Clean code, the SQL: Part 2: Electric Boogaloo

I enjoyed the conversation, because it really got us both thinking more deeply about which areas of our app landscape are in better/worse shape than others.

Advertisements

After the previous discussion about nested views, encapsulation & abstraction, I’d like to write about duplication specifically, and the distinction between actual lines of code being duplicated, versus functional duplication. Because the latter is not OK, but the former is generally acceptable when it’s boilerplate code, or done, again, in the name of performance and efficiency.

boilerplate with alphabet and stuff
letters and numbers and symbols!

 

So, to expand on last week’s “Encapsulation & Abstraction” segment.  The conversation with one of my favorite developers went something like this.

Developer:

While I agree that there’s some over-reliance on nested views, the reason they get implemented a lot is because there’s a particular problem they seem to easily solve: how to encapsulate business-data rules without violating DRY.

Let’s say we have a biz-rule for a “core segment” of data.  For simplicity’s sake, let’s call that rule “Widget A consists of a Widget record and a WidgetSupplement record joined by WidgetID, where Widget.WidgetType is ‘foo’.”  So it seems obvious to create a view WidgetFooComplete, which pulls in the data from both tables and applies the type condition.  This creates a sort of “atomic building block” of data, which can be consumed by apps & data-access methods repeatedly & consistently.

Now, most downstream apps will use that WidgetFooComplete data in its entirety (or nearly).  But let’s say there’s a hot new app that needs some more data about the Widgets, and it needs to go out to the WidgetMoarProperties table.  The natural inclination is to incorporate our existing “building block” view, WidgetFooComplete, into a new view for this app & its dependencies, and call it WidgetFooMoarComplete.

But what’s the alternative?  If we re-create the JOIN/WHERE conditions on the base-tables in this new view, it violates DRY and makes possible future refactoring difficult if that biz-rule changes.

Admittedly, most modern data-access technologies make it easier to create these “building blocks” of joined data entities.  And sometimes those biz-rules belong in the app’s lower layers, but this can lead to writing lots of little disparate queries/db-calls for what should have been one atomic operation.  That can be a maintenance headache, as could dozens (hundreds) of tailored stored-procs for every data-access scenario.

So it seems like nested views can have their place, but “deep” nesting is usually troublesome.  And to prevent the “slippery slope” effect, we have to practice diligence.

Me:

That’s pretty spot-on.  DBAs tend to criticize them (nested views) as a practice in general because of the tendency to over-use and over-rely on them, and because of that slippery slope, where “a little” use turns into “a lot”, and leads to troubleshooting headaches.  And generalizations are just that.

To take some examples in-hand: simple entity relationships, especially when biz-critical, should be A) obvious, and B) documented.  Unified views can serve this purpose, but should only be used where appropriate — i.e. to load an object that gets passed around/up the app stack.  They’re great “atomic building blocks” when you actually need the entire block of data.  But when you don’t — say in a stored-proc that’s doing some data flow operation and only needs a small subset of that data block — it’s probably better to get the relationship logic from the view and copy-paste it (but hopefully not all of it!), while omitting the stuff that’s not needed.

The main reason for this is usually index tuning.  If we’ve crafted some indexes to meet certain query patterns in certain troublesome procs, we want those procs to use those indexes, not just do a full table scan because they’re using a nested-view which does select * .

When we get to more complex business rules, we need to up our diligence game and be more mindful of dependency checking when planning for a rule change.  Proc comment-headers can be helpful here, as can tools that search thru SQL object meta-data and code-bases to produce dependency chains.

The main point is, duplication tends to be OK when it’s not functional duplication, i.e. when the SQL code is more-or-less similar in some places but it’s not exactly the same because the purpose (responsibility) of that module/stored-proc is not the same.

You’re right in that the “31-flavors of tailored procs for data-access” is a big maintenance headache, and sometimes that trumps even the performance concerns.  Again it’s about balance — we have to be mindful of both the biz-rule-maintenance concerns and the performance concerns.

Developer:

I figured.  Sometimes I see DBAs criticize developers’ work without seeming to understand that it doesn’t always come from sloppiness or laziness (although sometimes it does!).  Often, we’re trying to thread that needle of performance vs. maintainability.  In Dev-land, “lazy” is good in the sense of aiming for simplified logic, for ease of both maintenance and understanding.  Painstakingly tailoring each data-access call (stored-proc), while good for performance, is kinda opposite of that.  But, admittedly, we do fall back on SELECT * all too easily.

Mostly, we try to avoid code duplication because it leads to heavier maintenance overhead.  When some modules may perform similar operations, functionally, they will often re-use the same “core” logic, which we in turn encapsulate into its own ‘thing’.  But in SQL modules, as you say, that’s not always performant, so it’s definitely a tightrope-walk.

The “Clean Code” school of thought says, if it’s obvious, it’s “self-documenting”.  I don’t always agree with it, but it comes from maintenance concerns again.  We don’t like situations where someone tweaks the code but doesn’t update the comments, and you end up with misleading comments.  Unfortunately, it does come down to diligence again, and even “good” developers will easily fall back to rarely including comments just to avoid this situation.  Of course, another potential pitfall of supposedly self-documenting code is, what’s “obvious” to one person isn’t necessarily so to everyone else!

(We both enjoy writing, can you tell?)  =P

So basically we agreed to “moderation in all things” and exchanged Buddha statues and sang Kum-Bay-Yah.  I enjoyed the exchange because it really got us both thinking more deeply about which areas of our business/app landscape are in better/worse shape than others.

developer-dba-harmony-buddha
yay collaboration!

To conclude this part.  You will continue to see DBAs rant and rail against nested views and other “sins against SQL”, but:  Developers, don’t take it personally — we’re just trying to eek the most performance-per-$3k-core-license out of our precious servers, and spend less time chasing the white rabbit down the nested-views-hole.  And DBAs, go easy on your Devs — they still outnumber you, and they can whip out a complete web-app using the hottest JavaScript framework and a cloud-of-the-month service, faster than you can tune a server.  Everybody’s valuable, and everybody works toward the same goal: solving the business’s problems thru technology.

Moving on…

Part 3: Misusing & Abusing Datatypes

Because I’m getting long-winded again, let’s wrap up with a final “Clean SQL Code” topic that’s short & sweet.

Well, not really.  There are entire presentations dedicated to this topic.  But I’ll try to keep it condensed.

A date is not a datetime is not a time​ is not a time interval.  Okay?  For the third time, stop interchanging them!  Yes I know, SQL Server is a bit behind some other RDBMS platforms when it comes to this stuff.  Sorry, I don’t work for Microsoft.  I just deal with their tech.

Deep breaths…

More to the point, know your data.  Understand that there can be consequences to repeatedly casting types, or losing precision during conversion, sometimes exponentially so.  Yes I know, we all love loosely-typed (sometimes stringly typed) languages like JS & Python.  Those are wonderful tools for certain jobs/problems.  Again, be mindful and know your flows.

flow with the chart yo
I’m not sure what’s more disturbing.. the fact that this was the first image search result for “flow meme”, or the fact that it’s actually quite appropriate.

Thanks for reading, as always!

Views & Mismatched Datatypes

If you’ve ever wondered “who would win in a fight” amongst SQL datatypes…

Allow me a short revisit to the previous topic, because this example came from “real life” in my actual job in a production environment, and I wanted to share.  And remember, take this with a grain of salt — there are very few absolutes, a DBA’s favorite answer is “it depends”, and even when we spout rules or decrees, there are usually some exceptions.

Now, let’s start at the top.  T-SQL is a strongly typed language.  SQL Server, like most RDBMSs, counts datatypes as a foundational element in its ecosystem.  Things like bit, int, datetime, varchar (aka string), etc.  We also have this concept of NULL.  Any element of any datatype can be NULL, which, coincidentally, means a bit field can actually be 3-valued instead of just binary; BUT, that’s not the topic of this post.  However, it does involve a bit field.  And, as the title says, a VIEW.

First, the larger problem, or “what these little oversights / mis-steps can lead to”, as blogged by someone much smarter than me: Jonathan Kehayias – Implicit Conversions.

Now the diagram:

category, categorygroup, view-category-by-group
In English, the view’s DisplayInMenu field is defined as “If Category’s IsRoot bit is false, always use False; otherwise, use the Group’s DisplayInMenu bit.”  In even plainer English, it means we only want to “Display this Category in the Menu if it’s a ‘Root’ Category and it’s a member of a Group that gets displayed in the Menu.”

Do you see the problem?  It’s not so obvious if you’re not thinking about datatypes, but look closely.  Yes, our view’s DisplayInMenu field is a different datatype than its base field (origin).  That’s because the result of the CASE expression that defines it is “promoted” to the int type, instead of remaining a bit.  The same would be true of a COALESCE or ISNULL expression.  This is an example of datatype precedence.  If you’ve ever wondered “who would win in a fight?” amongst the SQL datatypes, Microsoft has the answer right there in black & white.

This isn’t necessarily a problem, all by its lonesome.  So why did it bite us in the proverbial behind?  Two reasons.  First, query & usage patterns:  vwCategoryByGroup.DisplayInMenu happens to be an important field in our queries, and usually it’s compared against a bit parameter or variable that’s passed down from the app’s lower tier.  Sometimes it’s even JOINed to another column, say, GroupMenuProperty.DisplayInMenu — which is, of course, a bit.  But because it’s an int in the view, SQL is doing extra work every time to implicitly convert those bits to  ints so that each side of the comparison operation is of the same type.  And again, not always, but sometimes, this causes performance problems.

ms-sql-stop-making-me-work-so-hard
I’m tired! Can’t you see I’ve been implicitly converting these datatypes all day?

The second reason is, admittedly, a bit beyond my understanding, so I’ll just explain the symptoms and leave the technical details to the more inquisitive minds.  Basically, during a performance crisis, one of the measures we took was to “fix” the view by turning DisplayInMenu back into a bit.  However, we found that it literally broke the dependent .NET application sitting on top, which started throwing exceptions of the “invalid cast” flavor.  I believe it’s using Entity Framework.  Thanks to a helpful tip from the Brent Ozar Office Hours webcast, I found out that it’s because the EF entity mapped to this view had that field (property?) defined as an int, and in order to make it “see” the datatype change, the code itself would need to be changed, i.e. that property would need to be defined as a bit.  Yay learning!

the-more-you-know
Also, did you know that EF really isn’t all that bad? 😉

So, dear reader, be conscious of your decisions with datatypes, especially when it comes to views with superficial computed columns.  But more to the point, beware of implicit conversions and mis-matched datatypes.  They can be, at best, a source of technical debt; at worst, a silent killer.

Til next time!

Views – the Good, the Bad, & the Lazy

If your VIEW has a dependency tree more than 2 levels deep, you’re doing it wrong.

Let’s talk for a minute about views.  No, not like scenery.  Not like my family (they’re good peeps).  I mean SQL views – i.e. CREATE VIEW vwFooBar which combines commonly used fields (columns) from tables Foo and Bar into a single entity, which datavelopers (term borrowed from Richie Rump, who has an awesome name btw) can then use & abuse without having to know the details of and relationships between those two tables.

So far, this sounds like a great idea, right?  Encapsulation & reusability are great principals in IT/Dev land.  And I agree!  Views have their place, for sure.  But some reason, I find myself constantly deconstructing them– refactoring complex queries & stored-procedures that use them — into their base tables (in the above example, that’s Foo and Bar).  And I find myself asking the proverbial WHY?  Why is this such a common misstep of devs & query writers, that the DBA has to spend such time “fixing” the issues this can cause?  Naturally, that train of thought spawned a blog post.

Let’s dive in!

Organization, Customer, Order: vwCustomerOrder
A very simple example of a 3-table view.

So, somebody created a nice high-level view for us that summarizes Customer, Organization, and Order data into a single entity that should prove pretty handy for things like reports, ad-hoc analysis, and maybe even a display-grid on a page somewhere.  And the happy developers don’t need to care about the foreign keys between the tables or how to JOIN them properly.

But!  Do we see a problem yet?  Yes, our vwCustomerOrder doesn’t include our Customer’s Email.  Well what if we need that in our code?  Do we go through change-management procedures and get the column added to the view?  Do we JOIN the view to the Customer table again just to get the Email?  Neither of those options are ideal.  The latter means that now we’re referencing the Customer table twice; the former involves refactoring, and is even more difficult if the view is an indexed view.

Okay, well let’s say we’ve added Email to the view, and it’s working fine.  Now let’s add another layer of complexity.  Say Customer is actually a VIEW, which consists of base-tables CustomerHeader and CustomerContact, where the latter stores a collection of contact entries for each Customer.  Now, vwCustomerOrder is thus a 2-level nested view.  These aren’t really super fun, but they’re not the most offensive thing in the database.  But again, what happens when we need something from CustomerContact that’s not already in our “master” top-level view vwCustomerOrder?  Maybe the primary Phone1, for example.  So to build that query, we now have to combine our 2-level nested view with a redundant table (i.e. a table that’s already in the view)!  But because it’s so terribly important to some reporting modules or some code bits, it becomes a permanent fixture of the schema.  And on it goes.

Expanded example from earlier:

organization-customer-contact-order-views
Because vwCustomer doesn’t include Phone2, if we built vwCustomerOrder on top of it, we’d have to JOIN again to CustomerContact, when we know it’s already included in vwCustomer. Thus, it’s smarter to build vwCustomerOrder from the base-tables and avoid that double-join (duplicate reference).

This is the problem with VIEWs that are grown organically, re-actively, without careful attention to the underlying schema and dependencies.  And look, I get it.  Mature software system systems DO evolve organically; iteration is the name of the game, and that goes for the database too, not just the app code.  But let’s agree that, like the boyscout principle espoused earlier in my ode to Clean Code, we can try to leave things a little better than how we found them.  DB refactoring is … not necessarily harder, but definitely different (than app code refactoring).  You probably have to jump thru more hoops to get the changes pushed to production.  Regardless, it’s worthwhile.  Why?  Glad you asked!

Nested views are icky for a couple reasons.  First, they’re more difficult to troubleshoot – not because they annoy the snot out of your DBA, but because they legitimately make index tuning more tedious.  Second, laziness begets more laziness – when you’ve got these views, you’re automatically tempted to simply use, re-use, & abuse them, instead of taking the time to analyze their effectiveness and refactor, replace, or reevaluate the problem.  Rampant ill-constructed views can lead to some serious technical debt.  And…

technical-debt-is-bad-mmkay

There’s another nugget of wisdom in software development that’s appropriate here: Less is More.  Your views should address a specific need or use-case, do it well, and call it a day.  Feature-creep is not a goal; views that try to be many things to many people ultimately fail at it, because you end up committing sins against set-theory & the relational model in favor of “oh just one more thing!”  While the purpose of views is ultimately abstraction, we should not forget the underlying parts — tables, indexes, statistics, schema relationships.  In fact, a good view can help us construct the right indexes and visualize the relationships.  Whereas a BAD view will obfuscate indexing needs and confuse (or even outright lie about) the base relationships.

Venn-diagram time.  We’ll use the circle areas to represent that “scope” of each view, meaning the base-tables that comprise it.  In a healthy schema, there will be mostly independent views with a few overlaps in small portions.  These views “do one thing well”, with little redundancy and hardly any nesting.  Conversely, in a haphazard ill-managed schema, there is lots of overlap and redundancy, multi-level nesting, and ugly colors (who the heck likes brown, anyway?).

views-family-good
Views in happy harmony & balanced dependence vs. independence.
views-family-bad
Views in sad land with ugly brown and black overlapping areas. (I was going to try for puce or burnt-umber but Paint is sadly not up to Crayola 64-pack standards.)

So, dear reader, what’s the point?  (I feel like I rhetorically ask that fairly often.)  Build your SQL views with targeted use-cases and clear purpose.  Avoid the laziness trap and the temptation to tack-on “one more thing”.  A view should  be a concrete, concise, abstracted representation of the underlying tables & relationships.  Evaluate existing views for how well they meet the developer needs, and don’t be afraid to deprecate, drop, or rewrite bad code.  And above all, stop the insane nesting.

If your view has a dependency tree more than 2 levels deep, you’re doing it wrong.

Or, more visually…

ridiculous-swiss-army-knife
This is not a useful tool, despite the sheer volume of functionality.

That’s all for this week!  Thanks for reading, and apologies for the slightly longer delay between posts.