After the previous discussion about nested views, encapsulation & abstraction, I’d like to write about duplication specifically, and the distinction between actual lines of code being duplicated, versus functional duplication. Because the latter is not OK, but the former is generally acceptable when it’s boilerplate code, or done, again, in the name of performance and efficiency.
So, to expand on last week’s “Encapsulation & Abstraction” segment. The conversation with one of my favorite developers went something like this.
While I agree that there’s some over-reliance on nested views, the reason they get implemented a lot is because there’s a particular problem they seem to easily solve: how to encapsulate business-data rules without violating DRY.
Let’s say we have a biz-rule for a “core segment” of data. For simplicity’s sake, let’s call that rule “Widget A consists of a
Widgetrecord and a
WidgetSupplementrecord joined by
Widget.WidgetTypeis ‘foo’.” So it seems obvious to create a view
WidgetFooComplete, which pulls in the data from both tables and applies the type condition. This creates a sort of “atomic building block” of data, which can be consumed by apps & data-access methods repeatedly & consistently.
Now, most downstream apps will use that
WidgetFooCompletedata in its entirety (or nearly). But let’s say there’s a hot new app that needs some more data about the Widgets, and it needs to go out to the
WidgetMoarPropertiestable. The natural inclination is to incorporate our existing “building block” view,
WidgetFooComplete, into a new view for this app & its dependencies, and call it
But what’s the alternative? If we re-create the
JOIN/WHEREconditions on the base-tables in this new view, it violates DRY and makes possible future refactoring difficult if that biz-rule changes.
Admittedly, most modern data-access technologies make it easier to create these “building blocks” of joined data entities. And sometimes those biz-rules belong in the app’s lower layers, but this can lead to writing lots of little disparate queries/db-calls for what should have been one atomic operation. That can be a maintenance headache, as could dozens (hundreds) of tailored stored-procs for every data-access scenario.
So it seems like nested views can have their place, but “deep” nesting is usually troublesome. And to prevent the “slippery slope” effect, we have to practice diligence.
That’s pretty spot-on. DBAs tend to criticize them (nested views) as a practice in general because of the tendency to over-use and over-rely on them, and because of that slippery slope, where “a little” use turns into “a lot”, and leads to troubleshooting headaches. And generalizations are just that.
To take some examples in-hand: simple entity relationships, especially when biz-critical, should be A) obvious, and B) documented. Unified views can serve this purpose, but should only be used where appropriate — i.e. to load an object that gets passed around/up the app stack. They’re great “atomic building blocks” when you actually need the entire block of data. But when you don’t — say in a stored-proc that’s doing some data flow operation and only needs a small subset of that data block — it’s probably better to get the relationship logic from the view and copy-paste it (but hopefully not all of it!), while omitting the stuff that’s not needed.
The main reason for this is usually index tuning. If we’ve crafted some indexes to meet certain query patterns in certain troublesome procs, we want those procs to use those indexes, not just do a full table scan because they’re using a nested-view which does
When we get to more complex business rules, we need to up our diligence game and be more mindful of dependency checking when planning for a rule change. Proc comment-headers can be helpful here, as can tools that search thru SQL object meta-data and code-bases to produce dependency chains.
The main point is, duplication tends to be OK when it’s not functional duplication, i.e. when the SQL code is more-or-less similar in some places but it’s not exactly the same because the purpose (responsibility) of that module/stored-proc is not the same.
You’re right in that the “31-flavors of tailored procs for data-access” is a big maintenance headache, and sometimes that trumps even the performance concerns. Again it’s about balance — we have to be mindful of both the biz-rule-maintenance concerns and the performance concerns.
I figured. Sometimes I see DBAs criticize developers’ work without seeming to understand that it doesn’t always come from sloppiness or laziness (although sometimes it does!). Often, we’re trying to thread that needle of performance vs. maintainability. In Dev-land, “lazy” is good in the sense of aiming for simplified logic, for ease of both maintenance and understanding. Painstakingly tailoring each data-access call (stored-proc), while good for performance, is kinda opposite of that. But, admittedly, we do fall back on
SELECT *all too easily.
Mostly, we try to avoid code duplication because it leads to heavier maintenance overhead. When some modules may perform similar operations, functionally, they will often re-use the same “core” logic, which we in turn encapsulate into its own ‘thing’. But in SQL modules, as you say, that’s not always performant, so it’s definitely a tightrope-walk.
The “Clean Code” school of thought says, if it’s obvious, it’s “self-documenting”. I don’t always agree with it, but it comes from maintenance concerns again. We don’t like situations where someone tweaks the code but doesn’t update the comments, and you end up with misleading comments. Unfortunately, it does come down to diligence again, and even “good” developers will easily fall back to rarely including comments just to avoid this situation. Of course, another potential pitfall of supposedly self-documenting code is, what’s “obvious” to one person isn’t necessarily so to everyone else!
(We both enjoy writing, can you tell?) =P
So basically we agreed to “moderation in all things” and exchanged Buddha statues and sang Kum-Bay-Yah. I enjoyed the exchange because it really got us both thinking more deeply about which areas of our business/app landscape are in better/worse shape than others.
Part 3: Misusing & Abusing Datatypes
Because I’m getting long-winded again, let’s wrap up with a final “Clean SQL Code” topic that’s short & sweet.
date is not a
datetime is not a
time is not a time
interval. Okay? For the third time, stop interchanging them! Yes I know, SQL Server is a bit behind some other RDBMS platforms when it comes to this stuff. Sorry, I don’t work for Microsoft. I just deal with their tech.
More to the point, know your data. Understand that there can be consequences to repeatedly
casting types, or losing precision during conversion, sometimes exponentially so. Yes I know, we all love loosely-typed (sometimes stringly typed) languages like JS & Python. Those are wonderful tools for certain jobs/problems. Again, be mindful and know your flows.
Thanks for reading, as always!