Get it? It’s just too punny! … Ok I’m done.
And it’s a great book, no doubt. But those guidelines for application code are not 100% directly applicable to database code.
Let’s back up a second. Why? That sounds about counter-intuitive, no? Ok, more context. See, the traditional (“legacy”?) app consists of about 3 layers, the bottom one being the database. And it’s usually relational, and is usually responsible for far more than simple data access and persistence. Read: complex business rules and process logic. Data flow, not just getters and setters.
So that means, as a developer, DBA, or hybrid “DbDev”, you’re often tasked with writing or improving the stored procedures which house that complex logic. And that’s my topic today: being clean about your SQL code.
Part 1: Comments
There’s a fairly famous quote from the book about comments:
Comments are always failures.
He’s using hyperbole, but for a purpose. While his views on comments may be extreme, most programmers tend to realize the core essence of that chapter, which is that comments only serve to express something in plain English that the code has failed to express clearly enough to be easily and immediately understood.
With SQL scripts, and in particular with stored-procedures, I’m taking a somewhat opposite stance:
Comments are always appreciated, even if they’re potentially outdated or inaccurate.
There are two types of comments in SQL, the
--inline and the
/* block */. Different people have their preferred flavors of
block — sometimes it’s just several lines prefaced with the double-dash
--. And that’s fine, whatever floats your comment-boat.
In particular, I always encourage a comment block at the top of ever stored-proc & other user-defined programmable objects (function, types, etc). Just a small example for illustration:
/* Location: Server.Database Author: NateTheDBA Created: 2012-12-21 Description: Gets users who have not logged in since the given date. Consumers: MyCoolAppName, MyReportServer Revisions 2015-05-15, Nate: removed archive (never used after archive-date) 2017-06-07, Nate: fixed formatting for blog post */ CREATE PROCEDURE GetUsersNotLoggedInSince @SinceDate datetime2 BEGIN --some clever stuff goes here... END
“But wait”, you say, “what about source control?” Yes, all your programmable objects (and even, arguably, your reference data) should be in source control. There are tool-vendors aplenty to help you with that. But guess what? Budgets. Time & effort. Oh, did I mention, legacy legacy legacy? Yes, dear reader, the average business has years (decades) of organically evolved relational databases and processes. Are you the guy or gal to swoop in on your unicorn and seamlessly convert their entire data tier infrastructure to a beautiful DevOps pipeline with shiny rainbows and kittens for all? No? Okay then. Baby-steps.
Yes, my procs are in source control. It’s called “daily automated script-out-objects-to-files which are then committed to SVN”. It’s not built-in to SSMS. Which means that I, or another DBA, or a potential consultant, or a Dev who gets enlisted to help improve a proc that runs for hours when it should only take minutes, would be inconvenienced by the extra trip to a separate tool/system to fetch some change-history just for context. And really, that’s all this is for — CONTEXT. We like to know what it is we’re working on when we start to work on it, without having to traverse a change-tree or go bug 3 other people who “might” have touched it last. I’m not asking for a detailed log of every single time someone touched the thing; just give me the overview, the milestones and significant changes to functionality/features/scope so that I have a jump-off point for troubleshooting/testing/reasoning about it.
“But wait”, you say again, “shouldn’t your name be a sufficient description of what the proc does?” Sure, in theory. Until you have dependencies which need that name to stay the same even after an update or logic-change. Like reports. Or data-connected Excel workbooks. Which are used daily by managers, who will come yelling at you if their worksheets suddenly stop functioning.
Back to comments in general. The reason they’re helpful (besides documentation-headers for objects) is that they provide context and explain intent. Half the time, my job as a DBA is improving or fixing someone else’s code. Therefore, I want to see, in plain English, what it is they’re trying to accomplish, notes about attempts and failures, and the like. Yes, I could have a discussion with them. And I will. But if I’m working on it asynchronously and they’ve moved on to something else, or our hours are different, I want those little nuggets of context and intent to be right there in the script, because that’s where I’m working!
What about queries that get passed-down from the app to the DB? ORMs don’t support pre-pending a comment to their data calls, do they? I wish. Maybe some do, I haven’t researched it, but I know for sure that LINQ doesn’t. But then again, when I’m using a query-capture tool (like DMVs, Profiler, X-events, or a vendor monitoring tool), ORM queries are so painfully obvious in comparison to hand-crafted SQL that I can usually spot them from a mile away, and go bother the app-devs for some context & conversation. If you’re one of the poor unfortunate souls who still passes ad-hoc generated SQL statements down thru ODBC to your DB, then sure, a little comment won’t hurt anybody.
So do your DBAs a favor, comment your SQL code, at least in terms of programmable database objects and ad-hoc scripts. I promise, it’ll make them hate you less. It might even make you love yourself more, because 3 months down the road when you revisit that proc, and you need to remember what it was for and why you did it that way, you’ll see it right there in your very own writing! (OK, typing.)
Part 2: SRP, Encapsulation, and Abstraction
A bit of paraphrase of one of the book’s key points:
A reusable module (function, method) should do one thing, and do it well.
Also, the DRY principle:
Don’t repeat yourself.
When building SQL modules, we’re usually concerned with performance and accuracy, over abstraction and composability. Therefore, repeating oneself is not necessarily a bad thing, when done for the right reasons. Diligence is a big factor here — if there’s a non-trivial relationship between some entities that you’re repeating in several places, and you know that it could become a maintenance headache if that relationship’s definition has to change later, do as much as possible to mitigate the risk of dependency/consistency-loss. This can be documentation, comments, and/or building that relationship into a view.
The latter brings up an interesting topic, one which I had a lively discussion about with a colleague recently (he’s a developer, and a dang good one) — nested views. Because inevitably, the encapsulation of those relationships & business-rules into things like views or ITVF’s can and will lead to nesting those objects into other objects. And troubleshooting many-level-nested views is a particularly frustrating exercise; in fact they’re what some DBAs call one of the “deadly sins of SQL“. But there are perfectly valid reasons and uses for them, sometimes, and I really enjoyed the discussion thread we had on it, so I’ll have to expand on that in another post.
Anyway, I’m already getting long-winded and well over 1k words, so I’ll wrap it up for now, and continue this topic next week.
Thanks for reading, stay tuned!