In what will surely be a controversial post, I give my take on some of the major points of this “SQL style guide” that made the rounds on Hacker News / Reddit
recently a few years ago. Hey, I never claimed to be a source of breaking news.
Now remember kids, these are opinions — everyone’s entitled to theirs, and it doesn’t mean you’re right or wrong. As the author repeatedly points out, consistency among a team / project / environment is more important than anything else. Both the original article, and my responses, are guidelines, written by people with some experience (guided by others with more experience) to help you start a conversation with your own team about how you can get better at writing code. Because that’s what we’re paid to do, among other things.
I agree with most of the points here:
- Consistent descriptive names (I would add “concise” too — autocomplete/intellisense has come a long way, but we’re still typing stuff sometimes)
- White space & indentation
- Datetime literals – hail Saint Bertrand
- Comments – you, dear reader, know my thoughts already
- Some OOP principles should be avoided because they usually lead to poor performance
- Hungarian notation is best left in the ’80s where it belongs
Something I don’t abide by:
- Sticking to the ‘standard’ and void vendor-specific functions/features
Some brief justification for my rejection:
Database code portability is largely a myth — it sounds great, in theory, but most teams/projects never actually do it. If your data persistence layer is ever slated for migration to a new tech-stack, it’s going to require a massive overhaul anyway, in which those vendor-specific functions/code-bits will the least of your worries. More likely, you’ll be swapping-out pieces of the data layer in small chunks to move into other storage tech, like a NoSQL store or a DocumentDB or something; and eventually over time, the whole original data layer will have been moved, and the concern over SQL code portability will be moot.
Furthermore, database vendors give you these features/functions for a reason — they’ve found that it greatly enhances their product and the productivity of developers who work with it. So why not take advantage?
Finally, if your application is ‘cutting-edge’ enough that ALL db access is done via ORM or some kind of repository layer in the code-base… guess what? You don’t have this problem in the first place! Because the database is a dumb state storage mechanism, containing little to no actual code whatsoever (storec procs, functions, etc.). So, port away!
Other basic issues:
- CamelCase (actually TitleCase) is pretty standard in a lot of DB schemas, and I see nothing wrong with it. Despite my love of underscores (
snake_case) , it does make for more awkward typing.
- Plural or singular entity names should match the convention preferred by your overlaying ORM, if you’re at the point of DB design; most of the time, though, you’re working with a DB that you’ve inherited and you have no control over entity naming anyway, so stop whining about it and get on with life.
- Leading vs. trailing commas: I prefer leading, but the arguments against it can sound convincing (not just his, but in general in the tech community) — my experience leans toward being more likely to futz with the middle-to-end of a list than the beginning (1st item), thus making the leading commas more likely to help, but that’s just me. Also, thanks to an awesome member of the Coding Blocks Slack, a point in my favor is that source code comparison (diff) tools will only show the one changed line instead of two, if you’ve had to add to the end of the column list.
Yes, please avoid reserved keywords, replace spaces with underscores (or use TitleCase to avoid having spaces), and use concise yet meaningful table aliases when you’re writing queries. I still remember, when I first started working at my current company, literally gasping and cursing under my breath when I found that some databases actually had a space in the name.
Beyond that, the article goes a bit too deep in the weeds for me, especially the whole “known suffixes” thing — because isn’t that just Hungarian notation on the other end? How about names that make it intuitive, such as
IsActive for a bit flag, or
RecordSequence for a sequential integer that’s not auto-generated (not an
identity value), or
@NumMonths as a parameter for a stored-proc that indicates how many months of reporting to fetch? Common sense should rule the day, not arcane prefix/suffix conventions that will never be documented or enforced.
This whole notion of a “river” feels strange and awkward. It’s made worse by the fact that some clause’s keywords are “too large” for the “standard” river width (which is the width of the
SELECT keyword, ish), such as
group by and
left join). Plus, I’ve yet to see truly excellent tooling support for this kind of style (be it VSCode, Visual Studio, SSMS, SQL Prompt, or other styling tools / auto-formatters). Given that I still largely write my code without continuous automatic re-style-on-the-fly styling assistance, I find this hard to digest.
Side-bar: big kudos to the author for pointing me at this typography article, which challenged my long-ingrained writing preference of double-spacing between sentences. Even now, I do it while I write this post, knowing it’s wrong — I can’t help myself!
For similar reasons,
JOINs and sub-queries don’t need to be “on the other side of the river” — since I’m actually saying “there is no river”, what I mean is, don’t indent those
JOINs or sub-queries excessively. In the
FROM clause, the
JOINed tables are just as important as the first one, so I don’t see the need to reduce their importance by putting them so far to the right. And please for the love of all things holy, stop putting the
JOIN predicates in-line (on the same line) after the joined table — put the
ON conditions to their own line and indent it!
Special note about the “Preferred formalities” section
BETWEEN is mostly evil. I’m not saying you should never use it; just be very clear about why you’re using it, and only use it with discrete valued types (
INT), NOT with continuous (or conceptually/nearly-continuous) value types (
UNION operator is often misused, usually because
UNION ALL is preferred (and is what you really meant anyway), but a blanket statement to “avoid it” misses the point of why it exists in the first place. Likewise, temporary tables (
#temptables) are wonderful tools when used properly and in moderation, but flagrant overuse can lead to what I call
#tempocalypse (which means you’re hammering your
TempDB so hard that its underlying storage system screams for mercy).
Misnamed section “Create syntax”
What he really does here is expound upon table and database design principles. This does not belong in a “Style Guide”; it probably belongs in a “Design Guide”, but because the relational database as a technology is so mature (yes, that means ‘old’) by this point, most of this feels completely unnecessary and redundant. And again, you’re working with inherited designs over 90% of the time, where you don’t get to make these decisions, because they were made for you by your predecessors. If you are so lucky as to be designing a relational model from scratch, look for advice from the tried-and-true architects of the trade.
I do echo and reiterate his advice to look at other data storage tech for things that are not ideally served by an RDBMS, such as EAV models, document storage, key-value storage, text search, etc. There is no shortage of NoSQL tech to meet those needs, and they will do a much better job of it than you or I could in designing a hacked-up SQL model that somewhat does the job but falls over flat when faced with scaling up to a larger size or heavier workload.
As I said at the beginning, these are opinions. I applaud the author’s effort, and the fact that he actually got something together, made it open-source on GitHub, asked for feedback, and actually got traction with several communities discussing it. That’s more than I can say for myself at this point!
I hope that this spurs some discussion on your database development team, and perhaps helps you work toward making a consistent style guide for your own environment. Even if it’s just water-cooler talk, it’s always good to remember why we write code: so that others can read it. (Sure, it’s also to make stuff happen, i.e. “make the computer do things”, but that’s a side-effect — your main goal should always be readability and clarity.)
Do you have some comments, thoughts, disagree? Leave me a comment! I’d love to hear from you. 🙂