So when I dived down the rabbit-hole of the Nested Set Model, of course I created a sample database to write & test the code against.
It’s that time again! This month, Todd Kleinhans (b/t) asks us how we use databases in our day to day life, i.e. personal use or “outside of our work / day-job”. Actually, the question is kinda vague — because if you think about it, we all use TONS of databases in our daily lives. Your phone’s contact list, your calendar, online shopping, banking.. the list goes on. As I’ve said before, Data is everything.
But what I think he meant, and the way most of the community has interpreted it, is “How do you manage/administrate/build/work-with/develop databases in your day-to-day life outside of work?”. So we’ll go with that.
Now this may out me as “not a real DBA” or some such nonsense, but honestly.. I don’t spend much of my time creating silly playground databases. Not that anybody else’s are ‘silly’ — just look at some of the fantastic posts for this month! Such neat ideas brought to life.
Special shout-out to Kenneth Fisher, who, if you look closely at his screenshot (and it’s not even related to this post), committed the abhorrent sin of creating a database name of pure emojis — FOR SHAME sir! But also you’re awesome. ❤
Me, I’m more of a quick-n-dirty spreadsheet guy. If I need, say, an inventory of my computer parts & gadgets so I know what I can & can’t repair, what materials I have to work with as I tinker, etc.. well, I create a Google Sheet. And it serves my needs well enough. (Spoiler alert: yes, you can view that one; I shared it. But it’s fairly outdated since I moved in March and haven’t had time to re-do inventory since.. last autumn.)
But for blogging in the tech field, you gotta get your hands dirty. So when I dived down the rabbit-hole of the Nested Set Model, of course I created a sample database to write & test the code against. And there have been some additional bits & pieces for blog demos and GitHub samples.
Most of the time, I’m creating databases / entities on SQL 2016 Developer Edition. Of course by now, that’s 2 major versions ‘behind’, but since I don’t run Linux personally (yet?), and I’m not a conference speaker (yet??), I don’t feel a burning need to upgrade. It’s FAR superior to Express Edition, though, so please for the love of all that is holy, if you find yourself using Express for personal/playground use, save yourself the headache and go grab Developer.
Containers/Docker? Meh. If you want to start playing with those, definitely look at 2017 or higher. It sounds appealing in theory — “just spin it up when you need it, spin it down when you don’t!” — and I’m sure that’s great if you’re starved for resources on whatever laptop you’re working with, but if you’ve done your due diligence and set your local SQL instance to appropriate resource limitations (hello, ‘max server memory’ and file-growths!), I’ve found that its impact is quite tolerable.
But come now. Surely this isn’t just a “shameless self-promotion” post or a grumpy-old-DBA “get off my lawn” post. Right?? Right!
To you folks out there creating your own nifty little databases for personal projects, learning/development, or even hopes & dreams of building a killer app on top of it one day — you’re amazing! Keep doing what you do, and write about it, because I love reading about it. Heck, go try to create the same model in PostgreSQL or MariaDB and see how it goes. We could all use a little cross-stack exposure once in a while.
That’s all I have for this month; short & sweet. I need to finalize plans for virtualizing our main SQL instances (which is really just a migration off bare-metal & onto VMs) within the coming weeks. Yes, we’re that far behind the curve. Now get off my lawn!
..this analogy of “dipping into the cookie jar”. What events or accomplishments can I take sustenance from, draw strength from, during these times?
Hi folks. It’s been a minute. Frankly it’s been a rough 5 months. From losing my wife, to dealing with the holidays and her birthday, to moving houses again. On the career front, I’m faced with the challenge of virtualizing our core business-critical SQL instances with minimal downtime. And obviously, because of all that personal/life stuff, it’s been difficult to stay focused and productive.
So this #tsql2sday‘s topic is poignant, I suppose — this analogy of “dipping into the cookie jar”. What events or accomplishments can I take sustenance from, draw strength from, during these times?
As usual, we must thank our host, Shane O’Neill (b|t), and remind readers to check out tsqltuesday.com.
Back when I first took this current job, I was worried sick about doing the commute every day. One and a half to two hours in traffic each way. Even if I used toll-roads, it might save 10-15 minutes, but it was still super stressful. But my wife never stopped encouraging me, telling me it would pay off. She put up with the crazy hours, she checked on me periodically, she stayed on the phone to keep me awake sometimes. She reminded me often, when the time was right, to have the telecommute/remote-work conversation with management.
And of course, to nobody’s surprise, she was right. I now work from home 4 days a week, take a vanpool the 5th day, and am much happier and more productive (in general!), much less stressed, and a markedly better driver. More importantly, because of her unwavering support, I can still look back and draw renewed energy from those memories and from her still-present spirit that stays with me in my heart.
One of the first big projects on my plate was upgrading & migrating the SQL 2005 instances to new hardware and SQL 2016. We didn’t use anything super fancy for the actual migration, just backup shipping, essentially. DbaTools came in very handy for copying over the logins/users without having to hash/unhash passwords. The main hiccups were around Agent Jobs and Replication. Jobs were turned off at the old server before they were enabled on the new, but due to lack of documentation and understanding of some, it became difficult to see which ones were critically needed “now” vs. which could wait a while, and which may have dependencies on other SQL instances (via linked-servers) that may or may not have moved.
And replication is just a PITA in general. Fortunately, I took this as a ‘growth opportunity’ to more fully document and understand the replication environment we were dealing with. So at the end of the project, we had a full list of replication pub-subs with their articles, a sense of how long they each took to re-initialize, and a decision-process to consult when adding-to or updating articles within them.
A similar upgrade-migration project came to fruition in 2017: the ERP system upgrade. This was a delicious combo meal of both the database instance upgrade (SQL 2008R2 to 2016) and the application & related services/middleware (Dynamics NAV 2009 to 2017). And as I blogged about, it came with a super sticky horrible side-effect that we’re still dealing with over a year later: a different collation than the rest of our DB environment.
Which reminds me, I need to do some follow-up posts to that. Briefly, the “best” band-aids so far have been thus:
If only dealing with the ERP entities, without joins to other DBs, just leave collations alone. The presentation layer won’t care (whether that SSRS or a .NET app).
If relating (joining) ERP entities to other DB entities, I’ll load the ERP data into temp-tables that use the other DB’s collation, then join to those temp-tables. The “conversion” is essentially ‘free on write’, meaning when we load the weird-collation data into a temp-table that’s using the normal-collation, there’s no penalty for it.
As I said, I’ll try to dive into this more in a future post. It’s been a life-saver for performance problems that cropped up as a result of the upgrade & the different collations.
But the point here is that, even though this project didn’t end up as wildly successful as we’d hoped, it’s still a success, because we learned some key lessons and were able to pivot effectively to address the problems in a reasonable way. And frankly, there was no going back anyway; it’s not like the business would have said “Oh, never mind, we’ll just stick with the old versions of everything” (even though some reticent managers would probably have enjoyed that!). So even when things seem bleak, there’s always a way around.
I’m still trying to figure out what this new chapter of my life looks like, without her. I’m still trying to be the very best DBA & BI-Dev I can, supporting the dozens of requests & projects that the business throws at us. Fortunately, with the incredible SQL Community, a wonderfully supportive family, and a fantastic team of colleagues, I remember how far I’ve come, on whose shoulders I stand, and how much promise the future holds.
Even though the one person I was meant to share it all with is gone; she still smiles down and encourages me in subtle ways.
Go find a SQL Saturday near you, at sqlsaturday.com!
The weekend before last, I attended my 4th #SQLSaturday event; my 2nd in San Diego CA (the others were Orange County CA, which is equally fantastic, and a little closer to work, though about the same distance from home). If you haven’t heard of or been to one, check out the home page and find one coming to a city near you! They’re fabulous FREE training events for the MS data platform, including but certainly not limited to SQL Server. For example, you’ll almost always find Kevin Feasel talking about aRrr or Kafka or Hadoop.
Did I mention free?
So I thought I’d share a few things that I learned this time!
The LinkedIn app’s Killer Feature
Did you know? The LinkedIn app has a “find nearby” feature that uses magic your phone’s various radios to instantly connect you with a fellow user who has the app open near you. It’s awesome! Now you don’t even have to look up from your convention coffee and security-blanket (phone) to network — just fire up the app, go to the People tab, hit “Find Nearby”, and commence trolling. =P
No, that’s horrible; be a normal human and talk to people. The tech is just there to help make the post-conversation connection.
Storage Myths Busted
This was an interesting and even slightly entertaining session presented by Max @ SQLHA. One analogy that really stood out to me was this:
SANs have become a bit like the printer industry — You don’t pay a lot for the enclosure, the device itself, i.e. the SAN box & software; but you pay through the nose for ‘refills’, i.e. the drives that your SAN vendor gods deem worthy of their enclosure.
It’s frighteningly accurate. Ask your storage admin what it costs to add a single drive (or pair of drives, if you’re using something with built-in redundancy) to your SAN. Then compare that cost with the same exact drive off the retail market. It’s highway robbery. And we’re letting them get away with it because we can’t evolve fast enough to take advantage of storage virtualization tech (S2D, SOFS, RDMA) that effectively makes servers with locally attached SSDs a superior architecture. (As long as they’re not using a horribly outdated interface like SAS!)
Data Protection and Privacy is Hard
But completely necessary. We all need to become security practitioners to some extent. Even if it just means raising and documenting our concerns to our bosses. The great state of California has even jumped on the bandwagon with its very own privacy act. Still in the early stages, yet on the heels of GDPR, it can only mean that there will be more to come.
A few concrete action items from this, which should be “fairly simple” (with a big ol’ asterisk that says “depending on your organization and your change-management process”).
At least encrypt your database backups. (And make a plan to implement some kind of “full” encryption across the data estate, be it TDE or AE or app-driven encryption via your developer teams.)
Stop usingsa! Reset the password, and disable it. Yes, your Agent Jobs will still run just fine.
Disable Named Pipes & Shared Memory protocols; just use TCP/IP. Disable the SQL Browser service.
Cut off your SQL servers from the public Internet (yes, you should still patch them; just download the patches to a fileshare and install them using PowerShell!). And stop letting people run SSMS on the server; that’s what client machines are for!
Here are just a few of the tips I gleaned from the session on this:
They were designed for data warehouses, but…
They’re also great for “operational analytics” — where you want to do aggregate reporting on your ‘live’ data, but that performance usually kinda sucks (and you don’t want to lock up those tables anyway).
Best with SQL 2016 or higher; 2012’s “v1” implementation was horrible, and 2014’s “v2” was semi-usable but still had some major drawbacks
Best candidate tables are “very large” (millions of rows or more), and best candidate columns have “low cardinality”, meaning they’re not full of unique values — they should be “compressible”. A simple example would be a Customer’s “State of residence” — you probably have millions of customers, but only 50-ish “State”s, and your typical report is asking “Hey, how many Customers per State ‘do X'” — that’s a great candidate for a columnstore index.
Users Don’t Like Date-Pickers
I mean they’re still better than text-entry boxes, but we can do better. Talking about SSRS here — reporting services, i.e. “the poor-man’s Tableau”.
Picture a typical business user, middle-manager type, going to your SSRS report that you just built for him/her. The first thing it asks them to do is “pick a start-date and an end-date” to define the “reporting period”. But you should know by now that they almost always want to see the current Fiscal Quarter. So you default them to those dates that define the current quarter. Great!
Now they want to quickly compare to the previous quarter, or the same quarter of the previous Fiscal Year. Ruh-roh. Nobody wants to go messing with those lame date-pickers with the pop-up calendar thing.
Give them a clickable label instead, which says “Compare to last Quarter” or “Previous Fiscal Year”.
The click action should be to “drill through” to the same report, while changing the (now internal/hidden) date parameters to the appropriate dates. Presto! The user only had to click once to get exactly what they wanted. Much nicer experience.
I’ll try to have a future post going into detail on this. I’ve got tons of ideas swimming around in my head after FishHeadTed‘s excellent SSRS classes, and not nearly enough time in the day to flesh them out.
Build a solid CS/IT foundation, build your soft-skills, and then learn the latest & greatest tech stack. Or the mature enterprise stuff. There’s plenty of room for both!
Our benevolentoverlord prompted us (months ago; I know, I’m always late to the party) to write about what we do in our day-to-day jobs as SQL Server professionals, aka #SQLCareer. The actual idea is to do 4 posts in a row, for 4 working days (either adjacently or the same day each week for 4 weeks).
Sticklers, us? Nope, never.
Without any kind of future guarantees, here is my first.
Emails, tickets, and alerts, oh my! Today was actually pretty quiet on the monitoring front — no major performance issues, no interruptions, and no job failures. That one that occasionally takes 8+ hours did okay last night. More often than not, I have to kill it if I find it’s been running into the work-day. That should be automated. /backlogged
Re-wrote some analyst’s data change script to be more “friendly” and set-based, instead relying on identity values — which is important when they can differ between Dev & Production environments. Deployed a few change requests, including that one.
On the side of less tedium, I fleshed-out more of the plan to virtualize our remaining bare-metal SQL servers. The usual ‘gotchas’ have to be discussed with the Server Admins — reserving compute resources, preventing the ‘noisy neighbor’ problem, and having enough storage to do the migration.
Yes, that means about 2x the storage as they’re currently using. Got a problem with that? No? Good. 😉
Finally, I worked on some code for querying the SSRS ReportServer database to obtain report metadata (including stored-proc usage, parameters, and user activity). The core concepts came from Ted Stathakis‘s SQLSaturdaysession on SSRS 2016. Hopefully, some of that code will be coming to GitHub soon.
“Delight” being sarcastic. No really, I love my job, but sometimes there are just some things that make you go ‘grr arg’.
First up, developing and finalizing some “data cleanup effort” reports. These are things that show the business users “what’s wrong with the data” and give them some guidance on how to “fix it”. Now, because there is no easy “management GUI” for this data, and because actual changes need to go through change-control, it’s not going to be a cake-walk. But given the right inputs and some domain knowledge, they can make the decisions as to what those changes should be, and then hand them off to the BA’s (business analysts). Usually in Excel form.
Next we have the ‘grr’ part. This is where I spent about 40 minutes trying to explain to the 3rd-party software vendor exactly where and how to integrate with our core data systems. Most of the info should not be ‘new’ to them, but since I never know which support tech will be assigned to the issue, it’s best to err on the side of verbosity. To make things more confusing, the folks on my end, who have been interfacing with the vendor thus far, aren’t intimately familiar with the underlying technology bits, so the conversation gets a little dicey before we all arrive on the same page.
Fortunately, since I had their attention, I was able to sneak in a suggestion to upgrade the back-end database for the software (which we self-host, but is essentially managed by them & their software) to something from this decade. MySQL 5.0 is old, guys, mmkay?
A developer needs a new table & some initial data-load to production; it’s passed QA. Great! They wrote the query a little weird, but nothing ol’ SQLPrompt can’t fix.
Commit outstanding stored-proc code & RDL files (SSRS reports) to source control. Even if I’m the only one working on them, I’d rather they be safe & track-able.
Ruh-roh. A developer set off some proc in a Dev db, it’s been running for over 30 minutes… and then they cancelled it! But wait, it’s stuck in rollback. Yes, kids, rolling-back a transaction can take even longer than the transaction itself, due to rollbacks being single-threaded.
Now, since the user’s hitting of the ‘Cancel’ button (aka “Stop” in SSMS) does not wholly and entirely cause a kill SPID command, I do that for them (since only sysadmin or processadmin can kill sessions). Then I run ye oldekill 64 with statusonly (64 happened to be the SPID) to check on its rollback status. Good news, everyone! It’s at… 0%.
And it stays at 0%. For fifteen minutes.
I move on to other tasks of course, since there’s nothing more I can do (short of pulling the plug on the server, which is never a good idea (unless you’re testing your Disaster Recovery Hope plan!). We also create a ticket to fix this stored-proc so that it doesn’t “take forever”, even if it has to churn through millions of records.
Finally, the statusonly check starts moving. It’s about 45% rolled back, with about “300 seconds remaining”. This, as anybody who’s been a DBA will tell you, is in “Microsoft time”. You know those progress-bars you typically get in Windows while something is happening? They’ll start off fast, make it to around 90-some percent, tell you there’s about 1 minute remaining, then take at least five minutes to “finish that last little bit”? Yeah, that’s a “Microsoft minute”. Patent-pending.
But fortunately for us, it does finish in just about the time it promised, maybe a bit longer. Just in time for the Dev to test what they wanted to test before going home for the day. And just in time for me to have my evening not ruined by a runaway transaction.
PS: Anybody who’s ever worked with SQL, including yours truly, has done this (set off a long transaction and/or tried to rollback a long transaction) at least once in their career. If you ask a seasoned DBA or DB-Dev, and they say “No, I’ve never done that!”, they’re a dirty rotten liar. =D
Are You Not Entertained?
If the life of a DBA sounds like fun to you, get in touch with your local IT recruiter and find out what the job market is like! You’ll hear all sorts of doom & gloom about how “the Cloud is taking over everything” and “the DBA is dead; long live DevSecDataOpsUnicorns!”.
No, I joke. Yes, some of the fundamental responsibilities of the DBA are shifting to the cloud vendors, to IaaS/PaaS services, but at the end of the day, being in IT is about learningtech and working with people. Build a solid CS/IT foundation, build your soft-skills, and then learn the latest & greatest tech stack. Or the mature enterprise stuff. There’s plenty of room for both!
Here are a handful of anti-patterns that I’ve seen with triggers in my time…
This month’s invite comes once again from our benevolent overlord* community pillar Steve Jones, head of SQLServerCentral.com! If you haven’t been there, stop reading immediately and go check out the helpful forums and ‘Stairways’ articles. Some truly excellent content to be had.
No, don’t stop reading immediately… save it to your favorites/reading-list and look at it in 5 minutes when you’re done here. =)
*Though I’ve not had the pleasure of meeting him in person, I’ve heard Steve is a phenomenally humble and down-to-earth guy, so my silly comment about him is even sillier in that light. ❤
Triggers – Love ’em or Hate ’em
Borrowing a bit of a mini-game from the CodingBlocks guys, the first “Google-feud” (auto-complete result) for “sql server triggers are ” is sql server are triggers bad. Well, they can be, if used improperly / to excess. Like most bits of tech, they have their place and their use-cases.
I’ve blogged about a few such use-cases here (a “who did what” audit-trail type of trigger) and here (as part of the Nested Set Model implementation), which you should definitely read. I didn’t condone click-baity titles back then. Alas…
Here are handful of anti-patterns that I’ve seen with triggers in my time.
Thing 1: Using them as Queues
Repeat after me:
A trigger is not a queue.
Triggers are executed within the same transaction as the query that fires them. Meaning, they’re subject to all that ACID-y goodness of a transactional system. This is a double-edged sword. If all is successful, it guarantees that trigger will do its job when the calling query runs, which is great for audit-ability. On the other hand, if the trigger has a problem, anything and everything that triggers it will also fail.
The fundamental distinction between this and a queue, is that the success of the queued action is not immediately critical to the continued operation of the thing that called (queued) it.
So if your requirement matches the latter behavior, and not the former, do us all a favor and use a realqueue. Heck, find one of the few people who know Service Broker. (Hint: one’s a Warewolf, another’s a Poolboy.)
Thing 2: Making them WAY TOO BIG
Mostly because of the transactional thing, the rule of thumb with triggers is K.I.S.S. “Keep it Small and Simple.” Even the audit-trail example is a bit much if the table being audited is under significant write load. Typically, if the business requirements are both high-throughput and high audit-ability, you’ll be implementing a much more complicated tech-stack than just plain ol’ SQL Server with triggers.
Some of the offenders I’ve seen include: A trigger that wanted to write to several other tables with IF conditions that branched based on what column was being updated. And a trigger that required near-SA level permissions to do some back-end maintenance-y stuff. Those are both recipes for problem pie.
Thing 3: Doing Nothing Useful
Somewhat opposite of above, there’s no point in introducing the management and performance overhead of triggers if they’re not performing a transaction-critical operation. For instance, something better left to a queue.
Thing 4: Housing Business Logic
There’s always been a holy war about whether “business logic” belongs in the database or in the application code. And, since geeks love sub-classifying things to the Nth degree, there’s a sub-holy-war about what “business logic” actually means. But I won’t go into that here.
If you fall on the 1st side of the fence, and you feel the desperate desire to embed some logic in the data layer, it belongs in stored procedures, not in triggers. Reasons include maintaintability, discoverability, performance, documentability. Not to mention source-control on procs is a helluva lot easier than on triggers.
Thing 5: Too Many of Them
While multiple triggers can be defined for the same action on the same table, that’s not an invitation. You enforce trigger execution order to an extent (first and last), but any more than that and you’re asking for confusion. Falling back on the KISS principle, if you need more than one trigger on a table & action (insert, update, or delete), you probably need to rethink the underlying design.
Using INSTEAD OF vs. AFTER: it’s fairly self-explanatory, but just be aware of what you’re doing, especially with the former. You’re literally replacing the desired action of, say, an update query with the contents of your instead of update trigger. If this is not obvious to all users of this table, you’re in for some really surprised faces and angry messages.
And that’s all I have for today folks! Enjoy triggering stuff. In moderation, as all things. =)
Both the original article, and my responses, are guidelines, meant to help you start a conversation with your own team about how you can get better at writing code.
In what will surely be a controversial post, I give my take on some of the major points of this “SQL style guide” that made the rounds on Hacker News / Reddit recently a few years ago. Hey, I never claimed to be a source of breaking news.
Now remember kids, these are opinions — everyone’s entitled to theirs, and it doesn’t mean you’re right or wrong. As the author repeatedly points out, consistency among a team / project / environment is more important than anything else. Both the original article, and my responses, are guidelines, written by people with some experience (guided by others with more experience) to help you start a conversation with your own team about how you can get better at writing code. Because that’s what we’re paid to do, among other things.
I agree with most of the points here:
Consistent descriptive names (I would add “concise” too — autocomplete/intellisense has come a long way, but we’re still typing stuff sometimes)
Some OOP principles should be avoided because they usually lead to poor performance
Hungarian notation is best left in the ’80s where it belongs
Something I don’t abide by:
Sticking to the ‘standard’ and void vendor-specific functions/features
Some brief justification for my rejection:
Database code portability is largely a myth — it sounds great, in theory, but most teams/projects never actually do it. If your data persistence layer is ever slated for migration to a new tech-stack, it’s going to require a massive overhaul anyway, in which those vendor-specific functions/code-bits will the least of your worries. More likely, you’ll be swapping-out pieces of the data layer in small chunks to move into other storage tech, like a NoSQL store or a DocumentDB or something; and eventually over time, the whole original data layer will have been moved, and the concern over SQL code portability will be moot.
Furthermore, database vendors give you these features/functions for a reason — they’ve found that it greatly enhances their product and the productivity of developers who work with it. So why not take advantage?
Finally, if your application is ‘cutting-edge’ enough that ALL db access is done via ORM or some kind of repository layer in the code-base… guess what? You don’t have this problem in the first place! Because the database is a dumb state storage mechanism, containing little to no actual code whatsoever (storec procs, functions, etc.). So, port away!
Other basic issues:
CamelCase (actually TitleCase) is pretty standard in a lot of DB schemas, and I see nothing wrong with it. Despite my love of underscores (snake_case) , it does make for more awkward typing.
Plural or singular entity names should match the convention preferred by your overlaying ORM, if you’re at the point of DB design; most of the time, though, you’re working with a DB that you’ve inherited and you have no control over entity naming anyway, so stop whining about it and get on with life.
Leading vs. trailing commas: I prefer leading, but the arguments against it can sound convincing (not just his, but in general in the tech community) — my experience leans toward being more likely to futz with the middle-to-end of a list than the beginning (1st item), thus making the leading commas more likely to help, but that’s just me. Also, thanks to an awesome member of the Coding Blocks Slack, a point in my favor is that source code comparison (diff) tools will only show the one changed line instead of two, if you’ve had to add to the end of the column list.
Yes, please avoid reserved keywords, replace spaces with underscores (or use TitleCase to avoid having spaces), and use concise yet meaningful table aliases when you’re writing queries. I still remember, when I first started working at my current company, literally gasping and cursing under my breath when I found that some databases actually had a spacein the name.
Beyond that, the article goes a bit too deep in the weeds for me, especially the whole “known suffixes” thing — because isn’t that just Hungarian notation on the other end? How about names that make it intuitive, such as IsActive for a bit flag, or LineNumber or RecordSequence for a sequential integer that’s not auto-generated (not an identity value), or @NumMonths as a parameter for a stored-proc that indicates how many months of reporting to fetch? Common sense should rule the day, not arcane prefix/suffix conventions that will never be documented or enforced.
This whole notion of a “river” feels strange and awkward. It’s made worse by the fact that some clause’s keywords are “too large” for the “standard” river width (which is the width of the SELECT keyword, ish), such as group by and left join). Plus, I’ve yet to see truly excellent tooling support for this kind of style (be it VSCode, Visual Studio, SSMS, SQL Prompt, or other styling tools / auto-formatters). Given that I still largely write my code without continuous automatic re-style-on-the-fly styling assistance, I find this hard to digest.
Side-bar: big kudos to the author for pointing me at this typography article, which challenged my long-ingrained writing preference of double-spacing between sentences. Even now, I do it while I write this post, knowing it’s wrong — I can’t help myself!
For similar reasons, JOINs and sub-queries don’t need to be “on the other side of the river” — since I’m actually saying “there is no river”, what I mean is, don’t indent those JOINs or sub-queries excessively. In the FROM clause, the JOINed tables are just as important as the first one, so I don’t see the need to reduce their importance by putting them so far to the right. And please for the love of all things holy, stop putting the JOINpredicates in-line (on the same line) after the joined table — put the ON conditions to their own line and indent it!
Special note about the “Preferred formalities” section
BETWEEN is mostly evil. I’m not saying you should never use it; just be very clear about why you’re using it, and only use it with discrete valued types (DATE, INT), NOT with continuous (or conceptually/nearly-continuous) value types (DATETIME, REAL/NUMERIC/DECIMAL).
The UNION operator is often misused, usually because UNION ALL is preferred (and is what you really meant anyway), but a blanket statement to “avoid it” misses the point of why it exists in the first place. Likewise, temporary tables (#temptables) are wonderful tools when used properly and in moderation, but flagrant overuse can lead to what I call #tempocalypse (which means you’re hammering your TempDB so hard that its underlying storage system screams for mercy).
Misnamed section “Create syntax”
What he really does here is expound upon table and database design principles. This does not belong in a “Style Guide”; it probably belongs in a “Design Guide”, but because the relational database as a technology is so mature (yes, that means ‘old’) by this point, most of this feels completely unnecessary and redundant. And again, you’re working with inherited designs over 90% of the time, where you don’t get to make these decisions, because they were made for you by your predecessors. If you are so lucky as to be designing a relational model from scratch, look for advice from the tried-and-true architects of the trade.
I do echo and reiterate his advice to look at other data storage tech for things that are not ideally served by an RDBMS, such as EAV models, document storage, key-value storage, text search, etc. There is no shortage of NoSQL tech to meet those needs, and they will do a much better job of it than you or I could in designing a hacked-up SQL model that somewhat does the job but falls over flat when faced with scaling up to a larger size or heavier workload.
As I said at the beginning, these are opinions. I applaud the author’s effort, and the fact that he actually got something together, made it open-source on GitHub, asked for feedback, and actually got traction with several communities discussing it. That’s more than I can say for myself at this point!
I hope that this spurs some discussion on your database development team, and perhaps helps you work toward making a consistent style guide for your own environment. Even if it’s just water-cooler talk, it’s always good to remember why we write code: so that others can read it. (Sure, it’s also to make stuff happen, i.e. “make the computer do things”, but that’s a side-effect — your main goal should always be readability and clarity.)
Do you have some comments, thoughts, disagree? Leave me a comment! I’d love to hear from you. 🙂
I’m talking about a locally stored, locally managed, only-ever-have-one-user, repository of SQL scripts. All it really needs to do is track changes on a daily (ish) basis.
Inspired by a Slack conversation with friend & former colleague, and building off last week’s T-SQL Tuesdaypost: How do I keep track of my SQL scripts?
The answer shouldn’t surprise anybody at this point: version control!
Side-note: I really should try using SSMS Solutions/Projects like Kenneth Fisher advocates, but I just haven’t gotten a chance yet. It would probably help with organization.
Well, perhaps you’re still in for a surprise. Because, you see, as much as I love using GitHub and Gist for my blog code samples, much of that love is lost in the daily grind of fix-it-tickets, BI building blocks, report development, and performance troubleshooting that comprises my business as usual. So forgive me if I haven’t made the leap to Git like all the cool kids are doing. I’m talking about a locally stored (i.e. on my very own hard-drive), locally managed, only-ever-have-one-user, repository of SQL scripts. All it really needs to do is track changes on a daily (ish) basis.
That’s what good ol’ Apache SVN is for! Yes, boys & girls, it’s still around. Consider it one of the many enduring foundational technologies that we tend to take for granted, or forget about, in the ever-present onslaught of the “new and shiny”.
But fortunately for us luddites, there are a couple GUI tools out there that make it feel less like the ’90s green-screen. Enter VisualSVN Server (free edition) and TortoiseSVN (also free).
Eeww, yuck! A non-distributed VCS?? What are you, some sort of wild animal!? The impudence!
Okay, calm down there sparky. Remember my requirements? They’re only about 5 sentences up. Go back and read them again. Breathe. We good? Good.
Another reason it doesn’t need to be distributed or ‘cloudy’ or web-based is that these scripts often contain private company I.P., so data governance demands that they stay within company IT space. And sure, there are private repos and ways to keep the stuff locked-down within a GitHub or similar, but again I ask, why? The first and primary requirement of my VCS is that it stays the heck out of my way when I’m working. I’m sure someone will counterpoint me, and I’d love to hear it. But for now, we’re keepin’ it local.
Getting Set Up
The first step, if it’s not obvious, is to install VisualSVN Server — it already contains the binaries for Apache SVN, so you don’t need to worry about that. It’ll ask you where to stick your repos and your backups; you can see an example of mine here:
Once it’s done, let it start up the manager GUI. Next step is to create a new repo:
I called mine “SQL_Scripts”. Because, as my blog’s tagline says, “why yes, I do like underscores.” When you go thru the steps, it’s easiest to leave everything at the default settings — that way you don’t need to futz with permissions or anything.
One ‘gotcha’ to note: in a corporate domain setting, you should set the server name to your machine’s FQDN, e.g. MYWORKSTATION.company.com. This may be done for you when you install, but I’m not 100% sure, so it’s worth checking — you can right-click on the VisualSVN Server (local) node in the left pane and go to Properties, to the Network tab, and verify it.
Next, install Tortoise SVN, or your favorite SVN client. I chose Tortoise because it’s easy to use, includes some very useful File Explorer options (right-click menu goodies), and supports standard command-line interaction just like vanilla SVN. We’re going to use said CLI interaction in just a bit!
Import (or is it Export?)
I almost always have trouble remembering which option is for use with a non-empty folder of “here’s a bunch of files that I want to dump into the repo to start with”, vs. “here’s an empty folder where I want to pull down the contents of an existing repo”. Fortunately, Tortoise yells at you if you try to do the latter — which is Export — into a non-empty folder. So we want to Import. Assuming you have a folder where all your SQL scripts live already, right-clicky and say “Tortoise SVN .. Import.”
You can verify that it worked by switching back to your VisualSVN Server for a moment, refreshing it, and seeing that the repo’s contents are now.. your happy files!
But wait… the folder I imported into the repo doesn’t have the pretty little icon overlays showing me that it’s in SVN… Why?
Don’t panic. We have one more step to go.
Right-clicky again! On the same folder you imported into SVN. You now want to “SVN Checkout…”, which will essentially mark all your files as “versioned”, because it will see that they all match exactly what’s already in the repo (because you just imported them a few moments ago).
There’s an important ‘gotcha’ here, if you named your repo something other than the folder name that’s serving as your repository root. By default, it will try to pull the repo’s contents into a sub-folder of the same name as the repo. In the example below, that’d be “Workspace\PersonalScripts”, instead of just “Workspace”, which is where I want it to go. This has happened to me more than once. Check it out:
Pull the trigger, and you’ll see all your files getting ‘Versioned’, like so.
Yay? Yay. Now for the cool part, where you can stop thinking about all of this and just let the machines do their work.
Automatic Nightly Check-In
Windows Task Scheduler, specifically. Let’s be honest, if you’re reading this far, you’re not really a CLI jockey. And you’re certainly not a Linux geek — you could have had all this set up and done with about 5 lines of bash, I’m sure. Party on, Wayne. Us Garth’s still need a bit of help.
I’m not going to do a walk-thru screen-shot montage of Task Scheduler; it’s pretty idiot-proof if you read & understand the dialogs. Here’s the key bits, the commands that you’ll actually want to enter as the actions. We have two actions: first “add all new files to the repo”, followed by “commit everything new & changed”. Schedule it however often you’d like; mine happens nightly at 8pm, when I’m about 99.9% sure I won’t be touching work. Oh, make sure to set it to “Run whether user is logged on or not”.
The actions are both “Start a program”, and said program will be the SVN executable, which should be (by default) located at C:\Program Files\TortoiseSVN\bin\svn.exe. Here are the arguments, respectively (1st action, then 2nd), subject to personalization of course!
And that about does it! Now, without any need to remember any command-line syntax, or to touch a right-click menu again, your SQL scripts are being silently versioned and committed every night to your very own local repository. If you ever forget when or what you did last week or last month, or need to “blame” yourself (which is short-hand for “show me all the changes from this part of this file so I can see what led to its current state”)… now you can.
PS: If you’re using a fantastic 3rd party diff/compare tool like I am, be sure to plug it into the TortoiseSVN options for diff-viewing/comparison. While their built-in interface ain’t too bad, BeyondCompare and other similar tools are even better.