Battle 4: GUIDs vs Identities
This is an oldie but goody. A) Developers want their apps to manage the record identifiers, but DBAs want the database to do it. B) Developers prefer abstracting the identity values out of sight/mind, DBAs know that occasionally (despite your best efforts to avoid it) your eyeballs will have to look at those values and visually connect them with their foreign key relationships while troubleshooting some obscure bug.
But there’s more to it than that. See, none of those arguments really matter, because there are easy answers to those problems. The real core issue lies with the lazy acceptance of GUI/designer defaults, instead of using a bit of brainpower to make a purposeful decision about your Primary Key and your Clustered Index.
Now wait a minute Mr. DBA, aren’t those the same thing?
NO! That’s where this problem comes from!
A good Clustered Index is: narrow (fewer bytes), unique (or at least, highly selective), static (not subject to updates), and ever-increasing (or decreasing, if you really want). NUSE, as some writers have acronym’d it. A GUID fails criteria ‘N’ and ‘E’. However, that’s not to say a GUID isn’t a fine Primary Key! See, your PK really only needs to be ‘U’; and to a lesser extent, ‘S’. See how those don’t overlap each other? So sure, use those GUIDs, make them your PK. Just don’t let your tool automagically also make that your CX (Clustered indeX). Spend a few minutes making a conscious effort to pick a different column (or couple columns) that meet more of these requirements.
For example, a
datetime column that indicates the age of each record. Chances are, you’re using this column in most of your queries on this table anyway, so clustering on it will speed those up.
Most of the time, though, if your data model is reasonably normalized and you’re indexing your foreign keys (because you should!), your PKs & CX’s will be the same. There’s nothing wrong with that. Just be mindful of the trade-offs.
Battle 5: CSV vs TAB
Often, we have to deal with data from outside sources that gets exchanged via “flat files”, i.e. text files that represent a single monolithic table of data. Each line is a row, and within each line, each string between each
delimiting character is a column value. So the question is, which is easier to deal with as that delimiter: comma, or tab?
String data values often have commas in them, so usually,the file also needs a “quoting character”, i.e. something that surrounds the string values so that the reader/interpreter of the file knows that anything found inside those quotes is all one value, regardless of any commas found within it.
But tabs are bigger.. aren’t they? No, they’re still just 1 byte (or 2, in Unicode). So that’s a non-argument. Compatibility? Every program that can read and automatically parse a
.csv can just as easily do so with a
.tab, even if Windows Explorer’s file icon & default-program handler would lead you to believe otherwise.
I recently encountered an issue with BCP (a SQL command-line utility for bulk copying data into / out of SQL server), where the
csv was just being a pain in the arse. I tried a
tab and all was well! I’m sure it was partially my fault but regardless, it was the path of least resistance.
Battle 6: designers vs scripting
This should be a no-brainer. There is absolutely no excuse for using the table designer or any other wizardy GUIs for database design and maintenance, unless you’re just learning the ropes. And even then, instead of pressing ‘OK’, use the ‘Script’ option to let SSMS generate a `tsql` script to perform whatever actions you just clicked-thru. Now yes, admittedly those generated scripts are rarely a shining example of clean code, but they get the job done, even with some unnecessary filler and fluff. Learn the critical bits and try to write the script yourself next time– and sure, use the GUI-to-script to double check your work, if you still need to.
Confession: I still use the GUI to create new SQL Agent Jobs. It’s not that I don’t know how to script it, it’s just that there are so many non-intuitive parameters to those
msdb system-sp’s that I usually have to look them up, thereby spending the time I would have otherwise saved.
Bonus round: the pronunciation of “Data”
Dah-tuh, or Day-tuh? Or, for the 3 people in the world who can actually read those ridiculous pronunciation glyphs, /ˈdeɪtə/ or /ˈdætə/ ? It’s a question as old as the industry itself… or maybe not. Anecdotally, it seems like most data professionals, and people in related industries, tend to say “day-tuh”; while those in the media and generally less technical communities tend to say “dah-tuh”. (Where the first syllable is the same vowel-sound as in “dad” or “cat”.) This likely means that the latter is more popular, but the former is more industrially accepted.
In either case, it doesn’t really matter, because at the end of the day, we’re talking about the same thing. So if some dogmatic DBA or pedantic PHB tries to correct your pronunciation, tell ’em to stop being so persnickety and get on with the task at hand!
Until next time…