Clean code, the SQL: Part 2: Electric Boogaloo

After the previous discussion about nested views, encapsulation & abstraction, I’d like to write about duplication specifically, and the distinction between actual lines of code being duplicated, versus functional duplication. Because the latter is not OK, but the former is generally acceptable when it’s boilerplate code, or done, again, in the name of performance and efficiency.

boilerplate with alphabet and stuff
letters and numbers and symbols!

 

So, to expand on last week’s “Encapsulation & Abstraction” segment.  The conversation with one of my favorite developers went something like this.

Developer:

While I agree that there’s some over-reliance on nested views, the reason they get implemented a lot is because there’s a particular problem they seem to easily solve: how to encapsulate business-data rules without violating DRY.

Let’s say we have a biz-rule for a “core segment” of data.  For simplicity’s sake, let’s call that rule “Widget A consists of a Widget record and a WidgetSupplement record joined by WidgetID, where Widget.WidgetType is ‘foo’.”  So it seems obvious to create a view WidgetFooComplete, which pulls in the data from both tables and applies the type condition.  This creates a sort of “atomic building block” of data, which can be consumed by apps & data-access methods repeatedly & consistently.

Now, most downstream apps will use that WidgetFooComplete data in its entirety (or nearly).  But let’s say there’s a hot new app that needs some more data about the Widgets, and it needs to go out to the WidgetMoarProperties table.  The natural inclination is to incorporate our existing “building block” view, WidgetFooComplete, into a new view for this app & its dependencies, and call it WidgetFooMoarComplete.

But what’s the alternative?  If we re-create the JOIN/WHERE conditions on the base-tables in this new view, it violates DRY and makes possible future refactoring difficult if that biz-rule changes.

Admittedly, most modern data-access technologies make it easier to create these “building blocks” of joined data entities.  And sometimes those biz-rules belong in the app’s lower layers, but this can lead to writing lots of little disparate queries/db-calls for what should have been one atomic operation.  That can be a maintenance headache, as could dozens (hundreds) of tailored stored-procs for every data-access scenario.

So it seems like nested views can have their place, but “deep” nesting is usually troublesome.  And to prevent the “slippery slope” effect, we have to practice diligence.

Me:

That’s pretty spot-on.  DBAs tend to criticize them (nested views) as a practice in general because of the tendency to over-use and over-rely on them, and because of that slippery slope, where “a little” use turns into “a lot”, and leads to troubleshooting headaches.  And generalizations are just that.

To take some examples in-hand: simple entity relationships, especially when biz-critical, should be A) obvious, and B) documented.  Unified views can serve this purpose, but should only be used where appropriate — i.e. to load an object that gets passed around/up the app stack.  They’re great “atomic building blocks” when you actually need the entire block of data.  But when you don’t — say in a stored-proc that’s doing some data flow operation and only needs a small subset of that data block — it’s probably better to get the relationship logic from the view and copy-paste it (but hopefully not all of it!), while omitting the stuff that’s not needed.

The main reason for this is usually index tuning.  If we’ve crafted some indexes to meet certain query patterns in certain troublesome procs, we want those procs to use those indexes, not just do a full table scan because they’re using a nested-view which does select * .

When we get to more complex business rules, we need to up our diligence game and be more mindful of dependency checking when planning for a rule change.  Proc comment-headers can be helpful here, as can tools that search thru SQL object meta-data and code-bases to produce dependency chains.

The main point is, duplication tends to be OK when it’s not functional duplication, i.e. when the SQL code is more-or-less similar in some places but it’s not exactly the same because the purpose (responsibility) of that module/stored-proc is not the same.

You’re right in that the “31-flavors of tailored procs for data-access” is a big maintenance headache, and sometimes that trumps even the performance concerns.  Again it’s about balance — we have to be mindful of both the biz-rule-maintenance concerns and the performance concerns.

Developer:

I figured.  Sometimes I see DBAs criticize developers’ work without seeming to understand that it doesn’t always come from sloppiness or laziness (although sometimes it does!).  Often, we’re trying to thread that needle of performance vs. maintainability.  In Dev-land, “lazy” is good in the sense of aiming for simplified logic, for ease of both maintenance and understanding.  Painstakingly tailoring each data-access call (stored-proc), while good for performance, is kinda opposite of that.  But, admittedly, we do fall back on SELECT * all too easily.

Mostly, we try to avoid code duplication because it leads to heavier maintenance overhead.  When some modules may perform similar operations, functionally, they will often re-use the same “core” logic, which we in turn encapsulate into its own ‘thing’.  But in SQL modules, as you say, that’s not always performant, so it’s definitely a tightrope-walk.

The “Clean Code” school of thought says, if it’s obvious, it’s “self-documenting”.  I don’t always agree with it, but it comes from maintenance concerns again.  We don’t like situations where someone tweaks the code but doesn’t update the comments, and you end up with misleading comments.  Unfortunately, it does come down to diligence again, and even “good” developers will easily fall back to rarely including comments just to avoid this situation.  Of course, another potential pitfall of supposedly self-documenting code is, what’s “obvious” to one person isn’t necessarily so to everyone else!

(We both enjoy writing, can you tell?)  =P

So basically we agreed to “moderation in all things” and exchanged Buddha statues and sang Kum-Bay-Yah.  I enjoyed the exchange because it really got us both thinking more deeply about topics and areas of our business/app environment where things were in better/worse shape than others.

developer-dba-harmony-buddha
yay collaboration!

To conclude this part.  You will continue to see DBAs rant and rail against nested views and other “sins against SQL”, but:  Developers, don’t take it personally — we’re just trying to eek the most performance-per-$3k-core-license out of our precious servers, and spend less time chasing the white rabbit down the nested-views-hole.  And DBAs, go easy on your Devs — they still outnumber you, and they can whip out a complete web-app using the hottest JavaScript framework and a cloud-of-the-month service, faster than you can tune a server.  Everybody’s valuable, and everybody works toward the same goal: solving the business’s problems thru technology.

Moving on…

Part 3: Misusing & Abusing Datatypes

Because I’m getting long-winded again, let’s wrap up with a final “Clean SQL Code” topic that’s short & sweet.

Well, not really.  There are entire presentations dedicated to this topic.  But I’ll try to keep it condensed.

A date is not a datetime is not a time​ is not a time interval.  Okay?  For the third time, stop interchanging them!  Yes I know, SQL Server is a bit behind some other RDBMS platforms when it comes to this stuff.  Sorry, I don’t work for Microsoft.  I just deal with their tech.

Deep breaths…

More to the point, know your data.  Understand that there can be consequences to repeatedly casting types, or losing precision during conversion, sometimes exponentially so.  Yes I know, we all love loosely-typed (sometimes stringly typed) languages like JS & Python.  Those are wonderful tools for certain jobs/problems.  Again, be mindful and know your flows.

flow with the chart yo
I’m not sure what’s more disturbing.. the fact that this was the first image search result for “flow meme”, or the fact that it’s actually quite appropriate.

Thanks for reading, as always!

Stupid Defaults

Every big enterprise product has them.  I just happen to pick on SQL because it’s my area of expertise.

This week’s topic was triggered by an amazing quote from Adam Machanic (b|t) in the sqlcommunity Slack group, where he was trying to help someone troubleshoot a slow linked-server query:

That default, I consider to have been checked into the SQL Server codebase directly by Satan.

satan-costume-guy-laughing
I imagine he looked something like this… Satan, not Adam.  =D

He’s referring, in this case, to the default option of remote proc transaction promotion, or “Enable Promotion of Distributed Transactions for RPC” in the GUI, which is set to True by default.  Admittedly, linked-servers are a dicey area of the technology and not everybody needs them, but when you do need to deal with them, there are some caveats to consider and a few potholes to avoid.

I won’t go into gory details, but the mile-high gist of it is that you should probably change a couple of the default linked-server settings when you’re going from MSSQL to MSSQL (which most of the time, you are): rpc and rpc out both to true, and the above dude to false.  The reasons behind that are subtle, but it boils down to usage patterns:  If you’re purely running basic SELECT statements against the remote server, you can leave this stuff alone; but if you want to do cool stuff like exec MyLinkedServer.RemoteDB.sys.sp_executesql '--some awesome dynamic-sql'​, you’ll want to change these.  (That last bit comes in especially handy if you’re building a dynamic query string, then running it against the remote server to bring in the results to a #temptable to further massage/mangle/munge said data.)

Even though you probably really shouldn’t be doing that in the database (that’s what web server farms are for!).

stack of colorful rings with arrow pointing up
push it up the stack!

So, what are some other “stupid defaults” in SQL Server?

Every big enterprise product has them.  I just happen to pick on SQL because it’s my area of expertise.  And it’s not even just “defaults”; there are some options which shouldn’t even be a thing — they should be completely and irrevocably in love with Edward removed from the product.  Yet, because the RDBMS tech space is infamously conservative and slow-to-change (the accepted euphemism is “mature“), these options and defaults have remained in the product despite our best attempts to convince MS that they’re heinous.

1. Parallelism settings (server/instance level)

Your servers have many-core CPUs, right?  And you want SQL to utilize those cores to the best of its ability, distributing the many users’ workloads fairly amongst them, yes?  Damn right, you paid $3k or more per core in freaking licensing costs!  “OK”, says SQL Server, “I’ll use all available CPUs for any query with a ‘cost’ over ‘5’“.  (To give context here, in case you’re not aware, ‘5’ is a LOW number; most OLTP workload queries are in the double to triple digits).  “But wait!”, you protest, “I have more than 1 user, obviously, and I don’t want their horrible queries bringing all CPUs to their knees and forcing the 50 other user queries to wait their turn!”

well... tough shit
-SQL Server

Hardly seems fair, no?  But those are the defaults, have been for over a decade, despite how much hardware has evolved since.  Just for completeness, I am talking about the following 2 options, which you can view by right-clicking on the instance in SSMS, or by running sys.sp_configure.  They are max degree of parallelism and cost threshold for parallelism, respectively (# cores to use, and how expensive the query should be to “go parallel” — as opposed to serial/single-threaded, which is perfectly fine for those itty-bitty-teeny-tiny queries that are so quick that 1 CPU core is more than enough to handle them).  We commonly abbreviate the former as maxDOP; the latter is less commonly abbreviated, but because I’m a sucker for acronyms, I like to call it CTFP, or maybe costFP if I’m feeling verbose.

Now obviously you can, and should, change these settings.  “To what?” you ask.  Well, the answer, as always my friend, is “It Depends ©®™“… a perennial favorite of DBAs and consultants alike.  But don’t ask me — there are plenty of people much smarter than I with blog posts on the topic.  If you put a gun to my head, for CTFP, I’d say “pick your favorite number between 50 and 100, start there, and test to see if it makes your server CPUs happy or sad”.  And for maxDOP I’d say “divide your # of CPU cores by 2 (if you have 8 or fewer) or 4 (if you have 16 or more)”.

And if you have somewhere between 9 and 15 CPU cores, don’t ask me, because you’re running some.. interesting hardware.  Figure it out yourself, and get that gun out of my face!

OK, I know 12 cores is probably not an unreasonable config, so.. extrapolating my logic above, divide by.. 3?  Yeah let’s go with that.

squirrel with "ship it" flag
ship it good

2. Auto Close (database option)

It’s at the top of the list in the GUI under Database Properties -> Options.  Yet nobody knows why you would ever enable it.  I once heard a community member theorize that it might have been put into the product back when it was considered a viable “local persistence option for Windows CE/Mobile apps” and it would help w/ resource management on said mobile device by freeing up resources when the DB wasn’t in-use.  Well, we all know how well that product line did in the market (hint: poorly).  There are so many better options for localized data stores in mobile dev, MS isn’t even a blip in the conversation.  (Again, talking local data persistence.)

If we’re talking cloud, MS is a big part of that conversation — Azure is amazing, and a solid competitor to AWS & GCP.

Anyway, if you ever find a SQL DB with the auto_close option enabled, find the person responsible, and slap them with a trout.

stick figure chasing another with purple trout
I will find you, and I will trout-slap you…

3. Server Max Memory

Last one for today.  This is something that’s not completely heinous, but could use a LOT more built-in intelligence during the installation process so that DBAs & SysAdmins didn’t need to think about it so much.  SQL Server will, by default, sets its max-memory to some-odd-billion-MBs (technically it’s the max value of a 32-bit int, which the more geeky among you have probably memorized), which is of course some-odd-million-GBs, which is more than even the most bleeding-edge servers have to date.  Which is fine in theory — you paid a crap-ton of money for this system, it might as well use up all the RAM that it can to perform to its potential, right?

all your ram are belong to us
in AD 2101…

Right.  Until you realize that “Oh wait, it’s running inside an OS” (whether that’s Windows or Linux, thanks to 2016 & 2017 product versions) — that that OS needs some RAM too, to keep itself running!  (Cue the Linux zealots with their “it’s so much more lightweight than Windoze, haha!!1” — yeah, well you still need some memory space, don’a ya?)

Here’s what I’d like to see, in my ideal world.  During SQL Server installation, it would detect how much RAM is on the system, subtract about 10% or 4-8 GB, leave that for the OS, and use the resulting number as its limit.  Boom, done, nobody has to think about configuring it and checking off another checkbox on their setup checklist.

But noooo… The vaunted MSSQL engineers can built all sorts of amazing things into the product like QueryStore, Adaptive Query Processing, and The Artist Formerly Known as Hekaton, but heaven forbid we get a little more intelligence in the installer.  It got a lot  better with 2016 when it let you configure tempDB reasonably correctly (multiple files, different locations, etc), but there’s still a LOT that could use some lurv.

two kitties cuddling
because KITTEHS!!

Do you have a favorite “stupid default” or “horrible setting” related to SQL Server or any other technology that you work with?  Share in the comments!

Clean Code, the SQL

as a developer, DBA, or hybrid “DbDev”, you’re often tasked with writing or improving the stored procedures which house that complex logic.  And that’s my topic today: being clean about your SQL code.

Get it? It’s just too punny! … Ok I’m done.

The Coding Blocks guys did a series of episodes about the perennial favorite Clean Code book.  If you haven’t subscribed to their podcast…

do it do it now -Arnold
What are you waiting for?!?!

And it’s a great book, no doubt. But those guidelines for application code are not 100% directly applicable to database code.

wait... what?
Srsly?

Let’s back up a second. Why? That sounds about counter-intuitive, no?  Ok, more context. See, the traditional (“legacy”?) app consists of about 3 layers, the bottom one being the database. And it’s usually relational, and is usually responsible for far more than simple data access and persistence.  Read: complex business rules and process logic.  Data flow, not just getters and setters.

So that means, as a developer, DBA, or hybrid “DbDev”, you’re often tasked with writing or improving the stored procedures which house that complex logic.  And that’s my topic today: being clean about your SQL code.

Part 1: Comments

There’s a fairly famous quote from the book about comments:

Comments are always failures.

He’s using hyperbole, but for a purpose.  While his views on comments may be extreme, most programmers tend to realize the core essence of that chapter, which is that comments only serve to express something in plain English that the code has failed to express clearly enough to be easily and immediately understood.

With SQL scripts, and in particular with stored-procedures, I’m taking a somewhat opposite stance:

Comments are always appreciated, even if they’re potentially outdated or inaccurate.

There are two types of comments in SQL, the --inline and the /* block */.  Different people have their preferred flavors of block — sometimes it’s just several lines prefaced with the double-dash --.  And that’s fine, whatever floats your comment-boat.

hms commentus comment-boat
I made my own!! (most copied from an example at http://www.asciiworld.com/-Boats-.html)

In particular, I always encourage a comment block at the top of ever stored-proc & other user-defined programmable objects (function, types, etc).  Just a small example for illustration:

/*
Location: Server.Database
Author: NateTheDBA
Created: 2012-12-21
Description: Gets users who have not logged in since the given date.
Consumers: MyCoolAppName, MyReportServer
Revisions
2015-05-15, Nate: removed archive (never used after archive-date)
2017-06-07, Nate: fixed formatting for blog post
*/
CREATE PROCEDURE GetUsersNotLoggedInSince
    @SinceDate datetime2
BEGIN
    --some clever stuff goes here...
END

“But wait”, you say, “what about source control?”  Yes, all your programmable objects (and even, arguably, your reference data) should be in source control.  There are tool-vendors aplenty to help you with that.  But guess what?  Budgets.  Time & effort.  Oh, did I mention, legacy legacy legacy?  Yes, dear reader, the average business has years (decades) of organically evolved relational databases and processes.  Are you the guy or gal to swoop in on your unicorn and seamlessly convert their entire data tier infrastructure to a beautiful DevOps pipeline with shiny rainbows and kittens for all?  No?  Okay then.  Baby-steps.

devops-unicorn-dba-cleaning-up
Not that I’m bitter or anything…

Yes, my procs are in source control.  It’s called “daily automated script-out-objects-to-files which are then committed to SVN”.  It’s not built-in to SSMS.  Which means that I, or another DBA, or a potential consultant, or a Dev who gets enlisted to help improve a proc that runs for hours when it should only take minutes, would be inconvenienced by the extra trip to a separate tool/system to fetch some change-history just for context.  And really, that’s all this is for — CONTEXT.  We like to know what it is we’re working on when we start to work on it, without having to traverse a change-tree or go bug 3 other people who “might” have touched it last.  I’m not asking for a detailed log of every single time someone touched the thing; just give me the overview, the milestones and significant changes to functionality/features/scope so that I have a jump-off point for troubleshooting/testing/reasoning about it.

“But wait”, you say again, “shouldn’t your name be a sufficient description of what the proc does?”  Sure, in theory.  Until you have dependencies which need that name to stay the same even after an update or logic-change.  Like reports.  Or data-connected Excel workbooks.  Which are used daily by managers, who will come yelling at you if their worksheets suddenly stop functioning.

end rant

Back to comments in general.  The reason they’re helpful (besides documentation-headers for objects) is that they provide context and explain intent.  Half the time, my job as a DBA is improving or fixing someone else’s code.  Therefore, I want to see, in plain English, what it is they’re trying to accomplish, notes about attempts and failures, and the like.  Yes, I could have a discussion with them.  And I will.  But if I’m working on it asynchronously and they’ve moved on to something else, or our hours are different, I want those little nuggets of context and intent to be right there in the script, because that’s where I’m working!

What about queries that get passed-down from the app to the DB?  ORMs don’t support pre-pending a comment to their data calls, do they?  I wish.  Maybe some do, I haven’t researched it, but I know for sure that LINQ doesn’t.  But then again, when I’m using a query-capture tool (like DMVs, Profiler, X-events, or a vendor monitoring tool), ORM queries are so painfully obvious in comparison to hand-crafted SQL that I can usually spot them from a mile away, and go bother the app-devs for some context & conversation.  If you’re one of the poor unfortunate souls who still passes ad-hoc generated SQL statements down thru ODBC to your DB, then sure, a little comment won’t hurt anybody.

you poor unfortunate soul
it’s sad, but true…

So do your DBAs a favor, comment your SQL code, at least in terms of programmable database objects and ad-hoc scripts.  I promise, it’ll make them hate you less.  It might even make you love yourself more, because 3 months down the road when you revisit that proc, and you need to remember what it was for and why you did it that way, you’ll see it right there in your very own writing!  (OK, typing.)

Part 2: SRP, Encapsulation, and Abstraction

A bit of paraphrase of one of the book’s key points:

A reusable module (function, method) should do one thing, and do it well.

Also, the DRY principle:

Don’t repeat yourself.

When building SQL modules, we’re usually concerned with performance and accuracy, over abstraction and composability.  Therefore, repeating oneself is not necessarily a bad thing, when done for the right reasons.  Diligence is a big factor here — if there’s a non-trivial relationship between some entities that you’re repeating in several places, and you know that it could become a maintenance headache if that relationship’s definition has to change later, do as much as possible to mitigate the risk of dependency/consistency-loss.  This can be documentation, comments, and/or building that relationship into a view.

Make-Time-for-Due-Diligence
It’s important.

The latter brings up an interesting topic, one which I had a lively discussion about with a colleague recently (he’s a developer, and a dang good one) — nested views.  Because inevitably, the encapsulation of those relationships & business-rules into things like views or ITVF’s can and will lead to nesting those objects into other objects.  And troubleshooting many-level-nested views is a particularly frustrating exercise; in fact they’re what some DBAs call one of the “deadly sins of SQL“.  But there are perfectly valid reasons and uses for them, sometimes, and I really enjoyed the discussion thread we had on it, so I’ll have to expand on that in another post.

Anyway, I’m already getting long-winded and well over 1k words, so I’ll wrap it up for now, and continue this topic next week.

Thanks for reading, stay tuned!

Quickie: SQL DB Role Members

A typical part of a DBA’s work-week might involve the occasional DB user-role-membership management, so I hope this helps the lone-wolf DBAs out there and/or the developers who need to know what to ask for…

Just a brief post on adding/removing users (database level users) to/from roles (database level roles).  It’s relevant because several shops are still stuck supporting at least a few 2008 (or hopefully, 2008R2) instances, and there is a key difference between those and newer (2012 & up) versions in the “preferred” method of doing this security task.

security seal for ur protection
He’s wearing a police hat, he must know what he’s doing…
There are reams of documentation and books and articles written about SQL security in general.  That is beyond the scope of this post (and indeed, beyond the scope of any single blog, unless you’re an SME on the subject!).  But a typical part of a DBA’s work-week might involve the occasional DB user-role-membership management, so I hope this helps the lone-wolf DBAs out there and/or the developers who need to know what to ask for, when they’re planning/deploying a new app against their SQL DB(s).

The “old” method involves calling system stored-procedures, sp_addrolemember and sp_droprolemember, in which you pass the role-name and username.  The “new” method, supported starting with SQL 2012, is to use the command-phrases ALTER ROLE [role] ADD MEMBER [user], and ALTER ROLE [role] DROP MEMBER [user].

The latter is more ‘standard‘, while the former is more ‘Microsoft-y‘.  I couldn’t easily find whether it’s part of the official ANSI standard or not… that’s an exercise for the reader.  What I find very interesting is that Azure’s data warehouse offerings require the old method.  Of course, hopefully in a DW setting you’re not messing with security nearly as much as a typical OLTP system, but… yeah.

Does that mean those Azure services are built on top of older SQL engine versions?  Possibly.  MSFT isn’t too open about the deep internals of such tech, but neither is any other cloud vendor, so we can’t really ask them such a question and expect anything more than a blank-stare.  But it is curious, no?

fry not sure if curious or suspicious
Exactly.
Syntax examples:  Let’s add the user foo to the database Bard, in the db_datareader built-in role.  Then we’ll remove him.  (Or her, I guess; “foo” is a pretty gender-neutral name.)  Creating said user is easy, so I’ll start with that, and it’s the same in all supported versions.  You need a server-level login to link it to; if you don’t have one, I’ll show you how to create it first.

Create server-level login:

--preferably, you create a login for an existing AD/Windows account:
CREATE LOGIN [yourdomain\foo] FROM WINDOWS;
--or, you can just create a SQL login (not connected to domain/Windows/ActiveDirectory; also less secure, as discussed here and here)
CREATE LOGIN [foo] WITH PASSWORD = 'foobar';

Create database-level user:

USE Bard;
--if you made the domain/Windows login:
CREATE USER [foo] FOR LOGIN [yourdomain\foo];
--or, if you just made the SQL login:
CREATE USER [foo] FOR LOGIN [foo];

Now the role-membership.

Old way:

  1. Add user to role:
    • exec Bard.sys.sp_addrolemember
          @rolename = 'db_datareader'
          , @membername = 'foo';
  2. Check that it worked:
    • exec Bard.sys.sp_helprolemember
          @rolename = 'db_datareader'
    • It will show something like this:
      db_datareader with member 'foo'
  3. Remove user from role:
    1. exec Bard.sys.sp_addrolemember
          @rolename = 'db_datareader'
          , @membername = 'foo';

New way (step 2, the “check”, is the same)

  1. Add user to role:
    • USE Bard;
      ALTER ROLE db_datareader ADD MEMBER [foo];
  2. Check (see above)
  3. Remove user from role:
    • USE Bard;
      ALTER ROLE db_datareader DROP MEMBER [foo];

Yay.

Notice that, because the “old way” is simply executing sys-sp’s, we can actually run it from any database context.  Whereas the “new way” requires you to connect to the database in question.

Note: I am in no way shape or form responsible for you screwing up your database or SQL instance, nor for you getting yelled at by your DBA or security admin or any other form of verbal assault you may incur as a result of running these commands.  But since you need server-admin & database-owner equivalent permissions anyway, you’re probably one of those people already, so you’ll just end up yelling at yourself.

no guarantees
No substitutions, exchanges, or refunds.
Cleanup (just so you don’t muddy your instance/DB up with a silly example user):

USE Bard;
DROP USER [foo];
USE master;
DROP LOGIN [foo];

If you have any questions, feel free to reach out to me!

 

DBA Holy Wars Part 2

Battle 4: GUIDs vs Identities

This is an oldie but goody.  A) Developers want their apps to manage the record identifiers, but DBAs want the database to do it.  B) Developers prefer abstracting the identity values out of sight/mind, DBAs know that occasionally (despite your best efforts to avoid it) your eyeballs will have to look at those values and visually connect them with their foreign key relationships while troubleshooting some obscure bug.

but-wait-theres-more-billy-mays
there’s ALWAYS more…

But there’s more to it than that.  See, none of those arguments really matter, because there are easy answers to those problems.  The real core issue lies with the lazy acceptance of GUI/designer defaults, instead of using a bit of brainpower to make a purposeful decision about your Primary Key and your Clustered Index.

Now wait a minute Mr. DBA, aren’t those the same thing?

NO!  That’s where this problem comes from!

A good Clustered Index is: narrow (fewer bytes), unique (or at least, highly selective), static (not subject to updates), and ever-increasing (or decreasing, if you really want).  NUSE, as some writers have acronym’d it.  A GUID fails criteria ‘N’ and ‘E’.  However, that’s not to say a GUID isn’t a fine Primary Key!  See, your PK really only needs to be ‘U’; and to a lesser extent, ‘S’.  See how those don’t overlap each other?  So sure, use those GUIDs, make them your PK.  Just don’t let your tool automagically also make that your CX (Clustered indeX).  Spend a few minutes making a conscious effort to pick a different column (or couple columns) that meet more of these requirements.

For example, a datetime column that indicates the age of each record.  Chances are, you’re using this column in most of your queries on this table anyway, so clustering on it will speed those up.

Most of the time, though, if your data model is reasonably normalized and you’re indexing your foreign keys (because you should!), your PKs & CX’s will be the same.  There’s nothing wrong with that.  Just be mindful of the trade-offs.

Battle 5: CSV vs TAB

bluray-vs-hddvd-fight
Who doesn’t love a good format-war?

Often, we have to deal with data from outside sources that gets exchanged via “flat files”, i.e. text files that represent a single monolithic table of data.  Each line is a row, and within each line, each string between each delimiting character is a column value.  So the question is, which is easier to deal with as that delimiter: comma, or tab?

String data values often have commas in them, so usually,the file also needs a “quoting character”, i.e. something that surrounds the string values so that the reader/interpreter of the file knows that anything found inside those quotes is all one value, regardless of any commas found within it.

But tabs are bigger.. aren’t they?  No, they’re still just 1 byte (or 2, in Unicode).  So that’s a non-argument.  Compatibility?  Every program that can read and automatically parse a .csv can just as easily do so with a .tab, even if Windows Explorer’s file icon & default-program handler would lead you to believe otherwise.

I recently encountered an issue with BCP (a SQL command-line utility for bulk copying data into / out of SQL server), where the csv was just being a pain in the arse. I tried a tab and all was well! I’m sure it was partially my fault but regardless, it was the path of least resistance.

Battle 6: designers vs scripting

no-wizard-allowed
Wizards are usually good, but in this case, they’re lazy and bad for you…

This should be a no-brainer. There is absolutely no excuse for using the table designer or any other wizardy GUIs for database design and maintenance, unless you’re just learning the ropes. And even then, instead of pressing ‘OK’, use the ‘Script’ option to let SSMS generate a `tsql` script to perform whatever actions you just clicked-thru.  Now yes, admittedly those generated scripts are rarely a shining example of clean code, but they get the job done, even with some unnecessary filler and fluff.  Learn the critical bits and try to write the script yourself next time– and sure, use the GUI-to-script to double check your work, if you still need to.

Confession: I still use the GUI to create new SQL Agent Jobs. It’s not that I don’t know how to script it, it’s just that there are so many non-intuitive parameters to those msdb system-sp’s that I usually have to look them up, thereby spending the time I would have otherwise saved.

Bonus round: the pronunciation of “Data”

its-data-not-data
Call me “big Data” one more time…

Dah-tuh, or Day-tuh?  Or, for the 3 people in the world who can actually read those ridiculous pronunciation glyphs, /ˈdeɪtə/ or /ˈdætə/ ?  It’s a question as old as the industry itself… or maybe not.  Anecdotally, it seems like most data professionals, and people in related industries, tend to say “day-tuh”; while those in the media and generally less technical communities tend to say “dah-tuh”.  (Where the first syllable is the same vowel-sound as in “dad” or “cat”.)  This likely means that the latter is more popular, but the former is more industrially accepted.

In either case, it doesn’t really matter, because at the end of the day, we’re talking about the same thing.  So if some dogmatic DBA or pedantic PHB tries to correct your pronunciation, tell ’em to stop being so persnickety and get on with the task at hand!

Until next time…

Little Gotchas

If the caller of our stored-procedure literally passes NULL as the parameter value, we might have a problem!

A large part of most DBA/DBD’s daily job is writing & maintaining stored-procedures.  In SQL Server or other RDBMSs, stored-procs (“SP’s”, “procs”, however you like to abbreviate), serve as one of the building-blocks of your overlaying applications and day-to-day operations, including maintenance and automation.

sprocket
This is a sprocket, not to be confused with a sproc, which is really just a proc.

Today, something struck me, and I was both shocked and comforted by the fact that this hadn’t really “come back to bite me in the arse“, as the proverbial saying goes.  But first, some context.

When we declare our proc signature with our parameters, we of course give them datatypes, and often default values — the parameter value that is assumed & used upon execution when the caller (operator, application, agent job, etc.) calls said proc without passing a value to that parameter.  So we create our proc like so:

CREATE PROCEDURE dbo.MyProc @MyParam BIT = 0 AS BEGIN SET NOCOUNT ON; END

So that users are allowed to call it like so, and assume some correct default behavior:

EXEC dbo.MyProc;

Coincidentally, that CREATE line is part of a typical “boilerplate” snippet or template which I use to create procs with “create if not exists, else alter” logic and a nice header-comment-block, which I’ll publish on my GitHub or Gist shortly, so I can show it here.  I know that MS recently added DROP IF EXISTS support to the language, but frankly, I like to keep procs intact if they exist because it’s easier not to have to remember/re-apply their metadata, such as security (grants/deny’s, certificate signatures, etc.) and extended properties.  Wake me up when they add true CREATE OR ALTER syntax!  Oh snap, they did… in 2016 SP1.  Anyway.

Now for the “catch”, the gotcha.

gotcha-programming-wikipedia-def
In programming/software-dev/IT-systems, “gotcha” has a specific meaning.  Thanks Wikipedia!

If the caller says exec dbo.MyProc, that’s great — they didn’t pass a parameter value, so the execution uses the default value (0) and off we go.  However, if the caller is so malicious as to literally pass NULL, we might have a problem!  Because let’s say that @MyParam value is used in a JOIN predicate or a IN (SELECT...) block, or even a CASE expression.  We won’t get an actual error; SQL Server is smart enough to jump over the syntactical variations required for equivalence checking (i.e. Column1 = 0 vs. Column1 is NULL) when it interprets/compiles the stored-procedure.  But, what we’re probably going to get is unexpected or unknown behavior.

warning-assumptions-ahead

It seemed worth re-using a classic…

And really, it all comes back to those nasty things called assumptions.  See, as the proc author, we’re assuming that our @MyParam will always be a 0 or 1, because it’s a BIT, and we gave it a default value, right?  Sure, maybe in another language, but this is T-SQL!  NULL is a separate and distinct thing, a valid value for any datatype, and must be accounted for and treated as such.  It can get especially dicey when you have a NOT IN (SELECT...) block that ends up as an empty-set, which suddenly morphs the outer query into a “without a WHERE clause” beast, and.. well, you can guess the rest.

So what do we do about it?  Well, we can add a “check parameter values” block to the top of our procedure where we either throw an error, or set the NULL value back to a default.

Examples:

IF (@MyParam IS NULL) RAISERROR ('@MyParam cannot be NULL; try again.', 15, 1);
IF (@MyParam IS NULL) SET @MyParam = 0;

We could also work on the internal proc logic to account for NULL values and “eliminate the guesswork” (i.e. prevent unexpected behavior) by actually having logical branches/conditions which “do something” if the parameter is NULL.  Then, at least we know what our proc will do if that infamous caller does exec MyProc @MyParam = NULL.  Yay!  But that sounds like a lot of work.  Maybe.

Or maybe it’s worthwhile because you actually want NULL to be treated differently than all other parameter values, and then, hey, you’ve already spent the time on that logic, so you’re done!

what-if-i-told-you-null-does-not-equal-null
But NULL does not NOT equal NULL, either!  Crap, somebody give me the red pill…

I hope this helps somebody else avoid the same assumptions.

Adventures in SQL Cluster Stacked Instances

Enough with the pitchforks; this is Test/QA. Here, I talk about 3 gotchas.

Today we’re going to talk about SQL Server instance stacking.

blasphemy-300
Blasphemy of the highest order!

Right, in production.  I’m talking about DEV/TEST environments.

Still, blasphemy!

Settle down.  If your server is set up correctly and has the resources you want it to have, and you divide your resources up per instance in a few very simple ways, it’s fine.  Enough with the pitchforks, the wailing and gnashing of teeth.

okay-okay-calm-down
Stop scaring the pandas.

Okay, now that that’s out of the way…

Remember our cute little DEV server?  So, the way he’s set up is, he’s got 3 SQL Server instances on him, each with its own dedicated SSD, and another dedicated SSD just for tempdbs.  Ideally, we’d have a separate SSD for each instance’s tempdb, but sadly, motherboards with 3 M.2 or NVMe slots aren’t (weren’t?) in production at the time, at least not for desktop class systems.  But I digress.

This is called instance stacking.  And yes, it’s a big no-no in production.  Mostly because performance troubleshooting is a pain in the arse.  But also because it’s more difficult to divvy-up resources like RAM and I/O & network throughput channels than one would like.  But it’s super simple to set up — you simply run the SQL Server installer 3x, each time creating a unique instance name.  Then, at the end of it, your SQL instances are addressable by MachineName\InstanceName, e.g. SQLDEV\Foo, SQLDEV\Bar, etc

Now the time came to create a “QA” environment.  Which, like DEV, didn’t need to be very performant (that’s a made-up word that consultants like to use, but it’s generally accepted in our industry so we go with it), and so, since we had some hardware laying around from a recent “up-gration” (upgrade-migration… okay, now I’m being ridiculous), we said “let’s use that thing!”.  It was a 2-node cluster setup with shared DAS storage.  For the uninitiated, DAS is Direct Attached Storage, i.e. an array of disks that you can directly attach to 1 or more servers using whatever interconnect is available on the endpoints (usually SAS, serial-attached SCSI  – which is one of most fun acronyms to pronounce in IT: “scuzzy”).  DAS is not to be confused with a SAN, Storage Area Network, which is a super fancy storage array with performance tiers and snapshot technology and de-duplication and all that hotness.

NAS-SAN-DAS-diagram
NAS, SAN, DAS – 3 acronyms, 1 underlying purpose, 3 implementations.

The interesting thing with a cluster is, when you install SQL Server instances, you can’t actually use the same “MachineName” for the 3 different “InstanceName”s.  Because in a cluster, the former is actually the “VirtualServerName”, which must be unique per clustered instance, in order to properly configure cluster resources, storage pools, and networks.

The reason this is interesting, is that it contrasts with stacked instance setup on a standalone server (non-clustered).  So if you compared our DEV and QA setups side-by-side, it’s a bit odd-ball: instead of SQLDEV\Inst1, SQLDEV\Inst2, etc., we have instance names like SQLQA1\Inst1, SQLQA2\Inst2, etc.  That makes the ol’ “find and replace” in config files a bit harder.  But, at the end of the day, it’s all just names!

find-and-replace-diallog
One of the handiest tools in an engineer’s toolbox!

Another interesting “gotcha” revolves around SQL 2008R2, which I know shouldn’t be on the short-list of versions to spin up, but unfortunately, a legacy ERP system demands it.  Well, it only happened to me with the 2008R2 instance installation, not the 2016’s, but that’s not to say it couldn’t happen with others.  Anyway, after installation, SQL Agent was not working; it wasn’t coming up as a cluster resource.  Basically, exactly what was outlined in this timely & detailed article at mssqltips.  I won’t restate the fix instructions here, just give it a read!  I do want to clarify something though.

In part of the fix, we use the handy-dandy PowerShell cmdlet Add-ClusterResourceDependency .  In its basic form, it requires 2 arguments, Resource and Provider.  To someone who’s not a cluster expert, this terminology might be a bit confusing.  Resource in this case is the SQL Server Agent, while Provider is SQL Server itself.  But we’re adding a Dependency, right?  Which depends on which?  Well, we know that Agent depends on the engine, so, Resource depends on Provider.  Yes, I know, that’s what the article tells you to do — I just like to understand why.

can-you-tell-me-why
Fry shares my curiosity…

Finally, there’s the question of divvying-up resources to the stacked clustered instances.  Now, in a standard cluster, you’ve got your active node and your passive node.  But if we’re stacking instances, we might as well split the SQL instances up and take advantage of the compute resources on both nodes.  (Storage is still shared; this is a cluster, after all!)  The CPUs are no problem — however many instances are stacked on a node, they’ll share the CPU cores pretty cooperatively.  Memory is a bit of a different story.  We want to take advantage of all the available RAM in the cluster, but…

As you know, you can configure each SQL instance to use a set amount of max. server memory.  So let’s say each cluster node has 32GB RAM, and we’re stacking 4 SQL instances total (to keep the math easy!).  If we split them up among the nodes at 2 each, each instance can use 16GB.  But if for some reason a node goes down, and all 4 instances move to 1 node, now they’re fighting for that 32GB!  So we should reduce their max-memory settings to 8GB each, instead of 16.  But we don’t want to do this manually!  Fortunately Aaron Betrand has an excellent blog post on the subject, with some useful ideas about how to do this dynamically & automatically.  The only issue I have with it is that it requires the linked-servers to use a highly privileged account (sysadmin or maybe serveradmin role) to be able to set that max-server-memory setting.  But wait, remember what we said at the beginning?  This ain’t production!  Who cares about security?  (That’s facetious, sort of — in reality, yes, we don’t care as much about security in lower environments, but we should still care a little!)

when-i-forget-my-password-is-incorrect
We all do silly things from time to time..

That concludes this week’s adventure!  Thanks for reading.