Clean code, the SQL: Part 2: Electric Boogaloo

I enjoyed the conversation, because it really got us both thinking more deeply about which areas of our app landscape are in better/worse shape than others.

Advertisements

After the previous discussion about nested views, encapsulation & abstraction, I’d like to write about duplication specifically, and the distinction between actual lines of code being duplicated, versus functional duplication. Because the latter is not OK, but the former is generally acceptable when it’s boilerplate code, or done, again, in the name of performance and efficiency.

boilerplate with alphabet and stuff
letters and numbers and symbols!

 

So, to expand on last week’s “Encapsulation & Abstraction” segment.  The conversation with one of my favorite developers went something like this.

Developer:

While I agree that there’s some over-reliance on nested views, the reason they get implemented a lot is because there’s a particular problem they seem to easily solve: how to encapsulate business-data rules without violating DRY.

Let’s say we have a biz-rule for a “core segment” of data.  For simplicity’s sake, let’s call that rule “Widget A consists of a Widget record and a WidgetSupplement record joined by WidgetID, where Widget.WidgetType is ‘foo’.”  So it seems obvious to create a view WidgetFooComplete, which pulls in the data from both tables and applies the type condition.  This creates a sort of “atomic building block” of data, which can be consumed by apps & data-access methods repeatedly & consistently.

Now, most downstream apps will use that WidgetFooComplete data in its entirety (or nearly).  But let’s say there’s a hot new app that needs some more data about the Widgets, and it needs to go out to the WidgetMoarProperties table.  The natural inclination is to incorporate our existing “building block” view, WidgetFooComplete, into a new view for this app & its dependencies, and call it WidgetFooMoarComplete.

But what’s the alternative?  If we re-create the JOIN/WHERE conditions on the base-tables in this new view, it violates DRY and makes possible future refactoring difficult if that biz-rule changes.

Admittedly, most modern data-access technologies make it easier to create these “building blocks” of joined data entities.  And sometimes those biz-rules belong in the app’s lower layers, but this can lead to writing lots of little disparate queries/db-calls for what should have been one atomic operation.  That can be a maintenance headache, as could dozens (hundreds) of tailored stored-procs for every data-access scenario.

So it seems like nested views can have their place, but “deep” nesting is usually troublesome.  And to prevent the “slippery slope” effect, we have to practice diligence.

Me:

That’s pretty spot-on.  DBAs tend to criticize them (nested views) as a practice in general because of the tendency to over-use and over-rely on them, and because of that slippery slope, where “a little” use turns into “a lot”, and leads to troubleshooting headaches.  And generalizations are just that.

To take some examples in-hand: simple entity relationships, especially when biz-critical, should be A) obvious, and B) documented.  Unified views can serve this purpose, but should only be used where appropriate — i.e. to load an object that gets passed around/up the app stack.  They’re great “atomic building blocks” when you actually need the entire block of data.  But when you don’t — say in a stored-proc that’s doing some data flow operation and only needs a small subset of that data block — it’s probably better to get the relationship logic from the view and copy-paste it (but hopefully not all of it!), while omitting the stuff that’s not needed.

The main reason for this is usually index tuning.  If we’ve crafted some indexes to meet certain query patterns in certain troublesome procs, we want those procs to use those indexes, not just do a full table scan because they’re using a nested-view which does select * .

When we get to more complex business rules, we need to up our diligence game and be more mindful of dependency checking when planning for a rule change.  Proc comment-headers can be helpful here, as can tools that search thru SQL object meta-data and code-bases to produce dependency chains.

The main point is, duplication tends to be OK when it’s not functional duplication, i.e. when the SQL code is more-or-less similar in some places but it’s not exactly the same because the purpose (responsibility) of that module/stored-proc is not the same.

You’re right in that the “31-flavors of tailored procs for data-access” is a big maintenance headache, and sometimes that trumps even the performance concerns.  Again it’s about balance — we have to be mindful of both the biz-rule-maintenance concerns and the performance concerns.

Developer:

I figured.  Sometimes I see DBAs criticize developers’ work without seeming to understand that it doesn’t always come from sloppiness or laziness (although sometimes it does!).  Often, we’re trying to thread that needle of performance vs. maintainability.  In Dev-land, “lazy” is good in the sense of aiming for simplified logic, for ease of both maintenance and understanding.  Painstakingly tailoring each data-access call (stored-proc), while good for performance, is kinda opposite of that.  But, admittedly, we do fall back on SELECT * all too easily.

Mostly, we try to avoid code duplication because it leads to heavier maintenance overhead.  When some modules may perform similar operations, functionally, they will often re-use the same “core” logic, which we in turn encapsulate into its own ‘thing’.  But in SQL modules, as you say, that’s not always performant, so it’s definitely a tightrope-walk.

The “Clean Code” school of thought says, if it’s obvious, it’s “self-documenting”.  I don’t always agree with it, but it comes from maintenance concerns again.  We don’t like situations where someone tweaks the code but doesn’t update the comments, and you end up with misleading comments.  Unfortunately, it does come down to diligence again, and even “good” developers will easily fall back to rarely including comments just to avoid this situation.  Of course, another potential pitfall of supposedly self-documenting code is, what’s “obvious” to one person isn’t necessarily so to everyone else!

(We both enjoy writing, can you tell?)  =P

So basically we agreed to “moderation in all things” and exchanged Buddha statues and sang Kum-Bay-Yah.  I enjoyed the exchange because it really got us both thinking more deeply about which areas of our business/app landscape are in better/worse shape than others.

developer-dba-harmony-buddha
yay collaboration!

To conclude this part.  You will continue to see DBAs rant and rail against nested views and other “sins against SQL”, but:  Developers, don’t take it personally — we’re just trying to eek the most performance-per-$3k-core-license out of our precious servers, and spend less time chasing the white rabbit down the nested-views-hole.  And DBAs, go easy on your Devs — they still outnumber you, and they can whip out a complete web-app using the hottest JavaScript framework and a cloud-of-the-month service, faster than you can tune a server.  Everybody’s valuable, and everybody works toward the same goal: solving the business’s problems thru technology.

Moving on…

Part 3: Misusing & Abusing Datatypes

Because I’m getting long-winded again, let’s wrap up with a final “Clean SQL Code” topic that’s short & sweet.

Well, not really.  There are entire presentations dedicated to this topic.  But I’ll try to keep it condensed.

A date is not a datetime is not a time​ is not a time interval.  Okay?  For the third time, stop interchanging them!  Yes I know, SQL Server is a bit behind some other RDBMS platforms when it comes to this stuff.  Sorry, I don’t work for Microsoft.  I just deal with their tech.

Deep breaths…

More to the point, know your data.  Understand that there can be consequences to repeatedly casting types, or losing precision during conversion, sometimes exponentially so.  Yes I know, we all love loosely-typed (sometimes stringly typed) languages like JS & Python.  Those are wonderful tools for certain jobs/problems.  Again, be mindful and know your flows.

flow with the chart yo
I’m not sure what’s more disturbing.. the fact that this was the first image search result for “flow meme”, or the fact that it’s actually quite appropriate.

Thanks for reading, as always!

Clean Code, the SQL

as a developer, DBA, or hybrid “DbDev”, you’re often tasked with writing or improving the stored procedures which house that complex logic.  And that’s my topic today: being clean about your SQL code.

Get it? It’s just too punny! … Ok I’m done.

The Coding Blocks guys did a series of episodes about the perennial favorite Clean Code book.  If you haven’t subscribed to their podcast…

do it do it now -Arnold
What are you waiting for?!?!

And it’s a great book, no doubt. But those guidelines for application code are not 100% directly applicable to database code.

wait... what?
Srsly?

Let’s back up a second. Why? That sounds about counter-intuitive, no?  Ok, more context. See, the traditional (“legacy”?) app consists of about 3 layers, the bottom one being the database. And it’s usually relational, and is usually responsible for far more than simple data access and persistence.  Read: complex business rules and process logic.  Data flow, not just getters and setters.

So that means, as a developer, DBA, or hybrid “DbDev”, you’re often tasked with writing or improving the stored procedures which house that complex logic.  And that’s my topic today: being clean about your SQL code.

Part 1: Comments

There’s a fairly famous quote from the book about comments:

Comments are always failures.

He’s using hyperbole, but for a purpose.  While his views on comments may be extreme, most programmers tend to realize the core essence of that chapter, which is that comments only serve to express something in plain English that the code has failed to express clearly enough to be easily and immediately understood.

With SQL scripts, and in particular with stored-procedures, I’m taking a somewhat opposite stance:

Comments are always appreciated, even if they’re potentially outdated or inaccurate.

There are two types of comments in SQL, the --inline and the /* block */.  Different people have their preferred flavors of block — sometimes it’s just several lines prefaced with the double-dash --.  And that’s fine, whatever floats your comment-boat.

hms commentus comment-boat
I made my own!! (most copied from an example at http://www.asciiworld.com/-Boats-.html)

In particular, I always encourage a comment block at the top of ever stored-proc & other user-defined programmable objects (function, types, etc).  Just a small example for illustration:

/*
Location: Server.Database
Author: NateTheDBA
Created: 2012-12-21
Description: Gets users who have not logged in since the given date.
Consumers: MyCoolAppName, MyReportServer
Revisions
2015-05-15, Nate: removed archive (never used after archive-date)
2017-06-07, Nate: fixed formatting for blog post
*/
CREATE PROCEDURE GetUsersNotLoggedInSince
    @SinceDate datetime2
BEGIN
    --some clever stuff goes here...
END

“But wait”, you say, “what about source control?”  Yes, all your programmable objects (and even, arguably, your reference data) should be in source control.  There are tool-vendors aplenty to help you with that.  But guess what?  Budgets.  Time & effort.  Oh, did I mention, legacy legacy legacy?  Yes, dear reader, the average business has years (decades) of organically evolved relational databases and processes.  Are you the guy or gal to swoop in on your unicorn and seamlessly convert their entire data tier infrastructure to a beautiful DevOps pipeline with shiny rainbows and kittens for all?  No?  Okay then.  Baby-steps.

devops-unicorn-dba-cleaning-up
Not that I’m bitter or anything…

Yes, my procs are in source control.  It’s called “daily automated script-out-objects-to-files which are then committed to SVN”.  It’s not built-in to SSMS.  Which means that I, or another DBA, or a potential consultant, or a Dev who gets enlisted to help improve a proc that runs for hours when it should only take minutes, would be inconvenienced by the extra trip to a separate tool/system to fetch some change-history just for context.  And really, that’s all this is for — CONTEXT.  We like to know what it is we’re working on when we start to work on it, without having to traverse a change-tree or go bug 3 other people who “might” have touched it last.  I’m not asking for a detailed log of every single time someone touched the thing; just give me the overview, the milestones and significant changes to functionality/features/scope so that I have a jump-off point for troubleshooting/testing/reasoning about it.

“But wait”, you say again, “shouldn’t your name be a sufficient description of what the proc does?”  Sure, in theory.  Until you have dependencies which need that name to stay the same even after an update or logic-change.  Like reports.  Or data-connected Excel workbooks.  Which are used daily by managers, who will come yelling at you if their worksheets suddenly stop functioning.

end rant

Back to comments in general.  The reason they’re helpful (besides documentation-headers for objects) is that they provide context and explain intent.  Half the time, my job as a DBA is improving or fixing someone else’s code.  Therefore, I want to see, in plain English, what it is they’re trying to accomplish, notes about attempts and failures, and the like.  Yes, I could have a discussion with them.  And I will.  But if I’m working on it asynchronously and they’ve moved on to something else, or our hours are different, I want those little nuggets of context and intent to be right there in the script, because that’s where I’m working!

What about queries that get passed-down from the app to the DB?  ORMs don’t support pre-pending a comment to their data calls, do they?  I wish.  Maybe some do, I haven’t researched it, but I know for sure that LINQ doesn’t.  But then again, when I’m using a query-capture tool (like DMVs, Profiler, X-events, or a vendor monitoring tool), ORM queries are so painfully obvious in comparison to hand-crafted SQL that I can usually spot them from a mile away, and go bother the app-devs for some context & conversation.  If you’re one of the poor unfortunate souls who still passes ad-hoc generated SQL statements down thru ODBC to your DB, then sure, a little comment won’t hurt anybody.

you poor unfortunate soul
it’s sad, but true…

So do your DBAs a favor, comment your SQL code, at least in terms of programmable database objects and ad-hoc scripts.  I promise, it’ll make them hate you less.  It might even make you love yourself more, because 3 months down the road when you revisit that proc, and you need to remember what it was for and why you did it that way, you’ll see it right there in your very own writing!  (OK, typing.)

Part 2: SRP, Encapsulation, and Abstraction

A bit of paraphrase of one of the book’s key points:

A reusable module (function, method) should do one thing, and do it well.

Also, the DRY principle:

Don’t repeat yourself.

When building SQL modules, we’re usually concerned with performance and accuracy, over abstraction and composability.  Therefore, repeating oneself is not necessarily a bad thing, when done for the right reasons.  Diligence is a big factor here — if there’s a non-trivial relationship between some entities that you’re repeating in several places, and you know that it could become a maintenance headache if that relationship’s definition has to change later, do as much as possible to mitigate the risk of dependency/consistency-loss.  This can be documentation, comments, and/or building that relationship into a view.

Make-Time-for-Due-Diligence
It’s important.

The latter brings up an interesting topic, one which I had a lively discussion about with a colleague recently (he’s a developer, and a dang good one) — nested views.  Because inevitably, the encapsulation of those relationships & business-rules into things like views or ITVF’s can and will lead to nesting those objects into other objects.  And troubleshooting many-level-nested views is a particularly frustrating exercise; in fact they’re what some DBAs call one of the “deadly sins of SQL“.  But there are perfectly valid reasons and uses for them, sometimes, and I really enjoyed the discussion thread we had on it, so I’ll have to expand on that in another post.

Anyway, I’m already getting long-winded and well over 1k words, so I’ll wrap it up for now, and continue this topic next week.

Thanks for reading, stay tuned!

Dates, Date-pickers, and the Devil

When a date range, or time period, is specified in SQL, it’s easiest, clearest, and most concise, to use a “greater-than-or-equal-to Period-Start, and less-than Next-Period-Start” logic. Mathematically speaking, we are defining the range as closed on the left, open on the right.

This is a bit rant-y, but… indulge me.  I’ve been writing/refactoring a lot of old reporting queries.  And, like most reports, they deal with dates and datetimes — as parameters, boundaries, or where/join predicates.  I also got way too intense with a recent SSC post (Sql Server Central), which fueled the fire even more.

fluffy-angry-puppy
I’m so cute and ANGRY!

SQL Server is very good at handling temporal datatypes and calculations against them.  We’ve got functions like dateadd, datediff, dateparts, datatypes datetime2 and datetimeoffset, datetime, etc.  It supports all sorts of format conversions if you need to display them in various ways.

..even though that should be left to the presentation layer!

Here’s the issue.  Well, there are several issues, but we only have time for a few.

Here’s the first problem

Report users don’t understand the “end of a time period” problem.  I don’t have a good name for it; others might call it the “Day plus one” problem or the “Less than date” problem.  What do I mean by this?  Well, let’s back up a bit, to DBA Commandment #6, “Thou shalt not use between with datetimes.”  In order to first understand the issue, we have to understand why this is a commandment.

When a date range, or time period, is specified in SQL, it’s easiest, clearest, and most concise, to specify it like so: @TheDate >= @StartOfPeriod and @TheDate < @StartOfNextPeriod.  Mathematically speaking, we’re defining the range as “closed on the left, open on the right”.  In other words, Min <= X < Max.

The reason we do this with datetimes is found right there in the name of the datatype — it has (or can have) a time component!

stone-tablets-with-roman-numerals-to-10
There are probably more than 10, but it’s a good starting point…

Let’s talk examples

Say you’d like to report on the month of March 2017.  How do you determine if your data-points (stored as datetime or, hopefully, datetime2) are within that period, that month?  Well sure, you could write where month(MyDateColumn) = 3 and year(myDateColumn) = 2017 

NO.  That is horrible, don’t do that.

It’s not SARGable and renders your index on that column useless.  (You do have an index on it, don’t you? No? Make one!)  Okay, let’s stick with something SARGable.  How about MyDateColumn between '20170301' and '2017-03-31T23:59:55.999'?  (You did read this post about using culture-neutral datetime literals right?)  But wait!  If your data is a datetime, it’s not actually that precise — your literal gets rounded up to 20170401 and you’re now including dates from April 1st (at midnight)!

Oh that’ll never happen… until it does.

Second problem

Many developers and report-writers assume that the values in their data will never be within the typical “1 second before midnight” or “1/300th of a second before midnight” escape window of your “3/31/2017 23:59:59.997” bounding value.  But can you guarantee that?  Didn’t think so.  Worse, if you use the .999 fraction as given in the 2nd example, you’re either “more” or “less” correct, and nobody can actually tell you which way that pendulum swings because it depends on the statistical likelihood of your data having actual literal “midnight” values vs. realistic (millisecond-y, aka “continuous”) values.  Sure, if you’re storing just a date, these things become a lot less complicated and more predictable.

But then why aren’t you storing it as an actual date, not a datetime!?

So what’s the right answer?

As I said, “greater than or equal to  ‘Start’, and less than ‘End'”, where ‘End’ is the day after the end of the period, at midnight (no later!).  Hence, MyDateColumn >= '20170301' and MyDateColumn < '20170401'.  Simple, yes?

keep calm and keep it simple
KCKS

But wait, there’s more!

I mentioned “date-pickers” in the title.  When it comes to UX, date-pickers are a sore subject, and rightly so — it’s difficult to truly “get it right”.  On a “desktop-ish” device (i.e. something with a keyboard), it may be easiest on the user to give them a simple text-box which can handle various formats and interpret them intelligently — this is what SSRS does.  But on mobile devices, you often see those “spinner” controls, which is a pain in the arse when you have to select, say, your birth date and the “Year” spinner starts at 2017.  #StopIt

I mean, I’m not that old, but spinning thru a few decades is still slower than just typing 4 digits on my keyboard — especially if your input-box is smart enough to flip my keyboard into “numeric only” mode.

Another seemingly popular date-picker UX is the “calendar control”.  Oh gawd.  It’s horrible!  Clicking thru pages and pages of months to find and click (tap?) on an itty bitty day box, only to realize “Oh crap, that was the wrong year… ok let me go back.. click, click, tap..” ad-nauseum.

stop-it-sign
#StopIt again

The point here is, use the type of date-picker that’s right for the context.  If it’s meant to be a date within a few days/weeks of today, past/future — OK, spinner or calendar is probably fine.  If it’s a birth date or something that could reasonably be several years in the past or future, just give me a damn box.  (Heck, I’ll take a series of 3 boxes, M/D/Y or Y/M/D, as long as they’re labeled and don’t break when I omit the leading-zero from a single-digit month #!)  If there’s extra pre-validation logic that “blocks out” certain dates (think bill-payer calendars or Disneyland annual-pass blackout-days), that probably needs to be a calendar too.

..just make sure it’s responsive on a mobile device.

And in all cases, pass that “ending date” to your SQL queries in a consistent, logical, sensible manner.  For reporting, where the smallest increment of a period is 1 day, that probably means automagically “adding 1 day” to their given end-date, because the end-user tends to think in those terms.  I.e. if I say “show me my bank activity from 1/1/2017 to 1/31/2017”, I really mean “through the end of the month“, i.e. the end of the day of 1/31.  So your query is going to end up wanting the end-date parameter to be 2/1/2017, because it’s using the correct & consistent “greater than or equal to start, and less than start-of-next” logic.

context-consistency-clarity
The 3 C’s

Final thoughts

I know it’s not easy to explain to business folks, and it’s not easy to implement correctly.  But it’s important.  The >= & < logic is clear, concise, and can be used consistently regardless of underlying datatype.  You just need to adjust your presentation layer (whether that’s SSRS parameters or a .NET date-picker) to convey their intent to the user, whether that’s “show/enter the last day of the month, but translate to the next day to feed to the query/proc.”, or “make them enter the next-day (day after the end of the month/period) and understand the ‘less than’ logic.”  I’m more inclined to the first, but it depends on your audience.

Thanks for reading, and happy date-ing!

DBA Holy Wars Part 2

Battle 4: GUIDs vs Identities

This is an oldie but goody.  A) Developers want their apps to manage the record identifiers, but DBAs want the database to do it.  B) Developers prefer abstracting the identity values out of sight/mind, DBAs know that occasionally (despite your best efforts to avoid it) your eyeballs will have to look at those values and visually connect them with their foreign key relationships while troubleshooting some obscure bug.

but-wait-theres-more-billy-mays
there’s ALWAYS more…

But there’s more to it than that.  See, none of those arguments really matter, because there are easy answers to those problems.  The real core issue lies with the lazy acceptance of GUI/designer defaults, instead of using a bit of brainpower to make a purposeful decision about your Primary Key and your Clustered Index.

Now wait a minute Mr. DBA, aren’t those the same thing?

NO!  That’s where this problem comes from!

A good Clustered Index is: narrow (fewer bytes), unique (or at least, highly selective), static (not subject to updates), and ever-increasing (or decreasing, if you really want).  NUSE, as some writers have acronym’d it.  A GUID fails criteria ‘N’ and ‘E’.  However, that’s not to say a GUID isn’t a fine Primary Key!  See, your PK really only needs to be ‘U’; and to a lesser extent, ‘S’.  See how those don’t overlap each other?  So sure, use those GUIDs, make them your PK.  Just don’t let your tool automagically also make that your CX (Clustered indeX).  Spend a few minutes making a conscious effort to pick a different column (or couple columns) that meet more of these requirements.

For example, a datetime column that indicates the age of each record.  Chances are, you’re using this column in most of your queries on this table anyway, so clustering on it will speed those up.

Most of the time, though, if your data model is reasonably normalized and you’re indexing your foreign keys (because you should!), your PKs & CX’s will be the same.  There’s nothing wrong with that.  Just be mindful of the trade-offs.

Battle 5: CSV vs TAB

bluray-vs-hddvd-fight
Who doesn’t love a good format-war?

Often, we have to deal with data from outside sources that gets exchanged via “flat files”, i.e. text files that represent a single monolithic table of data.  Each line is a row, and within each line, each string between each delimiting character is a column value.  So the question is, which is easier to deal with as that delimiter: comma, or tab?

String data values often have commas in them, so usually,the file also needs a “quoting character”, i.e. something that surrounds the string values so that the reader/interpreter of the file knows that anything found inside those quotes is all one value, regardless of any commas found within it.

But tabs are bigger.. aren’t they?  No, they’re still just 1 byte (or 2, in Unicode).  So that’s a non-argument.  Compatibility?  Every program that can read and automatically parse a .csv can just as easily do so with a .tab, even if Windows Explorer’s file icon & default-program handler would lead you to believe otherwise.

I recently encountered an issue with BCP (a SQL command-line utility for bulk copying data into / out of SQL server), where the csv was just being a pain in the arse. I tried a tab and all was well! I’m sure it was partially my fault but regardless, it was the path of least resistance.

Battle 6: designers vs scripting

no-wizard-allowed
Wizards are usually good, but in this case, they’re lazy and bad for you…

This should be a no-brainer. There is absolutely no excuse for using the table designer or any other wizardy GUIs for database design and maintenance, unless you’re just learning the ropes. And even then, instead of pressing ‘OK’, use the ‘Script’ option to let SSMS generate a `tsql` script to perform whatever actions you just clicked-thru.  Now yes, admittedly those generated scripts are rarely a shining example of clean code, but they get the job done, even with some unnecessary filler and fluff.  Learn the critical bits and try to write the script yourself next time– and sure, use the GUI-to-script to double check your work, if you still need to.

Confession: I still use the GUI to create new SQL Agent Jobs. It’s not that I don’t know how to script it, it’s just that there are so many non-intuitive parameters to those msdb system-sp’s that I usually have to look them up, thereby spending the time I would have otherwise saved.

Bonus round: the pronunciation of “Data”

its-data-not-data
Call me “big Data” one more time…

Dah-tuh, or Day-tuh?  Or, for the 3 people in the world who can actually read those ridiculous pronunciation glyphs, /ˈdeɪtə/ or /ˈdætə/ ?  It’s a question as old as the industry itself… or maybe not.  Anecdotally, it seems like most data professionals, and people in related industries, tend to say “day-tuh”; while those in the media and generally less technical communities tend to say “dah-tuh”.  (Where the first syllable is the same vowel-sound as in “dad” or “cat”.)  This likely means that the latter is more popular, but the former is more industrially accepted.

In either case, it doesn’t really matter, because at the end of the day, we’re talking about the same thing.  So if some dogmatic DBA or pedantic PHB tries to correct your pronunciation, tell ’em to stop being so persnickety and get on with the task at hand!

Until next time…

DBA Holy Wars

On a lighter note than usual, I thought it was time I weighed in on some of the long standing “programmer holy wars”, but with a little DBA-twist (like a twist of lime, only less delicious).  Like any good holy war, this will be full of posturing, pontificating, and political correctness.  And I probably won’t even commit to a particular side on some issues.  But hey, isn’t that the point?

Battle 1: Tabs vs. Spaces

tabs-vs-spaces-run-away
OH SHI…
Text editors and IDEs have long been mature enough to handle “smart tabs” and preference-based tab size.  However, you will occasionally have to copy-paste code into a non-code-oriented environment, such as an email or a document, where of course the tab size is based on inches rather than spaces in a monospace font.  I will admit in those rare instances, tabs are annoying.  But what is more annoying is the inconsistency you can get when spaces are used incorrectly, especially in the midst of lines in a sad attempt to do some kind of vertical alignment.  Plus, if you happen to have a different spacing-size preference than the original code author, you’re now battling that visual discrepancy as you read & maintain said code.

So I prefer tabs.  But I won’t fight my team on it if everybody else prefers spaces — that’s what those settings in the editor/IDE are there for!  I will happily conform with the best of them.  A quick Google says I’m in the minority anyway — which I’m OK with.

Battle 2: The Case for Casing

capslock-cruise-control-for-cool
The original, if somewhat dated.
Certain languages (COBOL, SQL) have a historical bent toward ALLCAPS for their keywords and language constructs.  Some argue that this is archaic, outmoded, etc.  I don’t mind it, working primarily with SQL, but in almost all other languages (C#, Python, JavaScript), I think it makes sense to follow the established conventions, and modern conventions never favor caps.  As I transitioned from C# to SQL, I actually wrote my scripts and stored-procs primarily in lower case for the longest time.  And then I came into an environment where RedGate’s SQL Prompt was in heavy use, and since its default “auto-format” settings are in-line with the SQL language “standard” (however old and dated it may be), it started YELLING all the keywords at me.. and like most people, I just accepted it, eventually letting it become my own “default” style.  (SQL Prompt is a fantastic tool, don’t get me wrong.  I absolutely love it, but its default formatting settings never agreed with me — then again, nor do anybody else’s, as we already discussed!)

But that’s not really what this battle is usually about.  Most often, it’s about your names, i.e. the identifiers for objects/methods/variables/procedures/APIs/etc. that your team and your developers have to come up with on a constant basis.  And usually it comes down to camelCase, TitleCase (which are often incorrectly used interchangeably!  and is apparently better known as PascalCase, which I just learned today, or possibly re-learned after several years), or lower_case_with_underscores (which, in another learning moment, I discovered is named snake_case!  How cool is that?).  Rarely, if ever, do people argue for ALLCAPS in these areas — it just feels.. obnoxious.

caps-lock-not-always-necessary
Yelling doesn’t always get you what you want…
As with any programmer-y topic, you can dive down the rabbit-hole and dissect layer upon layer of nuance in this battle until you’ve lost all semblance of productivity.  Because casing is, in some languages, important; while in others it’s simply convention-based, dependent on the abstraction level or family of things you’re talking about.  For example, C# Class names are TitleCase, and so typically are Methods, while object instances are usually camelCasepublic members can be TitleCase or camelCase, and private members can be _underscore_led, or whatever flavors for each that your boiler-plate/template system prefers.  Scoped variableNames are most often camel’d as well, while global constants are typically CAPS_WITH_UNDERSCORES.  And god help you if you ask a team of more than 3 people what their dependency packages’ names should look like.

omg-object-and-casing
Shamelessly borrowed from Adam Prescott’s blog, which you should definitely go read.
So in this battle, I have to play Switzerland.  I’m not vehemently opposed to any particular flavor of casing, finding it best to work within the conventions of the language and tool-set at hand.

Side-battle: Spacing in Names

That said, I can’t stand names/identifiers with actual white space in them, but that’s a somewhat different battle.  Most languages don’t even allow that, but most RDBMSs will happily accept your ridiculous My Cool Database and its resident Silly Tables and Happy Column 1/2/etc. as long as you properly “quote” them (surround them with [square-brackets] or `backticks`, depending on the SQL flavor).  If you submit that kind of nonsense to me, I will find you, and I will slap you with a large trout.

trout-slap
Particularly offensive names may warrant a double trout slap.

Battle 3: ORM vs Stored-Procs (vs Linq?)

This is that little twist-of-DBA as promised.  I recently read an interesting post related to this topic, and essentially the point was this: Developers have “won” (won what? I thought were all on the same side!), the ORM is here to stay, and as DBAs/DBDevs, we (you/I) need to build up our understanding of them so that we A) know them even better than our devs, and B) can troubleshoot performance issues with them.

I think there’s some truth to that, and some necessary context as well.  Ideally, yes, I would be an ORM expert on whatever 1 or 2 specific frameworks my colleagues are using (Entity Framework, most likely), and any time there was a potential performance challenge with a app-to-database call, I’d be able to parachute-in and sprinkle some magic dust and make it all better.  But I’m also the one DBA (out of approx. 1.3 total), serving 4 teams of 3-6 devs each, so in the immortal words of meme-dom:

Ain’t nobody got time for that!

nobody-got-time-old-fashioned
because sometimes old-fashioned things are funny too…
Now I’m not making excuses.  All I’m saying is, the burden of understanding is on more than just one team member or job-role.  If your dev team is adapting an ORM, said devs need to learn how it works too — at least enough to help with basic performance troubleshooting.  Even if it’s just the ability to extract, from a debug session, the actual T-SQL code that’s being sent to the server, and give me a sample query to analyze for performance bottlenecks.

Let’s step back a bit.  It’s all about using the right tool for the job, yes?  ORMs are meant for basic CRuD operations and simple data access patterns, right?  So why try to build complex business logic into them?  Because, like it not, teams do build complex business logic into the data layer — despite our protests and soapbox sermons to not do it.  And because the vast majority of applications we’re dealing with are not greenfield.  Furthermore, ORMs tend to work best when the data model is well-defined, or the database is modeled well (well-modeled?).  And again, we don’t all get to work with unicorns in utopia.

Put it this way: If you want an efficient, performant module of data-layer business-logic against your SQL database, it’s likely going to be a stored procedure carefully crafted by a DBA/DBDev.  Could you achieve the same results from the app layer, using Linq and/or some mix of ORM and code?  Probably.  Do you have the time and patience to do so?  Maybe not.

i-have-no-strong-feelings-neutral
If I don’t survive this… tell my wife, “hello”.
So once again, I’m Switzerland.  Well, preferably a more pragmatic version — what country would that be?  Norway?  Anyway.  Use the methodology that’s the best compromise between “right tool for the job”, “optimized developer productivity”, and “easiest to troubleshoot”.  It’s a tough call, but that’s why we get paid.

Until next time!