Dates, Date-pickers, and the Devil

When a date range, or time period, is specified in SQL, it’s easiest, clearest, and most concise, to use a “greater-than-or-equal-to Period-Start, and less-than Next-Period-Start” logic. Mathematically speaking, we are defining the range as closed on the left, open on the right.

This is a bit rant-y, but… indulge me.  I’ve been writing/refactoring a lot of old reporting queries.  And, like most reports, they deal with dates and datetimes — as parameters, boundaries, or where/join predicates.  I also got way too intense with a recent SSC post (Sql Server Central), which fueled the fire even more.

fluffy-angry-puppy
I’m so cute and ANGRY!

SQL Server is very good at handling temporal datatypes and calculations against them.  We’ve got functions like dateadd, datediff, dateparts, datatypes datetime2 and datetimeoffset, datetime, etc.  It supports all sorts of format conversions if you need to display them in various ways.

..even though that should be left to the presentation layer!

Here’s the issue.  Well, there are several issues, but we only have time for a few.

Here’s the first problem

Report users don’t understand the “end of a time period” problem.  I don’t have a good name for it; others might call it the “Day plus one” problem or the “Less than date” problem.  What do I mean by this?  Well, let’s back up a bit, to DBA Commandment #6, “Thou shalt not use between with datetimes.”  In order to first understand the issue, we have to understand why this is a commandment.

When a date range, or time period, is specified in SQL, it’s easiest, clearest, and most concise, to specify it like so: @TheDate >= @StartOfPeriod and @TheDate < @StartOfNextPeriod.  Mathematically speaking, we’re defining the range as “closed on the left, open on the right”.  In other words, Min <= X < Max.

The reason we do this with datetimes is found right there in the name of the datatype — it has (or can have) a time component!

stone-tablets-with-roman-numerals-to-10
There are probably more than 10, but it’s a good starting point…

Let’s talk examples

Say you’d like to report on the month of March 2017.  How do you determine if your data-points (stored as datetime or, hopefully, datetime2) are within that period, that month?  Well sure, you could write where month(MyDateColumn) = 3 and year(myDateColumn) = 2017 

NO.  That is horrible, don’t do that.

It’s not SARGable and renders your index on that column useless.  (You do have an index on it, don’t you? No? Make one!)  Okay, let’s stick with something SARGable.  How about MyDateColumn between '20170301' and '2017-03-31T23:59:55.999'?  (You did read this post about using culture-neutral datetime literals right?)  But wait!  If your data is a datetime, it’s not actually that precise — your literal gets rounded up to 20170401 and you’re now including dates from April 1st (at midnight)!

Oh that’ll never happen… until it does.

Second problem

Many developers and report-writers assume that the values in their data will never be within the typical “1 second before midnight” or “1/300th of a second before midnight” escape window of your “3/31/2017 23:59:59.997” bounding value.  But can you guarantee that?  Didn’t think so.  Worse, if you use the .999 fraction as given in the 2nd example, you’re either “more” or “less” correct, and nobody can actually tell you which way that pendulum swings because it depends on the statistical likelihood of your data having actual literal “midnight” values vs. realistic (millisecond-y, aka “continuous”) values.  Sure, if you’re storing just a date, these things become a lot less complicated and more predictable.

But then why aren’t you storing it as an actual date, not a datetime!?

So what’s the right answer?

As I said, “greater than or equal to  ‘Start’, and less than ‘End'”, where ‘End’ is the day after the end of the period, at midnight (no later!).  Hence, MyDateColumn >= '20170301' and MyDateColumn < '20170401'.  Simple, yes?

keep calm and keep it simple
KCKS

But wait, there’s more!

I mentioned “date-pickers” in the title.  When it comes to UX, date-pickers are a sore subject, and rightly so — it’s difficult to truly “get it right”.  On a “desktop-ish” device (i.e. something with a keyboard), it may be easiest on the user to give them a simple text-box which can handle various formats and interpret them intelligently — this is what SSRS does.  But on mobile devices, you often see those “spinner” controls, which is a pain in the arse when you have to select, say, your birth date and the “Year” spinner starts at 2017.  #StopIt

I mean, I’m not that old, but spinning thru a few decades is still slower than just typing 4 digits on my keyboard — especially if your input-box is smart enough to flip my keyboard into “numeric only” mode.

Another seemingly popular date-picker UX is the “calendar control”.  Oh gawd.  It’s horrible!  Clicking thru pages and pages of months to find and click (tap?) on an itty bitty day box, only to realize “Oh crap, that was the wrong year… ok let me go back.. click, click, tap..” ad-nauseum.

stop-it-sign
#StopIt again

The point here is, use the type of date-picker that’s right for the context.  If it’s meant to be a date within a few days/weeks of today, past/future — OK, spinner or calendar is probably fine.  If it’s a birth date or something that could reasonably be several years in the past or future, just give me a damn box.  (Heck, I’ll take a series of 3 boxes, M/D/Y or Y/M/D, as long as they’re labeled and don’t break when I omit the leading-zero from a single-digit month #!)  If there’s extra pre-validation logic that “blocks out” certain dates (think bill-payer calendars or Disneyland annual-pass blackout-days), that probably needs to be a calendar too.

..just make sure it’s responsive on a mobile device.

And in all cases, pass that “ending date” to your SQL queries in a consistent, logical, sensible manner.  For reporting, where the smallest increment of a period is 1 day, that probably means automagically “adding 1 day” to their given end-date, because the end-user tends to think in those terms.  I.e. if I say “show me my bank activity from 1/1/2017 to 1/31/2017”, I really mean “through the end of the month“, i.e. the end of the day of 1/31.  So your query is going to end up wanting the end-date parameter to be 2/1/2017, because it’s using the correct & consistent “greater than or equal to start, and less than start-of-next” logic.

context-consistency-clarity
The 3 C’s

Final thoughts

I know it’s not easy to explain to business folks, and it’s not easy to implement correctly.  But it’s important.  The >= & < logic is clear, concise, and can be used consistently regardless of underlying datatype.  You just need to adjust your presentation layer (whether that’s SSRS parameters or a .NET date-picker) to convey their intent to the user, whether that’s “show/enter the last day of the month, but translate to the next day to feed to the query/proc.”, or “make them enter the next-day (day after the end of the month/period) and understand the ‘less than’ logic.”  I’m more inclined to the first, but it depends on your audience.

Thanks for reading, and happy date-ing!

DBA Holy Wars

On a lighter note than usual, I thought it was time I weighed in on some of the long standing “programmer holy wars”, but with a little DBA-twist (like a twist of lime, only less delicious).  Like any good holy war, this will be full of posturing, pontificating, and political correctness.  And I probably won’t even commit to a particular side on some issues.  But hey, isn’t that the point?

Battle 1: Tabs vs. Spaces

tabs-vs-spaces-run-away
OH SHI…
Text editors and IDEs have long been mature enough to handle “smart tabs” and preference-based tab size.  However, you will occasionally have to copy-paste code into a non-code-oriented environment, such as an email or a document, where of course the tab size is based on inches rather than spaces in a monospace font.  I will admit in those rare instances, tabs are annoying.  But what is more annoying is the inconsistency you can get when spaces are used incorrectly, especially in the midst of lines in a sad attempt to do some kind of vertical alignment.  Plus, if you happen to have a different spacing-size preference than the original code author, you’re now battling that visual discrepancy as you read & maintain said code.

So I prefer tabs.  But I won’t fight my team on it if everybody else prefers spaces — that’s what those settings in the editor/IDE are there for!  I will happily conform with the best of them.  A quick Google says I’m in the minority anyway — which I’m OK with.

Battle 2: The Case for Casing

capslock-cruise-control-for-cool
The original, if somewhat dated.
Certain languages (COBOL, SQL) have a historical bent toward ALLCAPS for their keywords and language constructs.  Some argue that this is archaic, outmoded, etc.  I don’t mind it, working primarily with SQL, but in almost all other languages (C#, Python, JavaScript), I think it makes sense to follow the established conventions, and modern conventions never favor caps.  As I transitioned from C# to SQL, I actually wrote my scripts and stored-procs primarily in lower case for the longest time.  And then I came into an environment where RedGate’s SQL Prompt was in heavy use, and since its default “auto-format” settings are in-line with the SQL language “standard” (however old and dated it may be), it started YELLING all the keywords at me.. and like most people, I just accepted it, eventually letting it become my own “default” style.  (SQL Prompt is a fantastic tool, don’t get me wrong.  I absolutely love it, but its default formatting settings never agreed with me — then again, nor do anybody else’s, as we already discussed!)

But that’s not really what this battle is usually about.  Most often, it’s about your names, i.e. the identifiers for objects/methods/variables/procedures/APIs/etc. that your team and your developers have to come up with on a constant basis.  And usually it comes down to camelCase, TitleCase (which are often incorrectly used interchangeably!  and is apparently better known as PascalCase, which I just learned today, or possibly re-learned after several years), or lower_case_with_underscores (which, in another learning moment, I discovered is named snake_case!  How cool is that?).  Rarely, if ever, do people argue for ALLCAPS in these areas — it just feels.. obnoxious.

caps-lock-not-always-necessary
Yelling doesn’t always get you what you want…
As with any programmer-y topic, you can dive down the rabbit-hole and dissect layer upon layer of nuance in this battle until you’ve lost all semblance of productivity.  Because casing is, in some languages, important; while in others it’s simply convention-based, dependent on the abstraction level or family of things you’re talking about.  For example, C# Class names are TitleCase, and so typically are Methods, while object instances are usually camelCasepublic members can be TitleCase or camelCase, and private members can be _underscore_led, or whatever flavors for each that your boiler-plate/template system prefers.  Scoped variableNames are most often camel’d as well, while global constants are typically CAPS_WITH_UNDERSCORES.  And god help you if you ask a team of more than 3 people what their dependency packages’ names should look like.

omg-object-and-casing
Shamelessly borrowed from Adam Prescott’s blog, which you should definitely go read.
So in this battle, I have to play Switzerland.  I’m not vehemently opposed to any particular flavor of casing, finding it best to work within the conventions of the language and tool-set at hand.

Side-battle: Spacing in Names

That said, I can’t stand names/identifiers with actual white space in them, but that’s a somewhat different battle.  Most languages don’t even allow that, but most RDBMSs will happily accept your ridiculous My Cool Database and its resident Silly Tables and Happy Column 1/2/etc. as long as you properly “quote” them (surround them with [square-brackets] or `backticks`, depending on the SQL flavor).  If you submit that kind of nonsense to me, I will find you, and I will slap you with a large trout.

trout-slap
Particularly offensive names may warrant a double trout slap.

Battle 3: ORM vs Stored-Procs (vs Linq?)

This is that little twist-of-DBA as promised.  I recently read an interesting post related to this topic, and essentially the point was this: Developers have “won” (won what? I thought were all on the same side!), the ORM is here to stay, and as DBAs/DBDevs, we (you/I) need to build up our understanding of them so that we A) know them even better than our devs, and B) can troubleshoot performance issues with them.

I think there’s some truth to that, and some necessary context as well.  Ideally, yes, I would be an ORM expert on whatever 1 or 2 specific frameworks my colleagues are using (Entity Framework, most likely), and any time there was a potential performance challenge with a app-to-database call, I’d be able to parachute-in and sprinkle some magic dust and make it all better.  But I’m also the one DBA (out of approx. 1.3 total), serving 4 teams of 3-6 devs each, so in the immortal words of meme-dom:

Ain’t nobody got time for that!

nobody-got-time-old-fashioned
because sometimes old-fashioned things are funny too…
Now I’m not making excuses.  All I’m saying is, the burden of understanding is on more than just one team member or job-role.  If your dev team is adapting an ORM, said devs need to learn how it works too — at least enough to help with basic performance troubleshooting.  Even if it’s just the ability to extract, from a debug session, the actual T-SQL code that’s being sent to the server, and give me a sample query to analyze for performance bottlenecks.

Let’s step back a bit.  It’s all about using the right tool for the job, yes?  ORMs are meant for basic CRuD operations and simple data access patterns, right?  So why try to build complex business logic into them?  Because, like it not, teams do build complex business logic into the data layer — despite our protests and soapbox sermons to not do it.  And because the vast majority of applications we’re dealing with are not greenfield.  Furthermore, ORMs tend to work best when the data model is well-defined, or the database is modeled well (well-modeled?).  And again, we don’t all get to work with unicorns in utopia.

Put it this way: If you want an efficient, performant module of data-layer business-logic against your SQL database, it’s likely going to be a stored procedure carefully crafted by a DBA/DBDev.  Could you achieve the same results from the app layer, using Linq and/or some mix of ORM and code?  Probably.  Do you have the time and patience to do so?  Maybe not.

i-have-no-strong-feelings-neutral
If I don’t survive this… tell my wife, “hello”.
So once again, I’m Switzerland.  Well, preferably a more pragmatic version — what country would that be?  Norway?  Anyway.  Use the methodology that’s the best compromise between “right tool for the job”, “optimized developer productivity”, and “easiest to troubleshoot”.  It’s a tough call, but that’s why we get paid.

Until next time!

An Open Letter To Management

We need to talk about this stuff – candidly, openly, broadly, deeply.

More accurately, to legacy enterprise management.

Let’s say the following directive comes down from on-high: “Hey, our CEO wants us to provide better financial metrics reports and a dashboard that management can see to show real-time stats about the company.”

*groan*

I mean… Sure!  Yay, digital transformation, modernization, mobile friendly, all that good stuff!!

So, I have some thoughts on this, because I’ve seen the current state of things in small-medium enterprise, and am anxious to help improve that state to provide better value to the business.  To misquote Dennis Miller, I don’t mean to on a rant here, but…

First topic: Reality Check

It starts at the top, with a couple realizations:

  1. Data is ever-growing.
    1. We need to get smarter about managing its growth, including archiving/retention schemes, data warehousing, etc.
    2. This involves compliance regulations and operational resources.
      1. We need to ensure compliance with biz standards and data shelf-life.
      2. We need to automate as much as possible to avoid over-burdening our human resources (and to some extent our servers too).

For example, you can’t expect the same response-time for a query into 10-year-old financial data as you do for 1-year-old data.

  1. Traditional SSRS (SQL Server Reporting Services) is an operational time-sink.
    1. We spend way too much time assigning access, creating redundant “on demand” reports, and making seldom-used email subscriptions.
    2. We’re probably running on an old version, say 2008R2
      1. Vast improvements have come to the MS Data/BI platforms in the last decade and we need to take advantage of them.
      2. It’s not mobile-friendly at all; it’s not even modern-browser friendly, as some of its UX elements are still explicitly functional in Internet Explorer
    3. We tacked-on some 3rd-party application to attempt to bring some data-warehouse functionality into the environment, but only 1 person “knows it intimately” and is comfortable developing new reports with it.
  2. Our ERP system, in its current state/version, is a tangled mess, to the eyes of a DBA & query-writer/report-writer.
    1. We’ve bolted-on so much customization and special-configuration that it’s not suitable for stock/canned reports from the vendor, even if we upgraded to a version of the app that had a decent reporting engine.
    2. We can’t even decide on very basic things like “What is a ‘unit‘ of production?”, or “What are the different areas/groupings we break-out for revenue metrics?”

Ok sure, maybe we can agree on what those groupings are, but we can’t even get a consensus on what we call them!

Second topic: Single Source of Truth

We need to agree on a standard, documented, official set of business rules that answer such questions as “how do we measure revenue?”, “what are our different sub-orgs/departments/groupings for how we report on revenue?”, “what is ‘production output’ and how do we measure it?”, “how do we calculate bonuses for this group of employees?”, etc. More than that, we need to agree on naming things – we need a common, consistent nomenclature and understanding of what it means when someone says “N# Units”, “Department X” or “Order Aging” or “Membership Level” or “Bonus Type Y”.

And even more than that, we need to map those concepts to concrete, documented rule-sets that are manifested in the data somehow (from the simplest example, a “look-up table” or “reference table”, to the complex examples like a “data mart” or “analysis cube” or “ETL process”). This concept is sometimes called a “data dictionary”, which kinda belies its complexity, because it’s really more of a “data encyclopedia” – it needs to document what, how, why, & when.

What our concepts/terms/data-points mean, how they’re used, why they’re useful, & when they should be used.

Third topic: KISS and KPI’s

Management reports need to be simple. Yes, there are power-users who want the detail, and there are auditors who in fact require the detail. But your average C-level (or even P/VP-level) exec doesn’t care about that stuff – they want very simple answers to deceivingly simple (i.e. can be very complex under-the-hood) questions, like “How much money did we make this quarter for department X?”, or “What kind of productivity bonus do I give to group Y?”. But that’s just the beginning – that’s descriptive analytics. What they really want, but are sometimes too afraid to ask, are more powerful questions, like “How much money can I expect to make in market Z or state XX?”, “What are our expected new loyalty program memberships, and how much will they profit us?” — predictive analytics.  (And we’re not even going to touch prescriptive analytics yet, because you’re not ready for that.)

KISS means we need to try our best to hide the nitty-gritty details and “under the hood” logic/calculations from the end-user or report audience. But, that means fully knowing and understanding those details and rules and logic flows so that we can implement them!

KPI is Key Performance Metric. That’s the golden nugget, the one piece of information that the manager/report-viewer ultimately is after, the thing that makes them go “Got it! That’s the answer I was looking for!”, so they can make their business-decision and move on with their day. These aren’t necessarily just single numbers (like an overall revenue figure); they can be pie-charts, bar-graphs, a clear & concise grid, or whatever makes the most sense for the business-problem/business-decision at hand.

This all sounds fantastic, right? So what’s the catch?

Fourth topic: Time & Effort

Time is money, which is resources, which is people, learning, training, developing, implementing, testing, validating… rinse, repeat.  You don’t put that all on the shoulders of a lone DBA; that life-cycle touches many different disciplines and team members – managers, business users, accounting folks, marketing people, analysts, developers, testers, operational leads, and yes, of course, all of IT infrastructure (helpdesk, engineering, DBA).  And you don’t just buy a box off the shelf at your local software retailer and say “look, we’re gonna implement Tableu!”, wave a magical IT wand, and call it day.

Now we, as technologists, are more than willing to learn and educate ourselves, but…

There needs to be a matching dedication from the business to that effort, and to the platform(s) that is/are chosen.

That means, in concrete terms, a few things:

  1. Training budget & resources
    • Conferences, courses thru online training providers, cross-team collaboration.
  2. Product & technology investment
    • Upgrades, net-new products, whatever is needed.
  3. Time allowances & agreements
    • Dedicated scheduling where the “daily grind” operations take a back-seat and we can focus on the new stuff.
  4. Support from SME’s
    • The ability to call-out to a qualified expert when critical questions or roadblocks arise.
    • Can be contractors, consultants, service-providers, or platform-providers. The point is, you only use them if you need them, so you keep the cost relatively low.

That’s if you’re dedicated to in-house team/ability build-up. If you want to outsource, you have a different set of challenges:

  1. Contractors are expensive!
    1. Their requirements are exceedingly rigid.
    2. They’re likely to scoff (yes, even outright laugh) at the quagmire of data & logic & rules that we’ve created and/or want to build into our “magical reporting stack”.
  2. They’ll still require that same product/tech investment.
    1. No contractor is going to accept your old legacy SSRS instance as a baseline for building a modern, responsive, effect reporting system. The first thing they’ll say is “upgrade that, & come back to us.”
    2. Likewise for your legacy ERP system – sure, it’s a little less obsolete, and there are probably plenty of shops running it & developing on it, but good luck getting new-hire contractors to embrace it; at best, they’ll begrudge it; at worst, they’ll charge exorbitant fees for having to work on such an old platform.
  3. Technical debt is their worst enemy.
    1. Like it or not, like most decades-old enterprises, we have technical debt up the wazoo.
    2. Contractors won’t work in a debt-heavy environment; they’ll insist you “fix the debt” and come back to them in a few months/years when it’s all happy & pretty & green.

Technical debt is our enemy, too, but at least we “own” it – i.e. we’re aware of it and we have ideas on how to fix it, if/when we ever get the time.

It’s like our city roads: at least we know where the potholes are, and how to avoid them.

Executive Summary

My point, from this rambling and probably way too lengthy post, is this: We need to talk about this stuff. Yes, Mr. Manager, I know you already said that. Let me embellish:

We need to talk about this stuff, candidly, openly, broadly, deeply, cross-functionally (made-up phrase #2), even company-wide.

Because, while the end-goal is deceptively simple (“We want report dashboards!”), the underlying systems are complex, with lots of moving parts, requiring lots of knowledge (both domain/biz and tech), and lots of management (compliance, governance, automation, visibility/monitoring).

It’s not just a technology challenge. It’s a people challenge. It’s a cultural challenge. It’s an organizational challenge.

It’s a challenge that, when faced, met, and overcome, can lead to spectacular growth and success for all involved!

(And that’s my attempt to end this rant on a positive note. Enjoy!)


PS: No, I’m not happy about WordPress’s inability to understand the ‘style’ attribute of a simple <ol> tag, but I tried… so apologies if the outlines are not intuitive because each level is just another set of numbers, instead of Word-style outlining like 1.. a.. i.. etc.  Grr arg!

SQL Server for the Developer

And now for a brief interlude while I work on my next “real” post… continuing with the same theme… SQL Server for the developer/programmer!  Warning: potentially excessive snark ahead.

caution-sarcasm-ahead

SQL Server is a mystical fairy-tale land where you can store all your application data, and never have to worry about it again!  Sure, it “should” be relational, but you also “should” drive the speed limit.. PSHH!  Let’s throw blobs in there too — HTML, XML, hell, even JSON now that 2016’s got nice happy JSON-centric system functions.  And files?  Heck yes, what else is FILESTREAM for?  Sure, performance may be abysmal, but that’s somebody else’s problem, so why should we care?

It’s the only database platform your company invested in, so it must be good for everything.  Document storage?  Sure, not like there’s anything else better suited for the job, especially for ‘Free’!  Ooh, how about key-value data, and volatile temporal data like .NET Session-State?  Obviously!  Nothing else exists, and Microsoft makes it so simple to set it up in SQL.

sarcasm-sign-everybody-needs-one
Here’s your sign…
It’s that wonderful database system where we send all our Entity Framework queries and data just magically appears!  We don’t need to worry about how nasty and convoluted those queries turn out when they finally get parsed down to actual TSQL code.  It was only 3 nice simple lines of LINQ, why should it be any more complex than that?  Oh, and we can write all sorts of abhorrent ad-hoc queries against it too, nobody will even notice.  Let that code build all sorts of IN SELECTs & sub-queries, chains of unreadable JOINs, and Inception-level nested VIEWs, it’s perfectly fine!  Performance, shmerformance!  That’s what we have DBAs for.  Not that they’ll be able to help you when you refuse to refactor or rewrite your code that’s responsible for these abominable queries.

what has been seen cannot be unseen
Your terrible queries are scaring the dog…
SQL Server is so easy!  Microsoft gives me a free Developer Edition, I can code for all these awesome features that I know we’ll have in production, right?  Oh, we only run Standard Edition?  Woops.  Ooh, I know!  Let’s put everything in Azure.  It can do everything and make us breakfast!  Then we won’t need to worry about feature differences, backups, RTO/RPOs, index maintenance, performance tuning, scaling, or even performance in general!  Wait, no?  Oh well, that’s what we have DBAs for!  Besides, it’s so easy to move everything over, we just take our SQL backups and upload them to blob storage and then restore them on Azure SQL Database, right?  No?

this is why we can't have nice things
Exactly.
Hmm.  Maybe we should hire a DBA.  Or four.

That’s all, see ya next time!  😄

What’s in a Name?

a thing by any other name… is a different thing.

Let’s just start with a universal truth:

Names matter.

What do I mean by that? In technology, specifically. Well, let me explain.

The name you choose for a thing – whether it’s a server, a database, an application, a service, a class, a file, a method, a function, a procedure, ad nauseum – that name is immediately and irrevocably in love with Edward associated with that thing. Even if you miraculously get to change that name somewhere down the road, it will still be (at least for a while) “formerly known as” the original name. Legacy teams will continue to use that old name until they’ve convinced everybody the new name is worth the trouble of switching all dependency references & documentation. Managers will continue to use the old name even longer because they’re not close enough to the tech to be aware of, let alone understand, the full implications of said name change.

you-keep-using-that-name
Enigo Montoya prefers easy and descriptive names, like “Six-Fingered-Man”.

So that’s why names matter. They’re a unique and semi-permanent identifier of something; they encapsulate the essence of that thing in a single word or phrase. And they become embedded in the business jargon and technology team lore. Such lofty expectations of such a simple (hopefully), short (usually), and often under-appreciated element of our field.

It’s unbelievably easy to find bad examples of IT naming schemes, too. Just look:

I don’t know about you, but I like my names to be pronounceable. I don’t want to need dozens of syllables and two extra breaths just to say a couple server names in conversation. It’s not hat hard to create meaningful multi-part names that are still phonetic. We speak these things just as much as we type them, let’s make them speak-able!

Thus, once again,

Names matter. REALLY.

And so, dear reader, choose them wisely, with care and consideration. With lots and lots of context. Maybe even some conventions, or at least conventional wisdom. And consistency. Plan for the future, while being aware of the past. Don’t let legacy remnants stand in the way, but don’t let delusions of grandeur force you into a naming scheme that breeds confusion or resentment.

dilbert-namingconventions

Allow me to illustrate. We have a handful of SQL servers as part of our core IT infrastructure. Let’s call them FooDB, BarDB, and CharDB. Now, they each serve a fairly distinct and semi-isolated set of apps/websites. In other words, they each fill a role. (Of course, there are cross-dependencies and integrations, but nothing extreme.) Now, we want to migrate them to new hardware and plan for scale, assuming the business keeps growing. So what do we name the new servers? We know that SQL Server doesn’t really scale horizontally, at least not very easily & not without major application rewrites. But we don’t want to draw ourselves into a corner by assuming that we’ll “never” scale out. Ok, how about this: DC-FooDB-P1, DC-BarDB-P1, etc. The P stands for production, because, along with the fancy new hardware, we’ve got some additional infrastructure room to build a proper Dev & QA environment.

 

dev_test_prod_white
Seriously. You need tiered environments. That’s a whole ‘nother blog post.

So what have we planned for? Tiered environments, as well as a bit of scale-out room – 9 servers in each “role”, so to speak. What if we grew to the point where we needed 10 servers? Well, as a DBA, I’d argue that we should consider some scale-up at that point; but also, to seriously start looking at a broad infrastructure revamp, because that kind of growth should indicate some serious cash influx and a much higher demand on IT-Dev resources.

will-this-scale
Don’t ask me what happened to rules #1-5. I don’t know, I didn’t read the article.

Why should the breaking point be 10 and not 100? or 1000? Well look, it definitely depends on the technology we’re dealing with. SQL server is not Mongo, or Hadoop, or IIS, or Apache. Certain tech was made to scale-out; other tech is more suited for scale-up. And there’s nothing better or worse about either approach – both have their place, and both are important. Scale-out is arguably more important, but for traditional IT shops, it’s also more difficult.

This is where that big C-word comes in. Context. If we’re talking about a tech that scales-out, sure, leave room for 100 or 1000 of those cookie-cutter instances that you can spin-up and shuffle around like cattle. But if it’s a scale-up tech, don’t kid yourself by thinking you’re going to get to the triple-digits.

star-wars-episode-vi-han-solo
Delusions of grandeur… we all have them!

The point is, if you’re starting at 1, it’s not too difficult to scale-out to a few more “nodes” with simple tricks and tweaks; i.e. without drastically changing your IT-Dev practices and environment. But when you start talking multi-digits, of redundant nodes fulfilling the same role, you need to seriously PLAN for that kind of redundancy and scale. Your apps have to be aware of the node topology and be concurrency-safe, meaning “no assumptions allowed” about where they’re running or how many other instances are running simultaneously. And since we’re talking about database servers, that means replication, multi-node identity management, maybe sharding, data warehousing, availability groups, and other enterprise-level features. We’re talkin’ big $$$. Again, it stands to reason that if the business is seeing the kind of growth that leads in this direction, it can afford to invest in the IT-Dev improvements required to support it.

show-me-the-money

I’d love to know your thoughts! Leave me a comment or two. Until next time!

Header image: Keira loved her little pet penguin… and then later she destroyed it and literally ate its face off.

Drafted with StackEdit, finished with WordPress.

Is Coding Style still a thing?

Make it readable, minimize scrolling.. provide context, clarify complexity.

I can hear it now.. the moans and groans, the wailing and gnashing of teeth.. “Another post about coding style / formatting? REALLY!? Why can’t we all just get along read/write/paste code in our own environment and let our IDE/toolset worry about the format?” I hear you, really I do. But you’re missing the point. The single most important trait of good code is readability. Readability should be platform-independent, i.e. it should not vary (too greatly) depending on your viewing medium or environment.

The ratio of time spent reading (code) versus writing is well over 10 to 1 … (therefore) making it easy to read makes it easier to write.

-Robert C. “Uncle Bob” Martin, Clean Code

And yet, all around the web, I continue to find terribly formatted, nigh-unreadable code samples. Specifically, SQL scripts. That’s me; that’s my focus, so that’s what I pay the most attention to. But I have no doubt that the same applies to most other languages that Devs & IT folks find themselves scouring the tubes for on a daily basis. There is just an excessive amount of bad, inconsistent, language-inappropriate formatting out there. And honestly, it’s not that surprising — after all, there are millions of people writing said code — but it’s still a source of annoyance.

Clean code always looks like it was written by someone who cares.

-Michael Feathers, Clean Code

Don’t you care about your fellow programmers?

Maybe it stems from my early days working in a small shop, where we all had to read each other’s code all the time, and we didn’t have these fancy automated toolsets that take the human eyeballs out of so much of the workflow/SDLC. So we agreed upon a simple set of coding style/layout rules that we could all follow, mostly in-line with Visual Studio’s default settings. And even though we occasionally had little tiffs about whether it was smarter to use tabs or spaces for indentation (protip: it’s tabs!), we all agreed it was helpful for doing code reviews and indoctrinating onboarding new team members.

My plea to you, dear reader, is simple. Be conscious of what you write and what you put out there for public consumption. Be aware of the fact that not everybody will have the exact same environment and tooling that you use, and that your code sample needs to be clean & structured enough to be viewed easily in various editors of various sizes. Not only that, but be mindful of your personal quirks & preferences showing through your code.

That last part is really difficult for me. I’m a bit of an anal-retentive bastard a stickler for SQL formatting, particularly when it doesn’t match “my way”. So that’s not the point of this post – I don’t mean to, nor should I ever, try to impose my own personal style nuances on a wide audience. (Sure, on my direct reports, but that’s why we hire underlings, no? To force our doctrine upon them? Oh, my bad…)

 

All I’m saying is, try to match the most widely accepted & prevalent style guidelines for your language, especially if it’s largely enforced by the IDE/environment-of-choice or platform-of-choice for said language. And I’m not talking about syntax-highlighting, color-coded keywords, and that stuff – those are things given to us by the wonderful IDEs we use. Mostly, I mean the layout stuff – your whitespace, your line breaks, your width vs. height ratio, etc.

For example, Visual Studio. We can all agree that this is the de-facto standard for C# coding projects, yes? Probably for other MS-stack-centric languages as well, but this is just an example. And in its default settings, certain style guides are set: curly braces on their own lines, cascading smart-indents, 4-space tab width, etc. Could we not publish C# code samples to StackOverflow that have 7 or 8 spaced tabs, curly braces all over the place (some on the same line as the header code or the closing statement of the block; some 3 lines down and spaced 17 tabs over because “woops, I forgot a closing paren. in a line above, I fixed it but I was too lazy to go down and fix the indentation on the subsequent lines!”).  GRRR!

Always code as if the [person] who ends up maintaining your code will be a violent psychopath who knows where you live.

John Woods

Now, on to SQL. Specifically, TSQL (MS SQL). Okay, I will concede that the de-facto environment, SSMS, does NOT have what we’d call “great” default style/layout settings. It doesn’t even force you to use what little guidelines there are, because it’s not an IDE, so it doesn’t do smart auto-formatting (or what I’ve come to call “auto-format-fixing, in VS) for you. And sure, there are add-ins like RedGate SQL Prompt, SSMSBoost, Poor Man’s T-SQL Formatter, etc. But even they can’t agree on a nice happy compromised set of defaults. So what do we do? Maybe just apply some good old fashioned common sense. (Also, if you’re using such an add-in, tweak it to match your standards – play with those settings, don’t be shy!)

I can’t tell you how frustrating it is to have to horizontally-scroll to read the tail-end of a long SQL CASE expression, only to encounter a list of 10+ literal values in an IN () expression in the WHERE clause, that are broken-out to 1-per-line! Or even worse, a BETWEEN expression where the AND <end-value> is on a brand new line – REALLY?? You couldn’t break-up the convoluted CASE expression, but you had to line-break in the middle of a BETWEEN? Come on.

youre-killin-me-smalls
You’re killin’ me, Smalls!

Or how about the classic JOIN debate? Do we put the right-hand table on the same line as the left-hand table? Do we also put the ON predicates in the same line? Do we sub-indent everything off the first FROM table? Use liberal tabs/spaces so that all the join-predicates line up way over on the right side of the page? Again, common sense.

Make it readable, minimize scrolling, call-out (emphasize) important elements/lines (tables, predicates, functional blocks, control-flow elements, variables). It’s not revolutionary; just stop putting so much crap on one line that in a non-word-wrapped editor, you’re forced to H-scroll for miles. (FYI, I try to turn word-wrap on in most of my editors, then at least if I have some crazy-wide code, I can still read it all.)

SWYMMWYS. Say what you mean, mean what you say. (“Swim wiz”? “Swimmies”? I don’t know, I usually like to phoneticize my acronyms, but this one may prove troublesome.)

swimmies
swimmies!

Modern languages are expressive enough that you can generally make your structure match your intended purpose and workflow. Make it obvious. And if you can’t, comment. Every language worth its salt supports comments of some kind – comment, meaning, a piece of text that is not compiled/run, but rather just displayed to the editor/viewer of the code. Comments need to provide context, convey intent, and clarify complexity.

So what’s my point? Basically, let’s all try to make our code samples a bit easier to read and digest, by using industry-standard/accepted best-practices in terms of layout & style.

Drafted with StackEdit, finished with WordPress.


PS: Just to eat my own words, here’s an example of my preference to how a somewhat complex FROM clause would look, if I was king of the auto-formatter world:

SQL-sample-dark-code
Hipster dark theme FTW!

Send comments, thumb-ups, and hate-mail to my Contact page. 😛

Header image: our husky Keira at 2.5 months, being a sleepy-head!