Dates, Date-pickers, and the Devil

When a date range, or time period, is specified in SQL, it’s easiest, clearest, and most concise, to use a “greater-than-or-equal-to Period-Start, and less-than Next-Period-Start” logic. Mathematically speaking, we are defining the range as closed on the left, open on the right.

This is a bit rant-y, but… indulge me.  I’ve been writing/refactoring a lot of old reporting queries.  And, like most reports, they deal with dates and datetimes — as parameters, boundaries, or where/join predicates.  I also got way too intense with a recent SSC post (Sql Server Central), which fueled the fire even more.

fluffy-angry-puppy
I’m so cute and ANGRY!

SQL Server is very good at handling temporal datatypes and calculations against them.  We’ve got functions like dateadd, datediff, dateparts, datatypes datetime2 and datetimeoffset, datetime, etc.  It supports all sorts of format conversions if you need to display them in various ways.

..even though that should be left to the presentation layer!

Here’s the issue.  Well, there are several issues, but we only have time for a few.

Here’s the first problem

Report users don’t understand the “end of a time period” problem.  I don’t have a good name for it; others might call it the “Day plus one” problem or the “Less than date” problem.  What do I mean by this?  Well, let’s back up a bit, to DBA Commandment #6, “Thou shalt not use between with datetimes.”  In order to first understand the issue, we have to understand why this is a commandment.

When a date range, or time period, is specified in SQL, it’s easiest, clearest, and most concise, to specify it like so: @TheDate >= @StartOfPeriod and @TheDate < @StartOfNextPeriod.  Mathematically speaking, we’re defining the range as “closed on the left, open on the right”.  In other words, Min <= X < Max.

The reason we do this with datetimes is found right there in the name of the datatype — it has (or can have) a time component!

stone-tablets-with-roman-numerals-to-10
There are probably more than 10, but it’s a good starting point…

Let’s talk examples

Say you’d like to report on the month of March 2017.  How do you determine if your data-points (stored as datetime or, hopefully, datetime2) are within that period, that month?  Well sure, you could write where month(MyDateColumn) = 3 and year(myDateColumn) = 2017 

NO.  That is horrible, don’t do that.

It’s not SARGable and renders your index on that column useless.  (You do have an index on it, don’t you? No? Make one!)  Okay, let’s stick with something SARGable.  How about MyDateColumn between '20170301' and '2017-03-31T23:59:55.999'?  (You did read this post about using culture-neutral datetime literals right?)  But wait!  If your data is a datetime, it’s not actually that precise — your literal gets rounded up to 20170401 and you’re now including dates from April 1st (at midnight)!

Oh that’ll never happen… until it does.

Second problem

Many developers and report-writers assume that the values in their data will never be within the typical “1 second before midnight” or “1/300th of a second before midnight” escape window of your “3/31/2017 23:59:59.997” bounding value.  But can you guarantee that?  Didn’t think so.  Worse, if you use the .999 fraction as given in the 2nd example, you’re either “more” or “less” correct, and nobody can actually tell you which way that pendulum swings because it depends on the statistical likelihood of your data having actual literal “midnight” values vs. realistic (millisecond-y, aka “continuous”) values.  Sure, if you’re storing just a date, these things become a lot less complicated and more predictable.

But then why aren’t you storing it as an actual date, not a datetime!?

So what’s the right answer?

As I said, “greater than or equal to  ‘Start’, and less than ‘End'”, where ‘End’ is the day after the end of the period, at midnight (no later!).  Hence, MyDateColumn >= '20170301' and MyDateColumn < '20170401'.  Simple, yes?

keep calm and keep it simple
KCKS

But wait, there’s more!

I mentioned “date-pickers” in the title.  When it comes to UX, date-pickers are a sore subject, and rightly so — it’s difficult to truly “get it right”.  On a “desktop-ish” device (i.e. something with a keyboard), it may be easiest on the user to give them a simple text-box which can handle various formats and interpret them intelligently — this is what SSRS does.  But on mobile devices, you often see those “spinner” controls, which is a pain in the arse when you have to select, say, your birth date and the “Year” spinner starts at 2017.  #StopIt

I mean, I’m not that old, but spinning thru a few decades is still slower than just typing 4 digits on my keyboard — especially if your input-box is smart enough to flip my keyboard into “numeric only” mode.

Another seemingly popular date-picker UX is the “calendar control”.  Oh gawd.  It’s horrible!  Clicking thru pages and pages of months to find and click (tap?) on an itty bitty day box, only to realize “Oh crap, that was the wrong year… ok let me go back.. click, click, tap..” ad-nauseum.

stop-it-sign
#StopIt again

The point here is, use the type of date-picker that’s right for the context.  If it’s meant to be a date within a few days/weeks of today, past/future — OK, spinner or calendar is probably fine.  If it’s a birth date or something that could reasonably be several years in the past or future, just give me a damn box.  (Heck, I’ll take a series of 3 boxes, M/D/Y or Y/M/D, as long as they’re labeled and don’t break when I omit the leading-zero from a single-digit month #!)  If there’s extra pre-validation logic that “blocks out” certain dates (think bill-payer calendars or Disneyland annual-pass blackout-days), that probably needs to be a calendar too.

..just make sure it’s responsive on a mobile device.

And in all cases, pass that “ending date” to your SQL queries in a consistent, logical, sensible manner.  For reporting, where the smallest increment of a period is 1 day, that probably means automagically “adding 1 day” to their given end-date, because the end-user tends to think in those terms.  I.e. if I say “show me my bank activity from 1/1/2017 to 1/31/2017”, I really mean “through the end of the month“, i.e. the end of the day of 1/31.  So your query is going to end up wanting the end-date parameter to be 2/1/2017, because it’s using the correct & consistent “greater than or equal to start, and less than start-of-next” logic.

context-consistency-clarity
The 3 C’s

Final thoughts

I know it’s not easy to explain to business folks, and it’s not easy to implement correctly.  But it’s important.  The >= & < logic is clear, concise, and can be used consistently regardless of underlying datatype.  You just need to adjust your presentation layer (whether that’s SSRS parameters or a .NET date-picker) to convey their intent to the user, whether that’s “show/enter the last day of the month, but translate to the next day to feed to the query/proc.”, or “make them enter the next-day (day after the end of the month/period) and understand the ‘less than’ logic.”  I’m more inclined to the first, but it depends on your audience.

Thanks for reading, and happy date-ing!

An Open Letter To Management

We need to talk about this stuff – candidly, openly, broadly, deeply.

More accurately, to legacy enterprise management.

Let’s say the following directive comes down from on-high: “Hey, our CEO wants us to provide better financial metrics reports and a dashboard that management can see to show real-time stats about the company.”

*groan*

I mean… Sure!  Yay, digital transformation, modernization, mobile friendly, all that good stuff!!

So, I have some thoughts on this, because I’ve seen the current state of things in small-medium enterprise, and am anxious to help improve that state to provide better value to the business.  To misquote Dennis Miller, I don’t mean to on a rant here, but…

First topic: Reality Check

It starts at the top, with a couple realizations:

  1. Data is ever-growing.
    1. We need to get smarter about managing its growth, including archiving/retention schemes, data warehousing, etc.
    2. This involves compliance regulations and operational resources.
      1. We need to ensure compliance with biz standards and data shelf-life.
      2. We need to automate as much as possible to avoid over-burdening our human resources (and to some extent our servers too).

For example, you can’t expect the same response-time for a query into 10-year-old financial data as you do for 1-year-old data.

  1. Traditional SSRS (SQL Server Reporting Services) is an operational time-sink.
    1. We spend way too much time assigning access, creating redundant “on demand” reports, and making seldom-used email subscriptions.
    2. We’re probably running on an old version, say 2008R2
      1. Vast improvements have come to the MS Data/BI platforms in the last decade and we need to take advantage of them.
      2. It’s not mobile-friendly at all; it’s not even modern-browser friendly, as some of its UX elements are still explicitly functional in Internet Explorer
    3. We tacked-on some 3rd-party application to attempt to bring some data-warehouse functionality into the environment, but only 1 person “knows it intimately” and is comfortable developing new reports with it.
  2. Our ERP system, in its current state/version, is a tangled mess, to the eyes of a DBA & query-writer/report-writer.
    1. We’ve bolted-on so much customization and special-configuration that it’s not suitable for stock/canned reports from the vendor, even if we upgraded to a version of the app that had a decent reporting engine.
    2. We can’t even decide on very basic things like “What is a ‘unit‘ of production?”, or “What are the different areas/groupings we break-out for revenue metrics?”

Ok sure, maybe we can agree on what those groupings are, but we can’t even get a consensus on what we call them!

Second topic: Single Source of Truth

We need to agree on a standard, documented, official set of business rules that answer such questions as “how do we measure revenue?”, “what are our different sub-orgs/departments/groupings for how we report on revenue?”, “what is ‘production output’ and how do we measure it?”, “how do we calculate bonuses for this group of employees?”, etc. More than that, we need to agree on naming things – we need a common, consistent nomenclature and understanding of what it means when someone says “N# Units”, “Department X” or “Order Aging” or “Membership Level” or “Bonus Type Y”.

And even more than that, we need to map those concepts to concrete, documented rule-sets that are manifested in the data somehow (from the simplest example, a “look-up table” or “reference table”, to the complex examples like a “data mart” or “analysis cube” or “ETL process”). This concept is sometimes called a “data dictionary”, which kinda belies its complexity, because it’s really more of a “data encyclopedia” – it needs to document what, how, why, & when.

What our concepts/terms/data-points mean, how they’re used, why they’re useful, & when they should be used.

Third topic: KISS and KPI’s

Management reports need to be simple. Yes, there are power-users who want the detail, and there are auditors who in fact require the detail. But your average C-level (or even P/VP-level) exec doesn’t care about that stuff – they want very simple answers to deceivingly simple (i.e. can be very complex under-the-hood) questions, like “How much money did we make this quarter for department X?”, or “What kind of productivity bonus do I give to group Y?”. But that’s just the beginning – that’s descriptive analytics. What they really want, but are sometimes too afraid to ask, are more powerful questions, like “How much money can I expect to make in market Z or state XX?”, “What are our expected new loyalty program memberships, and how much will they profit us?” — predictive analytics.  (And we’re not even going to touch prescriptive analytics yet, because you’re not ready for that.)

KISS means we need to try our best to hide the nitty-gritty details and “under the hood” logic/calculations from the end-user or report audience. But, that means fully knowing and understanding those details and rules and logic flows so that we can implement them!

KPI is Key Performance Metric. That’s the golden nugget, the one piece of information that the manager/report-viewer ultimately is after, the thing that makes them go “Got it! That’s the answer I was looking for!”, so they can make their business-decision and move on with their day. These aren’t necessarily just single numbers (like an overall revenue figure); they can be pie-charts, bar-graphs, a clear & concise grid, or whatever makes the most sense for the business-problem/business-decision at hand.

This all sounds fantastic, right? So what’s the catch?

Fourth topic: Time & Effort

Time is money, which is resources, which is people, learning, training, developing, implementing, testing, validating… rinse, repeat.  You don’t put that all on the shoulders of a lone DBA; that life-cycle touches many different disciplines and team members – managers, business users, accounting folks, marketing people, analysts, developers, testers, operational leads, and yes, of course, all of IT infrastructure (helpdesk, engineering, DBA).  And you don’t just buy a box off the shelf at your local software retailer and say “look, we’re gonna implement Tableu!”, wave a magical IT wand, and call it day.

Now we, as technologists, are more than willing to learn and educate ourselves, but…

There needs to be a matching dedication from the business to that effort, and to the platform(s) that is/are chosen.

That means, in concrete terms, a few things:

  1. Training budget & resources
    • Conferences, courses thru online training providers, cross-team collaboration.
  2. Product & technology investment
    • Upgrades, net-new products, whatever is needed.
  3. Time allowances & agreements
    • Dedicated scheduling where the “daily grind” operations take a back-seat and we can focus on the new stuff.
  4. Support from SME’s
    • The ability to call-out to a qualified expert when critical questions or roadblocks arise.
    • Can be contractors, consultants, service-providers, or platform-providers. The point is, you only use them if you need them, so you keep the cost relatively low.

That’s if you’re dedicated to in-house team/ability build-up. If you want to outsource, you have a different set of challenges:

  1. Contractors are expensive!
    1. Their requirements are exceedingly rigid.
    2. They’re likely to scoff (yes, even outright laugh) at the quagmire of data & logic & rules that we’ve created and/or want to build into our “magical reporting stack”.
  2. They’ll still require that same product/tech investment.
    1. No contractor is going to accept your old legacy SSRS instance as a baseline for building a modern, responsive, effect reporting system. The first thing they’ll say is “upgrade that, & come back to us.”
    2. Likewise for your legacy ERP system – sure, it’s a little less obsolete, and there are probably plenty of shops running it & developing on it, but good luck getting new-hire contractors to embrace it; at best, they’ll begrudge it; at worst, they’ll charge exorbitant fees for having to work on such an old platform.
  3. Technical debt is their worst enemy.
    1. Like it or not, like most decades-old enterprises, we have technical debt up the wazoo.
    2. Contractors won’t work in a debt-heavy environment; they’ll insist you “fix the debt” and come back to them in a few months/years when it’s all happy & pretty & green.

Technical debt is our enemy, too, but at least we “own” it – i.e. we’re aware of it and we have ideas on how to fix it, if/when we ever get the time.

It’s like our city roads: at least we know where the potholes are, and how to avoid them.

Executive Summary

My point, from this rambling and probably way too lengthy post, is this: We need to talk about this stuff. Yes, Mr. Manager, I know you already said that. Let me embellish:

We need to talk about this stuff, candidly, openly, broadly, deeply, cross-functionally (made-up phrase #2), even company-wide.

Because, while the end-goal is deceptively simple (“We want report dashboards!”), the underlying systems are complex, with lots of moving parts, requiring lots of knowledge (both domain/biz and tech), and lots of management (compliance, governance, automation, visibility/monitoring).

It’s not just a technology challenge. It’s a people challenge. It’s a cultural challenge. It’s an organizational challenge.

It’s a challenge that, when faced, met, and overcome, can lead to spectacular growth and success for all involved!

(And that’s my attempt to end this rant on a positive note. Enjoy!)


PS: No, I’m not happy about WordPress’s inability to understand the ‘style’ attribute of a simple <ol> tag, but I tried… so apologies if the outlines are not intuitive because each level is just another set of numbers, instead of Word-style outlining like 1.. a.. i.. etc.  Grr arg!

Mongos, anybody?

You’ve heard of this thing called MongoDB…

Unless you’ve been living under/in the proverbial rock/hole-in-the-ground, you’ve heard of this thing called MongoDB. It’s one of the biggest flavors of NoSQL database systems out there – specifically, it’s a “document DB”, which means it stores semi-structured data in the form of JSON documents, grouped into collections, grouped into (of course) DBs.

Now, I’m a Windows guy. SQL Server & Windows Server are my comfort zones. I like GUIs, but I’m ok with a command line too. PowerShell has become my friend good acquaintance. But every once in a while, we have to step outside our comfort zones, no? So when I was told “hey, you’ve got to get a handle on managing these MongoDB instances that we’ve got running on servers so-and-so, as part of this ‘Imaging system’ application”… I thought “Hahaha…”, but what came out was “Sure thing!”  Something a bit like this:

hahahayeah

(courtesy of DBA Reactions)

So the first order of business was to decide on a MongoDB “GUI” – a Windows app that could at least connect-to and give me a visual overview of the running MongoDB instances (properly referred to as a mongod, stemming from the Linux term “daemon”, which on the Windows side is basically like a process or service). I tried both the “official” desktop app from the big org, MongoDB Compass, and a neat open-source tool called Robomongo.

And I actually like them both, for different reasons; most of which can probably be attributed to my lack of depth with the technology, but hey. Anyway, Compass is really nice in that it gives you this kind of statistical overview of the collections in your DBs, performing some basic aggregates on a 10% or 1k sample of the collection documents to give you a graphical 40-thousand-foot view of the data. But where it breaks down for me is that little “query” bar, which is just an empty pair of curly-braces. It’s only for “selecting” (finding, querying); no other operations to see here. So we can’t manipulate our data with it, but it’s definitely great for viewing!

 

Whereas with Robomongo, I can right-click on the DBs/collections and do very simple things like “show documents”, “show statistics”, etc. And it actually writes the equivalent mongo shell command for me to poke at; say, to inject more logic to the find to get something specific or to write a new command or two as I read thru the docs and learn things like aggregates, indexes, and whatnot. Being a shell, it allows us to write or update data as well as read it.

But alas, GUIs will only take you so far. This is a tech born & bred on the command-line, so to really dig into it, I needed to let go of the mouse. And the mongo shell was actually pretty straightforward! A couple visits to their docs pages & some visual verification by using the GUI tools to check my work, and things were starting to come together. And even for a complete Javascript noob like me, the command syntax was straightforward enough. Soon I was configuring a replset and mongodump/mongorestore-ing and re-syncing a replica after purging some old stale collections.

Even though it’s a CLI/Linux flavored technology, it works perfectly fine in Windows… Except for one thing.  So, you install it as a service, and you typically start & stop services using net start & net start and as long as your service’s CLI arguments are all correct, you should be good — in theory! Trouble is, the service tends not to stop gracefully. So I found that, instead, the following command was more useful: mongo admin --eval "shutdownServer()". This uses the actual mongo shell to send the native shutdown command to the mongod, instead of relying on the Windows services gymnastics to do it correctly.

It just goes to show, dear reader, that you’ve got to get your hands dirty and try out new technology before dismissing it as “not my job” or “somebody else’s problem”.

PS: Nope, that’s not Compass’s logo or anybody else’s; I made it meself, with good old Paint.NET!

Drafted with StackEdit, finished with WordPress.

No Regrets

Continuing from my previous post…

I loved my first job, no doubt. But slowly over the years, I started to feel pigeon-holed. I was always on-deck for problems that happened in the system due to what I’ll call “bad data”. If you’ve been in the industry, you know what that means – something in your datastore doesn’t “follow the rules”, or “breaks something that normally doesn’t break”, or “is just a little off-kilter”. (Otherwise known as “edge cases”, which I’ll talk about in a later post!) And often, it’s because the application allowed that data, which breaks the rules, into the system – Devs hate to hear this. But just as often, it’s because the data-guy, the DB-dev or analyst, ingested or transformed the data in a way that breaks the rules – so there’s plenty of blame to go around. And we knew that, and we respected that. We didn’t pass the blame, we just buckled down and tried to get it fixed.

But, too often, I was the one called upon to fix it, because I was the closest to the data – and yes, I knew it intimately, no doubt. Because of that, I started to feel left behind the curve. The other devs were pushing new tech stacks, learning new things, while I kept pluggin’ away at good ol’ TSQL (with the occasional smattering of C#). And yes, of course that’s partially my fault; I could have self-asserted, pushed for more learning opportunities, etc. I don’t regret my first job. I learned a ton – basically everything I know about databases, technology, and IT. But it was time for a change.

And that’s how I ended up where I am now!

Written with StackEdit.

Who Am I?

…and why I love what I do.

I started out as a data analyst, first playing in MS Access, and quickly moving to MS SQL Server. I also became a developer in the ASP.NET stack, way back in the 2.0 days (2006-07). Yet I found myself drawn to the data more than the code, for some reason. Perhaps it was the beautiful simplicity of the relational model. Or the myriad of variations that something as simple as an name field could contain & how challenging it could be to massage that “dirty data” into something elegant and consistent for the code to easily manage and present to the user. Or it could have been the thrill of analyzing a workload that may have taken several minutes to run, and then, by applying the right combination of indexes, relations, and set-logic, to bring that workload down to milliseconds.

This job was great. First “real” job out of college; a teeny-tiny software shop, 4 devs & the boss, working with K-12 school districts, making their lives just a bit easier when dealing with the vast swath of standardized testing and government-imposed student performance metrics that ultimately decided how they taught & how they got funded. And we grew, slowly but surely – 5 devs, then 6, then 8. We got to wear so many hats! Developer, Data Analyst, Helpdesk, Tech Support; even Printer, Delivery Crew, Trainer. It was truly a well-rounded experience.

As the years went by, it felt like family. And sure, every family has the occasional episode of dysfunction. But we were in it together, building up the business with passion, commitment, and respect. And those of us that were there since the start, we felt like we knew it all. Because we did; in that little microcosm, we truly were the experts. And each of us had a forte – I was the “data guy”, he was the “self-taught developer”, she was the “tech-evangelist / UX dev”, and he was the “dev architect”. Our best hire came soon after – the “young enthusiastic developer”, a true professional. Then there was our den-mother, bringer of snacks, paycheck-writer and book-keeper extraordinaire. And as always, the father-figure – the big kahuna, the boss man, who was more than willing to get his hands dirty and help with anything we asked him to.

That was a wonderful experience. In the near-decade I spent with that company and those colleagues, many of whom I consider my friends, I learned so much about technology, programming, databases, and business. And we welcomed more and more people into the family – the graphics & print-media design expert, the bright & cheery project manager, the ambitious salesperson, the kind & caring tier 1 support staff, the new data analyst, the veteran DB-dev, the expert architect, the UI/UX dev, the student dev, and even the Russian programmer! (They’re amazing, BTW.) Each & every one of them added value, and taught me something new.

I do not regret a single day of it. It was awesome.

But, as the baby bird has to eventually leave the nest, I needed to venture out, to get outside my comfort zone and see what else was out there, to join a new & different team and environment, to see if I was truly as good as I thought I was. Well, I think we all know the answer to that! (Hint: it’s “nope”.)

Actually, it’s been great – I drank from the proverbial fire-hose, learned a lot in a short time about a completely new line of business & a familiar but larger & much more complex tech stack & environment. So in general, I’m actually enjoying not being “the go-to guy” for every little problem that happens with the database.

And you know what else I discovered? I am pretty darn good at what I do. That’s a good feeling!

But I’m still the DBA. (Technically, one of two, but the other DBA is the company’s old-time-guru-turned-part-time-consultant). So in a sense, being that “go-to guy” hasn’t gone away, it’s just gotten much more manageable – there’s a larger organization of people around me that can help in many different ways. Whether it’s the old-timers with that tribal knowledge of all the hidden rules and nuances of the business, or the SysAdmin who can whip out a PowerShell script in less time than it takes to say “automation”, or the domain expert that can pinpoint exactly why something is broken because they literally “wrote the book” on that problem-space. So it’s really cool.

Plus, it gives me the space and the time to focus on getting really good at DBA stuff – performance tuning, report automation, data integrity, change management, backup/recovery, data security, all that jazz. I’m looking forward to really earning that “Senior DBA” badge – because even though that’s my title, I still recognize that I have a lot to learn. The landscape is constantly evolving – 5 years ago, this whole “Cloud” business was virtually (yes, that is a pun) unheard of; now it’s all anybody talks about. I “grew up” in an on-prem (that’s short for on-premises) Windows + SQL Server + IIS world; but now I’ve got my fingers on AWS instances (both EC2 and RDS), MongoDBs, a SSAS data warehouse, even a sprinkling of MySQL and Azure. Not to mention dozens of SQL Servers spread over 3 offices plus a data-center, and a ridiculously slick converged infrastructure platform called a “VxBlock”. The technology world just keeps getting bigger, better, faster, smarter.

And honestly, that’s why I love what I do.

Written with StackEdit.