The Horror of Conflicting Database Collations

All these queries with all these JOINs on columns with mis-matched collation, lead to very sad pandas: RBAR operations (row-by-agonizing-row, unable to use index-seeks), and high CPU.

Advertisements

It’s not even close to Halloween, but I promised I would get to this someday, and that someday is now. Strap in, grab the popcorn, and turn up the volume.

Oh wait, this is just a textual medium. Forget the volume bit.

If you’re not sure what this collation thing is, you should go back and read my teaser post, and as always there’s the docs. The one-liner recap on why it matters and how it became a target of my ire is this: legacy DBs use the old default SQL latin1 collation, but an upgraded ERP system’s DB now uses the new default Windows latin1 collation; these DBs all talk to each other, including linked-server JOINs, and performance of those queries has gone to shit.

Pardon the French. “Gone to hell in a hand-basket.”

where exactly are we going in this handbasket?
It can’t be any hotter than a California summer!

So why did this happen? Let’s try to find out. But first; let’s get specific about how all this wailing and gnashing of teeth actually manifests in real technical terms.

The Problem

Essentially what happens here is an implicit conversion problem. There are several blog posts from our distinguished community leaders on this topic and its woes. It causes heavy CPU load as the SQL engine tries desperately to match values of different data types. Even though both columns may be, say, nvarchar(20), the fact that one uses collation SQL_Latin1_General_CP1_CI_AS and the other uses Latin1_General_100_CI_AS, makes them somewhat strangers — they might as well be an INT and a varchar!

Now again, this is my example. We have dozens of critical application queries using their own little sandbox-y databases, joining to the central ERP system DB to fetch Customer or Sales data. This is already a recipe for sadness.

“Use a middle tier or service layer, dammit!”, you’d say.

“You’re preaching to the choir,” I’d reply.

Hell, you’re preaching to the preacher, at that point. But it’s not that simple, as I’ll touch on later.

There’s a subtle difference here, vs. those many community blog posts, which I’ll repeat.  The columns are of the same type.  Just different collations.

And when the collation on the join predicates is different, bad things happen. Let’s take CustomerNumber for example. On the ERP side, it’s a nvarchar(20) collate Latin1_General_100_CI_AS. On the internal & web apps side, it’s a varchar(20) collateSQL_Latin1_General_CP1_CI_AS. As you might imagine, this is a prime field for joining because it’s the main customer identified throughout all the systems.

Let’s be clear here. This is a numeric value only. Did it need to support Unicode? Absolutely not. Should it have been an int or bigint? Probably. But did The ERP designers choose to make it Unicode string anyway? Yep.

Premature optimization may be a root of evil in software development, but inappropriate data typing is at least as evil in database development.

Anyway. The point of this post is not to rant and whine about the design of that particular system. I’ll save that for another day. That won’t stop me from complaining about the improper ways it’s used.

this is why we can't have nice things
Exactly.

As stated above, all these queries with all these JOINs on columns with mis-matched collation, lead to very sad pandas: RBAR operations (row-by-agonizing-row, unable to use index-seeks), and high CPU.  Under duress, my 32 core servers ground to a halt with blocked sessions, ASYNC_NETWORK_IO waits, and 99% CPU utilization metrics.  Needless to say, these were not good days.

Side-note: I really enjoyed Bert’s post because it helped put things into very easy-to-understand terms, and is targeted at a wider audience than just the DBA.  Read it, and watch the video too!

Attempt to Treat the Symptoms

Because the ERP system has been upgraded a couple times in the last decade, the team came up with a [very thin] abstraction layer manifested as a series of “integration views” that UNION similar core entities from the older and newer incarnations of the database. Like Sales records. These have permeated many many areas of the ecosystem, to the point that almost all apps use them instead of the “raw” source tables. Which sounds great, right? Riiiight.

Until you throw in that monkey wrench of conflicting collations. Remember, Devs are lazy (in a good way). Their apps and queries need to not care about such a low level detail as database collation. So to support that “not caring”, we set up these integration views to apply the older default collation (the one that matches everything else in the environment) to the output columns. That way, no extra work is required to consume them in the same way they’ve been consumed for the last 10+ years.

Basically, we can add the keywords COLLATE DATABASE_DEFAULT after each column declaration of the view, like so (in the form “alias = sourceColumn”): CustomerNo = erp.CustomerNo COLLATE DATABASE_DEFAULT.

This was a terrible idea.

It made sense at the time. But as the months passed and the complaints of performance degradation mounted, the signs continued to point back at these views which now used this collation-conversion mechanism (prior to the latest upgrade, they did not).

The typical work-arounds involved temp tables and/or going straight to the “raw” source. Neither of these are ideal — the latter breaks that abstraction (however thin it was), while the former risks over-burdening tempdb (sometimes referred to as the “communal toilet” of SQL server). Generally this was acceptable, and often resulted in orders of magnitude improvement to performance. But it continued to rack up that technical debt.

One thing I tried was to remove those collation conversions from all columns in the view except the join predicates, because the consumers fed those values straight into the object or ORM layer, at which point they all became C# strings anyway, so it didn’t matter what collation they’d used or whether they were ANSI or Unicode at that point. But alas, because the core pitfall of these queries was still very present — that implicit conversion — performance still suffered.

Treating the Root Cause

Here I re-link the two scary articles that warn of the dangers and gotchas of changing your database and server (instance-level) default collations: StackOverflow answer, and blog post.  Given all that, it’s a daunting task.  But if we’re going to get our performance back up to snuff, it’ll probably have to happen at some point.  As the great Mike Rowe says…

it's a dirty job but somebody's gotta do it

And unlike the previous blogger, I will do my very best to follow up here and post about my journey as we undertake this harrowing trek.

The way I see it, there are three major attack vectors.

  1. We can try converting the ERP database to the old SQL collation.

Pros: smaller effort than #2, less integration/regression testing overall.
Cons: unsupported by ERP vendor, downtime for ERP system, high amount of risk.

  1. We can try converting all other DBs in the environment (across all SQL instances) to the new Windows collation to match that of the ERP DB.

Pros: supported by ERP vendor, future-proof, less tech-debt.
Cons: largest effort, heaviest testing burden, high risk.

  1. We could utilize some kind of data-replication to maintain copies of the required data on the other SQL instances in their default (matching) collation.

Pros: support not an issue, lowest effort & testing burden, lowest risk.
Cons: replication maintenance burden & overhead, loss of “real-time” (added data latency), and some tech-debt.

As the lone DBA, most if not all effort falls to me, so I’m quite inclined toward #3. And we were somewhat already going in this direction with the temp-table workarounds, i.e. pulling in the ERP data (usually via a linked-server) to the target app DB & using that temp-table for joins — essentially, that’s “lightweight replication”.

The technical debt we’re incurring here is that we’re leaving all of our legacy DBs (and servers) in the older SQL collation.  At some point, likely the far-future, these will be unsupported, or at least obsolete, in the sense that all new applications & DBs will prefer the newer Windows collation.  Perhaps during the next big “hardware refresh” cycle, i.e. when we have to plan and execute a big SQL server upgrade/migration, we can consider integrating the collation-change into that project.

But wait, you argue, what about a 4th option?

Oh sure, you mean say, “Screw it, not my problem!”…

  1. Force all ERP DB data access up the stack to the application layers, i.e. the apps pull the data into memory and join/merge it there (or relate to & interact with it however the developers want to, in that layer).

But this has several downsides that the business and the development teams would [justifiably] push back on: dev time & effort, the drastic-ness & unprecedented-ness of the change, the fear of not catching every single place & usage of the ERP in the mysterious myriad of apps that it seems nobody can ever quite get a unified visibility handle on (some of which they can’t even build or deploy anymore without diving down some seriously arcane rabbit-holes of ancient tech).  The risk is just too high.

More than that, and as much as I would love to say “ain’t my problem” and pass it off to a larger group of smart people, the fact is that over 50% of dependencies on the ERP DB are from my own BI/reporting queries.  There’s no shortage of reports that need to examine and relate legacy LOB app data with ERP customer & sales data.  And it’s not just to build a paginated SSRS report — where I could, arguably, do what I said above in faux-option 4: pull the data from 2 separate data-sets, then merge it in the report layer.  It’s heavily customized, painstakingly crafted mini-data-warehouses and data-marts that massage and tailor the data for an assortment of different purposes and reporting needs.  Thus, even with this, most of the burden still falls to me.

some guy with glasses posing under a globe like he's carrying the weight of the world
To be clear, this is NOT me; it’s the 4th pic I found on Google image search for “weight of the world” filtered for “noncommercial reuse” images. 😉  Kinda looks like Landry from Friday Night Lights, no?

 

So, to borrow a closing line from one of my favorite podcasts

May your server lights blink, your database collations be identical, and your cables be cleanly managed.

How is Database Formed?

Data is digital information. A database is a collection of data. And a DBA manages it all.

Borrowing from an ‘old meme’ a bit.  My wife recently said I should “write something about ‘how to do databases’.”  As amusingly odd as her phrasing was, I figured she was right.

well like duh
Obviously. GAWD.

What is it?

I like to start at beginning.  As Julie Andrews said, it’s a very good place to start.  What is a database?  That’s a pretty good question.  Here’s the prerequisite question: What is data?  Well, as I’ve said before, data is everything.  But that’s a bit of a cop-out, isn’t it?  That’s my career’s bias showing through.

Data is digital information.  Anything that can be quantified, specified, categorized, searched, sorted, produced, consumed, read, written, measured, and stored digitally in some fashion.  Data is the digital currency of the 21st century.  Data is the very reason that most technology exists — to house and transport data among producers and consumers of information.  It’s the evolutionary culmination of the stone tablet, the papyrus scroll, the bound book, the printing press, the newspaper, the library, the vinyl record, the magnetic tape, the compact disc, the pocket organizer, and the telephone.

So then, what is a database?  Simply put, it’s a collection of data.  The simplest analogy, depending on your age, is either a phone book or your cell phone’s contacts list (which is really just a phone book, in digital form).  Of course, with the latter, it’s not so much an analogy as an example — you phone’s contact list IS a database.

Fun side-note, the phone book also makes a decent discussion prop for some DBA topics like index fragmentation.

Expanding on that example.  You can search and sort your contacts by several data points: first name, last name, phone #, email, notes.  Different database systems have various names for these: fields, columns, properties, criteria, values.  The point is, it’s all data.  Or if you want to get pedantic, each one is a datum, and together they are data.

Pedantic, me?  Never.

This is what a database, or DB for short, is all about: storing data in an organized fashion so that it can be sorted, searched, sliced and diced.  Building on that, a database management system is a set of technology tools, processes and programs, that are used to gather, store, manipulate, copy, move, read, maintain, back up, link together, and operate one or many databases.  This DBMS can come in many flavors.  I happen to specialize in one called SQL Server, a Microsoft product/platform of the ‘relational‘ flavor — so if you’re following along with the abbreviation game, that’s an RDBMS.

If you’re hungry for more acronyms, the Wiki article on ‘databases‘ has a decent breakdown of the types and history behind them.

But Why?

all your data are belong to us
Non-geeks might not understand this, so Google “all your base” if you’re confused.

The more data you have, the more you can do with it.  Why do you think Facebook, Google, Microsoft, and Amazon are such powerful technological forces?  They purposefully, systematically gather as much data as they can from every possible source, and they have become very good at organizing and managing that data to maximize its value.  Amazon product recommendations are a prime (see what I did there?) example — they are generally appropriate and effective because they have “learned” from your past purchases, i.e. your historical data.  This “learning” – Machine Learning, aka Data Science – is the hot new marketing buzzword of recent years, but it all still comes back to data at the core.

This is not a “bad thing” or a “scary thing” as the old media and tin-foil-hat-wearers would have you believe.  Yes, it can be a little disconcerting, and yes, people and companies can abuse data in malicious ways.  But the vast majority of our digital data stewards actually want to do good.  They want to connect you with more people that you may know and become friends with; they want you to watch movies that you’ll really enjoy; they want you to easily navigate to your destination without being stuck in traffic; they even want to help stop global warming!

As a general business rule, we crave data because it helps us make decisions.  Every time a customer buys a product, we want to measure “the 5 W’s”: who what when where and how (ok, that’s not a ‘W’, but there’s a reason for it).  Notice I didn’t list “why” there — only the customer knows why, and that information, that data, is stored inside their brain.  And we can’t (yet) access that data.  So it’s a guessing game now — we feed the other 5 data-points into our DBMS and eventually, given some time and analysis, we can guess the Why.  And pretty accurately, at that.  Then, we can make a decision to “Market more aggressively to Customer Type X”, or “Have a flash-sale on Product Y”, or “Move on this hot emerging market demographic.”

So what does that make you?

Well, I’m a Database Administrator – a DBA.  Which means I “administrate databases”.

‘Administrate’, less common form of ‘administer’: manage and be responsible for the running of.

Thanks, dictionary.  So in a nutshell, a DBA manages data.  Deceptively simple sounding, no?  I mean, what can data possibly do; it’s not alive, right?  Actually, if you hang around a DBA for any length of time, you’ll commonly hear the phrase “Where does that data live?” or “That set of data lives over here.”  So clearly we anthropomorphize our data.  Most tech professionals do that to whatever technology they work closely with — it’s human nature.  Software “behaves badly”, machines “throw a fit”, etc.

But anyway, why do databases need to be managed?  What can happen to them?

Developers.  Developers happen.  =D

I joke, as you know, dear reader; I love developers.  Users ‘happen’, too — often more catastrophically.  So it’s fair to say that “people happen”.  But besides that, here are some common reasons that databases, and data, need to be managed.

  • Data can be “wrong”.

Data can either be human-generated or machine-generated.  Fingers on a keyboard, or sensors on a circuit board.  You wouldn’t think the latter could possibly ever be “wrong”, but both kinds are subject to error.  It’s just that the level of “wrongness” is subjective and depends on who’s asking and what’s expected of the system as a whole.

  • Data gets lost.

Humans interact with and manipulate data, and humans make mistakes.  Why do you think the Undo button became such a staple of so many computer applications?

  • Data gets corrupted.

Storage media (magnetic disks, silicon chips, etc.) are not perfect — they have a documented level of fault tolerance and failure rate — so we need to ensure that our data is preserved (by moving it to another area that’s not ‘faulty’, usually) past those failures.  Why?  Because our data is essentially “more valuable” than the hardware on which it’s stored.

  • Data needs to be organized.

This is slightly more subjective than the above; how and why we organize data is highly dependent on the overall intent of the systems that will interact with it.  But fundamentally, if there’s not some form of organization, the data is effectively garbage.  If you ripped out every individual page in the phonebook and scattered them all on the floor, it’s no longer an effective tool to find someone’s phone number; it’s just a mess of papers.

  • Data needs to be useful.

If we can’t do something with the data, what’s the point of having it?  The temperature at the North Pole on January 1st 1989 is, by itself, inconsequential.  But a history of temperatures at the same and similar locations, over a long period of time, gives us some great value — we can see trends, look for anomalies, and even predict the future of what those temperatures might be.

  • Databases need to be available.

Similarly, if we can’t access the data, what good is it?  Databases are a technology, and like most technologies, they occasionally break.  Again, most of that comes back to humans, because humans are the ones writing the code that creates the software that houses the data and runs the database, or that interacts with it.  But of course we still have power failures, network losses, disk failures, and even solar flares.  (Ask your favorite superstitious engineer; they’ll have at least one good story about a system outage that could only be blamed on solar flares or gremlins or the full moon.)

  • Databases need to be maintained.

Every DBMS has some kind of assumed ongoing maintenance requirements to keep it “running smoothly”.  Just like your car needs an oil change every 3 to 8 thousand miles, your databases need periodic attention to retain all of those important qualities discussed above.

And the latest big topic, underscored by the GDPR:

  • Data needs to be governed.

This is a big topic for another conversation, but the gist of it is, data is generally “owned” by someone, and deciding who owns what, where it’s allowed to live, and how it’s allowed to be used, constitutes an entire sub-industry of rules, regulations, policies, tools, security practices, and consequences, much of which we’re only just beginning to shape and understand.

TL;DR: What do you actually do?

who is your DBA and what does he do
Umm… stuff? And thangs?

I currently work at a “small enterprise”, a business that has been around for some decades (as opposed to a Silicon Valley start-up who counts their anniversaries in months, like an infatuated teenager), managing their database systems.  Some of that is financial/accounting, some is customer info, some is internal/operational, and all of it is important to at least one person in the business in their daily decision-making efforts.

Thus, I help ensure that the data is always ready, when it’s needed, in whatever form & shape it’s needed in.  I model, massage, correct, enhance, and move it around.  I help developers write faster queries (that’s a fancy word for “questions” that we ask of our data); I aide analysts with understanding and gleaning more value from the data; I maintain the underlying systems that house the databases and ensure that they perform well and get upgraded when necessary; and I work with business drivers (VP’s, CxO’s) to build reporting solutions that leverage the data to enable better, smarter decisions, and ultimately (hopefully!) increase profit.  (This last part is actually crossing into the BI – Business Intelligence – job role, which tends to happen to most small-shop DBAs, because they’re usually in the best position to make that transition.)

If some of that sounds like a blurb from a résumé, it kinda is.  This job has existed since the 80’s.  But it’s always evolving, like the tech industry in general; so just because we’ve been around a while doesn’t mean we’re all old crusty bearded dudes.  (Although we do have some prolific beards among us!)

So there you have it.  Now you can tell your friends and family what a DBA does.  Or at least, hopefully, I’ve helped my own friends & family understand a bit about what I do.

A SQL “Whodunnit” Trigger

Triggers aren’t bad, if used for the right reasons.. Here we look at an “audit-trail” use-case.

Inspired by a brief conversation in the #CodingBlocks community Slack: A short discussion and example of a “who-dunnit” (“who done it”, a colloquialism for a murder-mystery type thing) trigger, to find how what user is doing deletions against a certain table.

the cast of CSI LV
The original, or nothing.

The Background Check

Let’s name our hypothetical database CSI.  In it, we have a table, dbo.Victims, where it seems like data is being randomly deleted at random times.  As we all know, this is impossible — computers never do anything truly randomly, much less RDBMSes.

Insert witty counter-example here.  You know you have one.

So we want to find out who’s doing these deletions.  One DBA says, “Hey I got an idea… Let’s put an after delete trigger on the table!”  Another DBA says “I abhor triggers; let’s log sp_WhoIsActive every 5 seconds to try to catch the suspect ‘in-the-act’.”

Both approaches have their merits, and neither is that uncommon.  However, the latter is much more regularly blogged about, so I’m going to present the former, because it kinda helped remind me of a few things that I hadn’t used in a while.  I’d also argue that the latter is much less of a “guaranteed capture”, since you’re gambling pretty liberally on the fact that the delete transaction will even last that long; it’s statistically more likely that you’ll miss it.

The Setup

Here’s a SQL snippet that shows a basic after delete trigger created on our dbo.Victims table.  Notice the use of the special Deleted table reference — this is a “temporary, memory-resident” table according to the Docs, and it holds all the records that were/are-about-to-be deleted from the target table.

I feel like it used be called a “temporal table”, but that now refers to a new feature in 2016, where SQL keeps a hidden history-tracking copy of your table that you can reference like a time-machine; which, incidentally, almost* negates the need for such things as these triggers we’re talking about, but that’s another topic for another time.

*(The ‘almost’ is because temporal tables don’t tell you “WHO”, which is our primary motivator here.)

The interesting bits are how we identify our suspect, our ‘killer’ if you will.  See, we not only want to know who they are in the database context, we also (and likely, more importantly) want to know who they are at the server level context.  And just in case they’re impersonating another login, we want to check that too.

So we actually have a lot of options here.  There’s CURRENT_USER or USER_NAME(), for the DB context user.  Then we have SUSER_SNAME(), SUSER_NAME(), SYSTEM_USER, and ORIGINAL_LOGIN() for the server context.  If you’re curious, you could also get things like @@SPID (server session id), SUSER_ID() (server login id), and SESSION_USER (database session user).

ORIGINAL_LOGIN() may be the most potentially interesting, especially if we want to write our trigger with elevated (impersonated) permissions to be able to write to the logging table that we’ve set up to capture its detective-work.  I did not need it for this example, but it’s worth keeping in mind.

why dont you take a seat over there
We’ve been watching you…

The Sting

So we’ve got our evidence table, we’ve got our detective trigger, now we just need a suspect.  Thankfully we can test it out first, to make sure our operation will succeed when the real perp comes along.  We can do this, of course, by impersonation.  Or by using different SSMS query-windows with different logins — your choice.

Our faux-suspect’s login name is DummySuspect.  We map him to the db_datawriter and db_datareader roles in our CSI database — we know the real perp at least has write permission on the table dbo.Victims, otherwise he/she wouldn’t be able to delete those poor victim rows!  And we’re probably the db_owner, which is fine.  Let’s call our own login SergeantX.

Now we can pretend to be DummySuspect and execute a DELETE against CSI.dbo.Victims , and make sure it writes to our auditing table, which we called  aud.Evidence.

Yes, in practice, we’d probably want to put our Evidence table in a separate database, to really ensure those pesky Suspects can’t update it or delete from it, i.e. “cover their tracks” — here, I’ve settled for simply using a different schema, to keep the example workable.  Otherwise we’d have to deal with cross-DB permissions and such, which goes beyond the scope of one little blog post.

Ready?  Let’s try it!

The Proof

Go on over to the GitHub repo and check out the code.  There are 3 easy steps – ‘Step 1’, create the objects, including a new DB to house them called CSI.  You’ll see the trigger in there as well.  Then you can try ‘Step 2’, where I impersonate DummySuspect and delete a row from Victims, and then check the Evidence log when done.  And finally, ‘Step 3’ is a similar test, but assumes that you’ve actually connected that SSMS window/tab/query AS DummySuspect instead of impersonating him (or her!).  After you’ve done that, check out aud.Evidence again to make sure it logged the 2nd delete.

And there you have it.  A simple example of how to write and test an after delete trigger that writes the action info to a separate auditing table for investigation.

Hope you enjoyed!  Leave a comment here or on GitHub; I welcome all feedback.

One more thing…

Apparently I’ve been fork‘d!  Someone on GitHub liked my take on the Nested Set Model so much that they decided to pull it into their own library and have a play with it.  Yay!!  😀   Whoever you are, thank you and good luck building with it.  Enjoy!

someone forked my sql nested set model on github
Mr. Murray, thanks and have fun!

T-SQL Tuesday #99: Counting Rows the Less-Hard-Way

We can get our row count, and min & max date values, without ever touching the actual source table!

This month’s invite courtesy of Aaron Bertrand (B | T), whose “bad habits” blog series still inspires many an impassioned debate or engaging argument discussion on a regular basis among DBAs & Developers alike.

And yes, I’m taking the easier of the two “dealer’s choice” choices — the SQL focused one.  (I’m not big on sharing/blogging personal stuff, at least not here; I may one day start another blog for that, or perhaps just occasionally post more #off-topic stuff , but for now you’ll have to be content with my stories of vehicle troubles and the occasional movie-geekery).

So, without further ado…

By the way, what is ‘ado’ and why should there be no further of it?

art-vandelay-importer-exporter
Accidentally apropos on many levels…

Counting Rows in Really Big Tables

Previously touched on here, tables of unusual size (TOUSes) can be tricky.  You don’t want to lock them up for a long period of time, but you often need to gather information about them (such as row count, size, range of values) to perform some kind of operational maintenance with/on them.  In particular, Aaron’s post on “counting rows the hard way” inspired me to look into this a bit more and try to come up with a clever-ish way of finding out some basic “shape of data” info without actually querying (scanning) the table itself.

To start with, it’s actually really simple to get the total row-count from a few system catalog views — Aaron’s already shown you that, so I won’t repeat.  My interest is more in questions like “How many rows match a where-clause?” or “What are the min & max values for thatColumn?”

For this post, I’ll be focusing on a particular kind of table — the “history” or “transaction” table.  The idea here is that you have a record of “every time some event happens in/to some entity”.  A very common example is audit-trail tables, which I’ve been dealing a lot with lately.  Another common example is a “transaction history” table, such as, in our new favorite MSSQL demo database WideWorldImporters, the table Warehouse.StockItemTransaction​.  It’s the 2nd largest table in the db at 260-some-thousand rows.  {The largest is a multi-million-row beast that is actually the system-versioned aka temporal table behind a “normal” table; I might build a phase-2 example around this, but not today.}  So, while our queries won’t be super slow, we’ll get enough of an idea of what’s bad & good from measuring our IO stats (with SET STATISTICS IO ON).

TL;DR: The demo script is available here; the headers below correspond to the comment-lines of the same name, but I’ve left enough commentary in the SQL itself to keep the average reader on-track, so feel free to check it out ahead of time.  But do keep reading at some point!  :o)

A. Gathering Some Intel

First up, we have good ol’ sys.sp_spaceused.  This gives you some sizing info about the table, including its row count.  There’s a “disclaimer” circulating around out there that it’s not “up to the millisecond accurate” , i.e. it might not have the most current row count if someone else is in the middle of an insert operation or whatnot.  But for all intents & purposes, you can consider it truth.

Then you have the “hard ways” that people typically use — and that Aaron, again, covered just fine in his post on the subject, so I won’t spend any more time there.

But what if I want to count rows matching a where clause (a predicate)?  And in dealing with a typical history/transaction table, the predicate is almost always “between such and such dates”.  (Of course we won’t literally use the between operator, as we have been chastised severely; we know best to use >= and < !)  Also, I want to know the MIN and MAX of said dates in the table.  Lord knows we’re gonna be doing some table-scanning.

B. Ok, Let’s Try an Index

In their benevolent wisdom, the SQL deities decided not to give us an index on WideWorldImporters.Warehouse.StockItemTransactions.TransactionOccurredWhen.

BTW, how’s that for a verbose column name?  What, TransactionDate not good enough?  I suppose it isdatetime2 after all, but still…

So we create an index on it, to see if that helps our poor “count where dates” query.  And behold, it does!  We’ve cut our # of logical reads down by about 90% (from 1900 to 200, if you’re following along in the script).  That’s fantastic, but… we can do better.  Because if the table is, say, 500 million rows instead of 260k, that’s about 400,000 logical reads, which.. could definitely suck.

C. The Better Way

Again, the script has an ode to Aaron’s query on sys.partitions/tables to get the row-count from the meta-data.  Then the real fun begins.

There’s a system DMV (or probably ‘DMF‘ – dynamic management function) called sys.dm_db_stats_histogram, which takes the table’s object_id and the index’s index_id as arguments.  It gives you, obviously enough, the statistics histogram of the statistics object corresponding to that index.  We want to store its output in a temp-table (or even a real table — go nuts!) so we can query it some more.

--For example, if our new index is index_id 7:
sys.dm_db_stats_histogram(OBJECT_ID('Warehouse.StockItemTransactions'), 7)

So we create our #StatsHist table (“hist” being an abbreviation for “histogram”, not “history”, though in retrospect that’s probably not worth the possible confusion), and we populate it with the meta-data from Warehouse.StockItemTransactions and its new index that we just created (on TransactionOccurredWhen).  Poof!  We have an easy way of showing min/max values in that column!  Well… almost.  We have to convert the variant datatype to an understandable & aggregate-able (probably a made-up word.. aggregable? aggregatable?) type.  So we add a column range_hk_proper of type datetime2 and populate it with the converted values of range_high_key from the stats-output.

There!  Now we’re cookin’ with gas.  Our min/max/count query, and our “count where date-range” query, run in mere milliseconds, without ever touching the actual source table.  So we don’t lock it up or block anybody else from writing to it, even in the most pessimistic isolation levels.

Except when you created that index we needed on the date column.

Yes, I know.  What we’re hoping is that the tables we deal with in the “real world” already have such an index that we can take advantage of.  If not, well, that’s what maintenance windows are for.  And you better believe you’re gonna need that index sooner or later.

cooking-with-mustard-gas
Been a while since I used a Family Guy meme….

Where To Next?

Ostensibly, this whole thing could probably be turned into a stored-proc so you could run it “on demand” for any table that had a date or datetime column which you wanted to get such information about.  It’d have to do a lot of error-checking, of course — it wouldn’t work if you don’t have such a column, and if there’s no index on it, and probably a myriad of other ‘gotchas’ that I’m not thinking of at the moment.  But I did try to lay the groundwork for improvement. #StatsHist stores schema & table name too, so if you felt like turning it into a mini-data-warehouse holding a BUNCH of stat-histograms for a whole mess of tables, you could definitely do that.  And then you could run some basic analytics on it — min/max/avg, counts by year/month/day, etc.

Sounds like fun, no?  ;o)

The Passing of the Torch

Did I mention documentation?

It’s always hard to say goodbye to a colleague, especially someone who’s so central and ingrained in the company lore and holds so much of the “tribal knowledge”.  Hell, I was that guy just a couple years ago.

calvin-brain-dump
Can you just leave your whole brain right there on the desk? Thanks.

So now I’ve seen a couple such old-hats move on from my current team, and seeing both sides of the proverbial torch-passing is interesting.  There’s definitely some very common, very important things that we should always do.

Documentation, documentation, and more documentation.

Indeed.  Also, finishing critical tasks, handing off in-flight projects, re-assigning tickets, talking to managers, prepping teammates for the work overflow, and cleaning out that huge buildup of clutter that you’ve collected over the years.  Virtual or physical… often both!

Unsurprisingly, where we all seem to differ widely is the human aspects.  Breaking the news, saying goodbyes, doing those last-minute get-togethers and send-offs.  What do those last few weeks and days look like?  For some, it’s just business-as-usual up to the last minute — they’re literally so busy they have little other choice.  That’s how it was with the helpdesk manager we parted with last year.  I used some of the time to put together documentation and thank-you letters, which I hope ended up being helpful.  Database diagrams were printed and taped.  Wikis were written.

But the main thing is to make sure you exchange contact info and stay in touch.  It gives the team a sense of comfort, knowing they can reach back out when those random questions that nobody’s thought about for several months resurface.

keep in touch and stay awesome
KITSA!

I’ve learned a lot from those folks that took the time to pass on their knowledge and made the effort to keep in contact.  And I appreciate them for that!  Today I’ll thank one of my exiting managers; she knows who she is.  She taught me a lot about our internal application stacks, integration and interop, company culture, tribal knowledge, and not standing for anybody’s BS, including my own.  Good luck with consulting, stay in touch, and kick some butt!

That’s all for this week.  I promise I’ll work on that “database collation problems” post soon…  :o)

T-SQL Tuesday #98: Orphaned Users Redux

It’s that time again!  The first #Tsql2sday of 2018.  Thanks to the Blobeater for this month’s invite: “your technical challenges conquered”.

Because I’m already ridiculously late, I have a short one.  This is about orphaned users — you know, when you restore a database and its users aren’t mapped to the server logins that they should be or used to be.

orphan-movie-poster
Not that kind of orphan… pretty decent movie tho!

The typical solution is sp_change_users_login with the auto_fix or update_one option.  But guess what?  Yep, that’s deprecated.  By the way, did you know that it also has a report option?  Apparently that’s got some bugs…ish?  Try it sometime and see — compare the output of sys.sp_helpuser where the ‘LoginName’ column is null, with sp_change_users_login 'report'.  Preferably on a DB you’ve restored from another server.  😉

So what’s the correct solution?  ALTER USER [theUser] WITH LOGIN = [theLogin].  Simple, no?  Let’s get more general.  Could we come up with a half-decent way do apply this kind of fix dynamically?  Well sure, the nice folks at DBATools have already solved that problem.  And that’s great, really.  But just in case that doesn’t work… ^_^

One of the many things I love about SQL Prompt is the right-click option to “Script as INSERT” (from the results grid).  This is a quick & easy way to built a temp-table for the results of an exec statement so you can do the ol’ insert #tmp exec sys.sp_blah !  Then we can query the list of DB users for the null LoginNames and write a little set of queries to fix them!  Sound good?

UPDATE: Behold the code!

Happy Tuesday!

PS: Coincidentally, today’s (Thursday) SQL Server Central newsletter featured a very similar post by a gentleman over at Madeira Data.  Go check it out, it’s another great solution to this problem!  And while you’re at it, get the SQL Server Radio podcast (created by a couple guys from the same company) – it’s a terrific addition to your iTunes library.

Adventures in Database Renaming

Renaming a Database looks easy, but it’s really not. Unless nobody’s touching it. Let’s find out why!

Databases sometimes need to get a new name.  Or you need to swap names.  It can be a royal PITA if the DB is a mission-critical always-being-used everybody-and-their-mother-touches-it-all-the-time thing.

Errors like “The database must be in a state in which a CHECKPOINT can happen” or “Cannot obtain exclusive access to the database” or “This transaction was chosen as the deadlock victim” abound.  Sometimes it’s better to fall back on our old friends, detach & attach.

Let’s paint a picture!

lets paint a happy little tree
And if you tell ANYONE, that that tree is there, I will come to your house, and I will CUT YOU…

Our ERP database has been chosen by the IT gods to get moved to the shiny new flash storage array, off the old spinning-rust SAN.  This is fantastic news for the business users.  But lo, the executives warn us, “You must do this with no downtime!” (said in my best Brent Ozar PHB-imitation voice).  Of course when we tell them that’s impossible, they say, “OK, you must do this with minimal downtime.”  That’s mo’ betta’.

So what are our typical options for doing a database migration?  Or, more specifically, a data file migration.  See, we’re not moving to a new server, and we’re not moving a bunch of databases together; we’re just moving this one ERP database.  And we’re keeping it on the same SQL instance, we’re just swapping the storage underneath.  Now yes, in a perfect world, both storage arrays (both SANs) would inter-operate and they’d be able to take a snapshot, pass it over, spin it up, swap it out, and our SQL instance would be none-the-wiser.  But alas.

Other options?  Log-shipping, sure; it’s been around forever, it’s very mature, simple, and relatively easy to operate, even if it lacks a bit in terms of automation capability.  But let’s face it, the DBA is going to be sitting in front of his/her screen babysitting this operation no matter what, so that’s no biggie.  Mirroring?  Meh.  Expensive Edition’s still got all the hot action there, and we’re not that fancy.  Availability Groups?  Sure, if we already had them set up & running.  But we don’t.  Sad-trombone.

Back to Basics

So we circle back to the classics, “backup & restore”, “detach-move-attach” routines.  But there’s a hidden gem of a hybrid here: We can backup, restore as a new name, then switch the names!  The catch is, we need to put the original DB in read_only mode while we do this, to prevent new/updated data from coming in.  Now, when we ask the Dev-managers if their overlaying apps/websites can handle the DB being in read-only mode for an hour, they’re much more inclined to say OK to that, as opposed to the DB being actually ‘down down’.

clinton-depends-on-definition-of-down
are we talking about “down” time or “downtime”?

Now, there’s a way to do this in T-SQL, and there’s a way to do this in PowerShell.  Both are fine!  I’m not arguing for one or the other — pick a flavor and go with it.  This happened to be a day where I was feeling SQL-ish, not PoSh, hence the flavor I chose.

The Walkthru

In my example we have our “slow” storage drives ‘D’ and ‘L’ (for Data and Logs, respectively).  We’re moving our blessed database to “fast” flash storage drives ‘E’ and ‘M’ (you know, because they come after ‘D’ and ‘L’).  The team managing the dependent apps have given their blessing to have a ‘read-only mode’ maintenance window for the database, as long as the actual ‘downtime’ is mere seconds.

Once the DB is in READ_ONLY, we do a backup,preferably to the flash storage so that it’s faster (and the restore, from flash to flash, will be super-fast!).  We then restore that backup to a new DB on the flash storage drives, and, as expected, it’s speedy-gonzales.

he makes speedy gonzales look like regular gonzales
Flash arrays are awesome, btw. =D

Once our two copies of the database are up, we’re ready to take that minuscule down-time.  We kick the users out of it (with SINGLE_USER mode), detach both DBs, and re-attach them (create database .. on (file), (file) .. for attach) with different names!  Presto-change-oh, the ‘original’ DB is now resident on flash storage, and the old slow files are now a DB called “whatever the heck you want it to be, because it doesn’t matter anymore!”.

Disclaimer:  I in no way support nor condone the naming of databases with spaces or special characters.  If you actually name a database something like that, no matter how temporarily, I will find you, and throw sporks at your keyboard.

Assuming you didn’t run into any crazy locking problems or sticky-sessions that denied your attempt to force SINGLE_USER mode or detach the DBs, you should be good to go!  I even considered building all this into an Agent Job, but… I’m worried it would get tripped up and leave the DBs in a half-arsed state that wouldn’t make our Dev-managers very happy.  But hey, that’s what we have test environments for, right?!?

RIGHT.

Here’s the code!  Enjoy.  =)