Blog

T-SQL Tuesday #104: Code You’d Hate to Live Without

And that’s where we could use a little encouragement, I think — another DBA saying “Yay, it’s not just me!” makes it all worthwhile.

Advertisements

It’s that time of the month again!  Bert‘s fantastic invitation complete with YouTube vid is tempting indeed.  There are so many wonderful community scripts and tools for SQL Server DBAs.  But, in an interesting twist, he specifically asks us about something self-written or self-created.  And I’ll admit, I sometimes have trouble being exceptionally ‘proud’ of my home-brewed stuff.  Sure, I’ve blogged and GitHub‘d, and hopefully those have helped someone along the way.  Most of the time, though, those are specifically suited to a given use-case.

The more difficult code to share, I think, is the stuff that we keep in our “daily-grind” “Get-Stuff-Done” folders.  Scripts that we’ve tweaked and commented sporadically throughout the years, finding this edge-case or that tweak for a given scenario, most of which ends up being pretty environment-specific.  And that’s where we could use a little encouragement, I think — another DBA saying “Hey, it’s not just me!”, or “I knew there was another guy/gal out there who always has to do this kind of thing!”.  Thus, here I will attempt to show off something along those lines and see what you kind folks think.

Where Does it Hurt?

No, I don’t mean the famous wait-stats queries by Paul Randal (those are awesome).  I mean, what are the top few areas of your SQL environment where you’re always questioning something, or being asked to find some answer that’s not immediately obvious?  For me, there are 3.

Replication

Transactional replication is, in a word, ‘brittle’.  It works well when it’s working, and usually you don’t have to think about it.  But when you do need to check on it, or change anything about it (god forbid), or troubleshoot it (more like shoot it, amirite?), it feels a bit like trying to mess with a half-played Jenga stack.  Meaning, you might be just fine, but you might send the whole thing crashing down and have to rebuild it all.

peanut-brittle-from-the-joy-of-baking
As brittle as this stuff. But not nearly as delicious.

I won’t go into the whole troubleshooting aspect here, that’s too much scope.  But there’s a question that can often come up, especially from Devs or Biz-Analysts, and that is: “Hey, is TableX being replicated?”  And rather than subject my poor eyeballs to the replication properties GUI, I just run a little query, which reads from the distribution database and the actual database that’s being published (the ‘publisher’ db), and tells me a list of tables & columns that are being replicated.

Here it is.  Be sure to replace [dbName] with your actual published DB name.  Like StackOverflow or WideWorldImporters, or AdventureWorks <shudder>.

Report Subscriptions (SSRS)

Another question I’m often asked is, “Hey, did ReportX run?”  What they really mean is, “Did ReportX‘s email subscription get sent to BigWigUserY?”  We have an unholy amount of SSRS reports with email subscriptions.  And because I don’t care to bloat my inbox by BCC’ing myself with every single one, I rely on the users to speak up when they don’t receive something they’re expecting.

“This is a terrible idea.”, you say.  “Never trust the users!”

Yes, well, such is life.  If you have a better lazier solution I’m all ears.

So here’s a little script I wrote to query the ReportServer database (which MS will tell you is “officially unsupported”, snore).  If I know some keyword from the report title, or the supposed recipient’s email address, this will tell me if it ran successfully or not — LastRun, LastStatus.  A couple other useful bits: where it lives, ReportPath, and what its SQL Agent Job’s Name is, JobName.

That last bit is peculiar.  The JobName looks like a GUID (because it is, because SSRS just loves GUIDs), but it’s also the actual name of the agent job, which you can use to re-run said job — via exec msdb.dbo.sp_start_job — if the failure wasn’t systemic.  As I often do.

Disk Space

Last but not least, everybody’s favorite topic to try and forget about or pawn-off to the SysAdmins.  “How much space are those databases / data files / log files eating up?”  Well, mister suddenly-cares-about-disk-space-but-is-OK-with-storing-all-domain-users’-iTunes-music-libraries-on-the-central-fileshare-along-with-their-documents-because-that’s-what-we’ve-always-done.  {True story.}  Let me tell you.

keep calm and release the bitter
It’s healthy, I swear!

This script has a lot of comments and commented-out lines because I will tweak it depending on what I need to see.  Sometimes it’s “show me the DBs where the logical filename doesn’t follow my preferred pattern” (the DB name with ‘_data’ or ‘_log’ after it); or perhaps “Only show me files over 5GB with a lot of free space” when I’m itching to shrink something.

“Never shrink your files!  (In production!)”

Yes, thank you, knee-jerk reactionary.  I’m dealing with servers in lower environments, usually, with this one.  😉

What do you think?

I’d love to hear your feedback in the comments!  Happy Tuesday!!  =)

Credit where credit is due.

I did not magically come up with these all by myself.  They’re pieced together from many a StackOverflow answer and/or other community blog post or contribution that I’ve since forgotten.  I usually store a link to the source in these kind of things, when it’s true copy-pasta, but in these cases, I added enough of my own tweaks & style that I no longer tracked the original linkage.  If you see anything that looks familiar, please do tell me where to give kudos and shout-outs!  😀

The Horror of Conflicting Database Collations

All these queries with all these JOINs on columns with mis-matched collation, lead to very sad pandas: RBAR operations (row-by-agonizing-row, unable to use index-seeks), and high CPU.

It’s not even close to Halloween, but I promised I would get to this someday, and that someday is now. Strap in, grab the popcorn, and turn up the volume.

Oh wait, this is just a textual medium. Forget the volume bit.

If you’re not sure what this collation thing is, you should go back and read my teaser post, and as always there’s the docs. The one-liner recap on why it matters and how it became a target of my ire is this: legacy DBs use the old default SQL latin1 collation, but an upgraded ERP system’s DB now uses the new default Windows latin1 collation; these DBs all talk to each other, including linked-server JOINs, and performance of those queries has gone to shit.

Pardon the French. “Gone to hell in a hand-basket.”

where exactly are we going in this handbasket?
It can’t be any hotter than a California summer!

So why did this happen? Let’s try to find out. But first; let’s get specific about how all this wailing and gnashing of teeth actually manifests in real technical terms.

The Problem

Essentially what happens here is an implicit conversion problem. There are several blog posts from our distinguished community leaders on this topic and its woes. It causes heavy CPU load as the SQL engine tries desperately to match values of different data types. Even though both columns may be, say, nvarchar(20), the fact that one uses collation SQL_Latin1_General_CP1_CI_AS and the other uses Latin1_General_100_CI_AS, makes them somewhat strangers — they might as well be an INT and a varchar!

Now again, this is my example. We have dozens of critical application queries using their own little sandbox-y databases, joining to the central ERP system DB to fetch Customer or Sales data. This is already a recipe for sadness.

“Use a middle tier or service layer, dammit!”, you’d say.

“You’re preaching to the choir,” I’d reply.

Hell, you’re preaching to the preacher, at that point. But it’s not that simple, as I’ll touch on later.

There’s a subtle difference here, vs. those many community blog posts, which I’ll repeat.  The columns are of the same type.  Just different collations.

And when the collation on the join predicates is different, bad things happen. Let’s take CustomerNumber for example. On the ERP side, it’s a nvarchar(20) collate Latin1_General_100_CI_AS. On the internal & web apps side, it’s a varchar(20) collateSQL_Latin1_General_CP1_CI_AS. As you might imagine, this is a prime field for joining because it’s the main customer identified throughout all the systems.

Let’s be clear here. This is a numeric value only. Did it need to support Unicode? Absolutely not. Should it have been an int or bigint? Probably. But did The ERP designers choose to make it Unicode string anyway? Yep.

Premature optimization may be a root of evil in software development, but inappropriate data typing is at least as evil in database development.

Anyway. The point of this post is not to rant and whine about the design of that particular system. I’ll save that for another day. That won’t stop me from complaining about the improper ways it’s used.

this is why we can't have nice things
Exactly.

As stated above, all these queries with all these JOINs on columns with mis-matched collation, lead to very sad pandas: RBAR operations (row-by-agonizing-row, unable to use index-seeks), and high CPU.  Under duress, my 32 core servers ground to a halt with blocked sessions, ASYNC_NETWORK_IO waits, and 99% CPU utilization metrics.  Needless to say, these were not good days.

Side-note: I really enjoyed Bert’s post because it helped put things into very easy-to-understand terms, and is targeted at a wider audience than just the DBA.  Read it, and watch the video too!

Attempt to Treat the Symptoms

Because the ERP system has been upgraded a couple times in the last decade, the team came up with a [very thin] abstraction layer manifested as a series of “integration views” that UNION similar core entities from the older and newer incarnations of the database. Like Sales records. These have permeated many many areas of the ecosystem, to the point that almost all apps use them instead of the “raw” source tables. Which sounds great, right? Riiiight.

Until you throw in that monkey wrench of conflicting collations. Remember, Devs are lazy (in a good way). Their apps and queries need to not care about such a low level detail as database collation. So to support that “not caring”, we set up these integration views to apply the older default collation (the one that matches everything else in the environment) to the output columns. That way, no extra work is required to consume them in the same way they’ve been consumed for the last 10+ years.

Basically, we can add the keywords COLLATE DATABASE_DEFAULT after each column declaration of the view, like so (in the form “alias = sourceColumn”): CustomerNo = erp.CustomerNo COLLATE DATABASE_DEFAULT.

This was a terrible idea.

It made sense at the time. But as the months passed and the complaints of performance degradation mounted, the signs continued to point back at these views which now used this collation-conversion mechanism (prior to the latest upgrade, they did not).

The typical work-arounds involved temp tables and/or going straight to the “raw” source. Neither of these are ideal — the latter breaks that abstraction (however thin it was), while the former risks over-burdening tempdb (sometimes referred to as the “communal toilet” of SQL server). Generally this was acceptable, and often resulted in orders of magnitude improvement to performance. But it continued to rack up that technical debt.

One thing I tried was to remove those collation conversions from all columns in the view except the join predicates, because the consumers fed those values straight into the object or ORM layer, at which point they all became C# strings anyway, so it didn’t matter what collation they’d used or whether they were ANSI or Unicode at that point. But alas, because the core pitfall of these queries was still very present — that implicit conversion — performance still suffered.

Treating the Root Cause

Here I re-link the two scary articles that warn of the dangers and gotchas of changing your database and server (instance-level) default collations: StackOverflow answer, and blog post.  Given all that, it’s a daunting task.  But if we’re going to get our performance back up to snuff, it’ll probably have to happen at some point.  As the great Mike Rowe says…

it's a dirty job but somebody's gotta do it

And unlike the previous blogger, I will do my very best to follow up here and post about my journey as we undertake this harrowing trek.

The way I see it, there are three major attack vectors.

  1. We can try converting the ERP database to the old SQL collation.

Pros: smaller effort than #2, less integration/regression testing overall.
Cons: unsupported by ERP vendor, downtime for ERP system, high amount of risk.

  1. We can try converting all other DBs in the environment (across all SQL instances) to the new Windows collation to match that of the ERP DB.

Pros: supported by ERP vendor, future-proof, less tech-debt.
Cons: largest effort, heaviest testing burden, high risk.

  1. We could utilize some kind of data-replication to maintain copies of the required data on the other SQL instances in their default (matching) collation.

Pros: support not an issue, lowest effort & testing burden, lowest risk.
Cons: replication maintenance burden & overhead, loss of “real-time” (added data latency), and some tech-debt.

As the lone DBA, most if not all effort falls to me, so I’m quite inclined toward #3. And we were somewhat already going in this direction with the temp-table workarounds, i.e. pulling in the ERP data (usually via a linked-server) to the target app DB & using that temp-table for joins — essentially, that’s “lightweight replication”.

The technical debt we’re incurring here is that we’re leaving all of our legacy DBs (and servers) in the older SQL collation.  At some point, likely the far-future, these will be unsupported, or at least obsolete, in the sense that all new applications & DBs will prefer the newer Windows collation.  Perhaps during the next big “hardware refresh” cycle, i.e. when we have to plan and execute a big SQL server upgrade/migration, we can consider integrating the collation-change into that project.

But wait, you argue, what about a 4th option?

Oh sure, you mean say, “Screw it, not my problem!”…

  1. Force all ERP DB data access up the stack to the application layers, i.e. the apps pull the data into memory and join/merge it there (or relate to & interact with it however the developers want to, in that layer).

But this has several downsides that the business and the development teams would [justifiably] push back on: dev time & effort, the drastic-ness & unprecedented-ness of the change, the fear of not catching every single place & usage of the ERP in the mysterious myriad of apps that it seems nobody can ever quite get a unified visibility handle on (some of which they can’t even build or deploy anymore without diving down some seriously arcane rabbit-holes of ancient tech).  The risk is just too high.

More than that, and as much as I would love to say “ain’t my problem” and pass it off to a larger group of smart people, the fact is that over 50% of dependencies on the ERP DB are from my own BI/reporting queries.  There’s no shortage of reports that need to examine and relate legacy LOB app data with ERP customer & sales data.  And it’s not just to build a paginated SSRS report — where I could, arguably, do what I said above in faux-option 4: pull the data from 2 separate data-sets, then merge it in the report layer.  It’s heavily customized, painstakingly crafted mini-data-warehouses and data-marts that massage and tailor the data for an assortment of different purposes and reporting needs.  Thus, even with this, most of the burden still falls to me.

some guy with glasses posing under a globe like he's carrying the weight of the world
To be clear, this is NOT me; it’s the 4th pic I found on Google image search for “weight of the world” filtered for “noncommercial reuse” images. 😉  Kinda looks like Landry from Friday Night Lights, no?

 

So, to borrow a closing line from one of my favorite podcasts

May your server lights blink, your database collations be identical, and your cables be cleanly managed.

How is Database Formed?

Data is digital information. A database is a collection of data. And a DBA manages it all.

Borrowing from an ‘old meme’ a bit.  My wife recently said I should “write something about ‘how to do databases’.”  As amusingly odd as her phrasing was, I figured she was right.

well like duh
Obviously. GAWD.

What is it?

I like to start at beginning.  As Julie Andrews said, it’s a very good place to start.  What is a database?  That’s a pretty good question.  Here’s the prerequisite question: What is data?  Well, as I’ve said before, data is everything.  But that’s a bit of a cop-out, isn’t it?  That’s my career’s bias showing through.

Data is digital information.  Anything that can be quantified, specified, categorized, searched, sorted, produced, consumed, read, written, measured, and stored digitally in some fashion.  Data is the digital currency of the 21st century.  Data is the very reason that most technology exists — to house and transport data among producers and consumers of information.  It’s the evolutionary culmination of the stone tablet, the papyrus scroll, the bound book, the printing press, the newspaper, the library, the vinyl record, the magnetic tape, the compact disc, the pocket organizer, and the telephone.

So then, what is a database?  Simply put, it’s a collection of data.  The simplest analogy, depending on your age, is either a phone book or your cell phone’s contacts list (which is really just a phone book, in digital form).  Of course, with the latter, it’s not so much an analogy as an example — you phone’s contact list IS a database.

Fun side-note, the phone book also makes a decent discussion prop for some DBA topics like index fragmentation.

Expanding on that example.  You can search and sort your contacts by several data points: first name, last name, phone #, email, notes.  Different database systems have various names for these: fields, columns, properties, criteria, values.  The point is, it’s all data.  Or if you want to get pedantic, each one is a datum, and together they are data.

Pedantic, me?  Never.

This is what a database, or DB for short, is all about: storing data in an organized fashion so that it can be sorted, searched, sliced and diced.  Building on that, a database management system is a set of technology tools, processes and programs, that are used to gather, store, manipulate, copy, move, read, maintain, back up, link together, and operate one or many databases.  This DBMS can come in many flavors.  I happen to specialize in one called SQL Server, a Microsoft product/platform of the ‘relational‘ flavor — so if you’re following along with the abbreviation game, that’s an RDBMS.

If you’re hungry for more acronyms, the Wiki article on ‘databases‘ has a decent breakdown of the types and history behind them.

But Why?

all your data are belong to us
Non-geeks might not understand this, so Google “all your base” if you’re confused.

The more data you have, the more you can do with it.  Why do you think Facebook, Google, Microsoft, and Amazon are such powerful technological forces?  They purposefully, systematically gather as much data as they can from every possible source, and they have become very good at organizing and managing that data to maximize its value.  Amazon product recommendations are a prime (see what I did there?) example — they are generally appropriate and effective because they have “learned” from your past purchases, i.e. your historical data.  This “learning” – Machine Learning, aka Data Science – is the hot new marketing buzzword of recent years, but it all still comes back to data at the core.

This is not a “bad thing” or a “scary thing” as the old media and tin-foil-hat-wearers would have you believe.  Yes, it can be a little disconcerting, and yes, people and companies can abuse data in malicious ways.  But the vast majority of our digital data stewards actually want to do good.  They want to connect you with more people that you may know and become friends with; they want you to watch movies that you’ll really enjoy; they want you to easily navigate to your destination without being stuck in traffic; they even want to help stop global warming!

As a general business rule, we crave data because it helps us make decisions.  Every time a customer buys a product, we want to measure “the 5 W’s”: who what when where and how (ok, that’s not a ‘W’, but there’s a reason for it).  Notice I didn’t list “why” there — only the customer knows why, and that information, that data, is stored inside their brain.  And we can’t (yet) access that data.  So it’s a guessing game now — we feed the other 5 data-points into our DBMS and eventually, given some time and analysis, we can guess the Why.  And pretty accurately, at that.  Then, we can make a decision to “Market more aggressively to Customer Type X”, or “Have a flash-sale on Product Y”, or “Move on this hot emerging market demographic.”

So what does that make you?

Well, I’m a Database Administrator – a DBA.  Which means I “administrate databases”.

‘Administrate’, less common form of ‘administer’: manage and be responsible for the running of.

Thanks, dictionary.  So in a nutshell, a DBA manages data.  Deceptively simple sounding, no?  I mean, what can data possibly do; it’s not alive, right?  Actually, if you hang around a DBA for any length of time, you’ll commonly hear the phrase “Where does that data live?” or “That set of data lives over here.”  So clearly we anthropomorphize our data.  Most tech professionals do that to whatever technology they work closely with — it’s human nature.  Software “behaves badly”, machines “throw a fit”, etc.

But anyway, why do databases need to be managed?  What can happen to them?

Developers.  Developers happen.  =D

I joke, as you know, dear reader; I love developers.  Users ‘happen’, too — often more catastrophically.  So it’s fair to say that “people happen”.  But besides that, here are some common reasons that databases, and data, need to be managed.

  • Data can be “wrong”.

Data can either be human-generated or machine-generated.  Fingers on a keyboard, or sensors on a circuit board.  You wouldn’t think the latter could possibly ever be “wrong”, but both kinds are subject to error.  It’s just that the level of “wrongness” is subjective and depends on who’s asking and what’s expected of the system as a whole.

  • Data gets lost.

Humans interact with and manipulate data, and humans make mistakes.  Why do you think the Undo button became such a staple of so many computer applications?

  • Data gets corrupted.

Storage media (magnetic disks, silicon chips, etc.) are not perfect — they have a documented level of fault tolerance and failure rate — so we need to ensure that our data is preserved (by moving it to another area that’s not ‘faulty’, usually) past those failures.  Why?  Because our data is essentially “more valuable” than the hardware on which it’s stored.

  • Data needs to be organized.

This is slightly more subjective than the above; how and why we organize data is highly dependent on the overall intent of the systems that will interact with it.  But fundamentally, if there’s not some form of organization, the data is effectively garbage.  If you ripped out every individual page in the phonebook and scattered them all on the floor, it’s no longer an effective tool to find someone’s phone number; it’s just a mess of papers.

  • Data needs to be useful.

If we can’t do something with the data, what’s the point of having it?  The temperature at the North Pole on January 1st 1989 is, by itself, inconsequential.  But a history of temperatures at the same and similar locations, over a long period of time, gives us some great value — we can see trends, look for anomalies, and even predict the future of what those temperatures might be.

  • Databases need to be available.

Similarly, if we can’t access the data, what good is it?  Databases are a technology, and like most technologies, they occasionally break.  Again, most of that comes back to humans, because humans are the ones writing the code that creates the software that houses the data and runs the database, or that interacts with it.  But of course we still have power failures, network losses, disk failures, and even solar flares.  (Ask your favorite superstitious engineer; they’ll have at least one good story about a system outage that could only be blamed on solar flares or gremlins or the full moon.)

  • Databases need to be maintained.

Every DBMS has some kind of assumed ongoing maintenance requirements to keep it “running smoothly”.  Just like your car needs an oil change every 3 to 8 thousand miles, your databases need periodic attention to retain all of those important qualities discussed above.

And the latest big topic, underscored by the GDPR:

  • Data needs to be governed.

This is a big topic for another conversation, but the gist of it is, data is generally “owned” by someone, and deciding who owns what, where it’s allowed to live, and how it’s allowed to be used, constitutes an entire sub-industry of rules, regulations, policies, tools, security practices, and consequences, much of which we’re only just beginning to shape and understand.

TL;DR: What do you actually do?

who is your DBA and what does he do
Umm… stuff? And thangs?

I currently work at a “small enterprise”, a business that has been around for some decades (as opposed to a Silicon Valley start-up who counts their anniversaries in months, like an infatuated teenager), managing their database systems.  Some of that is financial/accounting, some is customer info, some is internal/operational, and all of it is important to at least one person in the business in their daily decision-making efforts.

Thus, I help ensure that the data is always ready, when it’s needed, in whatever form & shape it’s needed in.  I model, massage, correct, enhance, and move it around.  I help developers write faster queries (that’s a fancy word for “questions” that we ask of our data); I aide analysts with understanding and gleaning more value from the data; I maintain the underlying systems that house the databases and ensure that they perform well and get upgraded when necessary; and I work with business drivers (VP’s, CxO’s) to build reporting solutions that leverage the data to enable better, smarter decisions, and ultimately (hopefully!) increase profit.  (This last part is actually crossing into the BI – Business Intelligence – job role, which tends to happen to most small-shop DBAs, because they’re usually in the best position to make that transition.)

If some of that sounds like a blurb from a résumé, it kinda is.  This job has existed since the 80’s.  But it’s always evolving, like the tech industry in general; so just because we’ve been around a while doesn’t mean we’re all old crusty bearded dudes.  (Although we do have some prolific beards among us!)

So there you have it.  Now you can tell your friends and family what a DBA does.  Or at least, hopefully, I’ve helped my own friends & family understand a bit about what I do.

Quickie: Use of APPLY Operator

There are many great use-cases for the T-SQL APPLY operator, and if you haven’t tried it yet, I encourage you to check out Kevin Feasel’s excellent presentation on it here.  Today, I want to demonstrate a particularly interesting use-case (at least, to me!) that I found for it while tackling a real business problem — for reporting, specifically, which is what I spend a lot of my time dealing with lately.

The Setup

We have a “history of estimated market values” table, which stores, for some arbitrary dates based largely on user input, values (prices) of our proverbial Widgets.  In other words, we can’t guarantee we have every date in there for a given time period.  Over on side B, we have a “sales” table, which tells us when a given Widget was sold and for how much.  The business question, simply enough, is “When a given Widget was sold, how accurate was its ‘estimated market value’ at the time?”  Or, put another way, “How closely did the sale price of that Widget match our estimated market value?”

we don't make widgets
Yeah, yeah, we’re all tired of the Widget business. If only we could come up with a better term for a generic product to use in discussions of business problem samples.

The Tools

I used two interesting bits of TSQL to accomplish this, the main one being our lovely APPLY operator.  OUTER APPLY, in this case, because I still wanted the rows from “sales” even if we couldn’t find a closely-matching “market value” record.

If you haven’t used this operator yet, think of it conceptually like a JOIN to a sub-query, but with the added value that you can reference the outer tables’ columns within it.  CROSS APPLY is like INNER JOIN, and OUTER APPLY is like LEFT JOIN.

Sorry, RIGHT JOIN fans, no love for you here.  You’re just too weird.

My other little trick involved using TOP(1) in the apply’d sub-query, with a non-standard ORDER BY clause.  As I mentioned, within the APPLY‘s body, you can reference the outer table’s columns.  So I’m easily able to compare the SaleDate (outer table) with the EstimateDate (inner query).  I want “the closest EstimateDate to the SaleDate​”, which means I want the row where the difference between those two dates is the smallest.  Which means making use of our friend DATEDIFF.  So let’s try:

ORDER BY DATEDIFF(day, Sales.SaleDate, MktValueHist.EstimateDate)

Do you see the problem yet?  If I get a negative value, i.e. my SaleDate is way before my EstimateDate, say -100 days, that’s the top 1 row.  I don’t want that!  Let’s try our old friend from many many math classes ago, Mr. Absolute Value.

ORDER BY ABS(DATEDIFF(day, Sales.SaleDate, MktValueHist.EstimateDate)

That’s better!  Now I have the top 1 “closest to my sale date” row from MktValueHist.

All Together Now

I’ll put up a Gist with repro/demo code soon.  Here’s the meat of it, as a sample select, for convenience:

SELECT Sales.WidgetID, Sales.WidgetName, Sales.Price, Sales.SaleDate
, mvh.MarketValue, mvh.EstimateDate
, Accuracy = some_made_up_mathy_thing_here
FROM Sales
OUTER APPLY (
SELECT TOP(1) mValHist.MarketValue, mValHist.EstimateDate
FROM MarketValueHistory mValHist
WHERE mValHist.WidgetID = Sales.WidgetID
ORDER BY ABS(DATEDIFF(day, Sales.SaleDate, mValHist.EstimateDate)
) mvh
WHERE Sales.SaleDate >= '20180101'

There, my completely fabricated yet totally-based-in-reality example of how to get 2018’s Widget Sale Prices with corresponding “closest to the sale-date” Market Value Estimate from our history table.  You could even throw in some fancy math expression for “accuracy”, if you felt like it.  Like maybe “relative difference“.  =)

Note: For simplicity, we’re dealing with “whole dates”, i.e. just date.  Not datetime or datetime2.  If your date values do include times, you’d want to change the datediff interval to something more appropriate, like second or millisecond.

cross apply explained vs inner join and correlated-subquery
Forgive the graininess, it was from a Youtube screen-cap.  In an Oracle presentation, of all things. And I feel like they meant to say “Data exposition“, not “explosion”, given the latter can have negative connotation.  But.. hey, it’s better than a poorly drawn stick-figure or an inappropriate meme, right?  RIGHT?

And that’s how APPLY and ABS() saved my bacon today.  Hooray!

Have an interesting use-case for APPLY?  Or perhaps an alternative approach to a similar problem?  I’d love to hear from you!  Drop me a comment, a tweet, or a LinkedIn msg.  Happy summer!


PS: I’d also recommend checking out Steve Stedman’s useful “Join types poster“, if for no other reason than to have something concrete to use when explaining those concepts to your fellow cube-dwellers.  It has cross/outer apply on the 2nd page in their more commonly used pattern, the TVF (table valued function) relationship.

PPS: Fine, have your meme…

apply yourself, from breaking bad
I’ve never seen the show, so I don’t get it, but it was one of the first Google Image results for “apply yourself meme”. Enjoy that.

Off-Topic: A Short Story

As the rest of the years dragged on, I would always look back fondly at that first exhilarating victory.  There was nothing quite like it.

The wifey has been obsessively binge-ing Netflix’s “13 Reasons Why” recently.  It’s a fantastic show that addresses real teen issues in a respectful yet thought-provoking way.  It made me want to reminisce a bit about my own high school years, and really try to think about why and how it wasn’t all that bad.  And don’t get me wrong; I understand that my experience is probably not noteworthy, and I actually count myself fairly lucky to have had, essentially, an unremarkable four years.  It’s not that being unremarkable should be a goal, nor that I even encourage it; it’s just that, for me, it served a purpose of avoiding big drama and simply getting me where I wanted to go — even if I had no idea where that was going to be.

So I’ve put together a sample story – a “chapter”, if you will – from what I hope will eventually become a memoir of sorts, a “story of my life” to one day pass down to our kids.  If you remember high school, and especially if you were a band kid, I hope you’ll get a kick out of it.


Chapter 3 – Band

High school band, specifically marching band, was a great experience, and a suitable alternative to sports.  I was terrible at sports.  My younger brother had proven decent at baseball in little league, but none of that talent made its way to me.  (He didn’t take it any further, either, so I don’t feel bad about it.)  It worked out that, after freshman year, marching band counted as phys-ed. credit, so I never had to take another P.E. class after the first one. I did anyway, but that’s another story.

Trumpet was my instrument. Had been since 5th grade, after my father’s encouragement from having played the French horn back in his day. I’d tried French horn before, but I never quite got the hang of it. It’s a strange instrument, for a brass, in that you actually need to use your 2nd hand to hold and muffle the flared bell to produce subtle tone effects. Trumpet’s a little simpler — you just purse your lips and blow, and press a row of 3 buttons to control note progression.

I wasn’t that great at it — never made “first chair” (which means you’re the best at your particular instrument) or had any solos, but I toed the middle line satisfactorily. Having braces didn’t help; in fact, the position of the mouthpiece on the lips coincided exactly with the brace brackets. But with a combination of inner-lip calluses and sheer will, I made it work.

I always admired and envied the “rock stars” of the band, especially the trumpet players who could hit those super-high notes with such ease.  There were two guys in particular — Jared and Mark. Mark was a junior, a lanky rude-boy (fan of ska & jazz) with spiky hair and a contagiously good attitude.  Jared was a no-nonsense senior who’d seen and done it all, making a great section leader.

Editor’s note: said Wifey should skip the next paragraph.  =P

And then there was our junior leader, Nicole.  Ooh boy let me tell you.  Picture a hot summer morning out on the football field for marching practice; icy water bottles being used to cool off sun-soaked sweat-beaded skin; and a tall tan teenage Cali-girl in short shorts and a rolled up tank top, telling us young’uns what to do and where to go.  Can I get a 2-syllable ‘day-umn’?  Yes, that first year of marching band was quite the eye-popper.

In order to truly appreciate this story, you need a basic understanding of the way high school marching band works.  It’s in the fall, or first semester of school, to coincide with football.  While we support and play at some home-games, our biggest commitments were “tournaments”.  These are competitions hosted by various large high schools where they invite a number of other schools in to display their marching band’s “field show”, which is basically a series of songs played while marching into various formations that look like shapes and figures from above.  Each band is judged on both their musical and visual performance.

The color guard, a small team of girls (usually, at least in those days), performs along with the band, by waving colorful flags and banners and doing some choreographed dancing on & around the field.  Think of them like cheerleaders, but more elegant, and replace the pom-poms with twirlers and the mini-skirts with more flowy dress-like outfits (sometimes.. though here were definitely other schools who pushed the sex appeal angle much more with their own color guard).

You also have to understand that, unlike a sports team, the band didn’t have locker rooms.  So essentially, the buses were our locker rooms.  We did probably 5 to 10 events in a given season, only one of which was our own self-hosted tournament, so we were on the road a lot — at least, it seemed like a lot to me.  The bus was our changing room for putting on our uniforms, our break area for chatting and hanging out between the performances and the awards, and our celebration circle (or, in worse times, our den of commiseration).  Different types of people put up varying degrees of protest or privacy — some had to be in the very back with complete coverage and make-shift curtains made from spare shirts or towels, while others were happy to flaunt their undergarments to most of their peers, probably in an effort to tease and woo the opposite sex.  I was somewhere in the middle (as usual); I hid behind the seat-back and kept it quick & subtle, but I also tended to wear a regular tee-shirt underneath the uniform.  The aforementioned Katrina (of my previous chapter) was always around to cast a flirty glance or suggest a extra spray of her favorite cologne to make the stank more bearable.

A small side-note. Our school colors were brown and gold — the Golden Bears — but this made an absolutely horrible color scheme for uniforms. The regular ones were a brown base with gold and white trim, but they never quite got the hue far enough away from ‘shit brown’. The alternate uniforms were a little better, having a white base with gold and brown trim, but of course, they got dirty much faster, so we didn’t wear them as often as I would have liked. I do hope they’ve come to their senses and changed up the color scheme, or at least tweaked the uniforms so that they don’t remind spectators so much of human waste. Thankfully the color guard’s uniform colors were more friendly, being of a teal & fuchsia variety.

Finally, the third key concept here, is that each band is in a “class”, which is like a ranking system based somewhat on your high school’s historical performance, but mostly (read: almost entirely) on your size — the number of band members. Generally, the larger, and richer, high schools — in our area, Rancho Bernardo, Poway, Mount Carmel, and a couple others from the wealthy areas of greater San Diego — had the biggest bands and were thus in the highest class, AAA. We had historically been in AA (just below the top), and had, from what I heard in passing from the seniors, a decent ‘win’ history. Depending on the size and attendees of a given tournament, we could default down to the same class as the others involved; i.e. if nobody else was above ‘A’, we’d compete as ‘A’ too, which would be to our benefit.

Think of it like your weight classes in boxing or wrestling. Just because you’re a heavyweight doesn’t mean you’re more skilled than a featherweight, it just means you weigh more. Sure, the weight (or size) does make some difference in the competition, especially if we use the wrestling metaphor. It’s just not everything. And there can be a hidden motivation to try to “make weight”, i.e. to get into a smaller class so that you have an edge over your opponents.

This being the late 90s at a growing suburban school, our band was growing in number, but not necessarily in skill or in booster dollars. In the wrestling metaphor above, we were basically gaining flab. The class system also hadn’t been updated in a while — basically anything over 150 was AAA , but those big rich bands I mentioned before tended to be in the 300s. So unfortunately, we were basically “forced up” into the AAA class with our larger number, but we were still way outgunned and out-funded by those that had long held the candles in that high hall.

Now, having said all that, my first year in marching band was one of the most exhilarating, and it’s largely due to our first and only “sweeps” win in one of the first tournaments of the year.  A sweeps win is when your band wins the highest trophies in its class and in the tournament.  Looking back, there must have been a perfect storm of coincidences that led to it.  This was a relatively small tournament; none of those big rich bands attended, and we ended up being the largest one there.  I think it was hosted by Orange something-or-other high school.  The bus ride was a bit longer than most, maybe an hour or so.  Our uniforms were freshly pressed, having not been worn yet this season; and we’d barely finished mastering our show (the music and marching steps/positions, i.e. the choreography).

There was something in the air that night.

We arrived in the late afternoon, not too long before our turn was scheduled. We changed on the buses and lined up to take the field.  It was cool and temperate that evening, not too cold, but not warm enough to cause a sweat.  Perfect marching weather.  The emcee called out, “Tuh-MEC-you-la Valley High!”, and we took the grass.  It was well maintained for a small school; no big potholes or divots, clean and even yard-lines.  Our fearless leader, ‘H’ we called him — short for Mr. Hrbacek (her-ba-check) — took the conductor’s stand, counted it down, and the crisp snap of the snare drums meant it was on.

Our set was a big-band/swing theme, including “Moonlight Serenade” and “Sentimental Journey”.  We’d memorized pages upon pages of marching positions and music for this. Practiced dozens of hours — “sectionals” for an hour after school, those sweaty Saturday mornings, and every chance we could get at a field during class — it felt like hundreds.  Our feet were sure, our instruments were on-key and in-tempo, and we pulled it off, all the way to that final high note and conclusive closing drum beat.

The percussionists were always my favorite, even if I’d never admit it.  They were the driving beat that kept us all going, and the catching energy that fueled our desire to win.  Yeah, the brassy solos and deep booms of the tubas were great — hell, you’ve got to be a ridiculously strong dude (or dudette) to lug one of those bad boys around and march in tempo — but those drums made it all mesh together into something more than the sum of its parts.

So we left the field knowing that we’d gave it our all.  Yeah, we weren’t perfect, there were a few missteps and a few misplaced notes here and there, but we covered them up and soldiered on.  Thus, we took to the bus-changing-rooms once more, traded our uniforms for our street clothes, and gathered in the bleachers for the award announcements.

This was before the post-millennial days of “everybody’s a winner, everybody deserves a trophy”, but perhaps band culture was a bit ahead of its time, because almost everybody did get some kind of trophy.  Although that may have been due to the smaller size of this tournament, as I mentioned before.  Anyway, as with most competition awards, they worked their way up from the bottom to the top.  I wasn’t aware of this at the time, which made me quite confused as to why my elder band-mates were cheering progressively louder and louder as the announcers didn’t call our name. Obviously (now), it meant that we were toward the top.

The announcer has made his way to the final 3 awards – best musical performance, best visual performance, and the granddaddy of them all, “the tournament award”.  He calls the first.  “Best Musical Performance… Temecula Valley High!”  Loud but muffled cheers from our band as the director and seniors try to shush everybody.  “Best Visual Performance… Temecula Valley High!” Louder cheers from our mates as they struggle to contain themselves.  “And the Tournament Award goes to… Temecul–”

We erupt with elation before he can even finish the word.  Hoots and hollers, whoops and whistles.  Our director walks up to humbly accept the giant trophy, which I’m sure looked a lot bigger to us back then than it really was.  The stands empty of the competing bands as we make our way back to the buses.  The air is absolutely electric; high-fives and kudos abound, even between the flautists and the woodwinds, who are, for those of you unfamiliar with band sub-cliques, the quietest and most reserved of the bunch.  As we settle into our seats and prepare for the drive home, from a boom-box in the back of the bus come those timeless strains of Bryan May’s guitar and Freddie Mercury’s piercing vocals.  “Weeeee.. are the chaaaampions, my friennnd. Nooo time for looosers, cuz weee are the chaaampions… of the Woooooorld.”

The adults try to quiet us down, but this kind of celebration isn’t so easily subdued.  A few of the seniors try to explain that we got lucky, that we did ok but we mostly won because we outclassed the other bands.  And we knew, in the back of our minds, that it wasn’t always going to be this way; that jocks would still laugh at us and popularity queens would still snub us; that we’d be coming back on Monday to loads of schoolwork, and to the pressures and insecurities that go with high school life — particularly if you’re a band geek.

But damn if we weren’t gods in that moment.

And then, as the saying goes, it was all downhill from there.  That’s not quite fair, I suppose.  Heck, maybe I don’t give the old coot enough credit; perhaps he carefully planned this strategy of giving us an easy win to hit us with a taste of that sweet drug of victory, so that we’d stick around and keep trying harder, week after week, year after year, to replicate it.  Friggin’ brilliant, perhaps.  It never quite happened, as I said; we were hopelessly outclassed by those infamous high-society bands with their own logo-painted trailers and catered meals and mysteriously shiny pristine instruments that never seemed to fade.  Those top 3 award spots that I mentioned, well – let’s just say we got real tired of hearing the name “Rancho Bernardo”.  Over, and over, and over again.

The tournament that we hosted ourselves came towards the end of the season.  It was a nice break from the competition because, even though we had to perform – twice – we weren’t being judged.  So it gave those rock-star trumpet players time to show off their solo bits in a less subtle way.  In the first performance of the day, Jared actually popped out of line formation and did a half-kneel toward the crowd as he belted out those crisp 4 high notes – but in doing so, he flubbed just a bit, and he got crap for it later from H. and Mark.  Thus, at the night performance, he stayed in position, but absolutely nailed those notes, complete with a little trill-up and doo-wah.  There were a lot of bands here, more than almost any tournament we’d been to, it seemed.  I wondered why, but I’d come to realize later, after learning a bit of regional geography, that were we a convenient mid-way location between Orange and San Diego counties, so it made sense that those bigger schools wanted to come battle each other on the marching field without driving over 2 hours to either one’s hometown.

As the rest of the schoolyears dragged on, I would always look back fondly at that first exhilarating victory.  There was nothing quite like it.  Along with the occasional cleavage-peek on the bus, the weeks of pizza and coke on the road, and that Saturday morning navel-gazing at practice, it was enough to get me hooked for 4 solid seasons.  I even convinced my parents to buy me a Letterman’s jacket with the band letter in junior year. But the biggest adventures were yet to come.

T-SQL Tuesday #102: Giving Back

This month hosted by Mr. Major (scribnasium / @RileyMajor), whose past #tsql2sday post I may have missed but I love it all the same (BeyondCompare FTW!).  It’s about giving back to your community — mostly technical, i.e. the #SQLFamily / MS Data Platform community, but it can be about any community you’re a part of.

For starters, that’s what we blog for, at least partly.  I mostly blog about things I’ve learned or found interesting in my work, so that I don’t forget them later!  Of course, I’m always happy to learn that it helps someone else out there with a similar problem.  But there’s so much more to a community ecosystem than that.

SQL Saturday

SQL Saturdays are the perfect example of this.  I’ve been to 3 so far – 2 at Orange County, and 1 at San Diego.  I have to call out OC’s organizers, Ted Stathakis & .. the other guy.  Crap, I forgot his name.  He’s awesome though.  Srsly.  I talked with Ted a bit after a session at the last one and he truly is as nice as he is busy – 3 boys, all kinds of volunteer work, AND a full time job as the BI Director at Del Taco!  Wow!

I mean, it’s no Taco Bell, but still, Del Taco!  (Sorry, I had to..)  =P

I really want to volunteer to help at one of these.  I’m not sure what I’d do, but I know when the time comes, it’ll be worth it.

Lunch & Learns

To get my feet wet, I hosted a couple “lunch & learns” at my company with my Business Analyst and Development teams.  We talked about some basics, ranted about formatting a bit, tried to say goodbye to old habits from the SQL 2005 days (hello, date type!), and even dived into a few juicy things like performance-testing with IO stats and why you should never use scalar or multi-statement functions.  We also had a couple heart-to-hearts about DevOps and what it means in our environment.  (Hint: we’ve got a LOOONG way to go.)

At some point I’m going to scrub my slides and post them on SlideShare or something.  I just have to remove any company-specific info first.  ;o)

As with most teaching efforts, it helped me learn (and re-learn) some things myself!  I had to read some documentation to get wording exactly right, and I built a few playground DBs on my local machine to try out different concepts & examples.  Plus, it forced me to justify WHY I held certain opinions or notions on things, and in doing so, realize my own mistakes of the past.  Thus, I became a better DBA just by reinforcing some good practices and updating my own assumptions.

the-more-you-know
Yay learning!

Generosity

If there’s one thing I’ve learned from the many tributes to our late #SQLFamily member SQLSoldier, it’s the importance of being generous with your time.  Whether that means answering to the #sqlhelp tag on Twitter, participating in the SQL Community Slack, answering StackOverflow / DBA.StackExchange questions, or just taking a few moments to help someone at your workplace with a tech problem — it all makes a difference.  I need to be better about this, as far as my “online presence” goes.  In-person, I’m always eager to help, but because I work remotely most of the time (yay!), it’s a bit of a moving-target — the in-office days get packed with meetings and critical face-time (not FaceTime), so the peer-to-peer stuff is often postponed or relegated to Slack.

However, there’s a flip-side to this.  In being generous, we can’t forget to prioritize ourselves and our families.  I’m always being asked “Why are you working so much!?” — well, I could try to explain the difference between ‘work work’ and ‘tech community involvement’ and ‘self-betterment / career planning’ but… yeah.

Sorry, honey, be done in a sec!  =)

Anyway..

I encourage you, dear reader, to give back to your own communities, in whatever form that takes — tech/online, real-life, etc.  I promise that it will benefit you in some new & unexpected way.  Plus, whenever you make a real solid connection with someone, that’s worth its weight in gold.

it's not what you know, it's who you know
ORLY? Yes, RLY!

A SQL “Whodunnit” Trigger

Triggers aren’t bad, if used for the right reasons.. Here we look at an “audit-trail” use-case.

Inspired by a brief conversation in the #CodingBlocks community Slack: A short discussion and example of a “who-dunnit” (“who done it”, a colloquialism for a murder-mystery type thing) trigger, to find how what user is doing deletions against a certain table.

the cast of CSI LV
The original, or nothing.

The Background Check

Let’s name our hypothetical database CSI.  In it, we have a table, dbo.Victims, where it seems like data is being randomly deleted at random times.  As we all know, this is impossible — computers never do anything truly randomly, much less RDBMSes.

Insert witty counter-example here.  You know you have one.

So we want to find out who’s doing these deletions.  One DBA says, “Hey I got an idea… Let’s put an after delete trigger on the table!”  Another DBA says “I abhor triggers; let’s log sp_WhoIsActive every 5 seconds to try to catch the suspect ‘in-the-act’.”

Both approaches have their merits, and neither is that uncommon.  However, the latter is much more regularly blogged about, so I’m going to present the former, because it kinda helped remind me of a few things that I hadn’t used in a while.  I’d also argue that the latter is much less of a “guaranteed capture”, since you’re gambling pretty liberally on the fact that the delete transaction will even last that long; it’s statistically more likely that you’ll miss it.

The Setup

Here’s a SQL snippet that shows a basic after delete trigger created on our dbo.Victims table.  Notice the use of the special Deleted table reference — this is a “temporary, memory-resident” table according to the Docs, and it holds all the records that were/are-about-to-be deleted from the target table.

I feel like it used be called a “temporal table”, but that now refers to a new feature in 2016, where SQL keeps a hidden history-tracking copy of your table that you can reference like a time-machine; which, incidentally, almost* negates the need for such things as these triggers we’re talking about, but that’s another topic for another time.

*(The ‘almost’ is because temporal tables don’t tell you “WHO”, which is our primary motivator here.)

The interesting bits are how we identify our suspect, our ‘killer’ if you will.  See, we not only want to know who they are in the database context, we also (and likely, more importantly) want to know who they are at the server level context.  And just in case they’re impersonating another login, we want to check that too.

So we actually have a lot of options here.  There’s CURRENT_USER or USER_NAME(), for the DB context user.  Then we have SUSER_SNAME(), SUSER_NAME(), SYSTEM_USER, and ORIGINAL_LOGIN() for the server context.  If you’re curious, you could also get things like @@SPID (server session id), SUSER_ID() (server login id), and SESSION_USER (database session user).

ORIGINAL_LOGIN() may be the most potentially interesting, especially if we want to write our trigger with elevated (impersonated) permissions to be able to write to the logging table that we’ve set up to capture its detective-work.  I did not need it for this example, but it’s worth keeping in mind.

why dont you take a seat over there
We’ve been watching you…

The Sting

So we’ve got our evidence table, we’ve got our detective trigger, now we just need a suspect.  Thankfully we can test it out first, to make sure our operation will succeed when the real perp comes along.  We can do this, of course, by impersonation.  Or by using different SSMS query-windows with different logins — your choice.

Our faux-suspect’s login name is DummySuspect.  We map him to the db_datawriter and db_datareader roles in our CSI database — we know the real perp at least has write permission on the table dbo.Victims, otherwise he/she wouldn’t be able to delete those poor victim rows!  And we’re probably the db_owner, which is fine.  Let’s call our own login SergeantX.

Now we can pretend to be DummySuspect and execute a DELETE against CSI.dbo.Victims , and make sure it writes to our auditing table, which we called  aud.Evidence.

Yes, in practice, we’d probably want to put our Evidence table in a separate database, to really ensure those pesky Suspects can’t update it or delete from it, i.e. “cover their tracks” — here, I’ve settled for simply using a different schema, to keep the example workable.  Otherwise we’d have to deal with cross-DB permissions and such, which goes beyond the scope of one little blog post.

Ready?  Let’s try it!

The Proof

Go on over to the GitHub repo and check out the code.  There are 3 easy steps – ‘Step 1’, create the objects, including a new DB to house them called CSI.  You’ll see the trigger in there as well.  Then you can try ‘Step 2’, where I impersonate DummySuspect and delete a row from Victims, and then check the Evidence log when done.  And finally, ‘Step 3’ is a similar test, but assumes that you’ve actually connected that SSMS window/tab/query AS DummySuspect instead of impersonating him (or her!).  After you’ve done that, check out aud.Evidence again to make sure it logged the 2nd delete.

And there you have it.  A simple example of how to write and test an after delete trigger that writes the action info to a separate auditing table for investigation.

Hope you enjoyed!  Leave a comment here or on GitHub; I welcome all feedback.

One more thing…

Apparently I’ve been fork‘d!  Someone on GitHub liked my take on the Nested Set Model so much that they decided to pull it into their own library and have a play with it.  Yay!!  😀   Whoever you are, thank you and good luck building with it.  Enjoy!

someone forked my sql nested set model on github
Mr. Murray, thanks and have fun!