T-SQL Tuesday #104: Code You’d Hate to Live Without

And that’s where we could use a little encouragement, I think — another DBA saying “Yay, it’s not just me!” makes it all worthwhile.

Advertisements

It’s that time of the month again!  Bert‘s fantastic invitation complete with YouTube vid is tempting indeed.  There are so many wonderful community scripts and tools for SQL Server DBAs.  But, in an interesting twist, he specifically asks us about something self-written or self-created.  And I’ll admit, I sometimes have trouble being exceptionally ‘proud’ of my home-brewed stuff.  Sure, I’ve blogged and GitHub‘d, and hopefully those have helped someone along the way.  Most of the time, though, those are specifically suited to a given use-case.

The more difficult code to share, I think, is the stuff that we keep in our “daily-grind” “Get-Stuff-Done” folders.  Scripts that we’ve tweaked and commented sporadically throughout the years, finding this edge-case or that tweak for a given scenario, most of which ends up being pretty environment-specific.  And that’s where we could use a little encouragement, I think — another DBA saying “Hey, it’s not just me!”, or “I knew there was another guy/gal out there who always has to do this kind of thing!”.  Thus, here I will attempt to show off something along those lines and see what you kind folks think.

Where Does it Hurt?

No, I don’t mean the famous wait-stats queries by Paul Randal (those are awesome).  I mean, what are the top few areas of your SQL environment where you’re always questioning something, or being asked to find some answer that’s not immediately obvious?  For me, there are 3.

Replication

Transactional replication is, in a word, ‘brittle’.  It works well when it’s working, and usually you don’t have to think about it.  But when you do need to check on it, or change anything about it (god forbid), or troubleshoot it (more like shoot it, amirite?), it feels a bit like trying to mess with a half-played Jenga stack.  Meaning, you might be just fine, but you might send the whole thing crashing down and have to rebuild it all.

peanut-brittle-from-the-joy-of-baking
As brittle as this stuff. But not nearly as delicious.

I won’t go into the whole troubleshooting aspect here, that’s too much scope.  But there’s a question that can often come up, especially from Devs or Biz-Analysts, and that is: “Hey, is TableX being replicated?”  And rather than subject my poor eyeballs to the replication properties GUI, I just run a little query, which reads from the distribution database and the actual database that’s being published (the ‘publisher’ db), and tells me a list of tables & columns that are being replicated.

Here it is.  Be sure to replace [dbName] with your actual published DB name.  Like StackOverflow or WideWorldImporters, or AdventureWorks <shudder>.

Report Subscriptions (SSRS)

Another question I’m often asked is, “Hey, did ReportX run?”  What they really mean is, “Did ReportX‘s email subscription get sent to BigWigUserY?”  We have an unholy amount of SSRS reports with email subscriptions.  And because I don’t care to bloat my inbox by BCC’ing myself with every single one, I rely on the users to speak up when they don’t receive something they’re expecting.

“This is a terrible idea.”, you say.  “Never trust the users!”

Yes, well, such is life.  If you have a better lazier solution I’m all ears.

So here’s a little script I wrote to query the ReportServer database (which MS will tell you is “officially unsupported”, snore).  If I know some keyword from the report title, or the supposed recipient’s email address, this will tell me if it ran successfully or not — LastRun, LastStatus.  A couple other useful bits: where it lives, ReportPath, and what its SQL Agent Job’s Name is, JobName.

That last bit is peculiar.  The JobName looks like a GUID (because it is, because SSRS just loves GUIDs), but it’s also the actual name of the agent job, which you can use to re-run said job — via exec msdb.dbo.sp_start_job — if the failure wasn’t systemic.  As I often do.

Disk Space

Last but not least, everybody’s favorite topic to try and forget about or pawn-off to the SysAdmins.  “How much space are those databases / data files / log files eating up?”  Well, mister suddenly-cares-about-disk-space-but-is-OK-with-storing-all-domain-users’-iTunes-music-libraries-on-the-central-fileshare-along-with-their-documents-because-that’s-what-we’ve-always-done.  {True story.}  Let me tell you.

keep calm and release the bitter
It’s healthy, I swear!

This script has a lot of comments and commented-out lines because I will tweak it depending on what I need to see.  Sometimes it’s “show me the DBs where the logical filename doesn’t follow my preferred pattern” (the DB name with ‘_data’ or ‘_log’ after it); or perhaps “Only show me files over 5GB with a lot of free space” when I’m itching to shrink something.

“Never shrink your files!  (In production!)”

Yes, thank you, knee-jerk reactionary.  I’m dealing with servers in lower environments, usually, with this one.  😉

What do you think?

I’d love to hear your feedback in the comments!  Happy Tuesday!!  =)

Credit where credit is due.

I did not magically come up with these all by myself.  They’re pieced together from many a StackOverflow answer and/or other community blog post or contribution that I’ve since forgotten.  I usually store a link to the source in these kind of things, when it’s true copy-pasta, but in these cases, I added enough of my own tweaks & style that I no longer tracked the original linkage.  If you see anything that looks familiar, please do tell me where to give kudos and shout-outs!  😀

Quickie: Use of APPLY Operator

There are many great use-cases for the T-SQL APPLY operator, and if you haven’t tried it yet, I encourage you to check out Kevin Feasel’s excellent presentation on it here.  Today, I want to demonstrate a particularly interesting use-case (at least, to me!) that I found for it while tackling a real business problem — for reporting, specifically, which is what I spend a lot of my time dealing with lately.

The Setup

We have a “history of estimated market values” table, which stores, for some arbitrary dates based largely on user input, values (prices) of our proverbial Widgets.  In other words, we can’t guarantee we have every date in there for a given time period.  Over on side B, we have a “sales” table, which tells us when a given Widget was sold and for how much.  The business question, simply enough, is “When a given Widget was sold, how accurate was its ‘estimated market value’ at the time?”  Or, put another way, “How closely did the sale price of that Widget match our estimated market value?”

we don't make widgets
Yeah, yeah, we’re all tired of the Widget business. If only we could come up with a better term for a generic product to use in discussions of business problem samples.

The Tools

I used two interesting bits of TSQL to accomplish this, the main one being our lovely APPLY operator.  OUTER APPLY, in this case, because I still wanted the rows from “sales” even if we couldn’t find a closely-matching “market value” record.

If you haven’t used this operator yet, think of it conceptually like a JOIN to a sub-query, but with the added value that you can reference the outer tables’ columns within it.  CROSS APPLY is like INNER JOIN, and OUTER APPLY is like LEFT JOIN.

Sorry, RIGHT JOIN fans, no love for you here.  You’re just too weird.

My other little trick involved using TOP(1) in the apply’d sub-query, with a non-standard ORDER BY clause.  As I mentioned, within the APPLY‘s body, you can reference the outer table’s columns.  So I’m easily able to compare the SaleDate (outer table) with the EstimateDate (inner query).  I want “the closest EstimateDate to the SaleDate​”, which means I want the row where the difference between those two dates is the smallest.  Which means making use of our friend DATEDIFF.  So let’s try:

ORDER BY DATEDIFF(day, Sales.SaleDate, MktValueHist.EstimateDate)

Do you see the problem yet?  If I get a negative value, i.e. my SaleDate is way before my EstimateDate, say -100 days, that’s the top 1 row.  I don’t want that!  Let’s try our old friend from many many math classes ago, Mr. Absolute Value.

ORDER BY ABS(DATEDIFF(day, Sales.SaleDate, MktValueHist.EstimateDate)

That’s better!  Now I have the top 1 “closest to my sale date” row from MktValueHist.

All Together Now

I’ll put up a Gist with repro/demo code soon.  Here’s the meat of it, as a sample select, for convenience:

SELECT Sales.WidgetID, Sales.WidgetName, Sales.Price, Sales.SaleDate
, mvh.MarketValue, mvh.EstimateDate
, Accuracy = some_made_up_mathy_thing_here
FROM Sales
OUTER APPLY (
SELECT TOP(1) mValHist.MarketValue, mValHist.EstimateDate
FROM MarketValueHistory mValHist
WHERE mValHist.WidgetID = Sales.WidgetID
ORDER BY ABS(DATEDIFF(day, Sales.SaleDate, mValHist.EstimateDate)
) mvh
WHERE Sales.SaleDate >= '20180101'

There, my completely fabricated yet totally-based-in-reality example of how to get 2018’s Widget Sale Prices with corresponding “closest to the sale-date” Market Value Estimate from our history table.  You could even throw in some fancy math expression for “accuracy”, if you felt like it.  Like maybe “relative difference“.  =)

Note: For simplicity, we’re dealing with “whole dates”, i.e. just date.  Not datetime or datetime2.  If your date values do include times, you’d want to change the datediff interval to something more appropriate, like second or millisecond.

cross apply explained vs inner join and correlated-subquery
Forgive the graininess, it was from a Youtube screen-cap.  In an Oracle presentation, of all things. And I feel like they meant to say “Data exposition“, not “explosion”, given the latter can have negative connotation.  But.. hey, it’s better than a poorly drawn stick-figure or an inappropriate meme, right?  RIGHT?

And that’s how APPLY and ABS() saved my bacon today.  Hooray!

Have an interesting use-case for APPLY?  Or perhaps an alternative approach to a similar problem?  I’d love to hear from you!  Drop me a comment, a tweet, or a LinkedIn msg.  Happy summer!


PS: I’d also recommend checking out Steve Stedman’s useful “Join types poster“, if for no other reason than to have something concrete to use when explaining those concepts to your fellow cube-dwellers.  It has cross/outer apply on the 2nd page in their more commonly used pattern, the TVF (table valued function) relationship.

PPS: Fine, have your meme…

apply yourself, from breaking bad
I’ve never seen the show, so I don’t get it, but it was one of the first Google Image results for “apply yourself meme”. Enjoy that.

A SQL “Whodunnit” Trigger

Triggers aren’t bad, if used for the right reasons.. Here we look at an “audit-trail” use-case.

Inspired by a brief conversation in the #CodingBlocks community Slack: A short discussion and example of a “who-dunnit” (“who done it”, a colloquialism for a murder-mystery type thing) trigger, to find how what user is doing deletions against a certain table.

the cast of CSI LV
The original, or nothing.

The Background Check

Let’s name our hypothetical database CSI.  In it, we have a table, dbo.Victims, where it seems like data is being randomly deleted at random times.  As we all know, this is impossible — computers never do anything truly randomly, much less RDBMSes.

Insert witty counter-example here.  You know you have one.

So we want to find out who’s doing these deletions.  One DBA says, “Hey I got an idea… Let’s put an after delete trigger on the table!”  Another DBA says “I abhor triggers; let’s log sp_WhoIsActive every 5 seconds to try to catch the suspect ‘in-the-act’.”

Both approaches have their merits, and neither is that uncommon.  However, the latter is much more regularly blogged about, so I’m going to present the former, because it kinda helped remind me of a few things that I hadn’t used in a while.  I’d also argue that the latter is much less of a “guaranteed capture”, since you’re gambling pretty liberally on the fact that the delete transaction will even last that long; it’s statistically more likely that you’ll miss it.

The Setup

Here’s a SQL snippet that shows a basic after delete trigger created on our dbo.Victims table.  Notice the use of the special Deleted table reference — this is a “temporary, memory-resident” table according to the Docs, and it holds all the records that were/are-about-to-be deleted from the target table.

I feel like it used be called a “temporal table”, but that now refers to a new feature in 2016, where SQL keeps a hidden history-tracking copy of your table that you can reference like a time-machine; which, incidentally, almost* negates the need for such things as these triggers we’re talking about, but that’s another topic for another time.

*(The ‘almost’ is because temporal tables don’t tell you “WHO”, which is our primary motivator here.)

The interesting bits are how we identify our suspect, our ‘killer’ if you will.  See, we not only want to know who they are in the database context, we also (and likely, more importantly) want to know who they are at the server level context.  And just in case they’re impersonating another login, we want to check that too.

So we actually have a lot of options here.  There’s CURRENT_USER or USER_NAME(), for the DB context user.  Then we have SUSER_SNAME(), SUSER_NAME(), SYSTEM_USER, and ORIGINAL_LOGIN() for the server context.  If you’re curious, you could also get things like @@SPID (server session id), SUSER_ID() (server login id), and SESSION_USER (database session user).

ORIGINAL_LOGIN() may be the most potentially interesting, especially if we want to write our trigger with elevated (impersonated) permissions to be able to write to the logging table that we’ve set up to capture its detective-work.  I did not need it for this example, but it’s worth keeping in mind.

why dont you take a seat over there
We’ve been watching you…

The Sting

So we’ve got our evidence table, we’ve got our detective trigger, now we just need a suspect.  Thankfully we can test it out first, to make sure our operation will succeed when the real perp comes along.  We can do this, of course, by impersonation.  Or by using different SSMS query-windows with different logins — your choice.

Our faux-suspect’s login name is DummySuspect.  We map him to the db_datawriter and db_datareader roles in our CSI database — we know the real perp at least has write permission on the table dbo.Victims, otherwise he/she wouldn’t be able to delete those poor victim rows!  And we’re probably the db_owner, which is fine.  Let’s call our own login SergeantX.

Now we can pretend to be DummySuspect and execute a DELETE against CSI.dbo.Victims , and make sure it writes to our auditing table, which we called  aud.Evidence.

Yes, in practice, we’d probably want to put our Evidence table in a separate database, to really ensure those pesky Suspects can’t update it or delete from it, i.e. “cover their tracks” — here, I’ve settled for simply using a different schema, to keep the example workable.  Otherwise we’d have to deal with cross-DB permissions and such, which goes beyond the scope of one little blog post.

Ready?  Let’s try it!

The Proof

Go on over to the GitHub repo and check out the code.  There are 3 easy steps – ‘Step 1’, create the objects, including a new DB to house them called CSI.  You’ll see the trigger in there as well.  Then you can try ‘Step 2’, where I impersonate DummySuspect and delete a row from Victims, and then check the Evidence log when done.  And finally, ‘Step 3’ is a similar test, but assumes that you’ve actually connected that SSMS window/tab/query AS DummySuspect instead of impersonating him (or her!).  After you’ve done that, check out aud.Evidence again to make sure it logged the 2nd delete.

And there you have it.  A simple example of how to write and test an after delete trigger that writes the action info to a separate auditing table for investigation.

Hope you enjoyed!  Leave a comment here or on GitHub; I welcome all feedback.

One more thing…

Apparently I’ve been fork‘d!  Someone on GitHub liked my take on the Nested Set Model so much that they decided to pull it into their own library and have a play with it.  Yay!!  😀   Whoever you are, thank you and good luck building with it.  Enjoy!

someone forked my sql nested set model on github
Mr. Murray, thanks and have fun!

T-SQL Tuesday #99: Counting Rows the Less-Hard-Way

We can get our row count, and min & max date values, without ever touching the actual source table!

This month’s invite courtesy of Aaron Bertrand (B | T), whose “bad habits” blog series still inspires many an impassioned debate or engaging argument discussion on a regular basis among DBAs & Developers alike.

And yes, I’m taking the easier of the two “dealer’s choice” choices — the SQL focused one.  (I’m not big on sharing/blogging personal stuff, at least not here; I may one day start another blog for that, or perhaps just occasionally post more #off-topic stuff , but for now you’ll have to be content with my stories of vehicle troubles and the occasional movie-geekery).

So, without further ado…

By the way, what is ‘ado’ and why should there be no further of it?

art-vandelay-importer-exporter
Accidentally apropos on many levels…

Counting Rows in Really Big Tables

Previously touched on here, tables of unusual size (TOUSes) can be tricky.  You don’t want to lock them up for a long period of time, but you often need to gather information about them (such as row count, size, range of values) to perform some kind of operational maintenance with/on them.  In particular, Aaron’s post on “counting rows the hard way” inspired me to look into this a bit more and try to come up with a clever-ish way of finding out some basic “shape of data” info without actually querying (scanning) the table itself.

To start with, it’s actually really simple to get the total row-count from a few system catalog views — Aaron’s already shown you that, so I won’t repeat.  My interest is more in questions like “How many rows match a where-clause?” or “What are the min & max values for thatColumn?”

For this post, I’ll be focusing on a particular kind of table — the “history” or “transaction” table.  The idea here is that you have a record of “every time some event happens in/to some entity”.  A very common example is audit-trail tables, which I’ve been dealing a lot with lately.  Another common example is a “transaction history” table, such as, in our new favorite MSSQL demo database WideWorldImporters, the table Warehouse.StockItemTransaction​.  It’s the 2nd largest table in the db at 260-some-thousand rows.  {The largest is a multi-million-row beast that is actually the system-versioned aka temporal table behind a “normal” table; I might build a phase-2 example around this, but not today.}  So, while our queries won’t be super slow, we’ll get enough of an idea of what’s bad & good from measuring our IO stats (with SET STATISTICS IO ON).

TL;DR: The demo script is available here; the headers below correspond to the comment-lines of the same name, but I’ve left enough commentary in the SQL itself to keep the average reader on-track, so feel free to check it out ahead of time.  But do keep reading at some point!  :o)

A. Gathering Some Intel

First up, we have good ol’ sys.sp_spaceused.  This gives you some sizing info about the table, including its row count.  There’s a “disclaimer” circulating around out there that it’s not “up to the millisecond accurate” , i.e. it might not have the most current row count if someone else is in the middle of an insert operation or whatnot.  But for all intents & purposes, you can consider it truth.

Then you have the “hard ways” that people typically use — and that Aaron, again, covered just fine in his post on the subject, so I won’t spend any more time there.

But what if I want to count rows matching a where clause (a predicate)?  And in dealing with a typical history/transaction table, the predicate is almost always “between such and such dates”.  (Of course we won’t literally use the between operator, as we have been chastised severely; we know best to use >= and < !)  Also, I want to know the MIN and MAX of said dates in the table.  Lord knows we’re gonna be doing some table-scanning.

B. Ok, Let’s Try an Index

In their benevolent wisdom, the SQL deities decided not to give us an index on WideWorldImporters.Warehouse.StockItemTransactions.TransactionOccurredWhen.

BTW, how’s that for a verbose column name?  What, TransactionDate not good enough?  I suppose it isdatetime2 after all, but still…

So we create an index on it, to see if that helps our poor “count where dates” query.  And behold, it does!  We’ve cut our # of logical reads down by about 90% (from 1900 to 200, if you’re following along in the script).  That’s fantastic, but… we can do better.  Because if the table is, say, 500 million rows instead of 260k, that’s about 400,000 logical reads, which.. could definitely suck.

C. The Better Way

Again, the script has an ode to Aaron’s query on sys.partitions/tables to get the row-count from the meta-data.  Then the real fun begins.

There’s a system DMV (or probably ‘DMF‘ – dynamic management function) called sys.dm_db_stats_histogram, which takes the table’s object_id and the index’s index_id as arguments.  It gives you, obviously enough, the statistics histogram of the statistics object corresponding to that index.  We want to store its output in a temp-table (or even a real table — go nuts!) so we can query it some more.

--For example, if our new index is index_id 7:
sys.dm_db_stats_histogram(OBJECT_ID('Warehouse.StockItemTransactions'), 7)

So we create our #StatsHist table (“hist” being an abbreviation for “histogram”, not “history”, though in retrospect that’s probably not worth the possible confusion), and we populate it with the meta-data from Warehouse.StockItemTransactions and its new index that we just created (on TransactionOccurredWhen).  Poof!  We have an easy way of showing min/max values in that column!  Well… almost.  We have to convert the variant datatype to an understandable & aggregate-able (probably a made-up word.. aggregable? aggregatable?) type.  So we add a column range_hk_proper of type datetime2 and populate it with the converted values of range_high_key from the stats-output.

There!  Now we’re cookin’ with gas.  Our min/max/count query, and our “count where date-range” query, run in mere milliseconds, without ever touching the actual source table.  So we don’t lock it up or block anybody else from writing to it, even in the most pessimistic isolation levels.

Except when you created that index we needed on the date column.

Yes, I know.  What we’re hoping is that the tables we deal with in the “real world” already have such an index that we can take advantage of.  If not, well, that’s what maintenance windows are for.  And you better believe you’re gonna need that index sooner or later.

cooking-with-mustard-gas
Been a while since I used a Family Guy meme….

Where To Next?

Ostensibly, this whole thing could probably be turned into a stored-proc so you could run it “on demand” for any table that had a date or datetime column which you wanted to get such information about.  It’d have to do a lot of error-checking, of course — it wouldn’t work if you don’t have such a column, and if there’s no index on it, and probably a myriad of other ‘gotchas’ that I’m not thinking of at the moment.  But I did try to lay the groundwork for improvement. #StatsHist stores schema & table name too, so if you felt like turning it into a mini-data-warehouse holding a BUNCH of stat-histograms for a whole mess of tables, you could definitely do that.  And then you could run some basic analytics on it — min/max/avg, counts by year/month/day, etc.

Sounds like fun, no?  ;o)

T-SQL Tuesday #98: Orphaned Users Redux

It’s that time again!  The first #Tsql2sday of 2018.  Thanks to the Blobeater for this month’s invite: “your technical challenges conquered”.

Because I’m already ridiculously late, I have a short one.  This is about orphaned users — you know, when you restore a database and its users aren’t mapped to the server logins that they should be or used to be.

orphan-movie-poster
Not that kind of orphan… pretty decent movie tho!

The typical solution is sp_change_users_login with the auto_fix or update_one option.  But guess what?  Yep, that’s deprecated.  By the way, did you know that it also has a report option?  Apparently that’s got some bugs…ish?  Try it sometime and see — compare the output of sys.sp_helpuser where the ‘LoginName’ column is null, with sp_change_users_login 'report'.  Preferably on a DB you’ve restored from another server.  😉

So what’s the correct solution?  ALTER USER [theUser] WITH LOGIN = [theLogin].  Simple, no?  Let’s get more general.  Could we come up with a half-decent way do apply this kind of fix dynamically?  Well sure, the nice folks at DBATools have already solved that problem.  And that’s great, really.  But just in case that doesn’t work… ^_^

One of the many things I love about SQL Prompt is the right-click option to “Script as INSERT” (from the results grid).  This is a quick & easy way to built a temp-table for the results of an exec statement so you can do the ol’ insert #tmp exec sys.sp_blah !  Then we can query the list of DB users for the null LoginNames and write a little set of queries to fix them!  Sound good?

UPDATE: Behold the code!

Happy Tuesday!

PS: Coincidentally, today’s (Thursday) SQL Server Central newsletter featured a very similar post by a gentleman over at Madeira Data.  Go check it out, it’s another great solution to this problem!  And while you’re at it, get the SQL Server Radio podcast (created by a couple guys from the same company) – it’s a terrific addition to your iTunes library.

Finding Trigger Dependencies

Preamble

In SQL, TRIGGERs are often a subject of contention.  Some data professionals have strong opinions about them, especially on the “they’re bad!” side.  But like anything, they’re a specific kind of tool that has a place in a specific problem.  One of the common scenarios for their usage is what we call an “Audit trail” — a running log of any & all changes made to a table or data-set (or even a database schema, as RedGate DLM Dashboard employs).

Yes, there are newer product features built around this very requirement, such as Change Data Capture (CDC) and Temporal Tables, but they require particular versions and editions of the product, and not every environment is going to be there.  So we need something native and compatible to the lowest common/supportable denominator.

Like most areas of the database environment, triggers are often poorly documented.  So what happens when we need to review all the triggers in a database and show what tables they’re on & what columns they’re “tracking” (sticking with the audit-trail example)?  Or, more generally, you could say, we want to find all table & column dependencies of our triggers.  Well sure, we could use the SSMS Object Explorer GUI — find the table, the triggers, right-click and “Show Dependencies”.. but that’s not good enough, for two reasons: A) it doesn’t get down to the column level, and B) it’s not script-able.

System catalogs and DMVs to the rescue!

dmv-line-of-people
No, not THAT DMV…

Disclaimer: this is heavy on MS Docs links.  There was a StackOverflow Q/A that helped point me in that direction, but sadly I lost the tab and didn’t save the link.

The Goals

In this script, I mostly focus on the UPDATE triggers, because they’re the most heavy-handed in terms of auditing.  Whereas with a typical INSERT or DELETE, the triggered audit action is basically “Hey, a row was inserted/deleted, here’s its primary key” — and sure you can get fancier and log what all the values were in the entire row — those are fairly straightforward.  It’s basically “all or nothing”.  In contrast, with an UPDATE audit, we want to know just what was updated, and from what (old value) to what (new value).

So the goal of documenting the dependencies is to see which columns we care about — because, inevitably, there are some columns that we either “don’t care about” for auditing, or some that just plain don’t change (by their nature — they’re an identity or a primary key, or controlled via other means).  In turn, this helps us understand how to reproduce these audit mechanisms when a better technology comes along — such as CDC or temporal tables, as alluded to earlier.

cdc-centers-for-disease-control
No, not THAT CDC… yeesh.

The Walkthru

We start with the system catalog views sys.tables and sys.triggers.  This establishes our primary relationship — a trigger ‘lives’ on (is attached to) a single table.  Then we immediately step into murky water.  A trigger, like other programmable objects, has a definition script, which you can view in a myriad of ways — Object Explorer, sys.sp_helptext, sys.sql_modules, sys.syscomments (deprecated), or OBJECT_DEFINITION(OBJECT_ID).  So if we really wanted to go spelunking, we could just read the trigger’s definition and see what columns it deals with.  But after about 2 or 3 of those, our eyes are going to glaze over and our fingers will start to twitch.  We need something better, something that can handle hundreds of triggers and output a nice unified result!

Oh wait, what’s this?  Object Explorer has a “View Dependencies” function.. aw, bummer, it doesn’t show columns.  Just tables.  As we know, most of the GUI functionality is driven by existing system meta-data/catalogs/dmv’s, so there’s definitely something that keeps track of these dependencies.  Let’s hazard a guess… sys.sp_depends or sys.sql_expression_dependencies?  The former is deprecated, the latter doesn’t give us a way to get to columns either.  But thankfully, on the docs, the deprecated note points us toward the correct path: sys.dm_sql_referenced_entities and sys.dm_sql_referencing_entities.

Now we’re cooking with gas!  So how do I use these suckers?  Well, there’s a couple intermediary joins I need to go thru, and in fact these are functions, not views, so we need to know the proper arguments.  But fear not, I have done the hard work for you!  Check out the code and let me know what you think.

But wait, there’s more!  There’s a slight monkey-wrench here.  You see, in the column dependencies, we’re getting both the “audited” columns and the “key” columns.  But we’d like those to be distinguished from each other.  When we document and/or replicate the auditing using newer technology, the primary keys aren’t what we need to track (they’re in a sense “given”, known) — we need to know which columns are “audit-able” and which aren’t.  Hence the two queries with a UNION ALL.

union-vs-confederate
That is ALSO inappropriate. Good lord, who’s choosing these pictures today?

Conclusion

So there you have it, documenting some audit-trail triggers.  Or triggers in general.  Complete with the tables & columns on which they depend.  Enjoy!  ‘Til next time folks.

Indexing a HUGE Table

How it lost its original clustering key is a perfect topic for this month’s T-SQL Tuesday!

tsql2sday150x150

As previously referenced, we had a half-billion row table (which we’ll “round up” to a billion, just for the sake of argument) that needed an index.  A clustering key, to be exact.  How it lost its original clustering key isn’t that interesting is a perfect topic for this month’s T-SQL Tuesday (we planned to replace it, before realizing how long it would take, and had to cancel the job after it was removed but before it could finish adding the new one).  Anybody can drop an index; it takes mere milliseconds.  But to create (or rebuild) an index, the SQL engine needs to touch every single row that index will include, and of course for a clustered index, that’s errverybody.

So the “Lessons learned the hard way” are:

When planning an index replacement for a billion-row table, don’t just schedule the job as “drop old index, then create new index“, and expect nothing to go wrong.

..and..

Don’t try to index a billion-row table all at once.  Use a smarter methodology.  Unless you can literally afford to have that table be offline for several hours (depending on your hardware, and assuming you’re on Standard Edition.

Of course, some of you crazy kids with Expensive Edition are scoffing and thinking “Oh that’s nothin’!”, with your billions of rows and online index rebuilds.  Well you can go back to your fancy Always Encrypted data and your terabytes of RAM and just pretend I said “trillion” instead of “billion” — maybe that’ll fit your scale a bit better.  But for the rest of us mere mortals…

one billion records dr evil
it’s yuuge, I’m telling you…

Anyway, since we’re peons, we can’t just go create the billion-row index without taking the table offline.  So we need to come up with a way to minimize that downtime for the table, and avoid causing excessive blocking or I/O overload on the instance.  As I said before, SSIS to the rescue!

The Plan

i love it when a plan comes together
classic… plain and simple.

I call this the “setup, dump, & swap”.  Essentially we need to create an empty copy of the table, with the desired index(es), dump all the data into it, and then swap it in.  There are couple ways you can do this, but it boils down to the same basic premise: It’s “better” (probably not in terms of speed, but definitely in terms of efficiency and overhead) to fill this new copy of the table & its indexes, than it is to build the desired index on the existing table.

So here’s an outline:

  1. Script out the table (SSMS, right-click, script table, create to… or, if you use one of these cool extensions, F12 or similar “get definition” shortcut) — say, if the original is MyTable, script-create & replace MyTable with MyTableCopy
  2. Here’s a little room for choice.  You could create the table in the same schema with a new name; or, you could create the table in a different schema, with the same name or another name.  This will determine how you do the “swap” toward the end.
    • In the first case, we’d use sp_rename
    • In the 2nd, we’d use alter schema transfer
    • Both are essentially meta-data changes, but the latter could be potentially take a hair longer just because it pulls more strings; whereas the former requires taking care of dependencies ahead of time so you don’t break a schema-bound view or orphan a foreign key.
  3. Add the desired index(es), e.g. create clustered index CX_MyTableCopy_Datestamp_ThingName on dbo.MyTableCopy (Datestamp, ThingName)
  4. Build the SSIS task to copy the data from MyTable to MyTableCopy
  5. Schedule said task via SQL Agent.
  6. Use the previous tip to monitor its progress and estimate time to completion!
  7. Once done, prepare to swap!
    • Again, use sp_rename, and if needed, alter schema transfer.
    • Likely, this will involve several renames – the constraints and other indexes can’t be named the same either, so get all that stuff renamed with an _old suffix first, then you can swap the actual tables.
  8. Clean up after yourself and drop the old table once you verify everything’s working well and all dependencies are accounted for.

But Why?

can-you-tell-me-why
Fry shares your curiosity…

Let’s back up a bit.  Why are we doing it this way?  More importantly, what are the general use-cases for this kind of thing?  Well as I said, it’s “better” in certain ways, than simply creating the index on the “live” T.O.U.S.  It avoids locking said live table, and it has the potential to put less stress on the database and less synchronous I/O against its primary files.  Finally, and most importantly, as a bulk insert operation, with configurable batch sizing, the SSIS task will put exponentially less load on the transaction log.  Whereas, with the regular inline index creation, it could easily fill up the Tlog and will definitely cause performance issues.

Furthermore, swapping tables, or in more advanced cases, partitions, is a fantastic way to move massive amounts of data around with minimal impact.  It also happens to be quite helpful when we have to deal with indexing such massive amounts of data.

In my case, this was actually an archive of 2016 audit-trail data, so if I’d have chose, I could have taken it out of the partitioned view it was part of, and dealt with it being offline for several hours.  But because I’m a sucker for a good problem, and because I wanted to be able to semi-accurately monitor the progress to let the boss-man know when it might be done, I chose to take this route instead.  It’ll come in handy again soon, I’m sure — we have several other jumbo-tron tables laying about that may need some index tuning and/or partitioning.

So, happy swapping, and don’t let those T.O.U.S.‘s get the best of you!

rodent of unusual size
Meh, I don’t think they even exist!

 

PS: thanks to @SQLDoubleG for hosting, and thanks in advance to the rest of the community being lenient of my “submission” being a re-purposed existing post with a few extra blurbs thrown in to bring it on-topic!  =D