A SQL “Whodunnit” Trigger

Triggers aren’t bad, if used for the right reasons.. Here we look at an “audit-trail” use-case.

Advertisements

Inspired by a brief conversation in the #CodingBlocks community Slack: A short discussion and example of a “who-dunnit” (“who done it”, a colloquialism for a murder-mystery type thing) trigger, to find how what user is doing deletions against a certain table.

the cast of CSI LV
The original, or nothing.

The Background Check

Let’s name our hypothetical database CSI.  In it, we have a table, dbo.Victims, where it seems like data is being randomly deleted at random times.  As we all know, this is impossible — computers never do anything truly randomly, much less RDBMSes.

Insert witty counter-example here.  You know you have one.

So we want to find out who’s doing these deletions.  One DBA says, “Hey I got an idea… Let’s put an after delete trigger on the table!”  Another DBA says “I abhor triggers; let’s log sp_WhoIsActive every 5 seconds to try to catch the suspect ‘in-the-act’.”

Both approaches have their merits, and neither is that uncommon.  However, the latter is much more regularly blogged about, so I’m going to present the former, because it kinda helped remind me of a few things that I hadn’t used in a while.  I’d also argue that the latter is much less of a “guaranteed capture”, since you’re gambling pretty liberally on the fact that the delete transaction will even last that long; it’s statistically more likely that you’ll miss it.

The Setup

Here’s a SQL snippet that shows a basic after delete trigger created on our dbo.Victims table.  Notice the use of the special Deleted table reference — this is a “temporary, memory-resident” table according to the Docs, and it holds all the records that were/are-about-to-be deleted from the target table.

I feel like it used be called a “temporal table”, but that now refers to a new feature in 2016, where SQL keeps a hidden history-tracking copy of your table that you can reference like a time-machine; which, incidentally, almost* negates the need for such things as these triggers we’re talking about, but that’s another topic for another time.

*(The ‘almost’ is because temporal tables don’t tell you “WHO”, which is our primary motivator here.)

The interesting bits are how we identify our suspect, our ‘killer’ if you will.  See, we not only want to know who they are in the database context, we also (and likely, more importantly) want to know who they are at the server level context.  And just in case they’re impersonating another login, we want to check that too.

So we actually have a lot of options here.  There’s CURRENT_USER or USER_NAME(), for the DB context user.  Then we have SUSER_SNAME(), SUSER_NAME(), SYSTEM_USER, and ORIGINAL_LOGIN() for the server context.  If you’re curious, you could also get things like @@SPID (server session id), SUSER_ID() (server login id), and SESSION_USER (database session user).

ORIGINAL_LOGIN() may be the most potentially interesting, especially if we want to write our trigger with elevated (impersonated) permissions to be able to write to the logging table that we’ve set up to capture its detective-work.  I did not need it for this example, but it’s worth keeping in mind.

why dont you take a seat over there
We’ve been watching you…

The Sting

So we’ve got our evidence table, we’ve got our detective trigger, now we just need a suspect.  Thankfully we can test it out first, to make sure our operation will succeed when the real perp comes along.  We can do this, of course, by impersonation.  Or by using different SSMS query-windows with different logins — your choice.

Our faux-suspect’s login name is DummySuspect.  We map him to the db_datawriter and db_datareader roles in our CSI database — we know the real perp at least has write permission on the table dbo.Victims, otherwise he/she wouldn’t be able to delete those poor victim rows!  And we’re probably the db_owner, which is fine.  Let’s call our own login SergeantX.

Now we can pretend to be DummySuspect and execute a DELETE against CSI.dbo.Victims , and make sure it writes to our auditing table, which we called  aud.Evidence.

Yes, in practice, we’d probably want to put our Evidence table in a separate database, to really ensure those pesky Suspects can’t update it or delete from it, i.e. “cover their tracks” — here, I’ve settled for simply using a different schema, to keep the example workable.  Otherwise we’d have to deal with cross-DB permissions and such, which goes beyond the scope of one little blog post.

Ready?  Let’s try it!

The Proof

Go on over to the GitHub repo and check out the code.  There are 3 easy steps – ‘Step 1’, create the objects, including a new DB to house them called CSI.  You’ll see the trigger in there as well.  Then you can try ‘Step 2’, where I impersonate DummySuspect and delete a row from Victims, and then check the Evidence log when done.  And finally, ‘Step 3’ is a similar test, but assumes that you’ve actually connected that SSMS window/tab/query AS DummySuspect instead of impersonating him (or her!).  After you’ve done that, check out aud.Evidence again to make sure it logged the 2nd delete.

And there you have it.  A simple example of how to write and test an after delete trigger that writes the action info to a separate auditing table for investigation.

Hope you enjoyed!  Leave a comment here or on GitHub; I welcome all feedback.

One more thing…

Apparently I’ve been fork‘d!  Someone on GitHub liked my take on the Nested Set Model so much that they decided to pull it into their own library and have a play with it.  Yay!!  😀   Whoever you are, thank you and good luck building with it.  Enjoy!

someone forked my sql nested set model on github
Mr. Murray, thanks and have fun!

T-SQL Tuesday #99: Counting Rows the Less-Hard-Way

We can get our row count, and min & max date values, without ever touching the actual source table!

This month’s invite courtesy of Aaron Bertrand (B | T), whose “bad habits” blog series still inspires many an impassioned debate or engaging argument discussion on a regular basis among DBAs & Developers alike.

And yes, I’m taking the easier of the two “dealer’s choice” choices — the SQL focused one.  (I’m not big on sharing/blogging personal stuff, at least not here; I may one day start another blog for that, or perhaps just occasionally post more #off-topic stuff , but for now you’ll have to be content with my stories of vehicle troubles and the occasional movie-geekery).

So, without further ado…

By the way, what is ‘ado’ and why should there be no further of it?

art-vandelay-importer-exporter
Accidentally apropos on many levels…

Counting Rows in Really Big Tables

Previously touched on here, tables of unusual size (TOUSes) can be tricky.  You don’t want to lock them up for a long period of time, but you often need to gather information about them (such as row count, size, range of values) to perform some kind of operational maintenance with/on them.  In particular, Aaron’s post on “counting rows the hard way” inspired me to look into this a bit more and try to come up with a clever-ish way of finding out some basic “shape of data” info without actually querying (scanning) the table itself.

To start with, it’s actually really simple to get the total row-count from a few system catalog views — Aaron’s already shown you that, so I won’t repeat.  My interest is more in questions like “How many rows match a where-clause?” or “What are the min & max values for thatColumn?”

For this post, I’ll be focusing on a particular kind of table — the “history” or “transaction” table.  The idea here is that you have a record of “every time some event happens in/to some entity”.  A very common example is audit-trail tables, which I’ve been dealing a lot with lately.  Another common example is a “transaction history” table, such as, in our new favorite MSSQL demo database WideWorldImporters, the table Warehouse.StockItemTransaction​.  It’s the 2nd largest table in the db at 260-some-thousand rows.  {The largest is a multi-million-row beast that is actually the system-versioned aka temporal table behind a “normal” table; I might build a phase-2 example around this, but not today.}  So, while our queries won’t be super slow, we’ll get enough of an idea of what’s bad & good from measuring our IO stats (with SET STATISTICS IO ON).

TL;DR: The demo script is available here; the headers below correspond to the comment-lines of the same name, but I’ve left enough commentary in the SQL itself to keep the average reader on-track, so feel free to check it out ahead of time.  But do keep reading at some point!  :o)

A. Gathering Some Intel

First up, we have good ol’ sys.sp_spaceused.  This gives you some sizing info about the table, including its row count.  There’s a “disclaimer” circulating around out there that it’s not “up to the millisecond accurate” , i.e. it might not have the most current row count if someone else is in the middle of an insert operation or whatnot.  But for all intents & purposes, you can consider it truth.

Then you have the “hard ways” that people typically use — and that Aaron, again, covered just fine in his post on the subject, so I won’t spend any more time there.

But what if I want to count rows matching a where clause (a predicate)?  And in dealing with a typical history/transaction table, the predicate is almost always “between such and such dates”.  (Of course we won’t literally use the between operator, as we have been chastised severely; we know best to use >= and < !)  Also, I want to know the MIN and MAX of said dates in the table.  Lord knows we’re gonna be doing some table-scanning.

B. Ok, Let’s Try an Index

In their benevolent wisdom, the SQL deities decided not to give us an index on WideWorldImporters.Warehouse.StockItemTransactions.TransactionOccurredWhen.

BTW, how’s that for a verbose column name?  What, TransactionDate not good enough?  I suppose it isdatetime2 after all, but still…

So we create an index on it, to see if that helps our poor “count where dates” query.  And behold, it does!  We’ve cut our # of logical reads down by about 90% (from 1900 to 200, if you’re following along in the script).  That’s fantastic, but… we can do better.  Because if the table is, say, 500 million rows instead of 260k, that’s about 400,000 logical reads, which.. could definitely suck.

C. The Better Way

Again, the script has an ode to Aaron’s query on sys.partitions/tables to get the row-count from the meta-data.  Then the real fun begins.

There’s a system DMV (or probably ‘DMF‘ – dynamic management function) called sys.dm_db_stats_histogram, which takes the table’s object_id and the index’s index_id as arguments.  It gives you, obviously enough, the statistics histogram of the statistics object corresponding to that index.  We want to store its output in a temp-table (or even a real table — go nuts!) so we can query it some more.

--For example, if our new index is index_id 7:
sys.dm_db_stats_histogram(OBJECT_ID('Warehouse.StockItemTransactions'), 7)

So we create our #StatsHist table (“hist” being an abbreviation for “histogram”, not “history”, though in retrospect that’s probably not worth the possible confusion), and we populate it with the meta-data from Warehouse.StockItemTransactions and its new index that we just created (on TransactionOccurredWhen).  Poof!  We have an easy way of showing min/max values in that column!  Well… almost.  We have to convert the variant datatype to an understandable & aggregate-able (probably a made-up word.. aggregable? aggregatable?) type.  So we add a column range_hk_proper of type datetime2 and populate it with the converted values of range_high_key from the stats-output.

There!  Now we’re cookin’ with gas.  Our min/max/count query, and our “count where date-range” query, run in mere milliseconds, without ever touching the actual source table.  So we don’t lock it up or block anybody else from writing to it, even in the most pessimistic isolation levels.

Except when you created that index we needed on the date column.

Yes, I know.  What we’re hoping is that the tables we deal with in the “real world” already have such an index that we can take advantage of.  If not, well, that’s what maintenance windows are for.  And you better believe you’re gonna need that index sooner or later.

cooking-with-mustard-gas
Been a while since I used a Family Guy meme….

Where To Next?

Ostensibly, this whole thing could probably be turned into a stored-proc so you could run it “on demand” for any table that had a date or datetime column which you wanted to get such information about.  It’d have to do a lot of error-checking, of course — it wouldn’t work if you don’t have such a column, and if there’s no index on it, and probably a myriad of other ‘gotchas’ that I’m not thinking of at the moment.  But I did try to lay the groundwork for improvement. #StatsHist stores schema & table name too, so if you felt like turning it into a mini-data-warehouse holding a BUNCH of stat-histograms for a whole mess of tables, you could definitely do that.  And then you could run some basic analytics on it — min/max/avg, counts by year/month/day, etc.

Sounds like fun, no?  ;o)

T-SQL Tuesday #98: Orphaned Users Redux

It’s that time again!  The first #Tsql2sday of 2018.  Thanks to the Blobeater for this month’s invite: “your technical challenges conquered”.

Because I’m already ridiculously late, I have a short one.  This is about orphaned users — you know, when you restore a database and its users aren’t mapped to the server logins that they should be or used to be.

orphan-movie-poster
Not that kind of orphan… pretty decent movie tho!

The typical solution is sp_change_users_login with the auto_fix or update_one option.  But guess what?  Yep, that’s deprecated.  By the way, did you know that it also has a report option?  Apparently that’s got some bugs…ish?  Try it sometime and see — compare the output of sys.sp_helpuser where the ‘LoginName’ column is null, with sp_change_users_login 'report'.  Preferably on a DB you’ve restored from another server.  😉

So what’s the correct solution?  ALTER USER [theUser] WITH LOGIN = [theLogin].  Simple, no?  Let’s get more general.  Could we come up with a half-decent way do apply this kind of fix dynamically?  Well sure, the nice folks at DBATools have already solved that problem.  And that’s great, really.  But just in case that doesn’t work… ^_^

One of the many things I love about SQL Prompt is the right-click option to “Script as INSERT” (from the results grid).  This is a quick & easy way to built a temp-table for the results of an exec statement so you can do the ol’ insert #tmp exec sys.sp_blah !  Then we can query the list of DB users for the null LoginNames and write a little set of queries to fix them!  Sound good?

UPDATE: Behold the code!

Happy Tuesday!

PS: Coincidentally, today’s (Thursday) SQL Server Central newsletter featured a very similar post by a gentleman over at Madeira Data.  Go check it out, it’s another great solution to this problem!  And while you’re at it, get the SQL Server Radio podcast (created by a couple guys from the same company) – it’s a terrific addition to your iTunes library.

Finding Trigger Dependencies

Preamble

In SQL, TRIGGERs are often a subject of contention.  Some data professionals have strong opinions about them, especially on the “they’re bad!” side.  But like anything, they’re a specific kind of tool that has a place in a specific problem.  One of the common scenarios for their usage is what we call an “Audit trail” — a running log of any & all changes made to a table or data-set (or even a database schema, as RedGate DLM Dashboard employs).

Yes, there are newer product features built around this very requirement, such as Change Data Capture (CDC) and Temporal Tables, but they require particular versions and editions of the product, and not every environment is going to be there.  So we need something native and compatible to the lowest common/supportable denominator.

Like most areas of the database environment, triggers are often poorly documented.  So what happens when we need to review all the triggers in a database and show what tables they’re on & what columns they’re “tracking” (sticking with the audit-trail example)?  Or, more generally, you could say, we want to find all table & column dependencies of our triggers.  Well sure, we could use the SSMS Object Explorer GUI — find the table, the triggers, right-click and “Show Dependencies”.. but that’s not good enough, for two reasons: A) it doesn’t get down to the column level, and B) it’s not script-able.

System catalogs and DMVs to the rescue!

dmv-line-of-people
No, not THAT DMV…

Disclaimer: this is heavy on MS Docs links.  There was a StackOverflow Q/A that helped point me in that direction, but sadly I lost the tab and didn’t save the link.

The Goals

In this script, I mostly focus on the UPDATE triggers, because they’re the most heavy-handed in terms of auditing.  Whereas with a typical INSERT or DELETE, the triggered audit action is basically “Hey, a row was inserted/deleted, here’s its primary key” — and sure you can get fancier and log what all the values were in the entire row — those are fairly straightforward.  It’s basically “all or nothing”.  In contrast, with an UPDATE audit, we want to know just what was updated, and from what (old value) to what (new value).

So the goal of documenting the dependencies is to see which columns we care about — because, inevitably, there are some columns that we either “don’t care about” for auditing, or some that just plain don’t change (by their nature — they’re an identity or a primary key, or controlled via other means).  In turn, this helps us understand how to reproduce these audit mechanisms when a better technology comes along — such as CDC or temporal tables, as alluded to earlier.

cdc-centers-for-disease-control
No, not THAT CDC… yeesh.

The Walkthru

We start with the system catalog views sys.tables and sys.triggers.  This establishes our primary relationship — a trigger ‘lives’ on (is attached to) a single table.  Then we immediately step into murky water.  A trigger, like other programmable objects, has a definition script, which you can view in a myriad of ways — Object Explorer, sys.sp_helptext, sys.sql_modules, sys.syscomments (deprecated), or OBJECT_DEFINITION(OBJECT_ID).  So if we really wanted to go spelunking, we could just read the trigger’s definition and see what columns it deals with.  But after about 2 or 3 of those, our eyes are going to glaze over and our fingers will start to twitch.  We need something better, something that can handle hundreds of triggers and output a nice unified result!

Oh wait, what’s this?  Object Explorer has a “View Dependencies” function.. aw, bummer, it doesn’t show columns.  Just tables.  As we know, most of the GUI functionality is driven by existing system meta-data/catalogs/dmv’s, so there’s definitely something that keeps track of these dependencies.  Let’s hazard a guess… sys.sp_depends or sys.sql_expression_dependencies?  The former is deprecated, the latter doesn’t give us a way to get to columns either.  But thankfully, on the docs, the deprecated note points us toward the correct path: sys.dm_sql_referenced_entities and sys.dm_sql_referencing_entities.

Now we’re cooking with gas!  So how do I use these suckers?  Well, there’s a couple intermediary joins I need to go thru, and in fact these are functions, not views, so we need to know the proper arguments.  But fear not, I have done the hard work for you!  Check out the code and let me know what you think.

But wait, there’s more!  There’s a slight monkey-wrench here.  You see, in the column dependencies, we’re getting both the “audited” columns and the “key” columns.  But we’d like those to be distinguished from each other.  When we document and/or replicate the auditing using newer technology, the primary keys aren’t what we need to track (they’re in a sense “given”, known) — we need to know which columns are “audit-able” and which aren’t.  Hence the two queries with a UNION ALL.

union-vs-confederate
That is ALSO inappropriate. Good lord, who’s choosing these pictures today?

Conclusion

So there you have it, documenting some audit-trail triggers.  Or triggers in general.  Complete with the tables & columns on which they depend.  Enjoy!  ‘Til next time folks.

Indexing a HUGE Table

How it lost its original clustering key is a perfect topic for this month’s T-SQL Tuesday!

tsql2sday150x150

As previously referenced, we had a half-billion row table (which we’ll “round up” to a billion, just for the sake of argument) that needed an index.  A clustering key, to be exact.  How it lost its original clustering key isn’t that interesting is a perfect topic for this month’s T-SQL Tuesday (we planned to replace it, before realizing how long it would take, and had to cancel the job after it was removed but before it could finish adding the new one).  Anybody can drop an index; it takes mere milliseconds.  But to create (or rebuild) an index, the SQL engine needs to touch every single row that index will include, and of course for a clustered index, that’s errverybody.

So the “Lessons learned the hard way” are:

When planning an index replacement for a billion-row table, don’t just schedule the job as “drop old index, then create new index“, and expect nothing to go wrong.

..and..

Don’t try to index a billion-row table all at once.  Use a smarter methodology.  Unless you can literally afford to have that table be offline for several hours (depending on your hardware, and assuming you’re on Standard Edition.

Of course, some of you crazy kids with Expensive Edition are scoffing and thinking “Oh that’s nothin’!”, with your billions of rows and online index rebuilds.  Well you can go back to your fancy Always Encrypted data and your terabytes of RAM and just pretend I said “trillion” instead of “billion” — maybe that’ll fit your scale a bit better.  But for the rest of us mere mortals…

one billion records dr evil
it’s yuuge, I’m telling you…

Anyway, since we’re peons, we can’t just go create the billion-row index without taking the table offline.  So we need to come up with a way to minimize that downtime for the table, and avoid causing excessive blocking or I/O overload on the instance.  As I said before, SSIS to the rescue!

The Plan

i love it when a plan comes together
classic… plain and simple.

I call this the “setup, dump, & swap”.  Essentially we need to create an empty copy of the table, with the desired index(es), dump all the data into it, and then swap it in.  There are couple ways you can do this, but it boils down to the same basic premise: It’s “better” (probably not in terms of speed, but definitely in terms of efficiency and overhead) to fill this new copy of the table & its indexes, than it is to build the desired index on the existing table.

So here’s an outline:

  1. Script out the table (SSMS, right-click, script table, create to… or, if you use one of these cool extensions, F12 or similar “get definition” shortcut) — say, if the original is MyTable, script-create & replace MyTable with MyTableCopy
  2. Here’s a little room for choice.  You could create the table in the same schema with a new name; or, you could create the table in a different schema, with the same name or another name.  This will determine how you do the “swap” toward the end.
    • In the first case, we’d use sp_rename
    • In the 2nd, we’d use alter schema transfer
    • Both are essentially meta-data changes, but the latter could be potentially take a hair longer just because it pulls more strings; whereas the former requires taking care of dependencies ahead of time so you don’t break a schema-bound view or orphan a foreign key.
  3. Add the desired index(es), e.g. create clustered index CX_MyTableCopy_Datestamp_ThingName on dbo.MyTableCopy (Datestamp, ThingName)
  4. Build the SSIS task to copy the data from MyTable to MyTableCopy
  5. Schedule said task via SQL Agent.
  6. Use the previous tip to monitor its progress and estimate time to completion!
  7. Once done, prepare to swap!
    • Again, use sp_rename, and if needed, alter schema transfer.
    • Likely, this will involve several renames – the constraints and other indexes can’t be named the same either, so get all that stuff renamed with an _old suffix first, then you can swap the actual tables.
  8. Clean up after yourself and drop the old table once you verify everything’s working well and all dependencies are accounted for.

But Why?

can-you-tell-me-why
Fry shares your curiosity…

Let’s back up a bit.  Why are we doing it this way?  More importantly, what are the general use-cases for this kind of thing?  Well as I said, it’s “better” in certain ways, than simply creating the index on the “live” T.O.U.S.  It avoids locking said live table, and it has the potential to put less stress on the database and less synchronous I/O against its primary files.  Finally, and most importantly, as a bulk insert operation, with configurable batch sizing, the SSIS task will put exponentially less load on the transaction log.  Whereas, with the regular inline index creation, it could easily fill up the Tlog and will definitely cause performance issues.

Furthermore, swapping tables, or in more advanced cases, partitions, is a fantastic way to move massive amounts of data around with minimal impact.  It also happens to be quite helpful when we have to deal with indexing such massive amounts of data.

In my case, this was actually an archive of 2016 audit-trail data, so if I’d have chose, I could have taken it out of the partitioned view it was part of, and dealt with it being offline for several hours.  But because I’m a sucker for a good problem, and because I wanted to be able to semi-accurately monitor the progress to let the boss-man know when it might be done, I chose to take this route instead.  It’ll come in handy again soon, I’m sure — we have several other jumbo-tron tables laying about that may need some index tuning and/or partitioning.

So, happy swapping, and don’t let those T.O.U.S.‘s get the best of you!

rodent of unusual size
Meh, I don’t think they even exist!

 

PS: thanks to @SQLDoubleG for hosting, and thanks in advance to the rest of the community being lenient of my “submission” being a re-purposed existing post with a few extra blurbs thrown in to bring it on-topic!  =D

Quickie: Timing a HUGE Data Copy Operation

We think the total operation will take several hours, possibly days.  So we want to be able to check up on it.

While I try to compose my resources and samples for the previous two teasers, I found this to be a particularly interesting problem and thought I’d share my solution.

Scenario

trump it's gonna be yuge
yuuuuge

We have a very large table, half a billion rows.  It needs to be copied (well actually, indexed, but that’s another part of the discussion).  We know that standard TSQL would suck for this, even if we wrote a batch-loop proc for it (sure, it might be less blocking, but it’ll take forever).  Plus, we might be going between two different servers (well, not in my case, but you very well could be, and in fact that’s probably the more likely case, given the implications of such a large operation).  SSIS to the rescue!

Now, again, we’re lazy, so we use the Import/Export Data wizard (a component of SSIS) to build the .dtsx package.  Then we can schedule it via a SQL Agent Job.

Sure, we could build it with Visual Studio and/or BIML and be super-awesome, but where’s the fun lazy in that?

Based on some preliminary estimates of the throughput using SSIS for a small sub-set of the data, we think the total operation will take several hours, possibly days.  So we want to be able to check up on it — to see its status & elapsed time, and get an estimate of time remaining.  Sound good?

Assumptions

assumptions this way
over thar!

Two assumptions going into this, to keep the example simple.

  1. We know the Agent Job’s scheduled start time, and it does start on-schedule.
  2. We’re only dealing with 2 tables — 1 source, 1 destination.  Furthermore, they’re both on the same server instance; or if not, we can query a linked-server connection once to get the “source” table size-stats, because they won’t change.

(Okay that was slightly more than 2, but again, simple.  One can imagine expanding this to multiple tables & multiple instances, but then at some point you’re going to need a “witness” that can talk to all the disparate sources of data and conglomerate those bits & pieces together, and then you have to ask yourself “is it really worth it, or can I just give a SWAG & move on with my day?”)

Before I move on: helpful SO answer that reminded me how surprisingly-not-that-difficult (and-still-pretty-efficient) it is to convert a time interval (seconds, in this case) in a “human friendly format” like Days.hh:mm:ss.  Yay.

I’m sure I’ve done it before, and subsequently forgotten about it.  Because, again, it’s one of those things that should be done in the presentation layer.

The Outline

One key component of any kind of “check-up” or general monitoring solution is, you need it to be light weight, i.e. have low overhead.  We don’t want our method of monitoring the process to add some non-trivial extra load to that process (or the servers doing said processing).  So let’s avoid COUNT(*) shall we?  Unfortunately, the top Google results don’t readily point to this, but MS actually gave us a handy-dandy built-in way of measuring this, sys.sp_spaceused.  Yay again.

Granted, it doesn’t get you “up to the millisecond” accuracy, but it’s truly close enough for what we’re doing; and frankly, it’s your only sane option when asking the question “how many rows?” of a TOUS.

So we’re going to use the output of that system proc, specifically the rows column, to measure how much data has been bulk-copied into our destination table.  We’ll also measure the source table, once, as stated, because (we’re assuming!) that won’t change.

Finally, we’ll use those row-counts, with some simple tricks & nonsense — basic math, dateadd/datediff — to calculate a percent-complete, elapsed time, estimated total time, and ETA (estimated time of arrival completion).

it's all a lot of simple tricks and nonsense
The Force? Hah!

The Code

See Gist.

In the next post, I’ll zoom out a bit and explain why I needed to do this, and in what situations it should come in handy.  Stay tuned!

Hacking DLM Dashboard, Part 2

Another teaser, just to keep you waiting… I promise I’ll finish it soon!

This hack is about change buildup/bloat.  When you ignore a particular DB’s schema drifts for a while, the amount of changes that the “Review” page needs to show gets larger and larger.  Until, eventually, there’s simply too much content for it to handle, and it dies.  This is a bummer, because you want to ack those changes, but you don’t necessarily need to see them all in detail, as long as you have a general idea of what they were about.

So, this will be a tale of how I purged a set of un-acknowledged changes from DLM Dashboard for a specific Server/Database.  RG support, bless their heart, wanted me to go to the document repo (which, if you’re not aware, uses a RavenDB back-end stored on your local file system) and purge everything.  But I said no, I think we can do better than that.

Plus, if you really really care about those changes, you can use the RavenDB Studio (GUI) to dump the documents to CSV — i.e. if this tool is part of your audit/compliance toolbox and you don’t want to lose those changes, but you need to be able to “move on with life” by acking the changes to the DB so that you can resume checking it on a regular basis without having a page-hang/crash.

This will be screenshot-heavy, censorship-heavy (to avoid divulging company secrets!), and meme-free (I know, I know, some of you will thank me and others will be a sad panda).

More to come, stay tuned!