T-SQL Tuesday #99: Counting Rows the Less-Hard-Way

We can get our row count, and min & max date values, without ever touching the actual source table!

Advertisements

This month’s invite courtesy of Aaron Bertrand (B | T), whose “bad habits” blog series still inspires many an impassioned debate or engaging argument discussion on a regular basis among DBAs & Developers alike.

And yes, I’m taking the easier of the two “dealer’s choice” choices — the SQL focused one.  (I’m not big on sharing/blogging personal stuff, at least not here; I may one day start another blog for that, or perhaps just occasionally post more #off-topic stuff , but for now you’ll have to be content with my stories of vehicle troubles and the occasional movie-geekery).

So, without further ado…

By the way, what is ‘ado’ and why should there be no further of it?

art-vandelay-importer-exporter
Accidentally apropos on many levels…

Counting Rows in Really Big Tables

Previously touched on here, tables of unusual size (TOUSes) can be tricky.  You don’t want to lock them up for a long period of time, but you often need to gather information about them (such as row count, size, range of values) to perform some kind of operational maintenance with/on them.  In particular, Aaron’s post on “counting rows the hard way” inspired me to look into this a bit more and try to come up with a clever-ish way of finding out some basic “shape of data” info without actually querying (scanning) the table itself.

To start with, it’s actually really simple to get the total row-count from a few system catalog views — Aaron’s already shown you that, so I won’t repeat.  My interest is more in questions like “How many rows match a where-clause?” or “What are the min & max values for thatColumn?”

For this post, I’ll be focusing on a particular kind of table — the “history” or “transaction” table.  The idea here is that you have a record of “every time some event happens in/to some entity”.  A very common example is audit-trail tables, which I’ve been dealing a lot with lately.  Another common example is a “transaction history” table, such as, in our new favorite MSSQL demo database WideWorldImporters, the table Warehouse.StockItemTransaction​.  It’s the 2nd largest table in the db at 260-some-thousand rows.  {The largest is a multi-million-row beast that is actually the system-versioned aka temporal table behind a “normal” table; I might build a phase-2 example around this, but not today.}  So, while our queries won’t be super slow, we’ll get enough of an idea of what’s bad & good from measuring our IO stats (with SET STATISTICS IO ON).

TL;DR: The demo script is available here; the headers below correspond to the comment-lines of the same name, but I’ve left enough commentary in the SQL itself to keep the average reader on-track, so feel free to check it out ahead of time.  But do keep reading at some point!  :o)

A. Gathering Some Intel

First up, we have good ol’ sys.sp_spaceused.  This gives you some sizing info about the table, including its row count.  There’s a “disclaimer” circulating around out there that it’s not “up to the millisecond accurate” , i.e. it might not have the most current row count if someone else is in the middle of an insert operation or whatnot.  But for all intents & purposes, you can consider it truth.

Then you have the “hard ways” that people typically use — and that Aaron, again, covered just fine in his post on the subject, so I won’t spend any more time there.

But what if I want to count rows matching a where clause (a predicate)?  And in dealing with a typical history/transaction table, the predicate is almost always “between such and such dates”.  (Of course we won’t literally use the between operator, as we have been chastised severely; we know best to use >= and < !)  Also, I want to know the MIN and MAX of said dates in the table.  Lord knows we’re gonna be doing some table-scanning.

B. Ok, Let’s Try an Index

In their benevolent wisdom, the SQL deities decided not to give us an index on WideWorldImporters.Warehouse.StockItemTransactions.TransactionOccurredWhen.

BTW, how’s that for a verbose column name?  What, TransactionDate not good enough?  I suppose it isdatetime2 after all, but still…

So we create an index on it, to see if that helps our poor “count where dates” query.  And behold, it does!  We’ve cut our # of logical reads down by about 90% (from 1900 to 200, if you’re following along in the script).  That’s fantastic, but… we can do better.  Because if the table is, say, 500 million rows instead of 260k, that’s about 400,000 logical reads, which.. could definitely suck.

C. The Better Way

Again, the script has an ode to Aaron’s query on sys.partitions/tables to get the row-count from the meta-data.  Then the real fun begins.

There’s a system DMV (or probably ‘DMF‘ – dynamic management function) called sys.dm_db_stats_histogram, which takes the table’s object_id and the index’s index_id as arguments.  It gives you, obviously enough, the statistics histogram of the statistics object corresponding to that index.  We want to store its output in a temp-table (or even a real table — go nuts!) so we can query it some more.

--For example, if our new index is index_id 7:
sys.dm_db_stats_histogram(OBJECT_ID('Warehouse.StockItemTransactions'), 7)

So we create our #StatsHist table (“hist” being an abbreviation for “histogram”, not “history”, though in retrospect that’s probably not worth the possible confusion), and we populate it with the meta-data from Warehouse.StockItemTransactions and its new index that we just created (on TransactionOccurredWhen).  Poof!  We have an easy way of showing min/max values in that column!  Well… almost.  We have to convert the variant datatype to an understandable & aggregate-able (probably a made-up word.. aggregable? aggregatable?) type.  So we add a column range_hk_proper of type datetime2 and populate it with the converted values of range_high_key from the stats-output.

There!  Now we’re cookin’ with gas.  Our min/max/count query, and our “count where date-range” query, run in mere milliseconds, without ever touching the actual source table.  So we don’t lock it up or block anybody else from writing to it, even in the most pessimistic isolation levels.

Except when you created that index we needed on the date column.

Yes, I know.  What we’re hoping is that the tables we deal with in the “real world” already have such an index that we can take advantage of.  If not, well, that’s what maintenance windows are for.  And you better believe you’re gonna need that index sooner or later.

cooking-with-mustard-gas
Been a while since I used a Family Guy meme….

Where To Next?

Ostensibly, this whole thing could probably be turned into a stored-proc so you could run it “on demand” for any table that had a date or datetime column which you wanted to get such information about.  It’d have to do a lot of error-checking, of course — it wouldn’t work if you don’t have such a column, and if there’s no index on it, and probably a myriad of other ‘gotchas’ that I’m not thinking of at the moment.  But I did try to lay the groundwork for improvement. #StatsHist stores schema & table name too, so if you felt like turning it into a mini-data-warehouse holding a BUNCH of stat-histograms for a whole mess of tables, you could definitely do that.  And then you could run some basic analytics on it — min/max/avg, counts by year/month/day, etc.

Sounds like fun, no?  ;o)

The Passing of the Torch

Did I mention documentation?

It’s always hard to say goodbye to a colleague, especially someone who’s so central and ingrained in the company lore and holds so much of the “tribal knowledge”.  Hell, I was that guy just a couple years ago.

calvin-brain-dump
Can you just leave your whole brain right there on the desk? Thanks.

So now I’ve seen a couple such old-hats move on from my current team, and seeing both sides of the proverbial torch-passing is interesting.  There’s definitely some very common, very important things that we should always do.

Documentation, documentation, and more documentation.

Indeed.  Also, finishing critical tasks, handing off in-flight projects, re-assigning tickets, talking to managers, prepping teammates for the work overflow, and cleaning out that huge buildup of clutter that you’ve collected over the years.  Virtual or physical… often both!

Unsurprisingly, where we all seem to differ widely is the human aspects.  Breaking the news, saying goodbyes, doing those last-minute get-togethers and send-offs.  What do those last few weeks and days look like?  For some, it’s just business-as-usual up to the last minute — they’re literally so busy they have little other choice.  That’s how it was with the helpdesk manager we parted with last year.  I used some of the time to put together documentation and thank-you letters, which I hope ended up being helpful.  Database diagrams were printed and taped.  Wikis were written.

But the main thing is to make sure you exchange contact info and stay in touch.  It gives the team a sense of comfort, knowing they can reach back out when those random questions that nobody’s thought about for several months resurface.

keep in touch and stay awesome
KITSA!

I’ve learned a lot from those folks that took the time to pass on their knowledge and made the effort to keep in contact.  And I appreciate them for that!  Today I’ll thank one of my exiting managers; she knows who she is.  She taught me a lot about our internal application stacks, integration and interop, company culture, tribal knowledge, and not standing for anybody’s BS, including my own.  Good luck with consulting, stay in touch, and kick some butt!

That’s all for this week.  I promise I’ll work on that “database collation problems” post soon…  :o)

T-SQL Tuesday #98: Orphaned Users Redux

It’s that time again!  The first #Tsql2sday of 2018.  Thanks to the Blobeater for this month’s invite: “your technical challenges conquered”.

Because I’m already ridiculously late, I have a short one.  This is about orphaned users — you know, when you restore a database and its users aren’t mapped to the server logins that they should be or used to be.

orphan-movie-poster
Not that kind of orphan… pretty decent movie tho!

The typical solution is sp_change_users_login with the auto_fix or update_one option.  But guess what?  Yep, that’s deprecated.  By the way, did you know that it also has a report option?  Apparently that’s got some bugs…ish?  Try it sometime and see — compare the output of sys.sp_helpuser where the ‘LoginName’ column is null, with sp_change_users_login 'report'.  Preferably on a DB you’ve restored from another server.  😉

So what’s the correct solution?  ALTER USER [theUser] WITH LOGIN = [theLogin].  Simple, no?  Let’s get more general.  Could we come up with a half-decent way do apply this kind of fix dynamically?  Well sure, the nice folks at DBATools have already solved that problem.  And that’s great, really.  But just in case that doesn’t work… ^_^

One of the many things I love about SQL Prompt is the right-click option to “Script as INSERT” (from the results grid).  This is a quick & easy way to built a temp-table for the results of an exec statement so you can do the ol’ insert #tmp exec sys.sp_blah !  Then we can query the list of DB users for the null LoginNames and write a little set of queries to fix them!  Sound good?

UPDATE: Behold the code!

Happy Tuesday!

PS: Coincidentally, today’s (Thursday) SQL Server Central newsletter featured a very similar post by a gentleman over at Madeira Data.  Go check it out, it’s another great solution to this problem!  And while you’re at it, get the SQL Server Radio podcast (created by a couple guys from the same company) – it’s a terrific addition to your iTunes library.

Movie Reviews and the Killer Database Collation

If you have a core database using a different collation than the rest of the DBs around it, BAD THINGS HAPPEN.

And we’re back!  Hi folks, thanks for being patient with my December hiatus.  The holiday season is always a little hectic but this year it felt especially sudden.  And hey, you all have better things to do than read a blog in between the home cooked meals and the family gatherings.. like sleep, shop, and go see all the new movies!

Thanks to both Pitch Perfect 3 and the latest New Year’s Rockin’ Eve, Britney’s “Toxic” is now stuck in my head, so that’s fun.

britney-stewardess-toxic
I think I’m ready now… for 2018.

Some of you may not know this, but I’m a big movie nerd.  Not like the weird “knows a bunch of obscure factoids about all the Tarantino movies” or whatever.  But I do quite enjoy the behind-the-scenes / making-of stuff — what used to be called “bonus features” on DVDs (remember those things??) — whenever the wife will tolerate sitting thru them with me.

Our genre of choice is generally horror.  Now, I’m gonna get nerdy on you for a bit; because there are several sub-types or horror, and I enjoy almost almost all of them.  Campy, creepy, fun, found-footage, gory, spooky, slasher, supernatural, tense, psychological, revenge, deconstruction, possession.  For the uninitiated, “deconstruction” is like 2012’s Cabin in the Woods — it pokes fun at the tropes while building on them in unique ways.  Those are one of my favorite kind; that one in particular is definitely in my top 10 all-time.

So to kick off this year, before diving back into the technical stuff, I’d like to give you a coupe lightning reviews of some horror movies that we’ve watched that are perhaps underrated or you may have missed.

  • The Babysitter (2017) – comedy/deconstruction. A young preteen boy, whose parents are gone a lot, has a great friendship with his older teen babysitter, but one night decides to spy on what she and her friends do after he goes to bed. And well, crap hits the fan.  Lots of fun, eye candy, and slapstick violence. 👍👍
  • Patchwork (2015) – campy/revenge. 3 girls are Frankenstein’d together and have to overcome their mental differences and physical struggles to piece together the perpetrator and hopefully exact some revenge. Superbly acted by the lead lady, plenty of violence and just enough funny bits to keep it going. 👍
  • Happy Death Day (2017) – slasher/deconstruction. Think Groundhog Day but with a college chick being killed by a masked marauder repeatedly.  She must try to find out who it is before it’s too late!  Somewhat predictable but still entertaining and engaging. 👍
  • Incarnate (2016) – possession/supernatural. A somewhat unique twist on the genre, a brain doc frees people from possession by mind-sharing & getting the person back in control of their own consciousness.  Think Inception meets Exorcist.  Very well-acted, convincingly scary demon, and nicely twisted ending. 👍👍
  • Demonic (2015) – creepy/found-footage. Bit of a misnomer, as it has nothing to do with demons; it’s about a ghost-summoning gone horribly wrong resulting in the deaths of all but 1 (ish?) member of the group that originally attempted said ritual.  Frank Grillo is always on-point.  Very engaging. 👍
  • Last Shift (2014) – gory/creepy/demon-y. Rookie cop gets stuck with the last watch in a soon-to-be-shut-down police station, chaos ensues.  Literally, this is some crazy crap; scary and bloody.  Original & vastly under-hyped, has an indie vibe but looks & feels professional-grade. 👍👍

Most of these should be stream-able.  So check ’em out!

Now on to the SQL stuff.

A not equal a
borrowed from the man himself, Pinal Dave =)

Collations are Hard

If you ever have to integrate a vendor database into your existing environment, and the vendor ‘mandates’ their DB use a certain collation (which differs from the rest of your SQL instances / databases), run away screamingSrsly.

Or convince your managers that you know better, and force it into the same collation as everything else you have to integrate with.  Good luck & godspeed.

Let me give you an example.  The ERP system is being upgraded, which of course means a new (upgraded) DB as well.  Part of this upgrade seems to involve supporting case-sensitive searching/matching against name fields.  To this end, the vendor insists that the DB should use a case-sensitive collation, namely ​Latin1_General_100_CS_AS.  Problem is, the rest of your DB environment, in which a lot of stuff touches the ERP database (via joins, linked-server queries, etc.), uses the SQL default collation of SQL_Latin1_General_CP1_CI_AS.

If you follow the vendor’s mandate recommendation, guess what’s going to happen to your queries/views/stored-procedures that touch this & other DBs?  Horrible things.  Terrible performance degradation.  Wailing a gnashing of teeth from the developers, business users, and customers.

Okay, I exaggerate.  Slightly.

But it really does hurt performance, and I don’t feel like it’s talked about enough in the data professional community.  In the next post, I’ll take this problem apart a little more and try to impart some of what I’ve learned from going through the pain of dealing with the aforementioned example.

Happy 2018!

PS: Apparently this is my 50th post!!  Go me!  :o)

50-cent-face-on-50-dollar-bill
fiddy. fiddy posts.

Adventures in Database Renaming

Renaming a Database looks easy, but it’s really not. Unless nobody’s touching it. Let’s find out why!

Databases sometimes need to get a new name.  Or you need to swap names.  It can be a royal PITA if the DB is a mission-critical always-being-used everybody-and-their-mother-touches-it-all-the-time thing.

Errors like “The database must be in a state in which a CHECKPOINT can happen” or “Cannot obtain exclusive access to the database” or “This transaction was chosen as the deadlock victim” abound.  Sometimes it’s better to fall back on our old friends, detach & attach.

Let’s paint a picture!

lets paint a happy little tree
And if you tell ANYONE, that that tree is there, I will come to your house, and I will CUT YOU…

Our ERP database has been chosen by the IT gods to get moved to the shiny new flash storage array, off the old spinning-rust SAN.  This is fantastic news for the business users.  But lo, the executives warn us, “You must do this with no downtime!” (said in my best Brent Ozar PHB-imitation voice).  Of course when we tell them that’s impossible, they say, “OK, you must do this with minimal downtime.”  That’s mo’ betta’.

So what are our typical options for doing a database migration?  Or, more specifically, a data file migration.  See, we’re not moving to a new server, and we’re not moving a bunch of databases together; we’re just moving this one ERP database.  And we’re keeping it on the same SQL instance, we’re just swapping the storage underneath.  Now yes, in a perfect world, both storage arrays (both SANs) would inter-operate and they’d be able to take a snapshot, pass it over, spin it up, swap it out, and our SQL instance would be none-the-wiser.  But alas.

Other options?  Log-shipping, sure; it’s been around forever, it’s very mature, simple, and relatively easy to operate, even if it lacks a bit in terms of automation capability.  But let’s face it, the DBA is going to be sitting in front of his/her screen babysitting this operation no matter what, so that’s no biggie.  Mirroring?  Meh.  Expensive Edition’s still got all the hot action there, and we’re not that fancy.  Availability Groups?  Sure, if we already had them set up & running.  But we don’t.  Sad-trombone.

Back to Basics

So we circle back to the classics, “backup & restore”, “detach-move-attach” routines.  But there’s a hidden gem of a hybrid here: We can backup, restore as a new name, then switch the names!  The catch is, we need to put the original DB in read_only mode while we do this, to prevent new/updated data from coming in.  Now, when we ask the Dev-managers if their overlaying apps/websites can handle the DB being in read-only mode for an hour, they’re much more inclined to say OK to that, as opposed to the DB being actually ‘down down’.

clinton-depends-on-definition-of-down
are we talking about “down” time or “downtime”?

Now, there’s a way to do this in T-SQL, and there’s a way to do this in PowerShell.  Both are fine!  I’m not arguing for one or the other — pick a flavor and go with it.  This happened to be a day where I was feeling SQL-ish, not PoSh, hence the flavor I chose.

The Walkthru

In my example we have our “slow” storage drives ‘D’ and ‘L’ (for Data and Logs, respectively).  We’re moving our blessed database to “fast” flash storage drives ‘E’ and ‘M’ (you know, because they come after ‘D’ and ‘L’).  The team managing the dependent apps have given their blessing to have a ‘read-only mode’ maintenance window for the database, as long as the actual ‘downtime’ is mere seconds.

Once the DB is in READ_ONLY, we do a backup,preferably to the flash storage so that it’s faster (and the restore, from flash to flash, will be super-fast!).  We then restore that backup to a new DB on the flash storage drives, and, as expected, it’s speedy-gonzales.

he makes speedy gonzales look like regular gonzales
Flash arrays are awesome, btw. =D

Once our two copies of the database are up, we’re ready to take that minuscule down-time.  We kick the users out of it (with SINGLE_USER mode), detach both DBs, and re-attach them (create database .. on (file), (file) .. for attach) with different names!  Presto-change-oh, the ‘original’ DB is now resident on flash storage, and the old slow files are now a DB called “whatever the heck you want it to be, because it doesn’t matter anymore!”.

Disclaimer:  I in no way support nor condone the naming of databases with spaces or special characters.  If you actually name a database something like that, no matter how temporarily, I will find you, and throw sporks at your keyboard.

Assuming you didn’t run into any crazy locking problems or sticky-sessions that denied your attempt to force SINGLE_USER mode or detach the DBs, you should be good to go!  I even considered building all this into an Agent Job, but… I’m worried it would get tripped up and leave the DBs in a half-arsed state that wouldn’t make our Dev-managers very happy.  But hey, that’s what we have test environments for, right?!?

RIGHT.

Here’s the code!  Enjoy.  =)

Finding Trigger Dependencies

Preamble

In SQL, TRIGGERs are often a subject of contention.  Some data professionals have strong opinions about them, especially on the “they’re bad!” side.  But like anything, they’re a specific kind of tool that has a place in a specific problem.  One of the common scenarios for their usage is what we call an “Audit trail” — a running log of any & all changes made to a table or data-set (or even a database schema, as RedGate DLM Dashboard employs).

Yes, there are newer product features built around this very requirement, such as Change Data Capture (CDC) and Temporal Tables, but they require particular versions and editions of the product, and not every environment is going to be there.  So we need something native and compatible to the lowest common/supportable denominator.

Like most areas of the database environment, triggers are often poorly documented.  So what happens when we need to review all the triggers in a database and show what tables they’re on & what columns they’re “tracking” (sticking with the audit-trail example)?  Or, more generally, you could say, we want to find all table & column dependencies of our triggers.  Well sure, we could use the SSMS Object Explorer GUI — find the table, the triggers, right-click and “Show Dependencies”.. but that’s not good enough, for two reasons: A) it doesn’t get down to the column level, and B) it’s not script-able.

System catalogs and DMVs to the rescue!

dmv-line-of-people
No, not THAT DMV…

Disclaimer: this is heavy on MS Docs links.  There was a StackOverflow Q/A that helped point me in that direction, but sadly I lost the tab and didn’t save the link.

The Goals

In this script, I mostly focus on the UPDATE triggers, because they’re the most heavy-handed in terms of auditing.  Whereas with a typical INSERT or DELETE, the triggered audit action is basically “Hey, a row was inserted/deleted, here’s its primary key” — and sure you can get fancier and log what all the values were in the entire row — those are fairly straightforward.  It’s basically “all or nothing”.  In contrast, with an UPDATE audit, we want to know just what was updated, and from what (old value) to what (new value).

So the goal of documenting the dependencies is to see which columns we care about — because, inevitably, there are some columns that we either “don’t care about” for auditing, or some that just plain don’t change (by their nature — they’re an identity or a primary key, or controlled via other means).  In turn, this helps us understand how to reproduce these audit mechanisms when a better technology comes along — such as CDC or temporal tables, as alluded to earlier.

cdc-centers-for-disease-control
No, not THAT CDC… yeesh.

The Walkthru

We start with the system catalog views sys.tables and sys.triggers.  This establishes our primary relationship — a trigger ‘lives’ on (is attached to) a single table.  Then we immediately step into murky water.  A trigger, like other programmable objects, has a definition script, which you can view in a myriad of ways — Object Explorer, sys.sp_helptext, sys.sql_modules, sys.syscomments (deprecated), or OBJECT_DEFINITION(OBJECT_ID).  So if we really wanted to go spelunking, we could just read the trigger’s definition and see what columns it deals with.  But after about 2 or 3 of those, our eyes are going to glaze over and our fingers will start to twitch.  We need something better, something that can handle hundreds of triggers and output a nice unified result!

Oh wait, what’s this?  Object Explorer has a “View Dependencies” function.. aw, bummer, it doesn’t show columns.  Just tables.  As we know, most of the GUI functionality is driven by existing system meta-data/catalogs/dmv’s, so there’s definitely something that keeps track of these dependencies.  Let’s hazard a guess… sys.sp_depends or sys.sql_expression_dependencies?  The former is deprecated, the latter doesn’t give us a way to get to columns either.  But thankfully, on the docs, the deprecated note points us toward the correct path: sys.dm_sql_referenced_entities and sys.dm_sql_referencing_entities.

Now we’re cooking with gas!  So how do I use these suckers?  Well, there’s a couple intermediary joins I need to go thru, and in fact these are functions, not views, so we need to know the proper arguments.  But fear not, I have done the hard work for you!  Check out the code and let me know what you think.

But wait, there’s more!  There’s a slight monkey-wrench here.  You see, in the column dependencies, we’re getting both the “audited” columns and the “key” columns.  But we’d like those to be distinguished from each other.  When we document and/or replicate the auditing using newer technology, the primary keys aren’t what we need to track (they’re in a sense “given”, known) — we need to know which columns are “audit-able” and which aren’t.  Hence the two queries with a UNION ALL.

union-vs-confederate
That is ALSO inappropriate. Good lord, who’s choosing these pictures today?

Conclusion

So there you have it, documenting some audit-trail triggers.  Or triggers in general.  Complete with the tables & columns on which they depend.  Enjoy!  ‘Til next time folks.

VS SSRS Project Gotchas

I now present to you, a recent adventure in building & configuring a SSRS 2016 / VS2015 project to replace our aging SSRS 2008R2 / VS2008 setup.  In trying to make things ‘better’, I wanted to centralize the storage of the Data Sources.  I found a tip on StackOverflow with a neat (albeit hacky) idea on how to do this.  So I started diving into it.  Here’s how that went.

The Setup

I have a bunch of published reports on a ‘real’ report server (SSRS instance), which were created off-the-cuff using Report Builder 3.0 (no source-control connection or anything).  So the first thing I had to do was go and download each RDL (thanks, Microsoft… ugh!), to my local VS2015 project folder, so I could bring them into my SSRS project.  I didn’t bother trying to download/copy the Data Sources (they’re basically just connection-strings, and anyway you can’t download them from Report Manager), so I settled for re-creating those in VS.

Then it was time to set up the solution & projects.  I prefer to organize my reports in context-relevant folders, like “Operations”, “Marketing”, etc.  In order to do this sensibly, within the VS solution, you need to create a Project for each major fold you’ll have.  Makes sense; this is how the old solution was constructed too.  Here’s where I tried to improve things:  I created a “Datasources” project, which would house just the shared Data Sources.  Here’s where I created my new data-sources, setting them up with saved SQL auth (login & pwd).  Sure, maybe not the “best practice” but it keeps things simple — my SSRS user only ever has read-only db access, and the pwd is managed well enough to keep auditors happy.

Thus, my plan (in following the SO tip) was to have all the other projects’ Data Sources be pointers to these shared Data Source files (RDS​’s).  Sounds good in theory, right?

what could possibly go wrong
ooh, sparkly!

Starting to Implement

Well.  I started small with just one project, one report, and one data source.  I went into my Nate test project, did “add existing item” into the Reports folder, and browsed to pick the rdl that I had just downloaded from my live SSRS server.  I then went to my Datasources project, copied the rds that I needed, pasted it into Nate test project’s Shared Data Sources folder.

Now at this point there are 2 copies of this file in my solution folder.  We don’t want that.  So I opened up trusty ol’ Notepad++ on the Nate test.rptproj file, and edited the XML node under <DataSources><ProjectItem><Name>My-Source.rds</Name><FullPath>My-Source.rds</FullPath></ProjectItem></DataSources> , changing the FullPath node value to <FullPath>..\Datasources\My-Source.rds</FullPath>.  I then deleted the physical copy of the file at \MySolution\Nate test\My-Source.rds , so that only the shared one physically remains in the filesystem (\MySolution\Datasources\My-Source.rds).

Another way to accomplish this, you may read, is to right-click Shared Data Sources within your project, e.g. Nate test, and say “Add existing item..”, and go select the global shared RDL from your master Datasources project-folder.  However, this still results in a physical copy of the file within Nate test folder!  So in either case you’ll end up doing some editing & deletion.

With me so far?  Okay, here’s where it gets interesting.  I could successfully build and deploy this report to my SSRS server, verifying that it showed my “new thing” (I added a text-box that simply said “This is the new hotness!”).  But I could not preview the report in VS!  Sad panda.  And of course, the most vague generic error message in the world:

An error occurred during local report processing
An error occurred during report processing

sarcasm-smiley
How specific and informative!

More Googling led to a somewhat random tip in the vein of “Go to the report properties, Open the report’s Data Source, and re-select the Shared Data Source from the dropdown”.  Combine with “Edit the Shared Data Source, ensure your saved login/pwd is filled in, and Save it”.  Presto!  Previewing now worked.

But why is this?  Well, as it turns out, it’s not all that surprising.  The RDLs are stored without the saved login/pwd, because of course they’re just plain-text XML files.  So to “protect you”, MSFT doesn’t store your SQL auth credentials, even if (and now matter how many times) you click “Save my password” on the connection-properties window (connection-string builder).

Great.  Well, I did that once, for Nate test project.  Let’s see what happens with another project (aka report folder), say Test again.  So I add an existing RDL, I copy-paste the Data Source from the global shared Datasources project, manually edit the rptproj in Notepad++, delete the copied rdl, etc.  I then try to preview the report… and behold, the same maddeningly vague error message!

“Well duh!” you may think to yourself, “the credentials still aren’t stored anywhere un-encrypted!”.  Yes, dear read, you are correct.  But do I want to right-click on this project’s Data Sources, edit it, fill in the credentials, save, rinse, repeat, for every single project?!  A resounding NO!

nobody-got-time-old-fashioned
ANGTFT

So what should I do?  I could go type in the credentials to the actual connection-strings contained within the ‘master’ rds files.  Of course then they’re in plain-text for all to see… but wait, do I care?  Another no.  Let’s go edit some connection-strings!

As a reminder, here’s what they will look like:

<ConnectString>Data Source=MyServer;Initial Catalog=MyReportDB;User ID=MyReportUser;Password=ThisIs@nAw3s0meP@ssw0rd!;</ConnectString>

Why did MSFT choose to name the XML node ConnectString in their rds schema instead of ConnectionString?  Don’t ask me!

The Gotchas

Here’s the first gotcha:  My reports (RDLs) need to be re-pointed at the correct Shared Datasource, because currently, as they were downloaded from my ‘real’ SSRS server — which had my desired top-level folder layout of “Datasources”, “Nate test”, “Marketing”, etc. — their Data Sources xml node points at \Datasources\MyDatasource.rdl.  This was the correct path on the SSRS server and still will be the correct path when all is said & done.  BUT, to support local debugging/previewing, the build output has to make a copy of the rds in the Project’s bin\Debug folder, which it will happily do, and then just as happily proclaim Could not find a part of the path 'C:\Users\Nate\Documents\Visual Studio 2015\Projects\MyReportSolution\Nate test\bin\Debug\Datasources\MyDatasource.rdl'. — because DUH, the build system didn’t make you a ‘Datasources’ sub-folder under the Debug folder, why would it!?  So by either manually editing the rdl file to remove the ​​\Datasources\ path from the node, or by clicking on the Report’s Data Sources node in the Report Data pane and re-pointing it at the local Shard Data Source (which again, is itself a pointer to the globally shared Datasources!), you can fix this issue.

In the rdl XML, that looks like this (using strikethru to show what you remove):

<DataSources>
<DataSource Name=”MyReport_DataSource”>
<DataSourceReference>\Datasources\MyDatasource</DataSourceReference>
<rd:SecurityType>None</rd:SecurityType>
<rd:DataSourceID>gobbledegook-guid</rd:DataSourceID>
</DataSource>
</DataSources>

And there’s another gotcha:  After we do all this, if we open the rds from within VS solution-explorer, it still won’t “think” it saved the authentication!  But the connection-string will be clearly visible with the plain password.  Here’s a couple pictures of what that looks like.

shared-datasource-properties-ssrs-vs2015
Data Source properties
datasource-connection-properties-ssrs-vs2015
Connection properties of the Data Source

Oh, by the way, did I mention how many times VS crashed on me while doing these gymnastics?  At least a few.

Lessons Learned

So, what have we learned?  Well, for one, this is a crappy situation born of poor in-product support.  I should be able to configure Solution-level shared Data Sources, use them in as many Projects (within said Solution) as I want, and have VS configuration management support them; bonus points for doing so with saved & encrypted credentials.  Ideally, when we check this into source-control, we’d check in the “DEV” environment flavor connection-configs.  Then, when the reports get deployed to the “PROD” SSRS server, the same globally shared Data Sources are already present (and they don’t get over-written, thankfully by default!), configured by the DBA with prod credentials, and nobody in the development pipeline needs to know said credentials.  Yay?

But alas.  We live in an imperfect world.  I also live in a world where I’m both the Report Developer, the Report Manager, and the Production DBA.  So guess what I get to do?  That’s right, not care about the connection-string security!

Okay, yes, I care a little.  Depending on my mood.  But not enough to turn my whole dang world upside-down trying to design and implement a robust multi-tiered-environment solution for one measly SSRS repository.  Kudos to you if you do — you’re a better man than I.

Or “better woman”.  But I’m not a woman.  So I guess the correct phrasing would be “You’re a better woman than I am a man.”  No, that’s not right.  “You’re a better person than I am?”  Sure, I guess that works.

In Review

To recap:

  1. Create the solution & projects
  2. Create the global shared Datasources project, and create your Data Sources (RDS‘s)
  3. Import your Reports (RDL‘s) to each requisite report-project
  4. Manually edit your rptproj to point the data-sources at the central shared path
  5. Manually edit your RDL‘s to remove previously established paths to the data-sources
  6. Configure your rptproj‘s to output to the correct folders on the report server — this should be done by default, i.e. VS has the “intelligence” to guess these for you, but just double-check.
  7. Build, Preview, and Deploy!
  8. Commit it all to source-control and let the collaboration begin.

And now to go automate all this with PowerShell… right after this beer.

Anyway.  Until next time, folks!  If I’ve made any mistakes, missteps, or otherwise offending your technical sensibilities, and/or if you know of a better way to accomplish what I’m after, I’d love to hear from you in the comments!  =)