Renaming a Database looks easy, but it’s really not. Unless nobody’s touching it. Let’s find out why!
Databases sometimes need to get a new name. Or you need to swap names. It can be a royal PITA if the DB is a mission-critical always-being-used everybody-and-their-mother-touches-it-all-the-time thing.
Errors like “The database must be in a state in which a CHECKPOINT can happen” or “Cannot obtain exclusive access to the database” or “This transaction was chosen as the deadlock victim” abound. Sometimes it’s better to fall back on our old friends, detach & attach.
Let’s paint a picture!
Our ERP database has been chosen by the IT gods to get moved to the shiny new flash storage array, off the old spinning-rust SAN. This is fantastic news for the business users. But lo, the executives warn us, “You must do this with no downtime!” (said in my best Brent Ozar PHB-imitation voice). Of course when we tell them that’s impossible, they say, “OK, you must do this with minimal downtime.” That’s mo’ betta’.
So what are our typical options for doing a database migration? Or, more specifically, a data file migration. See, we’re not moving to a new server, and we’re not moving a bunch of databases together; we’re just moving this one ERP database. And we’re keeping it on the same SQL instance, we’re just swapping the storage underneath. Now yes, in a perfect world, both storage arrays (both SANs) would inter-operate and they’d be able to take a snapshot, pass it over, spin it up, swap it out, and our SQL instance would be none-the-wiser. But alas.
Other options? Log-shipping, sure; it’s been around forever, it’s very mature, simple, and relatively easy to operate, even if it lacks a bit in terms of automation capability. But let’s face it, the DBA is going to be sitting in front of his/her screen babysitting this operation no matter what, so that’s no biggie. Mirroring? Meh. Expensive Edition’s still got all the hot action there, and we’re not that fancy. Availability Groups? Sure, if we already had them set up & running. But we don’t. Sad-trombone.
Back to Basics
So we circle back to the classics, “backup & restore”, “detach-move-attach” routines. But there’s a hidden gem of a hybrid here: We can backup, restore as a new name, then switch the names! The catch is, we need to put the original DB in read_only mode while we do this, to prevent new/updated data from coming in. Now, when we ask the Dev-managers if their overlaying apps/websites can handle the DB being in read-only mode for an hour, they’re much more inclined to say OK to that, as opposed to the DB being actually ‘down down’.
Now, there’s a way to do this in T-SQL, and there’s a way to do this in PowerShell. Both are fine! I’m not arguing for one or the other — pick a flavor and go with it. This happened to be a day where I was feeling SQL-ish, not PoSh, hence the flavor I chose.
In my example we have our “slow” storage drives ‘D’ and ‘L’ (for Data and Logs, respectively). We’re moving our blessed database to “fast” flash storage drives ‘E’ and ‘M’ (you know, because they come after ‘D’ and ‘L’). The team managing the dependent apps have given their blessing to have a ‘read-only mode’ maintenance window for the database, as long as the actual ‘downtime’ is mere seconds.
Once the DB is in READ_ONLY, we do a backup,preferably to the flash storage so that it’s faster (and the restore, from flash to flash, will be super-fast!). We then restore that backup to a new DB on the flash storage drives, and, as expected, it’s speedy-gonzales.
Once our two copies of the database are up, we’re ready to take that minuscule down-time. We kick the users out of it (with SINGLE_USER mode), detach both DBs, and re-attach them (create database .. on (file), (file) .. for attach) with different names! Presto-change-oh, the ‘original’ DB is now resident on flash storage, and the old slow files are now a DB called “whatever the heck you want it to be, because it doesn’t matter anymore!”.
Disclaimer: I in no way support nor condone the naming of databases with spaces or special characters. If you actually name a database something like that, no matter how temporarily, I will find you, and throw sporks at your keyboard.
Assuming you didn’t run into any crazy locking problems or sticky-sessions that denied your attempt to force SINGLE_USER mode or detach the DBs, you should be good to go! I even considered building all this into an Agent Job, but… I’m worried it would get tripped up and leave the DBs in a half-arsed state that wouldn’t make our Dev-managers very happy. But hey, that’s what we have test environments for, right?!?
I now present to you, a recent adventure in building & configuring a SSRS 2016 / VS2015 project to replace our aging SSRS 2008R2 / VS2008 setup. In trying to make things ‘better’, I wanted to centralize the storage of the Data Sources. I found a tip on StackOverflow with a neat (albeit hacky) idea on how to do this. So I started diving into it. Here’s how that went.
I have a bunch of published reports on a ‘real’ report server (SSRS instance), which were created off-the-cuff using Report Builder 3.0 (no source-control connection or anything). So the first thing I had to do was go and download each RDL (thanks, Microsoft… ugh!), to my local VS2015 project folder, so I could bring them into my SSRS project. I didn’t bother trying to download/copy the Data Sources (they’re basically just connection-strings, and anyway you can’t download them from Report Manager), so I settled for re-creating those in VS.
Then it was time to set up the solution & projects. I prefer to organize my reports in context-relevant folders, like “Operations”, “Marketing”, etc. In order to do this sensibly, within the VS solution, you need to create a Project for each major fold you’ll have. Makes sense; this is how the old solution was constructed too. Here’s where I tried to improve things: I created a “Datasources” project, which would house just the shared Data Sources. Here’s where I created my new data-sources, setting them up with saved SQL auth (login & pwd). Sure, maybe not the “best practice” but it keeps things simple — my SSRS user only ever has read-only db access, and the pwd is managed well enough to keep auditors happy.
Thus, my plan (in following the SO tip) was to have all the other projects’ Data Sources be pointers to these shared Data Source files (RDS’s). Sounds good in theory, right?
Starting to Implement
Well. I started small with just one project, one report, and one data source. I went into my Nate test project, did “add existing item” into the Reports folder, and browsed to pick the rdl that I had just downloaded from my live SSRS server. I then went to my Datasources project, copied the rds that I needed, pasted it into Nate test project’s Shared Data Sources folder.
Now at this point there are 2 copies of this file in my solution folder. We don’t want that. So I opened up trusty ol’ Notepad++ on the Nate test.rptproj file, and edited the XML node under <DataSources><ProjectItem><Name>My-Source.rds</Name><FullPath>My-Source.rds</FullPath></ProjectItem></DataSources> , changing the FullPath node value to <FullPath>..\Datasources\My-Source.rds</FullPath>. I then deleted the physical copy of the file at \MySolution\Nate test\My-Source.rds , so that only the shared one physically remains in the filesystem (\MySolution\Datasources\My-Source.rds).
Another way to accomplish this, you may read, is to right-click Shared Data Sources within your project, e.g. Nate test, and say “Add existing item..”, and go select the global shared RDL from your master Datasources project-folder. However, this still results in a physical copy of the file within Nate test folder! So in either case you’ll end up doing some editing & deletion.
With me so far? Okay, here’s where it gets interesting. I could successfully build and deploy this report to my SSRS server, verifying that it showed my “new thing” (I added a text-box that simply said “This is the new hotness!”). But I could not preview the report in VS! Sad panda. And of course, the most vague generic error message in the world:
An error occurred during local report processing
An error occurred during report processing
More Googling led to a somewhat random tip in the vein of “Go to the report properties, Open the report’s Data Source, and re-select the Shared Data Source from the dropdown”. Combine with “Edit the Shared Data Source, ensure your saved login/pwd is filled in, and Save it”. Presto! Previewing now worked.
But why is this? Well, as it turns out, it’s not all that surprising. The RDLs are stored without the saved login/pwd, because of course they’re just plain-text XML files. So to “protect you”, MSFT doesn’t store your SQL auth credentials, even if (and now matter how many times) you click “Save my password” on the connection-properties window (connection-string builder).
Great. Well, I did that once, for Nate test project. Let’s see what happens with another project (aka report folder), say Test again. So I add an existing RDL, I copy-paste the Data Source from the global shared Datasources project, manually edit the rptproj in Notepad++, delete the copied rdl, etc. I then try to preview the report… and behold, the same maddeningly vague error message!
“Well duh!” you may think to yourself, “the credentials still aren’t stored anywhere un-encrypted!”. Yes, dear read, you are correct. But do I want to right-click on this project’s Data Sources, edit it, fill in the credentials, save, rinse, repeat, for every single project?! A resounding NO!
So what should I do? I could go type in the credentials to the actual connection-strings contained within the ‘master’ rds files. Of course then they’re in plain-text for all to see… but wait, do I care? Another no. Let’s go edit some connection-strings!
Why did MSFT choose to name the XML node ConnectString in their rds schema instead of ConnectionString? Don’t ask me!
Here’s the first gotcha: My reports (RDLs) need to be re-pointed at the correct Shared Datasource, because currently, as they were downloaded from my ‘real’ SSRS server — which had my desired top-level folder layout of “Datasources”, “Nate test”, “Marketing”, etc. — their Data Sources xml node points at \Datasources\MyDatasource.rdl. This was the correct path on the SSRS server and still will be the correct path when all is said & done. BUT, to support local debugging/previewing, the build output has to make a copy of the rds in the Project’s bin\Debug folder, which it will happily do, and then just as happily proclaim Could not find a part of the path 'C:\Users\Nate\Documents\Visual Studio 2015\Projects\MyReportSolution\Nate test\bin\Debug\Datasources\MyDatasource.rdl'. — because DUH, the build system didn’t make you a ‘Datasources’ sub-folder under the Debug folder, why would it!? So by either manually editing the rdl file to remove the \Datasources\ path from the node, or by clicking on the Report’s Data Sources node in the Report Data pane and re-pointing it at the local Shard Data Source (which again, is itself a pointer to the globally shared Datasources!), you can fix this issue.
In the rdl XML, that looks like this (using strikethru to show what you remove):
And there’s another gotcha: After we do all this, if we open the rds from within VS solution-explorer, it still won’t “think” it saved the authentication! But the connection-string will be clearly visible with the plain password. Here’s a couple pictures of what that looks like.
Oh, by the way, did I mention how many times VS crashed on me while doing these gymnastics? At least a few.
So, what have we learned? Well, for one, this is a crappy situation born of poor in-product support. I should be able to configure Solution-level shared Data Sources, use them in as many Projects (within said Solution) as I want, and have VS configuration management support them; bonus points for doing so with saved & encrypted credentials. Ideally, when we check this into source-control, we’d check in the “DEV” environment flavor connection-configs. Then, when the reports get deployed to the “PROD” SSRS server, the same globally shared Data Sources are already present (and they don’t get over-written, thankfully by default!), configured by the DBA with prod credentials, and nobody in the development pipeline needs to know said credentials. Yay?
But alas. We live in an imperfect world. I also live in a world where I’m both the Report Developer, the Report Manager, and the Production DBA. So guess what I get to do? That’s right, not care about the connection-string security!
Okay, yes, I care a little. Depending on my mood. But not enough to turn my whole dang world upside-down trying to design and implement a robust multi-tiered-environment solution for one measly SSRS repository. Kudos to you if you do — you’re a better man than I.
Or “better woman”. But I’m not a woman. So I guess the correct phrasing would be “You’re a better woman than I am a man.” No, that’s not right. “You’re a better person than I am?” Sure, I guess that works.
Create the solution & projects
Create the global shared Datasources project, and create your Data Sources (RDS‘s)
Import your Reports (RDL‘s) to each requisite report-project
Manually edit your rptproj to point the data-sources at the central shared path
Manually edit your RDL‘s to remove previously established paths to the data-sources
Configure your rptproj‘s to output to the correct folders on the report server — this should be done by default, i.e. VS has the “intelligence” to guess these for you, but just double-check.
And now to go automate all this with PowerShell… right after this beer.
Anyway. Until next time, folks! If I’ve made any mistakes, missteps, or otherwise offending your technical sensibilities, and/or if you know of a better way to accomplish what I’m after, I’d love to hear from you in the comments! =)
Originally posted on David Peter Hansen: Back in the days, I used to collect a lot of different scripts, tools, and other goodies for troubleshooting SQL Server performance issues. These days, however, I tend to use what is publicly and freely available (as well as some internal stuff), and keep a list of those in my…
I don’t normally reblog. But when I do, it’s something awesome. =D
Back in the days, I used to collect a lot of different scripts, tools, and other goodies for troubleshooting SQL Server performance issues. These days, however, I tend to use what is publicly and freely available (as well as some internal stuff), and keep a list of those in my head.
I’ve meant to write that list down for a while, and today Chrissy asked:
Anyone have a list of apps, modules, tools, etc that Microsoft PFE's use? If not, can a PFE write a blog post? 😁 (Partic. interestd in SQL)
a quick tip for passing default values to an SSRS multi-value parameter during report drill-thru.
Inspired by some StackOverflow-ing and this particular answer.
Aka: “How do I pass/assign a default set of values to a multi-value parameter in SSRS?”
We often have an SSRS report that needs to “drill thru” to another report, usually to go from a “high level view” to a “lower level” or more detailed view, or sometimes just to lead the user down the path we’ve chosen as the head analyst / BI architect. And part of that report navigation involves initializing (set to defaults) the value(s) of the sub-report’s parameters, including multi-value parameters (which I’ll now refer to as mvp, even though, I know, it’s a ridiculously overused acronym). These are basically arrays, but are often represented as simply comma-delimited strings, so it’s easy to forget their true nature.
Let’s fabricate an example. In our Sales Summary report, we have an element (a textbox, image, placeholder, or whatnot) that, when clicked, should drill-thru to Sales by Person. Said next report requires a multi-select parameter (another term for the mvp) to have 1 or more values selected, otherwise it doesn’t render. We’ll call this parameter SelectedNames, with value-label pairings 1=Bob, 2=Alice, 3=Mary. When we drill-thru to this by-Person report, we want it to initially show (have selected by default) all available people.
So how do we do this? In the properties of the “clickable” element on Sales Summary, say it’s a text-box named GoToDetails, we go to the Action tab. We choose the “Go to report” action, select/specify the target report, Sales by Person, and then add the parameters we want to pass to it. For comparison, I’m going to pass a “regular” (single value) parameter called @ReportDate, as well as the mvpSelectedNames. Here’s what that all looks like, in picture form.
The single parameter pass-thru is, as you’d expect, very simple. But for our mvp, we need to use the expression-builder, that little fx button stylized to look like your old high school math class days. Here’s what that function looks like:
And presto!, we have converted a comma-delimited list into an array to pass into our Sales by Person report’s SelectedNames multi-value parameter. Now it will initially render with all 3 selected people as desired.
So there you have it, a quick tip for passing default values to an SSRS multi-value parameter during report drill-thru. But what if you wanted to pass the selected values of one mvp down to another? Or do some clever on-the-fly mapping (conversion) from one to the next? Well, stay tuned! I’ll write about that next time. =)
Thanks for reading! For a lot more on SSRS and multi-value parameters, check out these articles: @sqlchick, @mssqltips, and @msdn.
What could possibly go wrong? As it turns out, plenty.
Faithful reader(s), it’s been a while! I’ve been busy preparing for some big transitions. I’m also getting better at MDX queries, tweaking SSAS-based reports to more accurately reflect the business rules. But enough about that, on with the post!
In which we doubt the SAN
A storage area network (SAN) is a management & administration solution, not a performance solution.
-someone wiser than me
SANs are wonderful technology. They inspire all kinds of geekery and are purported to solve all your storage woes. But there’s a catch: they’re expensive. Not just as a capital expense, but in maintenance and licensing costs. And if you ever want to upgrade it, like add some more drives to a particular tier/pool — fuhgeddaboudit.
So what do we do with SQL on a SAN? Well, it has tiers, right? Slower storage with huge capacity, faster storage with less, etc. We put the data files (heavy random read workload, typically) on the pool optimized for that kind of I/O pattern. We put the TLog files (heavy sequential write workload) on the pool best suited for that. And what about good ol’ TempDB? Its access pattern is fairly unique — random writes and reads, and frequent overwrites, which means it could potentially wear out your typical prosumer SSD relatively quickly. But we’re not complete cheapskates, we’ll buy enterprise class SSDs, no?
So we go read some stuff and figure, hey, sounds like a great idea, right? Put TempDB on a local SSD, or better yet, a pair of SSDs in RAID-0 for pure performance (because this is a cluster, we’ve got HA already). We’ll reduce the load on the SAN I/O channels and make our overworked TempDB happier with lower latency and better throughput. Right?
In which we discover what could possibly go wrong.
Once the new drive(s) is(are) installed and “presented” to Windows (that’s my SysAdmin’s term), it’s fairly trivial to do the SQL configuration change — it does of course require a SQL service restart (or cluster failover). Code example, assuming your new drive is ‘T’:
alter database tempdb
modify file (name=tempdev, filename='T:\tempdb.mdf')
alter database tempdb
modify file (name=tempdb2, filename='T:\tempdb2.ndf')
You do of course have multiple TempDB data files, yes? Good.
Should we put templog (TempDB’s transaction log) on the same drive as the TempDB data files, or put it on the same storage pool as the regular DBs’ TLogs? As usual, “it depends” — ask your favorite SQL gurus and do some testing.
Back on topic
We’ve made the change, we’ve done the cluster failover. TempDB is now running on our spankin’ new SSD. So we start monitoring performance metrics. Things like file I/O stats (from SQL DMV sys.dm_io_virtual_file_stats), latency and waits (from our monitoring tools), and good ol’ PerfMon.
But wait, what’s this? I/O stalls are higher? Write latency is higher?!? Perfmon agrees?
Write latency on the TempDB files was over 10x higher than it was when they were on the SAN (the performance tier, to be clear). The file_stats DMV showed large increases in I/O stalls. Sad-trombone.
In which we have several theories
Then ensued various conversations and brainstorms among my colleagues.
Someone check the firmware/drivers!
Maybe it’s got the wrong block-size.
Well, it’s only 6Gbps SAS… maybe we should’ve sprung for the 12Gbps.
The write latencies went up by a factor of 10. I don’t think an improvement by a factor of 2 is going to win you any trophies.
Why didn’t we get an NVMe or M.2 one?
Because the damn blades don’t have those slots, goober.
Another interesting observation, and potentially the silver lining. Overall instance waits (wait stats), according to our monitoring tool, went down. That’s good news, right? Maybe. Does application performance & user experience corroborate it? Possibly! We’ll be observing the patient for another week or so.
Let’s turn to the community again to see what others have experience.
Oh wait, it might not be such a fantastic idea after all.
And by “we” I mean “me”. Being the DBA and the primary proponent of the SSD addition, because I knew our workloads were very TempDB-heavy, I had to hang-tail and admit that the SAN gods won this round.
But wait, what about the fact that our wait stats are down? What about app/user experience? Valid arguments, I agree. That’s why we’re still observing. But I’m not optimistic, given the follow-up links above. We may utilize local SSDs for something else (index filegroups?) — but if those write latencies don’t improve, I’m concerned that it won’t help anybody.
In which I ask for your help
Yes, you! If you have ideas on what we did wrong, what we’re missing, or any other advice about getting the most “bang for the buck” out of a direct attached SSD on a converged-infrastructure Cisco UCS blade server platform with a VNX SAN, by all means, drop me a line. I’m all ears.
..while there are likely even better ways to do this in the long-run, this quick & easy approach was sufficient to save me time and effort..
At this point in my career, I’m not managing massive environments, so I don’t feel the need (nor have the expertise) to use a large scale solution like DSC or SCCM. But I’ve had to install SQL Server a few times, so I figured it’s worth at least scripting out a standard pre-configured installation, so that A) I don’t need click through a GUI ‘wizard’ hearkening back to the ’90s, and B) the SysAdmins can “fire and forget” (read: stop bugging me about it).
Thus, I’m attempting to K.I.S.S., while making it configurable & repeatable. There are some limitations of this approach, as alluded above. It’s not “massively scalable” (scaleable? scale-able?) because:
The PoSh script still needs to be deployed locally to the server in question
The installer config (.ini) also lives locally (though it probably could be a UNC path, it’s just a file after all)
The script prompts you for the service account (SQL engine, Agent) credentials and the sa password using the Read-Host -AsSecureStringmethod cmdlet, so some meatbag still has to type those in. This is because we don’t have an enterprise pwd/secret-management system where I could, say, ask it for a service account credential set and tell it to embed that securely in a script without it actually being visible to me. So, while yes, they’re kept in a “vault”, it’s not query-able by anything else, so an admin still needs to copy & paste them into whatever configuration screen he’s working with at the time. Not ideal, I know, but we work with what we’ve got.
PS: Yeah, yeah, “don’t use sa, rename it or disable it; or use Windows Auth only!”. Rage, howl, fire & brimstone. I’m not going to argue about it; we can save that for another post. This environment dictates that its used during setup and then disabled later, so that’s beyond the scope of the installer config.
So yes, while there are likely even better ways to do this in the long-run, this quick & easy approach was sufficient to save me time and effort for the occasions when a new SQL box/VM needs to be spun-up.
A primer on SQL cmd-prompt installation & its arguments
And finally, twothings that I attempted to understand but ultimately failed to implement, because (apparently, at least to me), PowerShell remote-ing is a P.I.T.A.
First we need an .ini file to work with. You could either create it from scratch, or take it from an existing SQL box’s “Setup Bootstrap” folder. Example path C:\Program Files\Microsoft SQL Server\130\Setup Bootstrap\Log\20170801_073414\ConfigurationFile.ini — indicating this was an install done on 8/1/2017 at 7:34am. Right above that, at simply C:\Program Files\Microsoft SQL Server\130\Setup Bootstrap\Log\, you’ll see a Summary.txt file, which can actually come in handy while you’re testing these unattended installs and wanting to see why it failed.
The first link above, from MSFT Docs, does a pretty nice job of telling you all the things that make up this config file. You get to bypass the TOS prompt, enter service account details, specify drive letters (paths) for default data/log file locations & tempdb, slipstream update packages (UpdateSource), and even more advanced stuff like AG settings and whatnot. My example will be a simple standalone instance using the default name, so I’ll be sticking with the basics.
We can then use this file in the ConfigurationFile argument of setup.exe from the SQL Server install media. To put a little more color on that: the .ini file is really just a collection of command-line arguments to setup.exe; you could also list them all out in-line, but that would be tedious and silly. Here’s a couple major selling points of creating your own config file:
Slipstream updates (SP’s, CU’s), instead of having it go out to MSFT update servers (or *aghast* sticking with the original RTM bits, you heathen you!)
Specify drive letters / default file locations: sure, this may be considered old-hat if you’re running super slick storage, but I still find it makes management a bit easier if I know where my MDFs, LDFs, TempDB, & backups will always be.
Take advantage of 2016’s better TempDB setup options (# files, size & growth)
We will, however, keep a couple arguments out of the .ini file and instead throw them into the ArgumentList from the calling PowerShell script. Speaking of, here’s what the PowerShell script needs to do:
Prompt the operator (SysAdmin or DBA) for the SQL & Agent service account credentials, and (optionally) the sa pwd (if using it).
Fetch our install media from the central network share where we store such things (server & office ISOs, for example).
Mount said ISO to our virtual disc drive.
Run its setup.exe with the following arguments:
The config .ini file
The service & sa accounts
After it’s done, un-mount (dismount) the ISO.
Then the DBA can connect to the brand-spankin’-new running SQL instance and do other post-setup configurations as desired (i.e. set max-memory, maxDOP/CTFP, etc). And sure, those could also be done in PowerShell (thanks in no small part to the awesometeam at DbaTools), I chose not to do so in this case.
As the bloggers say, “that’s left as an exercise to the reader”.
Plus, they’re never quite as deterministic as we’d like them to be — they depend on the server’s compute resources, i.e. memory size & CPU cores, as well as estimated workload & environment tier, so it’s often a gamble in “how correct” your initial settings will be anyway. Still, anything is better than the defaults, so configure-away!
Here are the Gists I’ve created to go along with this post. If I’ve made a mistake, or if you, dear reader, have a suggestion, we can incorporate them into the gist without me having to go back and edit the blog post!
I’d love to get feedback on how you would improve this, what you might do differently, etc. Drop me a comment or a tweet!
How it lost its original clustering key is a perfect topic for this month’s T-SQL Tuesday!
As previously referenced, we had a half-billion row table (which we’ll “round up” to a billion, just for the sake of argument) that needed an index. A clustering key, to be exact. How it lost its original clustering key isn’t that interesting is a perfect topic for this month’s T-SQL Tuesday (we planned to replace it, before realizing how long it would take, and had to cancel the job after it was removed but before it could finish adding the new one). Anybody can drop an index; it takes mere milliseconds. But to create (or rebuild) an index, the SQL engine needs to touch every single row that index will include, and of course for a clustered index, that’s errverybody.
When planning an index replacement for a billion-row table, don’t just schedule the job as “drop old index, then create new index“, and expect nothing to go wrong.
Don’t try to index a billion-row table all at once. Use a smarter methodology. Unless you can literally afford to have that table be offline for several hours (depending on your hardware, and assuming you’re on Standard Edition.
Of course, some of you crazy kids with Expensive Edition are scoffing and thinking “Oh that’s nothin’!”, with your billions of rows and online index rebuilds. Well you can go back to your fancy Always Encrypted data and your terabytes of RAM and just pretend I said “trillion” instead of “billion” — maybe that’ll fit your scale a bit better. But for the rest of us mere mortals…
Anyway, since we’re peons, we can’t just go create the billion-row index without taking the table offline. So we need to come up with a way to minimize that downtime for the table, and avoid causing excessive blocking or I/O overload on the instance. As I said before, SSIS to the rescue!
I call this the “setup, dump, & swap”. Essentially we need to create an empty copy of the table, with the desired index(es), dump all the data into it, and then swap it in. There are couple ways you can do this, but it boils down to the same basic premise: It’s “better” (probably not in terms of speed, but definitely in terms of efficiency and overhead) to fill this new copy of the table & its indexes, than it is to build the desired index on the existing table.
So here’s an outline:
Script out the table (SSMS, right-click, script table, create to… or, if you use oneofthesecoolextensions, F12 or similar “get definition” shortcut) — say, if the original is MyTable, script-create & replace MyTable with MyTableCopy
Here’s a little room for choice. You could create the table in the same schema with a new name; or, you could create the table in a different schema, with the same name or another name. This will determine how you do the “swap” toward the end.
In the first case, we’d use sp_rename
In the 2nd, we’d use alter schema transfer
Both are essentially meta-data changes, but the latter could be potentially take a hair longer just because it pulls more strings; whereas the former requires taking care of dependencies ahead of time so you don’t break a schema-bound view or orphan a foreign key.
Add the desired index(es), e.g. create clustered index CX_MyTableCopy_Datestamp_ThingName on dbo.MyTableCopy (Datestamp, ThingName)
Build the SSIS task to copy the data from MyTable to MyTableCopy
Schedule said task via SQL Agent.
Use the previous tip to monitor its progress and estimate time to completion!
Once done, prepare to swap!
Again, use sp_rename, and if needed, alter schema transfer.
Likely, this will involve several renames – the constraints and other indexes can’t be named the same either, so get all that stuff renamed with an _old suffix first, then you can swap the actual tables.
Clean up after yourself and drop the old table once you verify everything’s working well and all dependencies are accounted for.
Let’s back up a bit. Why are we doing it this way? More importantly, what are the general use-cases for this kind of thing? Well as I said, it’s “better” in certain ways, than simply creating the index on the “live” T.O.U.S. It avoids locking said live table, and it has the potential to put less stress on the database and less synchronous I/O against its primary files. Finally, and most importantly, as a bulk insert operation, with configurable batch sizing, the SSIS task will put exponentially less load on the transaction log. Whereas, with the regular inline index creation, it could easily fill up the Tlog and will definitely cause performance issues.
Furthermore, swapping tables, or in more advanced cases, partitions, is a fantastic way to move massive amounts of data around with minimal impact. It also happens to be quite helpful when we have to deal with indexing such massive amounts of data.
In my case, this was actually an archive of 2016 audit-trail data, so if I’d have chose, I could have taken it out of the partitioned view it was part of, and dealt with it being offline for several hours. But because I’m a sucker for a good problem, and because I wanted to be able to semi-accurately monitor the progress to let the boss-man know when it might be done, I chose to take this route instead. It’ll come in handy again soon, I’m sure — we have several other jumbo-tron tables laying about that may need some index tuning and/or partitioning.
So, happy swapping, and don’t let those T.O.U.S.‘s get the best of you!
PS: thanks to @SQLDoubleG for hosting, and thanks in advance to the rest of the community being lenient of my “submission” being a re-purposed existing post with a few extra blurbs thrown in to bring it on-topic! =D