TSQL Tuesday 95: Big Data

This month’s party brought to you by Mr. Hammer (b|t).

mc-hammer
No, not THAT one…

I apologize in advance for all the hammertime memes.  It was just too good to pass up.  Surely he must be used to this.  Or at least not surprised by it.  =D

So, Big Data.  What is it?  Well, in simple terms, it’s the realization and acceptance of the fact that data is multi-model, multi-faceted, multi-sourced, and constantly growing.  It’s the fact that the traditional RDBMS is no longer the be-all end-all source of truth and valuable information.  It’s part of a larger ecosystem involving JSON document stores, CSV files, streaming volatile bits of data coming from random devices and user activity that loses its meaning and potential impact almost as quickly as it can be gathered and sifted and stored.

But what do we actually get out of it?  As a small-medium enterprise NOT in the software business, I have to say, not as much as the hype would have us believe.  And look, I’m not so jaded and crusty that I refuse to adapt new tech.  I Just haven’t seen a meaningful transformative business use-case for it.  Sure, we have Google Analytics telling us how our websites are doing, and someone in marketing knows something about trending our social media traffic.  Does it really help us make more money?  Heck if I know.

cease thy actions, my timepiece has indicated the necessity of mallets
Old-timey colonials can even dig it…

Here’s what I’d like to see from the thought leaders.  Give me something I can chew on — a real-world, non-hypothetical, non-frivolous, impactful use-case for adopting and implementing something like Hadoop/Spark or Azure Data Lake.  Show me how my business can realistically journey down the path of predictive analytics and what it’s going to take from our Devs, IT staff, and management to actually get there.

Because they don’t get it yet.  I have managers still worrying about how much we’re spending on a dinky little flash storage array to support the growing needs of our on-prem converged infrastructure stack.  Meanwhile the AWS bill continues to baffle, and Devs want to play with Docker and Lambda.  But we can’t seem to convince the higher-ups that they’re short-staffed on the internal-apps team, even after a minor version upgrade takes 4 hours of Ops time and half a dozen end-users doing post-mortem testing just to be sure we didn’t break anything unexpected.

I’m not here to complain.  Really.  I do want to see something amazing, something inspiring, something that shows me what Big Data truly brings to the table.  And sure, I’ve see the vendor demos; they’re all just a bit outlandish, no?  I mean, they look really cool, sure — who doesn’t want to see a chord diagram of who’s killed who is GoT? — but does that really help my business improve sales and productivity?

My point is, there’s a gap.  A chasm of misunderstanding and mis-matched expectations between what management thinks Big Data is/means, and what it takes to actually implement.  They see the pretty pictures and the fancy demos, but they don’t see the toil and sweat (or at least, in the cloud, gobs of cash) that go into building & operating the underpinnings and pipelines that drive those nice graphics.  Not to mention the fundamental issues of data quality and governance.

continue not, time for hammer it is
OK OK, last one, I swear…

So do us a favor, Big Data pundits.  Show us something real, something that “the little guy” can use to up his/her game in the market.  Something that makes a positive impact on small non-startup non-software businesses with understaffed IT & Dev teams.  But more importantly, stop glossing over the effort and resources that it takes to “do Big Data right“.  Managers and executives need to understand that it’s not magic.  And IT practitioners need to understand that it’s actually worth-while.  Because I believe you — really — that the payoff in the end is there, and is good.  But you need to convince the whole stack.


PS: I know this is a fully day late for T-SQL Tuesday, and as such, I wasn’t going to post a ping-back in the comments of the invite, but then I saw there were only 8 others, so I felt it would benefit the event if I did add my late contribution.  I’ll tweet with a modified hash-tag instead of the standard #tsql2sday, to reflect my late-ness.  Hopefully that’s a fair compromise to the community & the event’s intentions.  =)

Advertisements

SQL Server Performance Troubleshooting Free Scripts and Tools List

Originally posted on David Peter Hansen:
Back in the days, I used to collect a lot of different scripts, tools, and other goodies for troubleshooting SQL Server performance issues. These days, however, I tend to use what is publicly and freely available (as well as some internal stuff), and keep a list of those in my…

I don’t normally reblog.  But when I do, it’s something awesome. =D

David Peter Hansen

Back in the days, I used to collect a lot of different scripts, tools, and other goodies for troubleshooting SQL Server performance issues. These days, however, I tend to use what is publicly and freely available (as well as some internal stuff), and keep a list of those in my head.

I’ve meant to write that list down for a while, and today Chrissy asked:

So here it is…

Disclaimer: While I do work as a Premier Field Engineer for Microsoft, this is my list – this is not an official list from my employer. Others likely have a list that differs from mine.

Free scripts and tools from Microsoft
These are scripts and tools provided…

View original post 892 more words

Quickie: SSRS Multi-Value Parameter Defaults

a quick tip for passing default values to an SSRS multi-value parameter during report drill-thru.

Inspired by some StackOverflow-ing and this particular answer.

Aka: “How do I pass/assign a default set of values to a multi-value parameter in SSRS?”

We often have an SSRS report that needs to “drill thru” to another report, usually to go from a “high level view” to a “lower level” or more detailed view, or sometimes just to lead the user down the path we’ve chosen as the head analyst / BI architect.  And part of that report navigation involves initializing (set to defaults) the value(s) of the sub-report’s parameters, including multi-value parameters (which I’ll now refer to as mvp, even though, I know, it’s a ridiculously overused acronym).  These are basically arrays, but are often represented as simply comma-delimited strings, so it’s easy to forget their true nature.

beware the array in string's clothing
I see you there!

Let’s fabricate an example.  In our Sales Summary report, we have an element (a textbox, image, placeholder, or whatnot) that, when clicked, should drill-thru to Sales by Person.  Said next report requires a multi-select parameter (another term for the mvp) to have 1 or more values selected, otherwise it doesn’t render.  We’ll call this parameter SelectedNames, with value-label pairings 1=Bob, 2=Alice, 3=Mary.  When we drill-thru to this by-Person report, we want it to initially show (have selected by default) all available people.

So how do we do this?  In the properties of the “clickable” element on Sales Summary, say it’s a text-box named GoToDetails, we go to the Action tab.  We choose the “Go to report” action, select/specify the target report, Sales by Person, and then add the parameters we want to pass to it.  For comparison, I’m going to pass a “regular” (single value) parameter called @ReportDate, as well as the mvp SelectedNames.  Here’s what that all looks like, in picture form.

text box properties > actions > go to report > specify report > add parameters > enter expression
step by step

The single parameter pass-thru is, as you’d expect, very simple.  But for our mvp, we need to use the expression-builder, that little fx button stylized to look like your old high school math class days.  Here’s what that function looks like:

=Split("1,2,3", ",")

And presto!, we have converted a comma-delimited list into an array to pass into our Sales by Person report’s SelectedNames multi-value parameter.  Now it will initially render with all 3 selected people as desired.

So there you have it, a quick tip for passing default values to an SSRS multi-value parameter during report drill-thru.  But what if you wanted to pass the selected values of one mvp down to another?  Or do some clever on-the-fly mapping (conversion) from one to the next?  Well, stay tuned!  I’ll write about that next time.  =)

Thanks for reading!  For a lot more on SSRS and multi-value parameters, check out these articles: @sqlchick, @mssqltips, and @msdn.

you the real mvp meme
Well, not really, see, because MVP is now completely overused and diluted to the point that it’s nearly lost all meaning, so… but yeah, you’re cool. And stuff.

PowerShell and BITS

this is about using PowerShell and the BITS framework to copy very large files across servers…

Welcome back!  This month’s topic is PowerShell — thanks to one of our prominently bearded community members.  PowerShell is a fantastic tool in the IT professional’s toolbelt.  I was first introduced to it somewhere in 2014 or 2015 by a colleague, and started making much heavier use of it when my career took me to a new & bigger environment.

Actually, funny side-story.  I remember seeing one of the very early incarnations of PowerShell, or what would eventually evolve into it, in college.  A graphics programming course, of all things, had a MS partner come in to show us this “cool new” Windows command-shell thing (different and separate from the DOS-style CMD, obviously), where he demonstrated fetching some data from the filesystem, feeding it into a CSV, and doing some kind of super-basic analysis on it to show in a “report” (which was really just another text file).  This was 2005-2006, so I couldn’t say what it was specifically, though I seem to remember something about the word “Longhorn”.  Although, reading up on some of the Wiki-history, it seems more likely that it was a Monad beta.

that is so four score and seven years ago
Don’t ask me why hes wearing horn-rims. I don’t know.

Preamble

Anyway, back on topic.  Today’s post is pretty simplistic in comparison to what most people may be writing about.  But I’ve already blogged about doing hands-off SQL installation with PowerShell & CLI, and this was another thing kicking-around the back of my mind.  So this is about using PowerShell and the BITS framework (*-BitsTransfer cmdlets) to copy very large files across servers.  Specifically, database backups.  Because let’s face it, they can be really large.  And if you’re faced with fetching them off a PROD box, you want to minimize the impact on that box’s resources.

Now sure, there are other ways – xcopy or robocopy with the /J flag (un-buffered IO), or fancy GUI tools.  And in an ideal world your backups would be written to a network share that’s not a local drive on the PROD SQL server, right?  Right…

Oh, and one more thing.  You need to enable BITS via the Windows Features console — search “features” in your Start menu and it should come up as Turn Windows features on or off (Control Panel) .  On a server, it’s under the Server Role “Web Server (IIS)”, feature “Background Intelligent Transfer Service (BITS)”.  Underneath there are 2 sub-feature choices, “IIS Server Extension” and “Compact Server”.  Honestly I don’t know which is preferable, but I left it with the default selection, the first (former).  It should go without saying, but don’t do this in production (unless you have the blessing of your SysAdmins).

But Why?

Why BITS?  Well, as per the Docs, it has the following 3 key features (emphasis mine):

  • Asynchronously transfer files in the foreground or background.
  • Preserve the responsiveness of other network applications.
  • Automatically resume file transfers after network disconnects and computer restarts.

Wow, nifty!  So it doesn’t hog the network, and it’s resumable (resume-able?) in case of connectivity hiccups.  Pretty sweet, no?  Also, it can run asynchronously in the background, which means it won’t hog your storage bandwidth or compute resources.

async all the things!
Because we can.

Let’s See an Example

Most of the guts and inspiration for this came from this article over on “Windows OS Hub” (woshub, a somewhat unfortunate sounding acronym, but certainly not as bad as some!).  The datePattern nonsense is just to make it “dynamic” in the sense that, if you have a backup scheme like me, with Sunday FULLs, daily DIFFs, and obviously TLogs in some every-X-minutes fashion, you’ll usually want the latest set of FULLs and DIFFs.  But you could easily tweak this, make it more point-in-time aware or whatever, as needed.

So, here’s a bit of a talk-thru outline, and then I’ll just link the Gist.

  1. Get the list of files we want to copy, from “source”
  2. For each file:
    1. Make sure it doesn’t exist in the “destination”
    2. If not, start a BITS transfer job (saving said job to a variable for checking/finishing later)
    3. While said BITS job is pending, print a progress message and sleep for some seconds
    4. Finish (“complete”) said job and move on to the next file
  3. Conclude with a message about how much work we just did!
  4. Repeat steps 1-3 for another “set of files” (list) if desired

And without further ado, the code.

The Catch

There are some downsides here.  First, you cannot use BITS in a non-interactive mode, i.e. inside a Scheduled Task as a User that’s not logged-in.  This is because it’s a “desktop”-oriented feature, not a “server” one.  Second, I’ve never been able to get multiple transfers going at once — or at least, multiple PoSh scripts which use BITS transfers.  This could very well be my fault, but it does seem like the BITS jobs are “serial” in nature, i.e. one must finish before the next one can start.  Again, not the expert, just observing what I found during my experiments.

parallel vs serial ports on old computer
Obviously, serial won out in the end (specifically, his superstar protege, USB), but you gotta hand it to parallel, he had a good run.

Conclusion

BITS transfer is an interesting method for copying extra-large files around your environment with low overhead.  PowerShell makes it easily accessible and lets you wrap it up in loops and checks so you can effectively build a progress-indicative, predictable and reproducible method for copying a whole SQL server’s set of backups from one place to another.

What cool little things have you discovered using PowerShell?  Let me know!  Thanks for reading.

Quickie: TempDB on local SSD

What could possibly go wrong? As it turns out, plenty.

Faithful reader(s), it’s been a while!  I’ve been busy preparing for some big transitions.  I’m also getting better at MDX queries, tweaking SSAS-based reports to more accurately reflect the business rules.  But enough about that, on with the post!

In which we doubt the SAN

A storage area network (SAN) is a management & administration solution, not a performance solution.

-someone wiser than me

SANs are wonderful technology.  They inspire all kinds of geekery and are purported to solve all your storage woes.  But there’s a catch: they’re expensive.  Not just as a capital expense, but in maintenance and licensing costs.  And if you ever want to upgrade it, like add some more drives to a particular tier/pool — fuhgeddaboudit.

So what do we do with SQL on a SAN?  Well, it has tiers, right?  Slower storage with huge capacity, faster storage with less, etc.  We put the data files (heavy random read workload, typically) on the pool optimized for that kind of I/O pattern.  We put the TLog files (heavy sequential write workload) on the pool best suited for that.  And what about good ol’ TempDB?  Its access pattern is fairly unique — random writes and reads, and frequent overwrites, which means it could potentially wear out your typical prosumer SSD relatively quickly.  But we’re not complete cheapskates, we’ll buy enterprise class SSDs, no?

So we go read some stuff and figure, hey, sounds like a great idea, right?  Put TempDB on a local SSD, or better yet, a pair of SSDs in RAID-0 for pure performance (because this is a cluster, we’ve got HA already).  We’ll reduce the load on the SAN I/O channels and make our overworked TempDB happier with lower latency and better throughput.  Right?

what could possibly go wrong
ooh, sparkly!

 

In which we discover what could possibly go wrong.

Once the new drive(s) is(are) installed and “presented” to Windows (that’s my SysAdmin’s term), it’s fairly trivial to do the SQL configuration change — it does of course require a SQL service restart (or cluster failover).  Code example, assuming your new drive is ‘T’:

use master;
alter database tempdb
modify file (name=tempdev, filename='T:\tempdb.mdf')
alter database tempdb
modify file (name=tempdb2, filename='T:\tempdb2.ndf')
--etc...

You do of course have multiple TempDB data files, yes?  Good.

Side-debate

Should we put templog (TempDB’s transaction log) on the same drive as the TempDB data files, or put it on the same storage pool as the regular DBs’ TLogs?  As usual, “it depends” — ask your favorite SQL gurus and do some testing.

Back on topic

We’ve made the change, we’ve done the cluster failover.  TempDB is now running on our spankin’ new SSD.  So we start monitoring performance metrics.  Things like file I/O stats (from SQL DMV sys.dm_io_virtual_file_stats), latency and waits (from our monitoring tools), and good ol’ PerfMon.

But wait, what’s this?  I/O stalls are higher?  Write latency is higher?!?  Perfmon agrees?

nooooooooooooo-vader
why god why?!?

Write latency on the TempDB files was over 10x higher than it was when they were on the SAN (the performance tier, to be clear).  The file_stats DMV showed large increases in I/O stalls.  Sad-trombone.

 

In which we have several theories

Then ensued various conversations and brainstorms among my colleagues.

Someone check the firmware/drivers!

It’s up-to-date.

Maybe it’s got the wrong block-size.

Nope, 64k.

Well, it’s only 6Gbps SAS… maybe we should’ve sprung for the 12Gbps.

The write latencies went up by a factor of 10.  I don’t think an improvement by a factor of 2 is going to win you any trophies.

Why didn’t we get an NVMe or M.2 one?

Because the damn blades don’t have those slots, goober.

Another interesting observation, and potentially the silver lining.  Overall instance waits (wait stats), according to our monitoring tool, went down.  That’s good news, right?  Maybe.  Does application performance & user experience corroborate it?  Possibly!  We’ll be observing the patient for another week or so.

Let’s turn to the community again to see what others have experience.

In which we eat some crow

And by “we” I mean “me”.  Being the DBA and the primary proponent of the SSD addition, because I knew our workloads were very TempDB-heavy, I had to hang-tail and admit that the SAN gods won this round.

Maybe.

But wait, what about the fact that our wait stats are down?  What about app/user experience?  Valid arguments, I agree.  That’s why we’re still observing.  But I’m not optimistic, given the follow-up links above.  We may utilize local SSDs for something else (index filegroups?) — but if those write latencies don’t improve, I’m concerned that it won’t help anybody.

keep calm because only time will tell
And beer. Lots of beer.

In which I ask for your help

Yes, you!  If you have ideas on what we did wrong, what we’re missing, or any other advice about getting the most “bang for the buck” out of a direct attached SSD on a converged-infrastructure Cisco UCS blade server platform with a VNX SAN, by all means, drop me a line.  I’m all ears.

Automating SQL Installation

..while there are likely even better ways to do this in the long-run, this quick & easy approach was sufficient to save me time and effort..

At this point in my career, I’m not managing massive environments, so I don’t feel the need (nor have the expertise) to use a large scale solution like DSC or SCCM.  But I’ve had to install SQL Server a few times, so I figured it’s worth at least scripting out a standard pre-configured installation, so that A) I don’t need click through a GUI ‘wizard’ hearkening back to the ’90s, and B) the SysAdmins can “fire and forget” (read: stop bugging me about it).

keep it simple stupid
the patented one-eyebrow-raise..

The Disclaimer

Thus, I’m attempting to K.I.S.S., while making it configurable & repeatable.  There are some limitations of this approach, as alluded above.  It’s not “massively scalable” (scaleable? scale-able?) because:

  1. The PoSh script still needs to be deployed locally to the server in question
  2. The installer config (.ini) also lives locally (though it probably could be a UNC path, it’s just a file after all)
  3. The script prompts you for the service account (SQL engine, Agent) credentials and the sa password using the Read-Host -AsSecureString method cmdlet, so some meatbag still has to type those in.  This is because we don’t have an enterprise pwd/secret-management system where I could, say, ask it for a service account credential set and tell it to embed that securely in a script without it actually being visible to me.  So, while yes, they’re kept in a “vault”, it’s not query-able by anything else, so an admin still needs to copy & paste them into whatever configuration screen he’s working with at the time.  Not ideal, I know, but we work with what we’ve got.

PS:  Yeah, yeah, “don’t use sa, rename it or disable it; or use Windows Auth only!”.  Rage, howl, fire & brimstone.  I’m not going to argue about it; we can save that for another post.  This environment dictates that its used during setup and then disabled later, so that’s beyond the scope of the installer config.

So yes, while there are likely even better ways to do this in the long-run, this quick & easy approach was sufficient to save me time and effort for the occasions when a new SQL box/VM needs to be spun-up.

Useful links

  1. A primer on SQL cmd-prompt installation & its arguments
  2. A couple community articles on the subject (the latter about slipstreaming updates)
  3. A technet article & couple Q&A threads (technet, stackoverflow) that helped me figure out how to securely get & put the credentials
  4. An example for mounting an ISO in PowerShell
  5. And finally, two things that I attempted to understand but ultimately failed to implement, because (apparently, at least to me), PowerShell remote-ing is a P.I.T.A.
config.ini, to command prompt, to PowerShell
3 steps toward a better workflow

The Outline

First we need an .ini file to work with.  You could either create it from scratch, or take it from an existing SQL box’s “Setup Bootstrap” folder.  Example path C:\Program Files\Microsoft SQL Server\130\Setup Bootstrap\Log\20170801_073414\ConfigurationFile.ini​  — indicating this was an install done on 8/1/2017 at 7:34am.  Right above that, at simply C:\Program Files\Microsoft SQL Server\130\Setup Bootstrap\Log\, you’ll see a Summary.txt file, which can actually come in handy while you’re testing these unattended installs and wanting to see why it failed.

The first link above, from MSFT Docs, does a pretty nice job of telling you all the things that make up this config file.  You get to bypass the TOS prompt, enter service account details, specify drive letters (paths) for default data/log file locations & tempdb, slipstream update packages (UpdateSource​), and even more advanced stuff like AG settings and whatnot.  My example will be a simple standalone instance using the default name, so I’ll be sticking with the basics.

We can then use this file in the ConfigurationFile argument of setup.exe from the SQL Server install media.  To put a little more color on that: the .ini file is really just a collection of command-line arguments to setup.exe​; you could also list them all out in-line, but that would be tedious and silly.  Here’s a couple major selling points of creating your own config file:

  1. Slipstream updates (SP’s, CU’s), instead of having it go out to MSFT update servers (or *aghast* sticking with the original RTM bits, you heathen you!)
  2. Specify drive letters / default file locations: sure, this may be considered old-hat if you’re running super slick storage, but I still find it makes management a bit easier if I know where my MDFs, LDFs, TempDB, & backups will always be.
  3. Take advantage of 2016’s better TempDB setup options (# files, size & growth)

We will, however, keep a couple arguments out of the .ini file and instead throw them into the ArgumentList from the calling PowerShell script.  Speaking of, here’s what the PowerShell script needs to do:

  1. Prompt the operator (SysAdmin or DBA) for the SQL & Agent service account credentials, and (optionally) the sa pwd (if using it).
  2. Fetch our install media from the central network share where we store such things (server & office ISO​s, for example).
  3. Mount said ISO to our virtual disc drive.
  4. Run its setup.exe with the following arguments:
    1. The config .ini file
    2. The service & sa accounts
  5. After it’s done, un-mount (dismount) the ISO.

Then the DBA can connect to the brand-spankin’-new running SQL instance and do other post-setup configurations as desired (i.e. set max-memory, maxDOP/CTFP, etc).  And sure, those could also be done in PowerShell (thanks in no small part to the awesome team at DbaTools), I chose not to do so in this case.

As the bloggers say, “that’s left as an exercise to the reader”.

Plus, they’re never quite as deterministic as we’d like them to be — they depend on the server’s compute resources, i.e. memory size & CPU cores, as well as estimated workload & environment tier, so it’s often a gamble in “how correct” your initial settings will be anyway.  Still, anything is better than the defaults, so configure-away!

husky puppies sharing
because sharing is caring!

The Code

Here are the Gists I’ve created to go along with this post.  If I’ve made a mistake, or if you, dear reader, have a suggestion, we can incorporate them into the gist without me having to go back and edit the blog post!

Yay technology!

I’d love to get feedback on how you would improve this, what you might do differently, etc.  Drop me a comment or a tweet!

Config/INI file:

PowerShell install script:

Indexing a HUGE Table

How it lost its original clustering key is a perfect topic for this month’s T-SQL Tuesday!

tsql2sday150x150

As previously referenced, we had a half-billion row table (which we’ll “round up” to a billion, just for the sake of argument) that needed an index.  A clustering key, to be exact.  How it lost its original clustering key isn’t that interesting is a perfect topic for this month’s T-SQL Tuesday (we planned to replace it, before realizing how long it would take, and had to cancel the job after it was removed but before it could finish adding the new one).  Anybody can drop an index; it takes mere milliseconds.  But to create (or rebuild) an index, the SQL engine needs to touch every single row that index will include, and of course for a clustered index, that’s errverybody.

So the “Lessons learned the hard way” are:

When planning an index replacement for a billion-row table, don’t just schedule the job as “drop old index, then create new index“, and expect nothing to go wrong.

..and..

Don’t try to index a billion-row table all at once.  Use a smarter methodology.  Unless you can literally afford to have that table be offline for several hours (depending on your hardware, and assuming you’re on Standard Edition.

Of course, some of you crazy kids with Expensive Edition are scoffing and thinking “Oh that’s nothin’!”, with your billions of rows and online index rebuilds.  Well you can go back to your fancy Always Encrypted data and your terabytes of RAM and just pretend I said “trillion” instead of “billion” — maybe that’ll fit your scale a bit better.  But for the rest of us mere mortals…

one billion records dr evil
it’s yuuge, I’m telling you…

Anyway, since we’re peons, we can’t just go create the billion-row index without taking the table offline.  So we need to come up with a way to minimize that downtime for the table, and avoid causing excessive blocking or I/O overload on the instance.  As I said before, SSIS to the rescue!

The Plan

i love it when a plan comes together
classic… plain and simple.

I call this the “setup, dump, & swap”.  Essentially we need to create an empty copy of the table, with the desired index(es), dump all the data into it, and then swap it in.  There are couple ways you can do this, but it boils down to the same basic premise: It’s “better” (probably not in terms of speed, but definitely in terms of efficiency and overhead) to fill this new copy of the table & its indexes, than it is to build the desired index on the existing table.

So here’s an outline:

  1. Script out the table (SSMS, right-click, script table, create to… or, if you use one of these cool extensions, F12 or similar “get definition” shortcut) — say, if the original is MyTable, script-create & replace MyTable with MyTableCopy
  2. Here’s a little room for choice.  You could create the table in the same schema with a new name; or, you could create the table in a different schema, with the same name or another name.  This will determine how you do the “swap” toward the end.
    • In the first case, we’d use sp_rename
    • In the 2nd, we’d use alter schema transfer
    • Both are essentially meta-data changes, but the latter could be potentially take a hair longer just because it pulls more strings; whereas the former requires taking care of dependencies ahead of time so you don’t break a schema-bound view or orphan a foreign key.
  3. Add the desired index(es), e.g. create clustered index CX_MyTableCopy_Datestamp_ThingName on dbo.MyTableCopy (Datestamp, ThingName)
  4. Build the SSIS task to copy the data from MyTable to MyTableCopy
  5. Schedule said task via SQL Agent.
  6. Use the previous tip to monitor its progress and estimate time to completion!
  7. Once done, prepare to swap!
    • Again, use sp_rename, and if needed, alter schema transfer.
    • Likely, this will involve several renames – the constraints and other indexes can’t be named the same either, so get all that stuff renamed with an _old suffix first, then you can swap the actual tables.
  8. Clean up after yourself and drop the old table once you verify everything’s working well and all dependencies are accounted for.

But Why?

can-you-tell-me-why
Fry shares your curiosity…

Let’s back up a bit.  Why are we doing it this way?  More importantly, what are the general use-cases for this kind of thing?  Well as I said, it’s “better” in certain ways, than simply creating the index on the “live” T.O.U.S.  It avoids locking said live table, and it has the potential to put less stress on the database and less synchronous I/O against its primary files.  Finally, and most importantly, as a bulk insert operation, with configurable batch sizing, the SSIS task will put exponentially less load on the transaction log.  Whereas, with the regular inline index creation, it could easily fill up the Tlog and will definitely cause performance issues.

Furthermore, swapping tables, or in more advanced cases, partitions, is a fantastic way to move massive amounts of data around with minimal impact.  It also happens to be quite helpful when we have to deal with indexing such massive amounts of data.

In my case, this was actually an archive of 2016 audit-trail data, so if I’d have chose, I could have taken it out of the partitioned view it was part of, and dealt with it being offline for several hours.  But because I’m a sucker for a good problem, and because I wanted to be able to semi-accurately monitor the progress to let the boss-man know when it might be done, I chose to take this route instead.  It’ll come in handy again soon, I’m sure — we have several other jumbo-tron tables laying about that may need some index tuning and/or partitioning.

So, happy swapping, and don’t let those T.O.U.S.‘s get the best of you!

rodent of unusual size
Meh, I don’t think they even exist!

 

PS: thanks to @SQLDoubleG for hosting, and thanks in advance to the rest of the community being lenient of my “submission” being a re-purposed existing post with a few extra blurbs thrown in to bring it on-topic!  =D