This month’s invitation is brought to you by Ewald Cress (blog, twitter), who I already like based on his tagline —
finds joy in minutiae..
Yes, my friend, don’t we all.
The topic at hand is fairly non-technical, but still important: folks who have made a positive contribution to your career or professional development. So it’s time for a shout-out! About a year ago, I wrote about my first major career move. There were several great influences in my first job, from the developers that taught me how to code, to the DBA who taught me how to keep cool & calm in the face of outages, to the boss who taught me the importance of time management and breadth of knowledge.
Since I am way too late in posting this, and I don’t feel like waxing poetic, I’ll just say a general “thank you” to all those who’ve helped me along in my career so far, with special acknowledgement to my former boss, my current boss, the SQL family, and my own family. Happy belated Thanksgiving and have a safe & pleasant holiday season! I’ll have a real post again quite soon, diving back into the tech stuff.
This month’s party brought to you by Mr. Hammer (b|t).
I apologize in advance for all the hammertime memes. It was just too good to pass up. Surely he must be used to this. Or at least not surprised by it. =D
So, Big Data. What is it? Well, in simple terms, it’s the realization and acceptance of the fact that data is multi-model, multi-faceted, multi-sourced, and constantly growing. It’s the fact that the traditional RDBMS is no longer the be-all end-all source of truth and valuable information. It’s part of a larger ecosystem involving JSON document stores, CSV files, streaming volatile bits of data coming from random devices and user activity that loses its meaning and potential impact almost as quickly as it can be gathered and sifted and stored.
But what do we actually get out of it? As a small-medium enterprise NOT in the software business, I have to say, not as much as the hype would have us believe. And look, I’m not so jaded and crusty that I refuse to adapt new tech. I Just haven’t seen a meaningful transformative business use-case for it. Sure, we have Google Analytics telling us how our websites are doing, and someone in marketing knows something about trending our social media traffic. Does it really help us make more money? Heck if I know.
Here’s what I’d like to see from the thought leaders. Give me something I can chew on — a real-world, non-hypothetical, non-frivolous, impactful use-case for adopting and implementing something like Hadoop/Spark or Azure Data Lake. Show me how my business can realistically journey down the path of predictive analytics and what it’s going to take from our Devs, IT staff, and management to actually get there.
Because they don’t get it yet. I have managers still worrying about how much we’re spending on a dinky little flash storage array to support the growing needs of our on-prem converged infrastructure stack. Meanwhile the AWS bill continues to baffle, and Devs want to play with Docker and Lambda. But we can’t seem to convince the higher-ups that they’re short-staffed on the internal-apps team, even after a minor version upgrade takes 4 hours of Ops time and half a dozen end-users doing post-mortem testing just to be sure we didn’t break anything unexpected.
I’m not here to complain. Really. I do want to see something amazing, something inspiring, something that shows me what Big Data truly brings to the table. And sure, I’ve see the vendor demos; they’re all just a bit outlandish, no? I mean, they look really cool, sure — who doesn’t want to see a chord diagram of who’s killed who is GoT? — but does that really help my business improve sales and productivity?
My point is, there’s a gap. A chasm of misunderstanding and mis-matched expectations between what management thinks Big Data is/means, and what it takes to actually implement. They see the pretty pictures and the fancy demos, but they don’t see the toil and sweat (or at least, in the cloud, gobs of cash) that go into building & operating the underpinnings and pipelines that drive those nice graphics. Not to mention the fundamental issues of data quality and governance.
So do us a favor, Big Data pundits. Show us something real, something that “the little guy” can use to up his/her game in the market. Something that makes a positive impact on small non-startup non-software businesses with understaffed IT & Dev teams. But more importantly, stop glossing over the effort and resources that it takes to “do Big Data right“. Managers and executives need to understand that it’s not magic. And IT practitioners need to understand that it’s actually worth-while. Because I believe you — really — that the payoff in the end is there, and is good. But you need to convince the whole stack.
PS: I know this is a fully day late for T-SQL Tuesday, and as such, I wasn’t going to post a ping-back in the comments of the invite, but then I saw there were only 8 others, so I felt it would benefit the event if I did add my late contribution. I’ll tweet with a modified hash-tag instead of the standard #tsql2sday, to reflect my late-ness. Hopefully that’s a fair compromise to the community & the event’s intentions. =)
Originally posted on David Peter Hansen: Back in the days, I used to collect a lot of different scripts, tools, and other goodies for troubleshooting SQL Server performance issues. These days, however, I tend to use what is publicly and freely available (as well as some internal stuff), and keep a list of those in my…
I don’t normally reblog. But when I do, it’s something awesome. =D
Back in the days, I used to collect a lot of different scripts, tools, and other goodies for troubleshooting SQL Server performance issues. These days, however, I tend to use what is publicly and freely available (as well as some internal stuff), and keep a list of those in my head.
I’ve meant to write that list down for a while, and today Chrissy asked:
Anyone have a list of apps, modules, tools, etc that Microsoft PFE's use? If not, can a PFE write a blog post? 😁 (Partic. interestd in SQL)
It’s time for a more thought-y, less tech-y post. Which is mostly my excuse for not wanting to write a bunch of code at the moment. But that’s how I started this blog, with mostly opinion pieces, trying to offer some critical thinking on how DBAs and Developers work together. So y’all better like it!
Today’s title is brought to you by Don Henley’s tune of the same name, which is now stuck in my head, thankyouverymuch.
This is about data quality. When you have “dirty data”, just like dirty laundry, and you let it sit unattended, it starts to smell. In software, this means the “badness” seeps into other areas of the environment, affecting systems and business processes that should otherwise function smoothly.
A code smell is a surface indication that usually corresponds to a deeper problem in the system.
And, more aptly:
Data quality is corporate America’s dirty little secret.
But what is dirty data? Generally, it’s anything that doesn’t quite fit the ideal data model — that perfect vision of how all the bits of information in the system fit together, the shape of each data entity and how they relate to each other. Mostly, dirty data is what happens when you allow users to type things into text-boxes, and you write those text-box contents straight into the database without any layers of validation or cleansing. (Coincidentally, that’s also how SQL injection happens, but most of us have been scared-straight by enough years of security bloggers hammering at our thick skulls — and our favorite XKCD — that we at least sanitize our inputs before dumping them to an INSERT statement.)
Let me take a recent example from my experience. We have an ERP system that doubles as our CRM system (which is already a pair of bad idea jeans). How do you think customer information gets into the database? Customer Service Reps, typing stuff. Usually by copying from a paper form. Or the customers themselves, using an online form. But guess what doesn’t happen in either case? If you said “USPS address validation“, give yourself a hand!
Now, being that this system is our “source of truth” for customer info, it stands to reason that lots of other business functions & processes depend on it. For example, let’s say we send a promotional calendar to our customers of a certain “subscription level” on a yearly basis. We’re not in the publishing business, so we contract this out to another company. But guess what they need from us in order to complete the job and mail out those calendars? Addresses! So what happens when there’s a bad address in our database? A calendar gets returned, wasted cost and materials. Multiply that by a couple thousand and you start to turn a few heads in the C-suite.
Later, around the Marketing table, someone has a brilliant idea that they need to run a mail-merge to send out a gift-package to the top 100 customers. So they ask the DBA for a list of said customers. “Sure! Here ya go, here’s a report.” And then the complaints start coming in.
“These customers aren’t active anymore.”
Then tell your CS reps to mark them as inactive in the system. But no, we don’t do that, we just write “inactive” in the FirstName field.
“These ones are employees.”
Fine, figure out a special indicator to add for that, so I can exclude them from the report. But no, of course, we can’t do that either; we just put “deactivated” in the FirstName field.
“This guys is dead.”
Yeah, not even kidding. Apparently the powers-that-be decided to keep his info in the system, but type in “deceased” to the “Address 2” line (in the US, this is customarily the apartment/suite/unit number).
But mostly, the biggest complaint is that we’re getting un-deliverable/return-to-sender when we try shipping out to some of these addresses. And why? Because they’re not subject to any external validation and quality-control.
So what’s the data professional’s responsibility in this? In my opinion, it’s to advocate for data quality. There are obviously big vendors out there like Melissa Data who will sell you a service to help get you there. APIs abound, from USPS and other official sources, so building it isn’t out of the question.
One potential roadblock is, as usual, conservatism. The business’s ERP system is its life-blood, highly sensitive to change and very guarded by over-protective management and finicky executives. But the smelly dirty data-laundry continues to cause problems and has real-money impacts on corp. efficiency and profit. Unfortunately, many people tend to take the ostrich approach.
So, my good people, start “doing your laundry”. Have those conversations with your teams and managers about the current state of your data quality, and what it’s going to look like moving forward. Make some plans, have a road-map, and understand that it’s going to involve a lot of collaboration between key players. And good luck!
What could possibly go wrong? As it turns out, plenty.
Faithful reader(s), it’s been a while! I’ve been busy preparing for some big transitions. I’m also getting better at MDX queries, tweaking SSAS-based reports to more accurately reflect the business rules. But enough about that, on with the post!
In which we doubt the SAN
A storage area network (SAN) is a management & administration solution, not a performance solution.
-someone wiser than me
SANs are wonderful technology. They inspire all kinds of geekery and are purported to solve all your storage woes. But there’s a catch: they’re expensive. Not just as a capital expense, but in maintenance and licensing costs. And if you ever want to upgrade it, like add some more drives to a particular tier/pool — fuhgeddaboudit.
So what do we do with SQL on a SAN? Well, it has tiers, right? Slower storage with huge capacity, faster storage with less, etc. We put the data files (heavy random read workload, typically) on the pool optimized for that kind of I/O pattern. We put the TLog files (heavy sequential write workload) on the pool best suited for that. And what about good ol’ TempDB? Its access pattern is fairly unique — random writes and reads, and frequent overwrites, which means it could potentially wear out your typical prosumer SSD relatively quickly. But we’re not complete cheapskates, we’ll buy enterprise class SSDs, no?
So we go read some stuff and figure, hey, sounds like a great idea, right? Put TempDB on a local SSD, or better yet, a pair of SSDs in RAID-0 for pure performance (because this is a cluster, we’ve got HA already). We’ll reduce the load on the SAN I/O channels and make our overworked TempDB happier with lower latency and better throughput. Right?
In which we discover what could possibly go wrong.
Once the new drive(s) is(are) installed and “presented” to Windows (that’s my SysAdmin’s term), it’s fairly trivial to do the SQL configuration change — it does of course require a SQL service restart (or cluster failover). Code example, assuming your new drive is ‘T’:
alter database tempdb
modify file (name=tempdev, filename='T:\tempdb.mdf')
alter database tempdb
modify file (name=tempdb2, filename='T:\tempdb2.ndf')
You do of course have multiple TempDB data files, yes? Good.
Should we put templog (TempDB’s transaction log) on the same drive as the TempDB data files, or put it on the same storage pool as the regular DBs’ TLogs? As usual, “it depends” — ask your favorite SQL gurus and do some testing.
Back on topic
We’ve made the change, we’ve done the cluster failover. TempDB is now running on our spankin’ new SSD. So we start monitoring performance metrics. Things like file I/O stats (from SQL DMV sys.dm_io_virtual_file_stats), latency and waits (from our monitoring tools), and good ol’ PerfMon.
But wait, what’s this? I/O stalls are higher? Write latency is higher?!? Perfmon agrees?
Write latency on the TempDB files was over 10x higher than it was when they were on the SAN (the performance tier, to be clear). The file_stats DMV showed large increases in I/O stalls. Sad-trombone.
In which we have several theories
Then ensued various conversations and brainstorms among my colleagues.
Someone check the firmware/drivers!
Maybe it’s got the wrong block-size.
Well, it’s only 6Gbps SAS… maybe we should’ve sprung for the 12Gbps.
The write latencies went up by a factor of 10. I don’t think an improvement by a factor of 2 is going to win you any trophies.
Why didn’t we get an NVMe or M.2 one?
Because the damn blades don’t have those slots, goober.
Another interesting observation, and potentially the silver lining. Overall instance waits (wait stats), according to our monitoring tool, went down. That’s good news, right? Maybe. Does application performance & user experience corroborate it? Possibly! We’ll be observing the patient for another week or so.
Let’s turn to the community again to see what others have experience.
Oh wait, it might not be such a fantastic idea after all.
And by “we” I mean “me”. Being the DBA and the primary proponent of the SSD addition, because I knew our workloads were very TempDB-heavy, I had to hang-tail and admit that the SAN gods won this round.
But wait, what about the fact that our wait stats are down? What about app/user experience? Valid arguments, I agree. That’s why we’re still observing. But I’m not optimistic, given the follow-up links above. We may utilize local SSDs for something else (index filegroups?) — but if those write latencies don’t improve, I’m concerned that it won’t help anybody.
In which I ask for your help
Yes, you! If you have ideas on what we did wrong, what we’re missing, or any other advice about getting the most “bang for the buck” out of a direct attached SSD on a converged-infrastructure Cisco UCS blade server platform with a VNX SAN, by all means, drop me a line. I’m all ears.
The problem isn’t so much that the role is vaguely defined. Although, depending on the size of the IT org and the tech stack, it can vary widely from a jack-of-all (DB Dev, report writer, production ops, the works) to a highly specialized performance tuner who works with 7 other teammates, each of whom has a unique surgical specialty in the data platform. But that’s not the problem — well, not the main problem. It is a problem in the sense that the business folks, especially HR, are notoriously and astonishingly ignorant of what a DBA or related role actually involves. But you get past that once you start talking to the tech leads and IT directors.
No, the problem is that, like most higher level technical roles, you don’t really know how a candidate is going to function in it (the role) without actually seeing him or her.. IN it. Do they keep a cool head when production goes down? Do they have a solid plan of attack for the dreaded “everything is slow!” complaint-storm? Do they have a good handle on HA & DR architecture & implementation? Can you rely on them to actually practice and follow thru with those strategies? Can they be a continuous learner and keep abreast of new developments while still tempering that with wisdom & maturity, applying the correct tools to the proper problems? Do try add value to the team and organization by both teaching and learning from others?
These are truly difficult, complex questions that are nearly impossible to deeply assess and fully answer during an interview process. Largely because the only real evidence of their answers lies in actual experience. Sure, a cert here or an MVP there definitely helps your case. But at any rate, we try our best to chip away at the boulder.
Pivoting to a more positive note, I’ll share some of the better questions that I’ve experienced during my career so far.
Some good examples.
How would you design and build a data copy/sync process across/between tiered environments, say DEV-QA-PROD?
Really great question. This is a common problem is small-to-medium enterprises with legacy systems where DevOps hasn’t quite reached down to the depths of the internal application stacks and people are still dealing with “refresh cycles” on the order of months, quarters, or even years. You can approach it purely from a tooling perspective, but that’s not the whole picture. Thus, it calls for some thought and team-culture ideas as well as “knowing the nerd-knobs”.
We have a complex process flow that involves a lot of stored procedures, say 50 total. Some of these are non-sequential, meaning they can be executed in arbitrary order, while others need to be sequenced with each other in “blocks”. This is a vendor product, so ultimately, the customer gets to decide the schedule and order of execution of this flow. The solution needs to be maintainable by field engineers. How would you handle this?
Woah. Talk about diving down a rabbit-hole. This is interesting in the sense that it exposes a bit of the architecture and some of the potential pain-points that the team is hoping to solve, while leaving enough room for improvement and experimentation by the hopeful candidate. More to the point, it’s just an example of a more general technique, which to me is very effective: taking an architectural problem that actually comes from the “real world” (the company/team that’s interviewing) and asking for the candidate’s ideas on how to solve it. You don’t need to get in-the-weeds super-detailed about it, but outlining your ideas helps indicate how you think about complex challenges and shows what kind of value-add you would bring to the team.
And finally, a perennial favorite:
Tell me about a time you broke production, and more importantly, how you addressed and resolved it.
So many stories from the trenches involve downtime and mistakes, it’s good to ‘bond’ over them. It helps bring the egos back down to earth, and reminds us that we’re all just meatbags, making technology to do our bidding, occasionally to our own regret. It shows the candidate’s “pressure cooker” mentality, or at least, what they tell you about it.
If you’re a DBA, Dev, or IT pro, help your managers better understand your team’s needs when it comes to hiring. Get involved in the job description write-ups and screening process questionnaires. Barge your way into those ivory towers, if you have to — or you’ll regret the time you waste on candidates who really belong in a different role than the one you’re after.
If you’re a manager, PLEASE LISTEN to your reports and tech leads. They know what makes a good team member, they’ve been doing it for a long time. Don’t dismiss their advice or block them from being part of the hiring process — yes, they are busy, and yes, they can be crotchety, but their input is highly valuable toward bringing in effective & productive talent.
That’s all folks!
PS: I know I missed the “deadline” by about an hour..ish. I blame DST. Heck, it’s still Tuesday for the majority of the Western hemisphere. I’m not biased, but I write in English, so… ya know. Take it as you will. Now excuse me while I go hide from the blog-police in my ASCII-bunker.
..while there are likely even better ways to do this in the long-run, this quick & easy approach was sufficient to save me time and effort..
At this point in my career, I’m not managing massive environments, so I don’t feel the need (nor have the expertise) to use a large scale solution like DSC or SCCM. But I’ve had to install SQL Server a few times, so I figured it’s worth at least scripting out a standard pre-configured installation, so that A) I don’t need click through a GUI ‘wizard’ hearkening back to the ’90s, and B) the SysAdmins can “fire and forget” (read: stop bugging me about it).
Thus, I’m attempting to K.I.S.S., while making it configurable & repeatable. There are some limitations of this approach, as alluded above. It’s not “massively scalable” (scaleable? scale-able?) because:
The PoSh script still needs to be deployed locally to the server in question
The installer config (.ini) also lives locally (though it probably could be a UNC path, it’s just a file after all)
The script prompts you for the service account (SQL engine, Agent) credentials and the sa password using the Read-Host -AsSecureStringmethod cmdlet, so some meatbag still has to type those in. This is because we don’t have an enterprise pwd/secret-management system where I could, say, ask it for a service account credential set and tell it to embed that securely in a script without it actually being visible to me. So, while yes, they’re kept in a “vault”, it’s not query-able by anything else, so an admin still needs to copy & paste them into whatever configuration screen he’s working with at the time. Not ideal, I know, but we work with what we’ve got.
PS: Yeah, yeah, “don’t use sa, rename it or disable it; or use Windows Auth only!”. Rage, howl, fire & brimstone. I’m not going to argue about it; we can save that for another post. This environment dictates that its used during setup and then disabled later, so that’s beyond the scope of the installer config.
So yes, while there are likely even better ways to do this in the long-run, this quick & easy approach was sufficient to save me time and effort for the occasions when a new SQL box/VM needs to be spun-up.
A primer on SQL cmd-prompt installation & its arguments
And finally, twothings that I attempted to understand but ultimately failed to implement, because (apparently, at least to me), PowerShell remote-ing is a P.I.T.A.
First we need an .ini file to work with. You could either create it from scratch, or take it from an existing SQL box’s “Setup Bootstrap” folder. Example path C:\Program Files\Microsoft SQL Server\130\Setup Bootstrap\Log\20170801_073414\ConfigurationFile.ini — indicating this was an install done on 8/1/2017 at 7:34am. Right above that, at simply C:\Program Files\Microsoft SQL Server\130\Setup Bootstrap\Log\, you’ll see a Summary.txt file, which can actually come in handy while you’re testing these unattended installs and wanting to see why it failed.
The first link above, from MSFT Docs, does a pretty nice job of telling you all the things that make up this config file. You get to bypass the TOS prompt, enter service account details, specify drive letters (paths) for default data/log file locations & tempdb, slipstream update packages (UpdateSource), and even more advanced stuff like AG settings and whatnot. My example will be a simple standalone instance using the default name, so I’ll be sticking with the basics.
We can then use this file in the ConfigurationFile argument of setup.exe from the SQL Server install media. To put a little more color on that: the .ini file is really just a collection of command-line arguments to setup.exe; you could also list them all out in-line, but that would be tedious and silly. Here’s a couple major selling points of creating your own config file:
Slipstream updates (SP’s, CU’s), instead of having it go out to MSFT update servers (or *aghast* sticking with the original RTM bits, you heathen you!)
Specify drive letters / default file locations: sure, this may be considered old-hat if you’re running super slick storage, but I still find it makes management a bit easier if I know where my MDFs, LDFs, TempDB, & backups will always be.
Take advantage of 2016’s better TempDB setup options (# files, size & growth)
We will, however, keep a couple arguments out of the .ini file and instead throw them into the ArgumentList from the calling PowerShell script. Speaking of, here’s what the PowerShell script needs to do:
Prompt the operator (SysAdmin or DBA) for the SQL & Agent service account credentials, and (optionally) the sa pwd (if using it).
Fetch our install media from the central network share where we store such things (server & office ISOs, for example).
Mount said ISO to our virtual disc drive.
Run its setup.exe with the following arguments:
The config .ini file
The service & sa accounts
After it’s done, un-mount (dismount) the ISO.
Then the DBA can connect to the brand-spankin’-new running SQL instance and do other post-setup configurations as desired (i.e. set max-memory, maxDOP/CTFP, etc). And sure, those could also be done in PowerShell (thanks in no small part to the awesometeam at DbaTools), I chose not to do so in this case.
As the bloggers say, “that’s left as an exercise to the reader”.
Plus, they’re never quite as deterministic as we’d like them to be — they depend on the server’s compute resources, i.e. memory size & CPU cores, as well as estimated workload & environment tier, so it’s often a gamble in “how correct” your initial settings will be anyway. Still, anything is better than the defaults, so configure-away!
Here are the Gists I’ve created to go along with this post. If I’ve made a mistake, or if you, dear reader, have a suggestion, we can incorporate them into the gist without me having to go back and edit the blog post!
I’d love to get feedback on how you would improve this, what you might do differently, etc. Drop me a comment or a tweet!