Career Day: SQL DBA

Build a solid CS/IT foundation, build your soft-skills, and then learn the latest & greatest tech stack. Or the mature enterprise stuff. There’s plenty of room for both!

Advertisements

Our benevolent overlord¬†prompted us (months ago; I know, I’m always late to the party) to write about what we do in our day-to-day jobs as SQL Server professionals, aka #SQLCareer. The actual idea is to do 4 posts in a row, for 4 working days (either adjacently or the same day each week for 4 weeks).

Sticklers, us? Nope, never.

Without any kind of future guarantees, here is my first.

Morning Mash-up

Emails, tickets, and alerts, oh my! Today was actually pretty quiet on the monitoring front — no major performance issues, no interruptions, and no job failures. That one that occasionally takes 8+ hours did okay last night. More often than not, I have to kill it if I find it’s been running into the work-day. That should be automated. /backlogged

Re-wrote some analyst’s data change script to be more “friendly” and set-based, instead relying on identity values — which is important when they can differ between Dev & Production environments. Deployed a few change requests, including that one.

virtualize all the things
Even SQL? Yes! Especially SQL. Just know the caveats.

On the side of less tedium, I fleshed-out more of the plan to virtualize our remaining bare-metal SQL servers. The usual ‘gotchas’ have to be discussed with the Server Admins — reserving compute resources, preventing the ‘noisy neighbor’ problem, and having enough storage to do the migration.

Yes, that means about 2x the storage as they’re currently using. Got a problem with that? No? Good. ūüėČ

Finally, I worked on some code for querying the SSRS ReportServer¬†database to obtain report metadata (including stored-proc usage, parameters, and user activity). The core concepts came from Ted Stathakis‘s SQLSaturday¬†session on SSRS 2016. Hopefully, some of that code will be coming to GitHub soon.

Afternoon Delight

“Delight” being sarcastic. No really, I love my job, but sometimes there are just some things that make you go ‘grr arg’.

First up, developing and finalizing some “data cleanup effort” reports. These are things that show the business users “what’s wrong with the data” and give them some guidance on how to “fix it”. Now, because there is no easy “management GUI” for this data, and because actual changes need to go through change-control, it’s not going to be a cake-walk. But given the right inputs and some domain knowledge, they can make the decisions as to what those changes should be, and then hand them off to the BA’s (business analysts). Usually in Excel form.

Excel spreadsheet with woman screaming in front of it
What? It’s under 1,048,576 rows! Barely…

Next we have the ‘grr’ part. This is where I spent about 40 minutes trying to explain to the 3rd-party software vendor exactly where and how to integrate with our core data systems. Most of the info should not be ‘new’ to them, but since I never know which support tech will be assigned to the issue, it’s best to err on the side of verbosity. To make things more confusing, the folks on my end, who have been interfacing with the vendor thus far, aren’t intimately familiar with the underlying technology bits, so the conversation gets a little dicey before we all arrive on the same page.

Fortunately, since I had their attention, I was able to sneak in a suggestion to upgrade the back-end database for the software (which we self-host, but is essentially managed by them & their software) to something from this decade. MySQL 5.0 is old, guys, mmkay?

Wrapping Up

A developer needs a new table & some initial data-load to production; it’s passed QA. Great! They wrote the query a little weird, but nothing ol’ SQLPrompt can’t fix.

Commit outstanding stored-proc code & RDL files (SSRS reports) to source control. Even if I’m the only one working on them, I’d rather they be safe & track-able.

Ruh-roh.¬†A developer set off some proc in a Dev db, it’s been running for over 30 minutes… and then they cancelled it! But wait, it’s stuck in rollback. Yes, kids, rolling-back a transaction can take even longer than the transaction itself, due to rollbacks being single-threaded.

Now, since the user’s hitting of the ‘Cancel’ button (aka “Stop” in SSMS) does not wholly and entirely cause a kill SPID¬†command, I do that for them (since only sysadmin¬†or processadmin can kill sessions). Then I run ye olde kill 64 with statusonly¬†(64 happened to be the SPID) to check on its rollback status. Good news, everyone! It’s at… 0%.

bad news nobody
You heard me.

And it stays at 0%. For fifteen minutes.

I move on to other tasks of course, since there’s nothing more I can do (short of pulling the plug on the server, which is never¬†a good idea (unless you’re testing your Disaster Recovery Hope plan!). We also create a ticket to fix this stored-proc so that it doesn’t “take forever”, even if it has to churn through millions of records.

Finally, the statusonly¬†check starts moving. It’s about 45% rolled back, with about “300 seconds remaining”. This, as anybody who’s been a DBA will tell you, is in “Microsoft time”. You know those progress-bars you typically get in Windows while something is happening? They’ll start off fast, make it to around 90-some percent, tell you there’s about 1 minute remaining, then take at least five minutes to “finish that last little bit”? Yeah, that’s a “Microsoft minute”. Patent-pending.

But fortunately for us, it does finish in just about the time it promised, maybe a bit longer. Just in time for the Dev to test what they wanted to test before going home for the day. And just in time for me to have my evening not ruined by a runaway transaction.

PS: Anybody who’s ever worked with SQL, including yours truly, has done this (set off a long transaction and/or tried to rollback a long transaction) at least once in their career. If you ask a seasoned DBA or DB-Dev, and they say “No, I’ve never done that!”, they’re a dirty rotten liar.¬† =D

Are You Not Entertained?

If the life of a DBA sounds like fun to you, get in touch with your local IT recruiter and find out what the job market is like! You’ll hear all sorts of doom & gloom about how “the Cloud is taking over everything” and “the DBA is dead; long live DevSecDataOpsUnicorns!”.

No, I joke. Yes, some of the fundamental responsibilities of the DBA are shifting to the cloud vendors, to IaaS/PaaS services, but at the end of the day, being in IT is about learning tech and working with people. Build a solid CS/IT foundation, build your soft-skills, and then learn the latest & greatest tech stack. Or the mature enterprise stuff. There’s plenty of room for both!

tee-shirt that says "tough enough to be a dba, crazy enough to love it"
We end on a happy note.

How is Database Formed?

Data is digital information. A database is a collection of data. And a DBA manages it all.

Borrowing from an ‘old meme’ a bit.¬† My wife recently said I should “write something about ‘how to do databases’.”¬† As amusingly odd as her phrasing was, I figured she was right.

well like duh
Obviously. GAWD.

What is it?

I like to start at beginning.¬† As Julie Andrews said, it’s a very good place to start.¬† What is a database?¬† That’s a pretty good question.¬† Here’s the prerequisite question: What is data?¬† Well, as I’ve said before, data is everything.¬† But that’s a bit of a cop-out, isn’t it?¬† That’s my career’s bias showing through.

Data is digital information.¬† Anything that can be quantified, specified, categorized, searched, sorted, produced, consumed, read, written, measured, and stored digitally in some fashion.¬† Data is the digital currency of the 21st century.¬† Data is the very reason that most technology exists — to house and transport data among producers and consumers of information.¬† It’s the evolutionary culmination of the stone tablet, the papyrus scroll, the bound book, the printing press, the newspaper, the library, the vinyl record, the magnetic tape, the compact disc, the pocket organizer, and the telephone.

So then, what is a database?¬† Simply put, it’s a collection of data.¬† The simplest analogy, depending on your age, is either a phone book or your cell phone’s contacts list (which is really just a phone book, in digital form).¬† Of course, with the latter, it’s not so much an analogy as an example — you phone’s contact list IS a database.

Fun side-note, the phone book also makes a decent discussion prop for some DBA topics like index fragmentation.

Expanding on that example.¬† You can search and sort your contacts by several¬†data points: first name, last name, phone #, email, notes.¬† Different database systems have various names for these: fields, columns, properties, criteria, values.¬† The point is, it’s all data.¬† Or if you want to get pedantic, each one is a datum, and together they¬†are data.

Pedantic, me?  Never.

This is what a database, or DB¬†for short, is all about: storing data in an organized fashion so that it can be sorted, searched, sliced and diced.¬† Building on that, a¬†database management system is a set of technology tools, processes and programs, that are used to gather, store, manipulate, copy, move, read, maintain, back up, link together, and operate one or many databases.¬† This DBMS¬†can come in many flavors.¬† I happen to specialize in one called SQL Server, a Microsoft product/platform of the ‘relational‘ flavor — so if you’re following along with the abbreviation game, that’s an RDBMS.

If you’re hungry for more acronyms, the Wiki article on ‘databases‘ has a decent breakdown of the types and history behind them.

But Why?

all your data are belong to us
Non-geeks might not understand this, so Google “all your base” if you’re confused.

The more data you have, the more you can do with it.¬† Why do you think Facebook, Google, Microsoft, and Amazon are such powerful technological forces?¬† They purposefully, systematically gather as much data as they can from every possible source, and they have become very good at organizing and managing that data to maximize its value.¬† Amazon product recommendations are a prime (see what I did there?) example — they are generally appropriate and effective because they have “learned” from your past purchases, i.e. your historical data.¬† This “learning” – Machine Learning, aka Data Science – is the hot new marketing buzzword of recent years, but it all still comes back to data at the core.

This is not a “bad thing” or a “scary thing” as the old media and tin-foil-hat-wearers would have you believe.¬† Yes, it can be a little disconcerting, and yes, people and companies can abuse data in malicious ways.¬† But the vast majority of our digital data stewards actually want to¬†do good.¬† They want to connect you with more people that you may know and become friends with; they want you to watch movies that you’ll really enjoy; they want you to easily navigate to your destination without being stuck in traffic; they even want to help stop global warming!

As a general business rule, we crave data because it helps us make decisions.¬† Every time a customer buys a product, we want to measure “the 5 W’s”: who what when where and how (ok, that’s not a ‘W’, but there’s a reason for it).¬† Notice I didn’t list “why” there — only the customer knows why, and that information, that data, is stored inside their brain.¬† And we can’t (yet) access¬†that data.¬† So it’s a guessing game now — we feed the other 5 data-points into our DBMS¬†and eventually, given some time and analysis, we can guess the Why.¬† And pretty accurately, at that.¬† Then, we can make a decision to “Market more aggressively to Customer Type X”, or “Have a flash-sale on Product Y”, or “Move on this hot emerging market demographic.”

So what does that make you?

Well, I’m a Database Administrator – a DBA.¬† Which means I “administrate databases”.

‘Administrate’, less common form of ‘administer’: manage and be responsible for the running of.

Thanks, dictionary.¬† So in a nutshell, a DBA manages data.¬† Deceptively simple sounding, no?¬† I mean, what can data possibly do;¬†it’s not alive, right?¬† Actually, if you hang around a DBA for any length of time, you’ll commonly hear the phrase “Where does that data live?” or “That set of data lives over here.”¬† So clearly we anthropomorphize our data.¬† Most tech professionals do that to whatever technology they work closely with — it’s human nature.¬† Software “behaves badly”, machines “throw a fit”, etc.

But anyway, why do databases need to be managed?  What can happen to them?

Developers.  Developers happen.  =D

I joke, as you know, dear reader; I love developers.¬† Users ‘happen’, too — often more catastrophically.¬† So it’s fair to say that “people happen”.¬† But besides that, here are some common reasons that databases, and data, need to be managed.

  • Data can be “wrong”.

Data can either be human-generated or machine-generated.¬† Fingers on a keyboard, or sensors on a circuit board.¬† You wouldn’t think the latter could possibly ever be “wrong”, but both kinds are subject to error.¬† It’s just that the level of “wrongness” is subjective and depends on who’s asking and what’s expected of the system as a whole.

  • Data gets lost.

Humans interact with and manipulate data, and humans make mistakes.  Why do you think the Undo button became such a staple of so many computer applications?

  • Data gets corrupted.

Storage media (magnetic disks, silicon chips, etc.) are not perfect — they have a documented level of fault tolerance and failure rate — so we need to ensure that our data is preserved (by moving it to another area that’s not ‘faulty’, usually) past those failures.¬† Why?¬† Because our data is essentially “more valuable” than the hardware on which it’s stored.

  • Data needs to be organized.

This is slightly more subjective than the above; how and why we organize data is highly dependent on the overall intent of the systems that will interact with it.¬† But fundamentally, if there’s not some form of organization, the data is effectively garbage.¬† If you ripped out every individual page in the phonebook and scattered them all on the floor, it’s no longer an effective tool to find someone’s phone number; it’s just a mess of papers.

  • Data needs to be useful.

If we can’t¬†do something with the data, what’s the point of having it?¬† The temperature at the North Pole on January 1st 1989 is, by itself, inconsequential.¬† But a history of temperatures at the same and similar locations, over a long period of time, gives us some great value — we can see trends, look for anomalies, and even predict the future of what those temperatures might be.

  • Databases need to be available.

Similarly, if we can’t¬†access the data, what good is it?¬† Databases are a technology, and like most technologies, they occasionally break.¬† Again, most of that comes back to humans, because humans are the ones writing the code that creates the software that houses the data and runs the database, or that interacts with it.¬† But of course we still have power failures, network losses, disk failures, and even solar flares.¬† (Ask your favorite superstitious engineer; they’ll have at least one good story about a system outage that could only be blamed on solar flares or gremlins or the full moon.)

  • Databases need to be maintained.

Every DBMS¬†has some kind of assumed ongoing maintenance requirements to keep it “running smoothly”.¬† Just like your car needs an oil change every 3 to 8 thousand miles, your databases need periodic attention to retain all of those important qualities discussed above.

And the latest big topic, underscored by the GDPR:

  • Data needs to be governed.

This is a big topic for another conversation, but the gist of it is, data is generally “owned” by someone, and deciding who owns what, where it’s allowed to live, and how it’s allowed to be used, constitutes an entire sub-industry of rules, regulations, policies, tools, security practices, and consequences, much of which we’re only just beginning to shape and understand.

TL;DR: What do you actually do?

who is your DBA and what does he do
Umm… stuff? And thangs?

I currently work at a “small enterprise”, a business that has been around for some decades (as opposed to a Silicon Valley start-up who counts their anniversaries in months, like an infatuated teenager), managing their database systems.¬† Some of that is financial/accounting, some is customer info, some is internal/operational, and all of it is important to at least one person in the business in their daily decision-making efforts.

Thus, I help ensure that the data is always ready, when it’s needed, in whatever form & shape it’s needed in.¬† I model, massage, correct, enhance, and move it around.¬† I help developers write faster queries (that’s a fancy word for “questions” that we ask of our data); I aide analysts with understanding and gleaning more value from the data; I maintain the underlying systems that house the databases and ensure that they perform well and get upgraded when necessary; and I work with business drivers (VP’s, CxO’s) to build reporting solutions that leverage the data to enable better, smarter decisions, and ultimately (hopefully!) increase profit.¬† (This last part is actually crossing into the BI¬†– Business Intelligence – job role, which tends to happen to most small-shop DBAs, because they’re usually in the best position to make that transition.)

If some of that sounds like a blurb from a r√©sum√©, it kinda is.¬† This job has existed since the 80’s.¬† But it’s always evolving, like the tech industry in general; so just because we’ve been around a while doesn’t mean we’re all old crusty bearded dudes.¬† (Although we do have some prolific beards among us!)

So there you have it.¬† Now you can tell your friends and family what a DBA does.¬† Or at least, hopefully, I’ve helped my own friends & family understand a bit about what I do.