Borrowing from an ‘old meme’ a bit. Somebody recently told me I should “write something about ‘how to do databases’.” As amusingly odd as that phrasing was, I figured they were right.
What is it?
I like to start at beginning. As Julie Andrews said, it’s a very good place to start. What is a database? That’s a pretty good question. Here’s the prerequisite question: What is data? Well, as I’ve said before, data is everything. But that’s a bit of a cop-out, isn’t it? That’s my career’s bias showing through.
Data is digital information. Anything that can be quantified, specified, categorized, searched, sorted, produced, consumed, read, written, measured, and stored digitally in some fashion. Data is the digital currency of the 21st century. Data is the very reason that most technology exists — to house and transport data among producers and consumers of information. It’s the evolutionary culmination of the stone tablet, the papyrus scroll, the bound book, the printing press, the newspaper, the library, the vinyl record, the magnetic tape, the compact disc, the pocket organizer, and the telephone.
So then, what is a database? Simply put, it’s a collection of data. The simplest analogy, depending on your age, is either a phone book or your cell phone’s contacts list (which is really just a phone book, in digital form). Of course, with the latter, it’s not so much an analogy as an example — you phone’s contact list IS a database.
Fun side-note, the phone book also makes a decent discussion prop for some DBA topics like index fragmentation.
Expanding on that example. You can search and sort your contacts by several data points: first name, last name, phone #, email, notes. Different database systems have various names for these: fields, columns, properties, criteria, values. The point is, it’s all data. Or if you want to get pedantic, each one is a datum
, and together they are data
.
Pedantic, me? Never.
This is what a database, or DB
for short, is all about: storing data in an organized fashion so that it can be sorted, searched, sliced and diced. Building on that, a database management system is a set of technology tools, processes and programs, that are used to gather, store, manipulate, copy, move, read, maintain, back up, link together, and operate one or many databases. This DBMS
can come in many flavors. I happen to specialize in one called SQL Server, a Microsoft product/platform of the ‘relational‘ flavor — so if you’re following along with the abbreviation game, that’s an RDBMS
.
If you’re hungry for more acronyms, the Wiki article on ‘databases‘ has a decent breakdown of the types and history behind them.
But Why?
The more data you have, the more you can do with it. Why do you think Facebook, Google, Microsoft, and Amazon are such powerful technological forces? They purposefully, systematically gather as much data as they can from every possible source, and they have become very good at organizing and managing that data to maximize its value. Amazon product recommendations are a prime (see what I did there?) example — they are generally appropriate and effective because they have “learned” from your past purchases, i.e. your historical data. This “learning” – Machine Learning, aka Data Science – is the hot new marketing buzzword of recent years, but it all still comes back to data at the core.
This is not a “bad thing” or a “scary thing” as the old media and tin-foil-hat-wearers would have you believe. Yes, it can be a little disconcerting, and yes, people and companies can abuse data in malicious ways. But the vast majority of our digital data stewards actually want to do good. They want to connect you with more people that you may know and become friends with; they want you to watch movies that you’ll really enjoy; they want you to easily navigate to your destination without being stuck in traffic; they even want to help stop global warming!
As a general business rule, we crave data because it helps us make decisions. Every time a customer buys a product, we want to measure “the 5 W’s”: who what when where and how (ok, that’s not a ‘W’, but there’s a reason for it). Notice I didn’t list “why” there — only the customer knows why, and that information, that data, is stored inside their brain. And we can’t (yet) access that data. So it’s a guessing game now — we feed the other 5 data-points into our DBMS
and eventually, given some time and analysis, we can guess the Why. And pretty accurately, at that. Then, we can make a decision to “Market more aggressively to Customer Type X”, or “Have a flash-sale on Product Y”, or “Move on this hot emerging market demographic.”
So what does that make you?
Well, I’m a Database Administrator – a DBA
. Which means I “administrate databases”.
‘Administrate’, less common form of ‘administer’: manage and be responsible for the running of.
Thanks, dictionary. So in a nutshell, a DBA manages data. Deceptively simple sounding, no? I mean, what can data possibly do; it’s not alive, right? Actually, if you hang around a DBA for any length of time, you’ll commonly hear the phrase “Where does that data live?” or “That set of data lives over here.” So clearly we anthropomorphize our data. Most tech professionals do that to whatever technology they work closely with — it’s human nature. Software “behaves badly”, machines “throw a fit”, etc.
But anyway, why do databases need to be managed? What can happen to them?
Developers. Developers happen. =D
I joke, as you know, dear reader; I love developers. Users ‘happen’, too — often more catastrophically. So it’s fair to say that “people happen”. But besides that, here are some common reasons that databases, and data, need to be managed.
- Data can be “wrong”.
Data can either be human-generated or machine-generated. Fingers on a keyboard, or sensors on a circuit board. You wouldn’t think the latter could possibly ever be “wrong”, but both kinds are subject to error. It’s just that the level of “wrongness” is subjective and depends on who’s asking and what’s expected of the system as a whole.
- Data gets lost.
Humans interact with and manipulate data, and humans make mistakes. Why do you think the Undo button became such a staple of so many computer applications?
- Data gets corrupted.
Storage media (magnetic disks, silicon chips, etc.) are not perfect — they have a documented level of fault tolerance and failure rate — so we need to ensure that our data is preserved (by moving it to another area that’s not ‘faulty’, usually) past those failures. Why? Because our data is essentially “more valuable” than the hardware on which it’s stored.
- Data needs to be organized.
This is slightly more subjective than the above; how and why we organize data is highly dependent on the overall intent of the systems that will interact with it. But fundamentally, if there’s not some form of organization, the data is effectively garbage. If you ripped out every individual page in the phonebook and scattered them all on the floor, it’s no longer an effective tool to find someone’s phone number; it’s just a mess of papers.
- Data needs to be useful.
If we can’t do something with the data, what’s the point of having it? The temperature at the North Pole on January 1st 1989 is, by itself, inconsequential. But a history of temperatures at the same and similar locations, over a long period of time, gives us some great value — we can see trends, look for anomalies, and even predict the future of what those temperatures might be.
- Databases need to be available.
Similarly, if we can’t access the data, what good is it? Databases are a technology, and like most technologies, they occasionally break. Again, most of that comes back to humans, because humans are the ones writing the code that creates the software that houses the data and runs the database, or that interacts with it. But of course we still have power failures, network losses, disk failures, and even solar flares. (Ask your favorite superstitious engineer; they’ll have at least one good story about a system outage that could only be blamed on solar flares or gremlins or the full moon.)
- Databases need to be maintained.
Every DBMS
has some kind of assumed ongoing maintenance requirements to keep it “running smoothly”. Just like your car needs an oil change every 3 to 8 thousand miles, your databases need periodic attention to retain all of those important qualities discussed above.
And the latest big topic, underscored by the GDPR:
- Data needs to be governed.
This is a big topic for another conversation, but the gist of it is, data is generally “owned” by someone, and deciding who owns what, where it’s allowed to live, and how it’s allowed to be used, constitutes an entire sub-industry of rules, regulations, policies, tools, security practices, and consequences, much of which we’re only just beginning to shape and understand.
TL;DR: What do you actually do?
I currently work at a “small enterprise”, a business that has been around for some decades (as opposed to a Silicon Valley start-up who counts their anniversaries in months, like an infatuated teenager), managing their database systems. Some of that is financial/accounting, some is customer info, some is internal/operational, and all of it is important to at least one person in the business in their daily decision-making efforts.
Thus, I help ensure that the data is always ready, when it’s needed, in whatever form & shape it’s needed in. I model, massage, correct, enhance, and move it around. I help developers write faster queries (that’s a fancy word for “questions” that we ask of our data); I aide analysts with understanding and gleaning more value from the data; I maintain the underlying systems that house the databases and ensure that they perform well and get upgraded when necessary; and I work with business drivers (VP’s, CxO’s) to build reporting solutions that leverage the data to enable better, smarter decisions, and ultimately (hopefully!) increase profit. (This last part is actually crossing into the BI
– Business Intelligence – job role, which tends to happen to most small-shop DBAs, because they’re usually in the best position to make that transition.)
If some of that sounds like a blurb from a résumé, it kinda is. This job has existed since the 80’s. But it’s always evolving, like the tech industry in general; so just because we’ve been around a while doesn’t mean we’re all old crusty bearded dudes. (Although we do have some prolific beards among us!)
So there you have it. Now you can tell your friends and family what a DBA does. Or at least, hopefully, I’ve helped my own friends & family understand a bit about what I do.