And now for a brief interlude while I work on my next “real” post… continuing with the same theme… SQL Server for the developer/programmer! Warning: potentially excessive snark ahead.
SQL Server is a mystical fairy-tale land where you can store all your application data, and never have to worry about it again! Sure, it “should” be relational, but you also “should” drive the speed limit.. PSHH! Let’s throw blobs in there too — HTML, XML, hell, even JSON now that 2016’s got nice happy JSON-centric system functions. And files? Heck yes, what else is FILESTREAM for? Sure, performance may be abysmal, but that’s somebody else’s problem, so why should we care?
It’s the only database platform your company invested in, so it must be good for everything. Document storage? Sure, not like there’s anythingelsebettersuited for the job, especially for ‘Free’! Ooh, how about key-value data, and volatile temporal data like .NET Session-State? Obviously! Nothingelseexists, and Microsoft makes it so simple to set it up in SQL.
It’s that wonderful database system where we send all our Entity Framework queries and data just magically appears! We don’t need to worry about how nasty and convoluted those queries turn out when they finally get parsed down to actual TSQL code. It was only 3 nice simple lines of LINQ, why should it be any more complex than that? Oh, and we can write all sorts of abhorrent ad-hoc queries against it too, nobody will even notice. Let that code build all sorts of IN SELECTs & sub-queries, chains of unreadable JOINs, and Inception-level nested VIEWs, it’s perfectly fine! Performance, shmerformance! That’s what we have DBAs for. Not that they’ll be able to help you when you refuse to refactor or rewrite your code that’s responsible for these abominable queries.
SQL Server is so easy! Microsoft gives me a free Developer Edition, I can code for all these awesome features that I know we’ll have in production, right? Oh, we only run Standard Edition? Woops. Ooh, I know! Let’s put everything in Azure. It can do everything and make us breakfast! Then we won’t need to worry about feature differences, backups, RTO/RPOs, index maintenance, performance tuning, scaling, or even performance in general! Wait, no? Oh well, that’s what we have DBAs for! Besides, it’s so easy to move everything over, we just take our SQL backups and upload them to blob storage and then restore them on Azure SQL Database, right? No?
Part 2, as promised. We were talking about the components of SQL Server. So far we hit “the big ones”: the database enginesqlservr.exe, the data files (mdf, ldf), and the agentsqlagent.exe.
Thirdly, less of a “component” and more of a “tool-set”, the venerated SSMS – SQL Server Management Studio. (Formerly known as SQL Server Enterprise Manager, if you’ve been around for a long time.) SSMS is the “stock” application interface for SQL Server, the DBA’s basic swiss army knife – all of the essential functionality, a couple odd-ball modules here & there, and no “fluff”. It’s not the prettiest, nor the sexiest, but it gets the job done. It even has some basic monitoring capability; as an infrastructure person, you would most likely use it to check up on DB statuses, Agent Job runs, and perhaps the occasional “everything is slow!” panic moment by peeking at the Activity Monitor (right-click on a server instance).
SSMS is powerful, and with great power comes great responsibility. Your DBA should be practicing good security principles and granting limited access to all but the most trusted, experienced team members, because that top tier access, sysadmin (or sa for short), is literally capable of destroying the server (maybe not physically, but logically!). SSMS is a client-side application, primarily – meaning that yes, while you do “get it for free” when you install SQL Server on your server (at least, until SQL Server 2016, when they finally broke it out to a standalone installer, praise the gods), you should really install it on your own workstation and use it from there.
Key takeaway: ask your DBA to install SSMS on your workstation, IF it’s appropriate (and trust me, they will tell you if it’s not); often, there are better monitoring tools and infrastructure-focused apps that can “talk to” the SQL servers without needing you to dive down deep into SSMS-land. But, in a pinch, it helps to know a few basic things about what it can do.
Finally, and less commonly, we have a collection of supplemental services that work within and around those two main components to provide additional functionality and support for things like bulk data interchange, reporting, data warehousing, and scale-out options. Again, the curious reader can easily find more detailed info at his/her favorite reference sites, so I’ll define these briefly and move on. They all start with “SS”, which of course stands for SQL Server. So, here we go. SSIS – Integration Services – bulk data import/export, ETL (extract/transform/load), and similar tasks. SSRS – Reporting Services – lets us build & manage reports, and provides a basic no-frills web interface from which any business user can view said reports to which they’re granted access. SSAS – Analysis Services – data warehousing, cubes, “analytics” – basically, we’re filling up disk-space with pre-defined roll-ups and drill-downs of our data for faster reporting queries & data analysis. So as you can imagine, often those last two go hand-in-hand, in a mature environment that’s cognizant of its “business intelligence” (BI) needs. But that’s another topic way outside the scope of this post!
Key takeaway: if any of those SS*S components are a part of your environment, monitor them at the service/process level, and, just like with the Agent, have a procedure in place to address incidents.
So what have we learned? A SQL server machine runs one (or more, but hopefully/usually just one!) instances of SQL Server. We should primarily be aware of the engine & agent service processes, the locations of the mdf & ldf files, and the agent job schedules; and secondarily, if applicable, the other components (SS-whatever-S).
Oh, but wait! What about performance concerns?!? Glad you asked.
SQL Server EATSdisk for breakfast, memory for lunch, and CPU for dinner. In that order.
So your dedicated SQL server machines better have FAST disks, LOTS of RAM, and a decently fast, multi-socket multi-core CPU. The disk speed helps keep transaction log throughput & data file read/write activity up-to-snuff, so faster == better. Memory is for holding the “hottest” data for fastest access so it doesn’t always have to go to disk for everything, so more == better. And CPU cores help it serve 100’s (or even 1000’s) of concurrent requests in a sane, efficient manner.
In reality, SQL Server performance is a huge topic; there are entire consultingcompanies dedicated to it, and tons of excellentblogs about it. You want a good DBA on your team who knows how to troubleshoot and address performance issues with your SQL servers. But when you’re tasked with helping spec-out a new physical box, or sizing a new VM, it doesn’t hurt to know what kind of demands you’ll need to meet.
Which brings us to our last key point…
Key takeaway: SQL Server is primarily a scale-UP technology, not scale-out. If you’re not OK with that, you might want to look into other DBMS’s.
That’s not to say you can’t scale-out with it; obviously, large corporations with mature ecosystems often have 100’s of instances in some resource-groups, but as I discussed earlier in What’s in a Name?, the applications which depend on those SQL servers are also architected with that scaling in mind, and it’s not like an “easy switch” you can make to an existing monolithic app.
Anyway, I hope this helps IT Helpdesk and SysAdmins/Engineers get a little more visibility into the SQL DBA’s world, and bit more understanding of how things work behind-the-scenes.
Know which servers are running SQL Server, monitor the services, and have an escalation procedure for failures.
Today’s post will be much less philosophical and (hopefully) much more practical. This is a topic that I’ll actually be presenting at work to my fellow IT team members, who are mostly all Helpdesk and Infrastructure people, with no Dev or database background. My goal is to explain the basics of SQL Server from an IT infrastructure perspective, and I hope that others find it useful as well as my team! So let’s get to it. Fair warning: lots of abbreviations ahead! But fear not, I will define them – I don’t want to assume any prior knowledge, other than basic IT skills & Windows familiarity.
I also decided to break this up into 2 posts, because it got pretty lengthy. So, here goes part 1!
SQL Server is, obviously, Microsoft’s relational database management system, or RDBMS. “Relational” means it’s based on the concept of relationships between entities (or objects, if you will; though mostly we mean tables). In layman’s terms, the relations can be expressed as “is a..”, “has a..”, or “has many..”; for example, a Customer object “has a” Contact-Info object, which probably “has many” Phone#s, Emails, etc. SQL Server is a transactional, fully ACID compliant database system – Atomicity, Consistency, Isolation, Durability. I won’t bore you with the computer science theory stuff; you can read all about it on Wikipedia or your favorite reference site, so let’s move on.
There are a handful of components that make up SQL Server, and I’ll list them in what I think is the “most to least important” order, or at least most to least commonly used. Infrastructure-wise, at its core, SQL Server is basically an “application” and persistent data storage mechanism. It’s a set of processes (services) that run on your server and operate on specific sets/types of files, which we’ll cover in a bit.
First and foremost of said core components is the database engine. This is the DBA’s bread & butter – it’s the service process, sqlservr.exe, which handles all the queries and scripts, manipulates the data files & transaction logs, ensures referential integrity and all that other ACID-y goodness. Like any good service, it is best run under a dedicated service account, especially in a Windows domain environment. Otherwise, it will run as a built in system account, like NT Service\MSSQLSERVER. As a service, it’s set to “Automatic” run-mode and it accepts a number of startup parameters which can help govern and tweak its behavior. Most of those details are outside the scope of this post, so go ask your friendly (or surly, depending on the day) DBA if you’re curious.
A bit of key terminology: each running database engine is what we call a SQL Server instance. Typically you dedicate one physical (or virtual) server to each instance; however, in rare cases (usually in the lower tier environments like dev/test), you will see multiple instances on one server – this is called instance stacking, and it’s not usually recommended by DBAs due to resource governance complications. You typically reference a SQL Server instance simply by the server name that it’s running on, because the “default instance”, while technically named MSSQLSERVER, does not require any special referencing. So if my SQL server box was named FooBar, I would connect to the SQL instance with the name FooBar. If you get into stacked instances, you have to start naming the other instances uniquely, and your connections now have to include them with a backslash, e.g. FooBar\SQL2, FooBar\SQL3, etc.
As I said, the engine manages the data files. We’re talking about a permanent data storage system, so that data has to “live” somewhere – namely, mdf and ldf files (and sometimes ndf too). The mdf (and ndf) files hold the “actual data“, i.e. the tables, rows, indexes, & other objects in each database. The ldf files represent the transaction logs, or “T-logs” for short, which, again, are a fundamental part of the RDBMS. They allow for and enforce those fancy ACID properties. All of these files are exclusively locked by the sqlservr.exe process, meaning, nothing else is allowed to touch or modify those files but SQL Server itself, while SQL Server is running – not your antivirus, not your backup software (SQL has its own backup methodology), not even Windows itself. The only way to interact with and manipulate those files, or rather, the data within those files, is through the engine.
Key takeaway: know which servers are running SQL Server, monitor the sqlservr.exe service(s), and work with your DBA to ensure that your AV software excludes the locations of those data files, and that your backup strategy includes & works in conjunction with SQL-based backups.
In a close second, we have the SQL Server Agent, sometimes just “agent” for short. Think of this like Windows Task Scheduler on steroids, but specifically focused within the realm of SQL Server. Like the engine, it runs as a service process – sqlagent.exe– again, typically set to automatic startup and running under a dedicated service account. (There’s a small debate on whether it’s best to use the same service account as the engine, or a separate one; in the end it doesn’t really matter much, so it’s mostly at the discretion of your DBA team and/or the Active Directory admin.)
SQL Server Agent is basically free automation – we can use it to schedule all kinds of tedious operations and database maintenance tasks, anything from “run this query every day at 12pm because Mary in accounting runs her big report after lunch and it needs to have data from X, Y, & Z”, to running the hallowed “good care & feeding of indexes & stats” that any DBA will tell you is not only essential to decent database performance, but essential to life itself (or at least, your SQL server’s life). These sets of work are called Agent Jobs (or just “jobs”), which consist of one or more Job Steps. A job has one or more Schedules, which dictate when it runs – such as, again, daily 12pm, or “every Saturday & Sunday at 5pm”, or “every 15 minutes between 8am and 9pm”. Ideally, jobs should be configured to alert the DBA (or, in some cases, the broader IT team) when they fail, so that appropriate action can be taken.
Key takeaway: monitor the service sqlagent.exe and have an alert & escalation procedure in place for handling failed Agent Jobs.
A quick sidebar about transaction logs. SQL Server uses what’s called a “write-ahead log” methodology, which basically means that operations are literally written to the T-logs before they’re written to the actual data files (MDFs/NDFs). As stated so eloquently in thisblog:
[When] neglected, it can easily become a bottleneck to our SQL Server environment.
It’s the DBA’s job to ensure the T-logs are being properly backed up, maintained, and experiencing good throughput. And I’ll have another blurb about performance at the end of part 2. However, as an infrastructure person, it’s worth knowing two things: A) these aren’t your “traditional” logs, like error or event logs; these are fundamental core part of SQL Server. And B) —
Key takeaway: The disks which house the transaction logs should be FAST. Like, stupid-fast. Srsly. At least at sequential writes.
I have a confession. I enjoyedge cases. More than I probably should. I take an unhealthy amount of pleasure in finding those unexpected oddities and poking holes in the assumptions and “soft rules” that business users and even developers make about their processes, applications, and most importantly, their data.
But let’s back up a minute. What exactly is an edge case? Well, I’ll quote Wikipedia’s definition at you, then I’ll put my own spin on it.
An edge case is a problem or situation that occurs only at an extreme (maximum or minimum) operating parameter.
In my mind, there are really two types of edge cases. Although the results and treatment are largely similar, so it’s not a terribly important distinction, but for conversation’s sake, here they are.
Known: while we can (and usually do) identify these cases, they are typically thought of as ultra-rare and/or “too difficult to reproduce”; thus, the plan for handling them involves tedious one-off or ad-hoc procedures, which are often completely undocumented.
Unknown: the saying goes, “You don’t know what you don’t know” – these cases don’t even cross our minds as a possibility, either due to ignorance or because they are [supposedly] contrary to our business rules or our [assumed/remembered] application logic.
Again, the end result is the same: panic, discord, technical debt, and wasted hours of remediation. So why do we do this to ourselves? Well, one common justification tends to be “Oh, that’ll never happen!”, and we sweep it under the rug. Then there’s the laziness, busy-ness / lack of time, pressure to deliver, gap in abilities or tool-sets, passing the buck, etc. We’re all guilty of at least two of these in any given week.
So let’s move on to the important part: What can we do about it? Largely, we can simply take the excuses & reasons and simply turn them on their heads. Take ownership, commit to learning, communicate with management, make time for planning and documentation, and once a path is laid out for remediation, actually do the work. It’s often not easy or pretty, but a little pain now beats a lot of pain later.
I know, easier said than done, right? :o)
Let’s say our sales offices are closed on Sunday – this is our “operating assumption”. Therefore, no orders will be processed on Sundays – this is our “business rule”. So because of this, we’ve decided to do some ETL processing and produce a revenue report for the previous week. Now, we’re using some antiquated tooling, so our first batch of ETL, which takes the orders from the sales system and loads them into the bookkeeping system, runs from, say, 2am to about 5am. Then we have a second batch, which moves that bookkeeping data into the report-staging area, from 6am to about 7am. We need those hours of “buffer zones” because the ETL times are unpredictable. And finally, our reporting engine churns & burns at 8am. But along comes Overachieving Oliver, on a Sunday at 5:30am, and he’s processed a couple orders from the other day (so perhaps he’s really Underachieving Oliver, but he’s trying to make up for it, because he enjoys alliteration almost as much as I do).
Woah nelly! What happened? Oliver’s sales didn’t make it into the report! Not only that, but they don’t even exist according to bookkeeping. But come Monday, if he tries to re-process those orders, he’s probably going to get an error because they’re already in the sales system. So now he’s gotta get IT involved – probably an analyst, a developer, and maybe even a DBA. Damn, that’s a lot of resources expended because an assumption and a rule were broken!
Here’s another one. An order consists of 1 or more order-lines, which each contain 1 or more of a given item (product). Let’s say we store these in our database in a table called OrderLines, and each line has a LineNumber. Now, we have an assumption that those LineNumbers are always sequential. It’s even a rule in our applications – maybe not all parts or all modules, but at least some, and enough to cause a fuss if there’s a gap in that sequence or if a line is moved around somehow without proper data dependency updates (which, by the way, are application-controlled, not database-controlled). Plus there some manager who depends on this old reporting metric that also breaks when those line numbers are out-of-whack. But this should never happen, right?
The operative word there being “should”. But apparently there was a bug in an “update order” routine that left a gap in the sequence. Or maybe the DBA was asked to delete something from an order post-mortem, and there’s no way within the app’s ecosystem to do it, so he had to write some queries to make it work. And guess what? Because the Dev team is super busy working on the hot new feature that everybody wants, it will be 2 weeks before they can circle back around to address that update-bug or add that utility function for line-deletion. So now the DBA’s writing a stored-proc to wrap in a scheduled job to “fix” those order-line sequences every night, to prevent that one app module from breaking and keep those management reports accurate. And, to quote my very first post ever, “the DBA waxe[s] wroth”.
So, prevention? Well, I’d probably start by locking down the order entry system during that off-limits window. We should also wire-up those big 3 processes so that there’s no need for indeterminate buffer-zones and inconsistent timing. And yeah, it could be a big change needing lots of team buy-in and approvals, but it’s worth the investment. Right? Right!
What do I mean by that? In technology, specifically. Well, let me explain.
The name you choose for a thing – whether it’s a server, a database, an application, a service, a class, a file, a method, a function, a procedure, ad nauseum – that name is immediately and irrevocably in love with Edward associated with that thing. Even if you miraculously get to change that name somewhere down the road, it will still be (at least for a while) “formerly known as” the original name. Legacy teams will continue to use that old name until they’ve convinced everybody the new name is worth the trouble of switching all dependency references & documentation. Managers will continue to use the old name even longer because they’re not close enough to the tech to be aware of, let alone understand, the full implications of said name change.
So that’s why names matter. They’re a unique and semi-permanent identifier of something; they encapsulate the essence of that thing in a single word or phrase. And they become embedded in the business jargon and technology team lore. Such lofty expectations of such a simple (hopefully), short (usually), and often under-appreciated element of our field.
It’s unbelievably easy to find bad examples of IT naming schemes, too. Just look:
I don’t know about you, but I like my names to be pronounceable. I don’t want to need dozens of syllables and two extra breaths just to say a couple server names in conversation. It’s not hat hard to create meaningful multi-part names that are still phonetic. We speak these things just as much as we type them, let’s make them speak-able!
Thus, once again,
Names matter. REALLY.
And so, dear reader, choose them wisely, with care and consideration. With lots and lots of context. Maybe even some conventions, or at least conventional wisdom. And consistency. Plan for the future, while being aware of the past. Don’t let legacy remnants stand in the way, but don’t let delusions of grandeur force you into a naming scheme that breeds confusion or resentment.
Allow me to illustrate. We have a handful of SQL servers as part of our core IT infrastructure. Let’s call them FooDB, BarDB, and CharDB. Now, they each serve a fairly distinct and semi-isolated set of apps/websites. In other words, they each fill a role. (Of course, there are cross-dependencies and integrations, but nothing extreme.) Now, we want to migrate them to new hardware and plan for scale, assuming the business keeps growing. So what do we name the new servers? We know that SQL Server doesn’t really scale horizontally, at least not very easily & not without major application rewrites. But we don’t want to draw ourselves into a corner by assuming that we’ll “never” scale out. Ok, how about this: DC-FooDB-P1, DC-BarDB-P1, etc. The P stands for production, because, along with the fancy new hardware, we’ve got some additional infrastructure room to build a proper Dev & QA environment.
So what have we planned for? Tiered environments, as well as a bit of scale-out room – 9 servers in each “role”, so to speak. What if we grew to the point where we needed 10 servers? Well, as a DBA, I’d argue that we should consider some scale-up at that point; but also, to seriously start looking at a broad infrastructure revamp, because that kind of growth should indicate some serious cash influx and a much higher demand on IT-Dev resources.
Why should the breaking point be 10 and not 100? or 1000? Well look, it definitely depends on the technology we’re dealing with. SQL server is not Mongo, or Hadoop, or IIS, or Apache. Certain tech was made to scale-out; other tech is more suited for scale-up. And there’s nothing better or worse about either approach – both have their place, and both are important. Scale-out is arguably more important, but for traditional IT shops, it’s also more difficult.
This is where that big C-word comes in. Context. If we’re talking about a tech that scales-out, sure, leave room for 100 or 1000 of those cookie-cutter instances that you can spin-up and shuffle around like cattle. But if it’s a scale-up tech, don’t kid yourself by thinking you’re going to get to the triple-digits.
The point is, if you’re starting at 1, it’s not too difficult to scale-out to a few more “nodes” with simple tricks and tweaks; i.e. without drastically changing your IT-Dev practices and environment. But when you start talking multi-digits, of redundant nodes fulfilling the same role, you need to seriously PLAN for that kind of redundancy and scale. Your apps have to be aware of the node topology and be concurrency-safe, meaning “no assumptions allowed” about where they’re running or how many other instances are running simultaneously. And since we’re talking about database servers, that means replication, multi-node identity management, maybe sharding, data warehousing, availability groups, and other enterprise-level features. We’re talkin’ big $$$. Again, it stands to reason that if the business is seeing the kind of growth that leads in this direction, it can afford to invest in the IT-Dev improvements required to support it.
I’d love to know your thoughts! Leave me a comment or two. Until next time!
Header image: Keira loved her little pet penguin… and then later she destroyed it and literally ate its face off.
Make it readable, minimize scrolling.. provide context, clarify complexity.
I can hear it now.. the moans and groans, the wailing and gnashing of teeth.. “Another post about coding style / formatting? REALLY!? Why can’t we all just get along read/write/paste code in our own environment and let our IDE/toolset worry about the format?” I hear you, really I do. But you’re missing the point. The single most important trait of good code is readability. Readability should be platform-independent, i.e. it should not vary (too greatly) depending on your viewing medium or environment.
The ratio of time spent reading (code) versus writing is well over 10 to 1 … (therefore) making it easy to read makes it easier to write.
And yet, all around the web, I continue to find terribly formatted, nigh-unreadable code samples. Specifically, SQL scripts. That’s me; that’s my focus, so that’s what I pay the most attention to. But I have no doubt that the same applies to most other languages that Devs & IT folks find themselves scouring the tubes for on a daily basis. There is just an excessive amount of bad, inconsistent, language-inappropriate formatting out there. And honestly, it’s not that surprising — after all, there are millions of people writing said code — but it’s still a source of annoyance.
Clean code always looks like it was written by someone who cares.
Maybe it stems from my early days working in a small shop, where we all had to read each other’s code all the time, and we didn’t have these fancy automated toolsets that take the human eyeballs out of so much of the workflow/SDLC. So we agreed upon a simple set of coding style/layout rules that we could all follow, mostly in-line with Visual Studio’s default settings. And even though we occasionally had little tiffs about whether it was smarter to use tabs or spaces for indentation (protip: it’s tabs!), we all agreed it was helpful for doing code reviews and indoctrinating onboarding new team members.
My plea to you, dear reader, is simple. Be conscious of what you write and what you put out there for public consumption. Be aware of the fact that not everybody will have the exact same environment and tooling that you use, and that your code sample needs to be clean & structured enough to be viewed easily in various editors of various sizes. Not only that, but be mindful of your personal quirks & preferences showing through your code.
That last part is really difficult for me. I’m a bit of an anal-retentive bastarda stickler for SQL formatting, particularly when it doesn’t match “my way”. So that’s not the point of this post – I don’t mean to, nor should I ever, try to impose my own personal style nuances on a wide audience. (Sure, on my direct reports, but that’s why we hire underlings, no? To force our doctrine upon them? Oh, my bad…)
All I’m saying is, try to match the most widely accepted & prevalent style guidelines for your language, especially if it’s largely enforced by the IDE/environment-of-choice or platform-of-choice for said language. And I’m not talking about syntax-highlighting, color-coded keywords, and that stuff – those are things given to us by the wonderful IDEs we use. Mostly, I mean the layout stuff – your whitespace, your line breaks, your width vs. height ratio, etc.
For example, Visual Studio. We can all agree that this is the de-facto standard for C# coding projects, yes? Probably for other MS-stack-centric languages as well, but this is just an example. And in its default settings, certain style guides are set: curly braces on their own lines, cascading smart-indents, 4-space tab width, etc. Could we not publish C# code samples to StackOverflow that have 7 or 8 spaced tabs, curly braces all over the place (some on the same line as the header code or the closing statement of the block; some 3 lines down and spaced 17 tabs over because “woops, I forgot a closing paren. in a line above, I fixed it but I was too lazy to go down and fix the indentation on the subsequent lines!”). GRRR!
Always code as if the [person] who ends up maintaining your code will be a violent psychopath who knows where you live.
Now, on to SQL. Specifically, TSQL (MS SQL). Okay, I will concede that the de-facto environment, SSMS, does NOT have what we’d call “great” default style/layout settings. It doesn’t even force you to use what little guidelines there are, because it’s not an IDE, so it doesn’t do smart auto-formatting (or what I’ve come to call “auto-format-fixing, in VS) for you. And sure, there are add-ins like RedGate SQL Prompt, SSMSBoost, Poor Man’s T-SQL Formatter, etc. But even they can’t agree on a nice happy compromised set of defaults. So what do we do? Maybe just apply some good old fashioned common sense. (Also, if you’re using such an add-in, tweak it to match your standards – play with those settings, don’t be shy!)
I can’t tell you how frustrating it is to have to horizontally-scroll to read the tail-end of a long SQL CASE expression, only to encounter a list of 10+ literal values in an IN () expression in the WHERE clause, that are broken-out to 1-per-line! Or even worse, a BETWEEN expression where the AND <end-value> is on a brand new line – REALLY?? You couldn’t break-up the convoluted CASE expression, but you had to line-break in the middle of a BETWEEN? Come on.
Or how about the classic JOIN debate? Do we put the right-hand table on the same line as the left-hand table? Do we also put the ON predicates in the same line? Do we sub-indent everything off the first FROM table? Use liberal tabs/spaces so that all the join-predicates line up way over on the right side of the page? Again, common sense.
Make it readable, minimize scrolling, call-out (emphasize) important elements/lines (tables, predicates, functional blocks, control-flow elements, variables). It’s not revolutionary; just stop putting so much crap on one line that in a non-word-wrapped editor, you’re forced to H-scroll for miles. (FYI, I try to turn word-wrap on in most of my editors, then at least if I have some crazy-wide code, I can still read it all.)
SWYMMWYS. Say what you mean, mean what you say. (“Swim wiz”? “Swimmies”? I don’t know, I usually like to phoneticize my acronyms, but this one may prove troublesome.)
Modern languages are expressive enough that you can generally make your structure match your intended purpose and workflow. Make it obvious. And if you can’t, comment. Every language worth its salt supports comments of some kind – comment, meaning, a piece of text that is not compiled/run, but rather just displayed to the editor/viewer of the code. Comments need to provide context, convey intent, and clarify complexity.
So what’s my point? Basically, let’s all try to make our code samples a bit easier to read and digest, by using industry-standard/accepted best-practices in terms of layout & style.
Unless you’ve been living under/in the proverbial rock/hole-in-the-ground, you’ve heard of this thing called MongoDB. It’s one of the biggest flavors of NoSQL database systems out there – specifically, it’s a “document DB”, which means it stores semi-structured data in the form of JSON documents, grouped into collections, grouped into (of course) DBs.
Now, I’m a Windows guy. SQL Server & Windows Server are my comfort zones. I like GUIs, but I’m ok with a command line too. PowerShell has become my friend good acquaintance. But every once in a while, we have to step outside our comfort zones, no? So when I was told “hey, you’ve got to get a handle on managing these MongoDB instances that we’ve got running on servers so-and-so, as part of this ‘Imaging system’ application”… I thought “Hahaha…”, but what came out was “Sure thing!” Something a bit like this:
So the first order of business was to decide on a MongoDB “GUI” – a Windows app that could at least connect-to and give me a visual overview of the running MongoDB instances (properly referred to as a mongod, stemming from the Linux term “daemon”, which on the Windows side is basically like a process or service). I tried both the “official” desktop app from the big org, MongoDB Compass, and a neat open-source tool called Robomongo.
And I actually like them both, for different reasons; most of which can probably be attributed to my lack of depth with the technology, but hey. Anyway, Compass is really nice in that it gives you this kind of statistical overview of the collections in your DBs, performing some basic aggregates on a 10% or 1k sample of the collection documents to give you a graphical 40-thousand-foot view of the data. But where it breaks down for me is that little “query” bar, which is just an empty pair of curly-braces. It’s only for “selecting” (finding, querying); no other operations to see here. So we can’t manipulate our data with it, but it’s definitely great for viewing!
Whereas with Robomongo, I can right-click on the DBs/collections and do very simple things like “show documents”, “show statistics”, etc. And it actually writes the equivalent mongo shell command for me to poke at; say, to inject more logic to the find to get something specific or to write a new command or two as I read thru the docs and learn things like aggregates, indexes, and whatnot. Being a shell, it allows us to write or update data as well as read it.
Even though it’s a CLI/Linux flavored technology, it works perfectly fine in Windows… Except for one thing. So, you install it as a service, and you typically start & stop services using net start & net start and as long as your service’s CLI arguments are all correct, you should be good — in theory! Trouble is, the service tends not to stop gracefully. So I found that, instead, the following command was more useful: mongo admin --eval "shutdownServer()". This uses the actual mongo shell to send the native shutdown command to the mongod, instead of relying on the Windows services gymnastics to do it correctly.
It just goes to show, dear reader, that you’ve got to get your hands dirty and try out new technology before dismissing it as “not my job” or “somebody else’s problem”.
PS: Nope, that’s not Compass’s logo or anybody else’s; I made it meself, with good old Paint.NET!