This month’s T-SQL Tuesday topic, thanks to Wayne Sheffield, is a doozy. It’s difficult to admit failures, even if we eventually overcame them. Mostly, it’s tough to fess up to those hours upon hours of what feels like wasted time preceding the hopefully inevitable breakthrough.
Let me tell you a tale of square pegs and round holes.
Get your mind out of the gutter.
And strap in. This puppy went over 2k words. I didn’t intend it that way, but, c’est la vie.
A Short History Lesson
I used to work for a software company that made K-12 educational assessment & reporting products. A large part of K12ED is dealing with learning standards, these historically fraught named concepts that students are supposed to have “mastered” and learned during their time in the US public school system. You ever hear a reference in pop-culture or media referring to “teaching to the test” or “state standardized testing” or similar? That’s what they’re talking about.
In the late 90’s, a bunch of states, including California in 1997, finalized and centralized a “list” of sorts, almost a database (but not formally, because let’s face it, this was still the early days of what we know as modern IT), of said Standards, which came to be unofficially known as the “CA ’97 Standards”. For over a decade, these learning goals dictated what was taught when in our schools. Government funding incentives, based on state standardized test scores, drove instruction to the point that teachers had little if any time to dynamically mold their students’ minds into critically thinking, multi-faceted, creative individuals.
But this article isn’t about my bias. This is about the broad, sweeping shift in the K12ED landscape called “Common Core”. As the technology sector grew more mature and started permeating more “traditional” industries, a vocal minority of thought-leaders had what they deemed an epiphany.
Hey, what if we took the old-guard educational bureaucracy, and all those disparate state standards, and turned it into a unified, technology-driven learning ecosystem?
Sounds great in theory, right? I mean, surely their intentions were good? Well, you know what they say about the road to Hell and how it’s paved…
Gosh, there goes my bias again. Sorry, let me just tuck that back in its pocket.
Anyway. These new Core Standards introduced some new ways of thinking. For example, that some learning concepts are inter-related and “cross-cutting” (a fancy way of saying that sometimes a Math concept requires a fundamental Reading-Literacy knowledge-point to fully grasp). This had some very interesting impacts on the underlying technology systems which relied on, and housed, said Standards. System which, I might add, had existed for over a decade at this point, in many cases.
Bringing it Back to Data
Our company’s system was one such. Our key partner’s system, from which we quite literally inherited the traditional, relational database structure to house the CA ’97 Standards, was another. You see, back before RESTful APIs ran the world, software systems relied heavily on what we call “local data stores”. In order to show the teachers and administrators, who primarily used our system, the Standards, in which their students were performing well (or poorly), we had to relate those Standards to the test questions that said students were tested on month after month, year after year. And, like so many other small businesses of the late 90’s / early 00’s, we had a trusty ol’ SQL Server lying around, just waiting to be loaded with all our precious data.
This was fine, for the most part. The legacy Standards conformed reasonably well to a relational data model, even though we had to throw in a bit of hierarchy (using the good ol’ adjacency list scheme). There wasn’t a complicated set of relationships from Standards in different subjects (Math, Science, History) to each other, and people didn’t care to do much in-depth analysis beyond “are my students getting well-prepared for the state tests?”.
Enter Common Core
You parents thought these things were complicated and convoluted — just Google “common core math problem” and you’ll find no shortage of critical satire. Well, the underlying data structures required to store these new Standards were going to be significantly more complex as well.
One of my main jobs, for about a year or two, was to oversee the importation of said Core Standards into our SQL database system. On the surface, it seemed reasonable — we had a hierarchy concept already, and we had a roll-up & drill-down mechanism for the handful of different “levels” of said hierarchy. But it was all very static. What that means, for us tech-heads, is that it was not easy to change or extend; not adaptable to new and expanded requirements. The older Standards adhered to a fairly strict hierarchy, and each level of roll-up had a distinct name. With Common Core, they broke out of that old mold, while simultaneously keeping some of the names only to change their meaning depending on context.
Think of it this way. A group of cattle is called a herd. A group of sheep is also called a herd. And a group of seagulls is called a flock.
And I ra-a-an.. I ran so far a-wa-a-ay…
Sorry, where was I? Right, group names. So what if the government suddenly decided for us that a group of sheep will from now on be called a ‘gaggle’. But only if they’re all female. If the group contains any male sheep, it’s called a ‘herd’ still. And groups of cattle will be still be called herds, unless it’s purely a group of bulls being raised for beef, in which case we call it a ‘meatsock’.
Have I lost you yet? Of course I have! This is pure nonsense, right? Language does not work this way. Moreover, hierarchies of groups of things do not work this way.
But I digress. There was, despite my jest, a method to the madness of the new Common Core hierarchy groupings. And I did learn it and understand it, for the most part. The problem was that it threw our existing legacy data model to the wind.
Enter Academic Benchmarks
As usual with a legacy software system in a small company, the resources and buy-in for a data-layer overhaul were nil. So it fell to us intrepid DB-Devs to shove that snowflake-shaped peg into the very square hole of the old relational model. We sketched out plenty of ERDs, brainstormed how to tack-on to our existing structures, but nothing short of a miracle would make this new data conform to these old static norms.
Thankfully, the “geniuses” (and yes, that is used sarcastically) over at Academic Benchmarks, or AB for short (at least for the purposes of this post), had already done this. And we paid them, thousands of dollars per year, for the convenience of using their GUIDs to identify Standards across different systems and vendors. Never mind that they were just perpetuating the bad model of yesteryear; never mind that they provided zero support for data quality feedback loops. We could happily take their Excel files or CSVs and load them nearly straight into our database.
Enter, and Exit, ASN
While I was searching for the words to express how insufficient our data model was, I came across this little gem from the Gates Foundation: Achievement Standards Network, or ASN. (ASN stands for many other things, so as with all acronyms, it’s all about context; just fair warning for the Google-happy.) The architects here had understood that learning standards needed a better and more flexible model, not only in terms of storage, but also in terms of data interchange format. This new kid on the block called JSON had been making waves for a while, and was subsequently widely adopted by the tech industry in general, so it stood to reason that this would be the preferred format for publishing and serving the Standards from ASN.
Bonus: it was FREE. Yes, really. What a wonderful thought, I said to my team, to imagine never having to pay those crooks at AB another red cent! Eventually. After years of transition. But alas, it was not to be. See, AB had been around the industry for a long time. They had their hooks in almost every learning content publisher and assessment vendor. So as tempting as this shiny new source of academic truth may have been, sadly, it was not practical.
Enter the Contractor
Somewhere around the same time, we took on a promising new developer who, not only had a very strong background in Math and CS fundamentals, but who had also proven his worth with real world applications with actual “in the wild” deployments and users. He was a bit arrogant, actually. One could argue that he’d earned it, perhaps, but we didn’t appreciate always being told everything we were doing wrong, constantly, to the point that it was hard to hear the more important bits of wisdom and insight into how we could improve.
Sadly, that last part came in too little, too late.
Side-note, on the crest of the micro-services wave, he diagrammed and proposed an entire system re-write for our core software product, using micro-services architecture and the concept of small purpose-dedicated data stores. It looked real purty, but was ultimately and completely impractical for a company of our size. If we were a “true startup” with millions in VC funding coming out the ears, and could afford to throw a brand new, young & hungry “2 pizza team” at the product, then sure. But that was not reality.
The Brick Wall
No two bones about it: we had to support these new Standards. So we soldiered on with our relational database tables. And we tacked-on additional entities and relationships, logic and code, to make it “almost” do what the Common Core Standards wanted to do. Effectively, what we were trying to do, was to shove that pretty round sparkly peg of graph data, into the dingy old square hole of a 15-year-old SQL table structure.
Somewhere along the way, AB caught up with the times and started offering other, less “flat” formats, for their Standards data. So even though ASN was off the table, not all of our work toward JSON ingestion/conversion went to waste. Consuming the data exports from the vendors wasn’t a problem — we’d already been doing this. That was not the issue.
The issue, the brick wall against which we continually banged our heads, was the fact that we just plain couldn’t support the advanced & complex relationships and groupings (categorizations) of the new Standards. Which turned out, in retrospect, not to be the end of the world, because honestly it would take years, if not decades, for the educational system’s old-guard mentality to even comprehend such relationships and categorizations, let alone how they could help shape classroom instruction toward better outcomes for their students.
Good lord, that sounded like a bunch of jargon.
What I mean, in plainer English, is that we spent a lot of time worrying and arguing about something that did not matter as much as we thought it did. The consumers of our system, the teachers and principals and such, honestly didn’t care about all that. They just wanted to know if their kids were on-track to be “Proficient” on the state tests so their funding would remain steady. (I don’t give them enough credit, I know; teachers themselves also needed to know, on a regular basis, which kids needed special attention in what areas of learning, but that’s beyond the scope of most generalized reporting tools.)
Hindsight is Always 20/20
So what was the lesson here? I don’t want to overlook the fact that we were still using the wrong data storage technology for the problem, fundamentally. Or at least, the wrong data model. But, we live in a real world where software needs to be delivered on-time, in-budget, with less than perfect teams who have less experience and expertise than they feel they should. So instead of butting our heads against that brick wall, let’s try to remember to be pragmatic. To be realistic about what’s feasible for who & when; and to adapt the parts and segments of our application infrastructure, appropriately and efficiently, to the business problems and use-cases at-hand. Over-engineering may be foolish, but under-engineering is just as perilous.