The EAVil Cycle, Part 2

In which we discuss more about the EAV model and some of its merits and pitfalls.

Advertisements

continued from last week…

The Ugly (aka the “Wow really?!?”)

You’ll see this ‘creep’ even in product-catalog systems as mature as Amazon’s. If you search for (geeky as we are) graphics cards, and use the product attribute filters in the left pane to narrow it down, you’ll find that some correctly have their memory type (GDDR5, etc.) listed, while others may not. If you’re really unfortunate, there will be two semi-redundant attribute-sets that you’ll have to “juggle” between to really get at what you want. TVs, for example, may see both an “HDR support” (yes/no) and an “HDR type” (standard, ultra, etc.) — I’m kinda pulling those out of my arse for example’s sake, but you get the point.

Why does this happen? Because at some level, humans are still responsible for ‘tagging’ those products (for lack of better word). And as much encouragement and guidance as the ‘admin software’ may give them, they can (and do) still decide at times to side-step that guidance and say “Nope, I know better; make me this new thing!”

But isn’t that a problem with nearly all data-driven systems? Sure, of course it is. Yet with a model as flexible as EAV, the problem is intensely magnified by the fact that it’s made so easy to do — to ‘extend’.

so preoccupied with whether or not you could, you didn't stop to think if you should
It’s probably not the exact quote, for you pedants, but it’s close enough for government blog work.

And unfortunately, the biggest contributor to this problem is the lack of development-time and forethought given to the administration, or management, of the data. You see, this problem could be largely assuaged if the admin-toolset were the FIRST thought and priority in the roadmap. But so often, that thought comes LAST, if at all. So sure, your product feature tagging system looks great, it’s flexible and your customers love it. But you’re throwing tickets over the wall to your data team every time a requirement or use-case changes, or when you need to fix a data-quality problem caused by the users not knowing there was already a “Widget Type” before creating their new “Widget Kind” tag, or misspelling “Is Wierd” because English is weird and has more exceptions to the “I before E” rule than not.

Does this problem go away with a dedicated search-index or NoSQL technology like Elasticsearch or (shudder) MongoDB? Of course not! If anything, it may be worse. Maybe. But wait, those systems make it easier to de-dupe and manage redundancy & data quality, don’t they? Probably. I can’t speak from experience myself, but I’ve heard good things. Once again, it all comes down to the effort you’re willing to invest in the system. If you make data quality a priority, you’ll be happier with the experience. If you don’t, well you’re just another amateur data scientist complaining about dirty non-standardized/non-validated address fields, aren’t ya?  =P

I joke with the data scientists, of course. What they do is awesome. They just tend to often re-invent the wheel of data-cleansing/data-wrangling that we DBAs have been doing for a few decades, because they didn’t know the right questions to ask or the right place to look. We need to get better at working together WITH them, not ‘for’ or ‘against’ them.

ninja cat riding a unicorn with laser-eyes
How the data scientists see themselves…

The Why or When (aka “Is it a decent model for this?”)

The long-story-short version is, consider your business and your data. Try to plan for the future, and anticipate potential changes and growth. It’s not easy, and we never “get it right the first time”. But we can try.

When your attributes are fairly static, and you know that you can tightly control them, you might consider a more rigid model. Something with a handful of lookup tables referenced by the main product entity. This is advantageous for performance and management, at the expense of scalability and extensibility.

When you literally need to support on-the-fly extension, and you’re absolutely married to SQL (i.e. not ready to venture out into NoSQL land just yet), the EAV model may fit the bill. Aaron’s article, and the comments therein, present some fairly valid and reasonable implementation suggestions to make it a little more palatable. Just beware the date — that was written back in 2009. Before we had such things as Elasticsearch and its ilk. I’d heavily encourage the consideration of purpose-built data-stores for this sort of thing, if you have any hope of scaling-out.

Other tools in your toolbox can help with this, too. For example, consider an in-memory data-grid for super-fast reads. The vast majority of data-access to these attributes & values is going to be reading, using it to filter & slice & dice a data-set. You can pay the small performance cost (e.g. write to the underlying SQL database) on the rare occasion when a write/change needs to occur.

In Conclusion

Proving the age-old rule of “Just because you CAN, doesn’t mean you SHOULD”, the EAV model is sometimes okay and sometimes not. You need to understand your business and your data to make that call. And you need to consider the magnitude of effort that may be involved in pivoting from one model to another. Unfortunately, in many cases, that part overshadows the rest, and the show business must go on.

queen the show must go on
You’re welcome again, ears.

Still, I encourage you to at least think about it, and be ready with that knowledge of pros/cons when the time is right to discuss it with stakeholders.

The EAVil Cycle

In which we discuss the EAV model and some of its merits and pitfalls.

EAV, or Entity-Attribute-Value, is an data model that’s been around the block. It’s typically injected into a relational database at some point during the overall application/architecture life-cycle, somewhere between when the team realizes that they’ve got way too many lookup tables for a “main business entity” thing, and when they finally make the shift into polyglot data stores.

Wake me up when that actually happens, successfully.

I’m not going to rehash aging internet arguments here, nor bore you with replicated diagrams that you could just as easily look up on Google Images. No, instead, I’m going to tell you why this model is not great, why it’s not bad, and how you should go about deciding if it’s not wrong for you.

Yes, I did negate all those adjectives on purpose. Mostly because nobody really enjoys working with these structures, regardless of how they’re implemented; customers and business stakeholders are ALWAYS CHANGING the requirements and the attributes in question. But, we press on.

Refs:

PS: Postgres is looking really freakin’ cool these days.

the good the bad and the ugly
Another Clint Eastwood pic… apologies if you’re not a fan. =P

The Good (aka “Not Bad”)

Proponents tell us that this model is easily searchable and easy to administer. The “searchable” bit is true; especially when you have the attribute-values pre-defined and don’t rely on end-user text-entry. But that’s true of basically any data model. The key here is that all attribute-values are effectively in one “search index”. But wait, don’t we have purpose-built search indexes nowadays? (Hint: see Elasticsearch.) This will come up again later.

Administerable? Administrable? Administratable? Damn you English! Anyway. Yes, again, if you’re fairly confident in your business users’ ability to effectively track and de-dupe (de-duplicate) incoming requirements/requests using their own brains/eyeballs and the admin tool-set that you build for them.

Oh, right, did I mention that? You have to build the admin app. Because you do NOT want to be writing ad-hoc SQL queries every time a new attribute requirement comes in. (Still, it’s better than making schema changes for new req’s, as I’ll discuss in a bit.)

Mainly, though, the biggest ‘pro’ of this model is that your business requirements, i.e. your attributes and the values they’re allowed to contain, can be flexible. The model allows a theoretically infinite amount of customization to suit your needs; though in practice, as Allen writes in the CodingBlocks article, you do run up against some pretty hard scalability hurdles right-quick. So in practice, you might want to consider more horizontally-scalable data stores, or (God help you) try scaling-out your SQL databases. (Spoiler-alert: big money big money!)

shut up and take my money
Millions of query-bucks…

The Bad (aka the “Not Great”)

Which brings me to the first ‘con’. Performance. If you’re modeling this in a relational DB, and you expect it to scale well, you’re probably overly optimistic. Or very small. (If the latter, great! But you don’t always want to be small, right? Right!)

Don’t get me wrong; you can make it work half-decent with good indexing and sufficient layers of abstraction (i.e. don’t write a “kitchen-sink view” that’s responsible for pivoting/piecing-together all the attributes for a product straight outta SQL). But again, is it really the right tool for the job?

Momentary digression. SQL Server, or more generally, the relational database, has long been touted as the “Swiss army knife” of IT; we’ve thrown it at so many problems of different size and shape, that we’ve almost lost track of what it’s actually very GOOD at. Hint: it’s about relationships and normalization.

Another argument against it seems to be data integrity and enforcement. I can understand that, but again, with some clever software overlay and user-guidance, this can become almost a non-issue. But remember, your developers are the ones building said software. So that effort needs to be considered.

The Ugly (to be continued…)

The biggest problem, and quite a legit one, is ‘creep’ — both scope and feature. See, the inherent flexibility in the model will almost encourage data managers to be less careful and considerate when deciding when to add an attribute or value-set, and how to govern the data-set as a whole.

creep wish i was special
No, not THAT creep. But, you’re welcome ears.

Stay tuned for more…

T-SQL Tuesday #113: Personal-Use Databases

So when I dived down the rabbit-hole of the Nested Set Model, of course I created a sample database to write & test the code against.

tsql2sday150x150It’s that time again! This month, Todd Kleinhans (b/t) asks us how we use databases in our day to day life, i.e. personal use or “outside of our work / day-job”. Actually, the question is kinda vague — because if you think about it, we all use TONS of databases in our daily lives. Your phone’s contact list, your calendar, online shopping, banking.. the list goes on. As I’ve said before, Data is everything.

But what I think he meant, and the way most of the community has interpreted it, is “How do you manage/administrate/build/work-with/develop databases in your day-to-day life outside of work?”. So we’ll go with that.

Now this may out me as “not a real DBA” or some such nonsense, but honestly.. I don’t spend much of my time creating silly playground databases. Not that anybody else’s are ‘silly’ — just look at some of the fantastic posts for this month! Such neat ideas brought to life.

Special shout-out to Kenneth Fisher, who, if you look closely at his screenshot (and it’s not even related to this post), committed the abhorrent sin of creating a database name of pure emojis — FOR SHAME sir! But also you’re awesome. ❤

Me, I’m more of a quick-n-dirty spreadsheet guy. If I need, say, an inventory of my computer parts & gadgets so I know what I can & can’t repair, what materials I have to work with as I tinker, etc.. well, I create a Google Sheet. And it serves my needs well enough. (Spoiler alert: yes, you can view that one; I shared it. But it’s fairly outdated since I moved in March and haven’t had time to re-do inventory since.. last autumn.)

But for blogging in the tech field, you gotta get your hands dirty. So when I dived down the rabbit-hole of the Nested Set Modelof course I created a sample database to write & test the code against. And there have been some additional bits & pieces for blog demos and GitHub samples.

Most of the time, I’m creating databases / entities on SQL 2016 Developer Edition. Of course by now, that’s 2 major versions ‘behind’, but since I don’t run Linux personally (yet?), and I’m not a conference speaker (yet??), I don’t feel a burning need to upgrade. It’s FAR superior to Express Edition, though, so please for the love of all that is holy, if you find yourself using Express for personal/playground use, save yourself the headache and go grab Developer.

Containers/Docker? Meh. If you want to start playing with those, definitely look at 2017 or higher. It sounds appealing in theory — “just spin it up when you need it, spin it down when you don’t!” — and I’m sure that’s great if you’re starved for resources on whatever laptop you’re working with, but if you’ve done your due diligence and set your local SQL instance to appropriate resource limitations (hello, ‘max server memory’ and file-growths!), I’ve found that its impact is quite tolerable.

But come now. Surely this isn’t just a “shameless self-promotion” post or a grumpy-old-DBA “get off my lawn” post. Right?? Right!

To you folks out there creating your own nifty little databases for personal projects, learning/development, or even hopes & dreams of building a killer app on top of it one day — you’re amazing! Keep doing what you do, and write about it, because I love reading about it. Heck, go try to create the same model in PostgreSQL or MariaDB and see how it goes. We could all use a little cross-stack exposure once in a while.

That’s all I have for this month; short & sweet. I need to finalize plans for virtualizing our main SQL instances (which is really just a migration off bare-metal & onto VMs) within the coming weeks. Yes, we’re that far behind the curve. Now get off my lawn!

=P

clint eastwood frowning angrily
I’m old and racist! But I’m still adorable for some reason!

T-SQL Tuesday #112: Cookies!!

..this analogy of “dipping into the cookie jar”. What events or accomplishments can I take sustenance from, draw strength from, during these times?

tsql2sday150x150

Hi folks. It’s been a minute. Frankly it’s been a rough 5 months. From losing my wife, to dealing with the holidays and her birthday, to moving houses again. On the career front, I’m faced with the challenge of virtualizing our core business-critical SQL instances with minimal downtime. And obviously, because of all that personal/life stuff, it’s been difficult to stay focused and productive.

So this #tsql2sday‘s topic is poignant, I suppose — this analogy of “dipping into the cookie jar”. What events or accomplishments can I take sustenance from, draw strength from, during these times?

As usual, we must thank our host, Shane O’Neill (b|t), and remind readers to check out tsqltuesday.com.

Encouragement

Back when I first took this current job, I was worried sick about doing the commute every day. One and a half to two hours in traffic each way. Even if I used toll-roads, it might save 10-15 minutes, but it was still super stressful. But my wife never stopped encouraging me, telling me it would pay off. She put up with the crazy hours, she checked on me periodically, she stayed on the phone to keep me awake sometimes. She reminded me often, when the time was right, to have the telecommute/remote-work conversation with management.

And of course, to nobody’s surprise, she was right. I now work from home 4 days a week, take a vanpool the 5th day, and am much happier and more productive (in general!), much less stressed, and a markedly better driver. More importantly, because of her unwavering support, I can still look back and draw renewed energy from those memories and from her still-present spirit that stays with me in my heart.

the wife is always right
GOTO 1

Accomplishment

One of the first big projects on my plate was upgrading & migrating the SQL 2005 instances to new hardware and SQL 2016. We didn’t use anything super fancy for the actual migration, just backup shipping, essentially. DbaTools came in very handy for copying over the logins/users without having to hash/unhash passwords. The main hiccups were around Agent Jobs and Replication. Jobs were turned off at the old server before they were enabled on the new, but due to lack of documentation and understanding of some, it became difficult to see which ones were critically needed “now” vs. which could wait a while, and which may have dependencies on other SQL instances (via linked-servers) that may or may not have moved.

And replication is just a PITA in general. Fortunately, I took this as a ‘growth opportunity’ to more fully document and understand the replication environment we were dealing with. So at the end of the project, we had a full list of replication pub-subs with their articles, a sense of how long they each took to re-initialize, and a decision-process to consult when adding-to or updating articles within them.

Continuous Learning

A similar upgrade-migration project came to fruition in 2017: the ERP system upgrade. This was a delicious combo meal of both the database instance upgrade (SQL 2008R2 to 2016) and the application & related services/middleware (Dynamics NAV 2009 to 2017). And as I blogged about, it came with a super sticky horrible side-effect that we’re still dealing with over a year later: a different collation than the rest of our DB environment.

Which reminds me, I need to do some follow-up posts to that. Briefly, the “best” band-aids so far have been thus:

  1. If only dealing with the ERP entities, without joins to other DBs, just leave collations alone. The presentation layer won’t care (whether that SSRS or a .NET app).
  2. If relating (joining) ERP entities to other DB entities, I’ll load the ERP data into temp-tables that use the other DB’s collation, then join to those temp-tables. The “conversion” is essentially ‘free on write’, meaning when we load the weird-collation data into a temp-table that’s using the normal-collation, there’s no penalty for it.

As I said, I’ll try to dive into this more in a future post. It’s been a life-saver for performance problems that cropped up as a result of the upgrade & the different collations.

But the point here is that, even though this project didn’t end up as wildly successful as we’d hoped, it’s still a success, because we learned some key lessons and were able to pivot effectively to address the problems in a reasonable way. And frankly, there was no going back anyway; it’s not like the business would have said “Oh, never mind, we’ll just stick with the old versions of everything” (even though some reticent managers would probably have enjoyed that!). So even when things seem bleak, there’s always a way around.

when the going gets tough, the tough take a coffee break
Wait a minute…

Conclusion

I’m still trying to figure out what this new chapter of my life looks like, without her. I’m still trying to be the very best DBA & BI-Dev I can, supporting the dozens of requests & projects that the business throws at us. Fortunately, with the incredible SQL Community, a wonderfully supportive family, and a fantastic team of colleagues, I remember how far I’ve come, on whose shoulders I stand, and how much promise the future holds.

Even though the one person I was meant to share it all with is gone; she still smiles down and encourages me in subtle ways.

…with love & light ❤

What, ANOTHER blog?

Head on over to natethewriter.home.blog (yes, it’s still Wordpress, because I’m cheap!)

Yes, dear readers, it’s high time I drew a clear line between the tech/professional content and the personal. Also, I realized that I need to write more, because it’s been helping me through the grieving process. And a lot of that writing is going to be very… non-technical.

Don’t get me wrong; I need to, and will, resume my database-related blogging too. At some point. I promise! But I’m trying to honor my wife’s memory and take to heart her words of encouragement to me, when she said “You know, people always told me I should be a writer. But you should try it too; you’re pretty good at it.” I feel like she’s passed along some of her soul and spirit to me. If nothing else, at least I’ll get to show the world a little glimpse into her imagination, as I gather the little bits of sparkle that she left with me in her journals and scrapbooks. And hopefully, honor that soul and spirit by exposing some of my own.

So, head on over to natethewriter.home.blog (because I wasn’t feeling terribly creative — I know, not a good start for a so-called “writer” — but I decided to stick with the pattern)… even though I have absolutely NO delusions that I’m in any sense a “real writer”, just an amateur with the desire to express thoughts and ideas in concrete form. And follow, subscribe, stalk, etc.! Hope to see you there in the comments. ❤

keira looking at camera
She’s so excited!

What Marvel Movies Do I Need to Watch?

Welcome to the first post of the new year. I’ll be keeping things a little on the lighter side for now. I’m still very into my work and learning lots of share-worthy things in the data world. But for now, movies!

I also want to take a moment to appreciate those who reached out to us after our devastating loss. Thank you for your thoughts and prayers. Please continue to remember our family as we struggle to find a sense of normalcy.

So, some of my elder moviegoers asked me the question that many people have been asking over the last year or two: “What Marvel movies do I really need to watch before Infinity War?”, or more recently, “before End Game?”. More generally, which ones are worthwhile viewing to a casual non-geek, to someone who doesn’t need to obsess over every little minutiae, someone who is not by nature a “comic book movie lover”. It’s a completely fair question, and honestly it needs more.. less nerdy answers.

Hence, this post! Go read it on the new blog!

In Loving Memory

Do not waste moments. Put forth light and love into this world. Emanate joy and kindness. And cherish each day, for our time here is never guaranteed.

may your wings be strong
-her favorite quote, from the movie Maleficent.

Dear readers. It is with a heavy heart that I write this. 2018 has seen some significant losses of meaningful lives, whether it be in #SQLFamily or elsewhere. Sadly, that toll increases again.

In October this year, my wife of 10 years passed away unexpectedly, at home. No chronically debilitating illnesses being fought off, no sudden acute onset of horrible afflictions, just here one day, gone the next. Words could not express the deep sorrow and profound shock we experienced that night, and in the days and weeks that followed.

I count myself incredibly blessed to have been loved by her. She loved so deeply and so fiercely, pouring out her generous heart to those who would take the time to truly know her. The day we said “I do!” was the happiest day of my life. She had a contagious laugh and a smile that could light up the room, even when she was hurting, sometimes more than anybody knew. It gives me some small comfort that she no longer feels any pain, but is surrounded instead by joy, light, and love.

She was a sensitive soul, leaving a little bit of herself in every life she touched. She was an empath, which means she would often know how you were feeling before you did. She felt the emotions of others, magnified in herself, and was an amazing listener, who could always be counted on for comfort and support.

The holidays were her favorite time of year, specifically Thanksgiving and Christmas. So today, on the heels of our day of thanks and the toes of the season of giving, I encourage you with two things.

First, be thankful for your loved ones, your family and friends, and most of all your children, if you are so blessed to have them. Life is so precious. Spend it with those you love, and do not waste any moments.

Second, give of yourself unto others, be it time, money, helping hands, a listening ear, or a friendly voice. Put forth light and love into this world, not darkness or hate. Emanate joy and kindness, not malice or bitterness.

And above all, cherish each day, for our time here is never guaranteed.

May you ever walk the beaches of paradise,
May you dance along the sunsets,
And sing until the end of eternity.
You are forever loved, and always in my heart.