The EAVil Cycle, Part 2

In which we discuss more about the EAV model and some of its merits and pitfalls.

Advertisements

continued from last week…

The Ugly (aka the “Wow really?!?”)

You’ll see this ‘creep’ even in product-catalog systems as mature as Amazon’s. If you search for (geeky as we are) graphics cards, and use the product attribute filters in the left pane to narrow it down, you’ll find that some correctly have their memory type (GDDR5, etc.) listed, while others may not. If you’re really unfortunate, there will be two semi-redundant attribute-sets that you’ll have to “juggle” between to really get at what you want. TVs, for example, may see both an “HDR support” (yes/no) and an “HDR type” (standard, ultra, etc.) — I’m kinda pulling those out of my arse for example’s sake, but you get the point.

Why does this happen? Because at some level, humans are still responsible for ‘tagging’ those products (for lack of better word). And as much encouragement and guidance as the ‘admin software’ may give them, they can (and do) still decide at times to side-step that guidance and say “Nope, I know better; make me this new thing!”

But isn’t that a problem with nearly all data-driven systems? Sure, of course it is. Yet with a model as flexible as EAV, the problem is intensely magnified by the fact that it’s made so easy to do — to ‘extend’.

so preoccupied with whether or not you could, you didn't stop to think if you should
It’s probably not the exact quote, for you pedants, but it’s close enough for government blog work.

And unfortunately, the biggest contributor to this problem is the lack of development-time and forethought given to the administration, or management, of the data. You see, this problem could be largely assuaged if the admin-toolset were the FIRST thought and priority in the roadmap. But so often, that thought comes LAST, if at all. So sure, your product feature tagging system looks great, it’s flexible and your customers love it. But you’re throwing tickets over the wall to your data team every time a requirement or use-case changes, or when you need to fix a data-quality problem caused by the users not knowing there was already a “Widget Type” before creating their new “Widget Kind” tag, or misspelling “Is Wierd” because English is weird and has more exceptions to the “I before E” rule than not.

Does this problem go away with a dedicated search-index or NoSQL technology like Elasticsearch or (shudder) MongoDB? Of course not! If anything, it may be worse. Maybe. But wait, those systems make it easier to de-dupe and manage redundancy & data quality, don’t they? Probably. I can’t speak from experience myself, but I’ve heard good things. Once again, it all comes down to the effort you’re willing to invest in the system. If you make data quality a priority, you’ll be happier with the experience. If you don’t, well you’re just another amateur data scientist complaining about dirty non-standardized/non-validated address fields, aren’t ya?  =P

I joke with the data scientists, of course. What they do is awesome. They just tend to often re-invent the wheel of data-cleansing/data-wrangling that we DBAs have been doing for a few decades, because they didn’t know the right questions to ask or the right place to look. We need to get better at working together WITH them, not ‘for’ or ‘against’ them.

ninja cat riding a unicorn with laser-eyes
How the data scientists see themselves…

The Why or When (aka “Is it a decent model for this?”)

The long-story-short version is, consider your business and your data. Try to plan for the future, and anticipate potential changes and growth. It’s not easy, and we never “get it right the first time”. But we can try.

When your attributes are fairly static, and you know that you can tightly control them, you might consider a more rigid model. Something with a handful of lookup tables referenced by the main product entity. This is advantageous for performance and management, at the expense of scalability and extensibility.

When you literally need to support on-the-fly extension, and you’re absolutely married to SQL (i.e. not ready to venture out into NoSQL land just yet), the EAV model may fit the bill. Aaron’s article, and the comments therein, present some fairly valid and reasonable implementation suggestions to make it a little more palatable. Just beware the date — that was written back in 2009. Before we had such things as Elasticsearch and its ilk. I’d heavily encourage the consideration of purpose-built data-stores for this sort of thing, if you have any hope of scaling-out.

Other tools in your toolbox can help with this, too. For example, consider an in-memory data-grid for super-fast reads. The vast majority of data-access to these attributes & values is going to be reading, using it to filter & slice & dice a data-set. You can pay the small performance cost (e.g. write to the underlying SQL database) on the rare occasion when a write/change needs to occur.

In Conclusion

Proving the age-old rule of “Just because you CAN, doesn’t mean you SHOULD”, the EAV model is sometimes okay and sometimes not. You need to understand your business and your data to make that call. And you need to consider the magnitude of effort that may be involved in pivoting from one model to another. Unfortunately, in many cases, that part overshadows the rest, and the show business must go on.

queen the show must go on
You’re welcome again, ears.

Still, I encourage you to at least think about it, and be ready with that knowledge of pros/cons when the time is right to discuss it with stakeholders.

Author: natethedba

I'm a SQL Server DBA, family man, and all-around computer geek.

2 thoughts on “The EAVil Cycle, Part 2”

  1. Nice illustration of the point with the graphics card thing! I think that’s something anybody who’s every been buying computer components has been annoyed by, but I’ve never seen it pointed out before.

    On the EAV side of things, I’d also say that deciding whether it’s appropriate can be an opportunity to highlight potential improvements to your development process, and the way the business (or your client) sees tech in general. If I see EAV already in place, or somebody trying to angle toward something that can be used to sidestep development or “empower the users”, I’ll point out the disadvantages of EAV and advantages of a well-maintained/constrained relational solution, but usually in order to ask the people involved why they think the “proper” dev model isn’t suitable for their needs. It could be that the dev methodology isn’t suitable anymore, was never suitable, or doesn’t exist. It could be that the previous DBA or DB developer wasn’t very good. It could be that somebody that wields power has had a bad experience interfacing with techies in the past, and doesn’t trust them. It’s a good opportunity to learn more about deeper-rooted problems and where improvements can be made to processes and collaboration.

    I admit that I do still lean toward the “kill it with fire” kneejerk though :p

    Liked by 1 person

    1. Thanks for the awesome feedback Simon! I really like your angle on it— the opportunity to discover and discuss deeper issues around the software and the people involved. Very nice. 👌

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s