DBA does not mean “database archaeologist”, even though sometimes that’s what you end up doing…
Aka “Dammit Jim, I’m a DBA, not a data researcher!” Or, as I stated on Twitter:
DBA != Database Archaeologist
Today I spent more hours than I care to admit, tracking down some obscure data from two disparate systems in an attempt to reconcile what were supposed to be matching records. Part of that is my own fault — I’m a sucker for interesting problems and edge cases, as I’ve blogged about before…
mostly just for the smug satisfaction of proving to the business that “your assumptions about how your data works are invalid“.
But mostly it’s because, the further back in time you go, the less reliable the data becomes. Especially (exponentially) when that data originates from human free-form text input.
Let’s contrive an example. We have our core business product system, WidgetMaster, which tracks Widgets we ship out by WidgetNumber. Our partner associate business runs an online widget exchange where people can buy and sell their Widgets in a sort of second-hand/after-market fashion. PartnerExchange listings are supposed to include the WidgetNumber for ease of tracking and associating data between the two systems, even though they’re officially run by different companies (or in my case, different departments of the same company — yeah, think about that for a second).
Now, ideally, theoretically, those WidgetNumbers should always match up. But unfortunately, up until late 2014, our WidgetMaster system didn’t have an API that PartnerExchange could call to obtain a widget by its number; and even if we did, they have to at some level rely on the customer (or a worker) to read and enter that WidgetNumber into the exchange listing. But wait, we started doing bar-codes back in 2010, so “most” of them are actually scanned from the bar-code, but not every customer has that capability, so there’s still a lot of hand entered data.
So we have some dirty data. Let’s complicate things a bit. Over time, those widgets can come back to WidgetMaster for update/upgrade and then ship back out. Again, WidgetNumber should remain consistent throughout that process. Now, when PartnerExchange sells certain particular widgets, sometimes they’re part of a SuperSpecialCollection. This collection spans many years, maybe even a decade or more. WidgetMaster got wind of this SuperSpecialCollection, being bought-up by Mr. HighRollerCustomer, so we started marking the incoming/outgoing records with a new property.
But it’s text.
It’s entered by the receiver, based on looking at the Widget’s buy/sell history in PartnerExchange. And yes, the HighRollerCustomer who last bought the widget is aware that it’s part of their SuperSpecialCollection, but they aren’t guaranteed to specify that when they send the widget back in to WidgetMaster for upgrade.
Do we see the problem yet?
See, about 5 years ago, there was a reorg, and the dev team for WidgetMaster completely revamped the way in which “collection membership” for incoming widgets is designated/tracked. So now it’s over in some property table. To make matters worse, PartnerExchange renamedSuperSpecialCollection to AwesomeCltn a few years ago because they were tired of typing so many letters (and apparently fans of cryptic abbreviations).
Fortunately, PartnerExchange has done a decent job of at least storing the correct WidgetType and WidgetQuality in their listings, despite WidgetNumbers being fairly sparse. But again, because over in WidgetMaster, we’re supposed to associate each WidgetNumber with the AwesomeCollection, we now have this secondary task of mapping unmatched WidgetNumbers across systems, by using Type and Quality from one side (partner) combined with Collection-membership from the other side (master), by assuming that the partner’s designation of SuperSpecial/AwesomeCollection is correct.
If your head’s not spinning yet, give yourself a round of applause. While rubbing your tummy and tapping your foot.
Needless to say, this is hard. We’ll probably get the majority of records matched (mapped?) eventually by using a couple string LIKE predicates and some clever try/pass/retry flow, but it’s tedious at best. Another bit of frustration will come up when we do a couple ad-hoc searches thru each system to attempt to apply reason and logic, i.e. find a pattern; and because we’ve already done the work, we might as well put that info into our results, even if it doesn’t show us a useful pattern by itself.
So how do we approach this? We’ll as I said, I spent what I felt was too much time on it, but essentially I did an initial “majority rules” mapping attempt (first pass), followed by a few reconciliation attempts for the remainders. Those consisted of fairly identifiable patterns with a couple outliers. With those outliers, as with the rest of the unmapped records at the end of the day, I had to tell the business, basically, “Here’s the majority of the results. You can assign a research specialist or analyst to the rest, if you feel it’s that important.”
I may have done this with slightly more attitude than necessary.
How could we improve this? The bigger geeks in the room may pipe up with “machine learning!” Ok sparky, do you have that infrastructure ready to go? No? How long for implementation? mumble mumble something about Azure mumble… Okay, sure, how about training the model so you can feed it your data? Cool, well enjoy the incredulous laugh you’ll get when you tell the manager that.
How about other tool sets? Sure, we could check out Python or R, write a C# app maybe? Well guess what, we still need to look at the data to understand what patterns (or lack thereof) there are to work with. And again, lead time. Unfamiliar tools means longer development cycles. And they’re really not guaranteed to produce any better results (more matches) at the end of the day, are they?
Because your data models and your analyses are only as good asthe data itself.
And with that, I’ll call it a day. Thanks for reading!
This is a story. And it’s longer than my usual post, so get comfy. I may be stretching my own rules, but I swear, I’ll tie it back to databases… somehow! Let’s get started.
Back in December 2016, I was in an accident in my 2011 Mazda 3. Ironically, I was driving home from filling up with gas, plus I’d just had some maintenance done the month before. These things are ironic because the car was a total loss. “Totaled”, in layman’s terms. It means the damage was such that the insurance company would rather pay off the market value of the car, than pay for the repairs. Or, put another way, it means that the cost of repairs would be within nominal range of the vehicle’s value. Short version, I’m not getting the car back. Oh, and that gas fill-up and mechanic bill? Money down the drain.
FYI, I was just fine – so the car and its safety-system did its job, protecting me from harm. Safety is an important part of choosing a car, kids… remember that.
Anyway, the car gets towed off to a local yard, I have a day or so to collect my possessions from it, and then they tow it somewhere else to the “salvage yard”, where it becomes somebody else’s property and problem. Insurance sends me the check for the value of the car. I was actually worried that it wouldn’t be enough to cover the loan balance, because I was still making payments AND I’d refinanced midway thru the original loan term, lowering the interest rate and payment but also extending the term. But, thankfully, the car’s value was above the loan’s balance, so I was able to fully pay it off and still pocket some cash.
Unfortunately, I didn’t have “loss of use coverage” on the policy – meaning, no free rental car. So I pay for it, for a little while. Fortunately, around the same time, my parents were getting ready to dump their old 2000 Honda Accord for a newer one, so we started talking and they offered to us as a gift. (Did I ever mention how awesome my parents are? They are!) Excellent.
Actually, this is the same car that I used to drive around in high school, so it’s got some memories.
Including, of all things, my very first accident! You’ll see why this is ironic in a few paragraphs.
Now, this car is what we call a “beater”. It’s 17 years old, it’s been through the wringer, it’s got 190k miles on it; but hey, it’s a freakin’ Honda. It’ll last another 50k at least, if maintained properly. And it has been – faithful oil changes, scheduled maintenance and beyond. But it’s not the safest vehicle on the road; the e-brake light comes on sporadically, and the airbag warning light is always on, so we don’t actually know if the airbags (particularly the passenger side) will work. So we need to start shopping for a new vehicle, at least for the wife.
I won’t go into car-shopping here, it’s a pain the arse unless you use courtesy buying services like those offered thru your credit-union or your some kind of “club membership” or whatever. Long story short, we got a 2017 Hyundai Elantra, in black, not brand-new but used with only 3k miles on it (apparently they had buyer’s remorse).
I will digress just for a second about black cars. They look pretty slick, even though they do show dirt a bit more than gray/silver (which is what the Mazda was). But you know what makes them look super-duper slick? Those “legacy” CA gold-on-black license plates. I convinced myself that I had to get those. Until I went to the DMV and found out that they’re $40 initially plus an extra $50/year on your registration fees. Jesus H… I get that they need to make money, but that’s ridiculous. Srsly. Ding the people that want those silly vanity plates, because I understand it adds a lot of processing/tracking overhead and makes the data (see? I told you I’d tie it back!) more complex. But this is the same old metal made by the same old prison inmates with a different coat of paint. Don’t pretend it actually costs anything extra for you to make them and pass them out. Anyway. Back to the story.
So we get the car, the wife’s driving it home. She’s heat-sensitive, and it’s been a pretty long, warm day. She ends up passing out for a second and veering off the road to the right shoulder, which is a small dirt embankment into some bushes and trees. Fortunately enough, she realizes what is happening and she’s able to bring the car to a gentle stop without actually hitting anything. So the car’s only real damage is some scrapes & bruises on the front end, a bit of scratching on the sides, and some dings to the under-carriage-panel. Now, me being the savvy consumer that I am, I’ve already added it to our insurance policy and added rental coverage. The new car goes into the shop before we’ve even had it a day, but we get a free rental while it’s there. (Ford Fusion – I like it alright, but the wife hates it; she has a bit of an anti-Ford bias.) Insurance covers about 1.5k damage for the bodywork, it gets done, we get it back in less than 2 weeks. Yay.
Alright, here’s where it gets fun interesting.
That was all back in December/January.
March rolls around, and I’m driving the Honda home from work. There’s a sudden pile-up of stopping cars in my lane and I can’t stop in time, so I run into the SUV in front of me. Fairly low speed, nobody panics, we pull over and start exchanging info and pictures. Now, my bumper is nearly detached, and my hood is quite scrunched in at the point of impact. This is because I hit his tow hitch, which stuck out from the rest of his rear body quite a bit. So even though he literally has a 1-inch scratch on his bumper, I’m looking at significant damage. But it seems mostly superficial, so I figure, well, I might not even need insurance, and he certainly doesn’t care enough to report it unless I do, so he leaves it up to me.
He helps me rope-up the bumper so it doesn’t fall off (he was such a nice guy, no joke!), and I start driving the rest of the way home. Quite a distance, mind you (I have a 60 mile total commute). After a little while, I start seeing smoke coming from under the hood. Fortunately it’s white, not black, so I know I’m not in terribly immediate danger. But I pull off to a gas station and take a look. Well, I can’t actually open the hood due to the scrunchy-ness, but I peer inside and see that there’s a significant bit of frame damage, and the radiator looks hurt. Sure enough, it would turn out that that was the biggest problem – the radiator (and compressor) would need to be replaced, and the frame around it needed repairing/re-welding.
This is not a small job. I take it to a body shop first, but as they look inside and see what I saw, they know that it’s beyond their scope, so they send me next-door to a full-service mechanic & repair shop. Next day, he gives me the estimate: $2.7k. Now, about this time, I’m talking with the insurance reps. I know they’re going to want to total this car – it’s KBB value is literally just over $2k, and these repairs are significantly more than that. What I was trying to ask them, and never got a straight answer, was whether we could file the claim for a lower amount, by asking for the mechanical repairs only. Remember, this car is a “beater”. We don’t really care how it looks, we just need it to run. And the shop was kind enough to provide that “bare-bones” estimate as well – only about $750.
But then my insurance adjuster did two things that were very insightful & much appreciated.
I have to give a shout-out to Safeco here, because throughout all of this, they’ve been immensely helpful and easy to deal with. (Even though I mentioned not answering my question in the paragraph above, as you’ll see, that was really my own fault for not understanding the process, and it was a moot point anyway!) So if you’re in the market for a new insurance policy, definitely check ‘em out.
First, because this shop was not an official “authorized partner”, she couldn’t accept their estimates as gospel; but, she could offer this newer “pilot” program whereby any shop (or even the customer) could submit pictures of the vehicle and the damage, and, provided enough detail and the right angles, a 3rd party estimator could assess the damage and estimate the cost. Great!
Second, she heard me out as I explained the concern with totaling the car, and understood that I really wanted to keep the car after simply getting it mechanically sound. But, she clarified, because I had collision coverage on this car, they (the insurance company) literally “owed me” the full cost of those repairs or the vehicle value, whichever is lower. So in fact, I would be doing myself a disservice and actually losing money if I tried to simply file the claim for the lower “bare-bones” amount, just to avoid the total-loss.
Instead, she explained, what you can do is keep the vehicle, even after it’s been declared “totaled“.
There’s a process and paperwork to this, and it involves the DMV, obviously. But because the insurance policy will still pay me the value of the vehicle, I should have more than enough to get the minimum repairs done and pocket the rest. Yay!
Now, the process. The CA DMV has done a fairly decent job of documenting this, but it’s still unclear (at least to me) what the order of operations is. There are 5 things you need:
Salvage title (which is different than the regular title, aka pinkslip)
DMV form REG 343, which you fill out yourself
REG 488c, which you also fill out yourself
Owner retention of salvage vehicle
REG 481, which your insurance company completes & sends to the DMV
Brake & light inspection (to make sure it meets road safety standards)
Certificates are printed & given to you by the inspecting shop
Full vehicle inspection (again, safety & compliance)
REG 31, which is completed by DMV personnel only
Number 4 can be done by many authorized 3rd-party shops, most of which also do smog tests and such things, so they’re not hard to find. The rest are DMV forms, as noted above. (#5 can be done either by the DMV or by CHP; but, CHP has quite a narrow list of “accepted” vehicles which they’ll inspect for this purpose, and honestly their appointment “system” for trying to get them done is horrendous, so it’s easiest to let the DMV do it.) But again, what’s the order in which I should do these things? Well, let me tell you!
First, you get that payout check from your insurance, and you get the repairs done. Then you take the car to a brake/light inspection place (#4 above), and get that “certificate” (much like a smog certificate, it’s an “official” record that says your vehicle passed this test). Actually, if the vehicle hasn’t been smog-checked recently, you probably need that too. Mine was just done in 2016 so it wasn’t necessary.
Ooh! Another database tie-in. Okay, we all know a car’s VIN is like the primary key of the DMV’s vehicle database, right? Plate#s you can change, but the VIN is etched in stone steel. But they’re largely sequential – so two 2000 Honda Accords are going to have mostly the same characters in their VINs, up to the last, say, 2-6 numbers (ish.. I’m nowhere near knowledgeable enough about the system, I’m just guessing based on my observation of what happened to me). So when the paperwork comes back from the insurance, it ends up with the wrong VIN, off by 3 #s at the end. But I don’t realize this until I check with the DMV as I’m filing the accident report. Also, you can use online services to look up a VIN and find the basic info about it, but again, because I had such similar VINs (my correct one, and the insurance’s one from the papers), both turned up the same descriptions, down to the body style and trim level (4 door sedan, LX, if you’re curious). The only way we actually found the mistake was that the DMV was looking up “ownership” info based on the VIN, and when the agent read me the name on file, I was like “whodat?”, since it wasn’t me or my father, and then I went back to my pinkslip and checked it there, as well as on the car’s door-panel.
The lesson here is, always double-check your VIN when filing paperwork, especially with the DMV. Moving on.
Before you go further, you need to actually make sure that the insurance and/or the salvage yard has officially notified the DMV of the vehicle being a “total loss”. (See #3 in the list.) In my case, they hadn’t – it had only been a month (between the actual payout and the first time I went to the DMV). So I have to check again before I go back.
But, since I was there, I made the DMV agent answer all my questions and specify exactly what I needed to do to complete this process, and the order in which to do it. Which is why I’m now writing this and sharing with you!
Once that notification is done, the DMV will have record of the vehicle being a “total loss”, or “salvage”. Then you can make a new DMV appointment, go in, and get #5 and #1-2 done all at the same time, in that order. I.e., go to the “inspection” or “inspector” side first, have them do the inspection and fill out the form (REG 31). Then go to the appointment line and take all your paperwork to the agent that calls you. So that’s your “inspection-passed” form (REG 31), your salvage title form (REG 343), your salvage certificate form (REG 488c), and your brake & light certificates. If you have a copy of the insurance co’s REG 481, might as well bring that too! You also need your license plates – you have to “surrender” them, which means turn them in and get new ones (not that same day, obviously – I think they still mail them to the DMV and you have to go pick them up… but I’ll find out soon).
Finally, to add a little icing on this crap-cake. I was driving the Hyundai to work, literally the next day, and I got rear-ended by another driver who wasn’t paying attention at a red-light. Again, super low speed, minor damage, but, another visit to the body shop for that poor black Elantra, and another week with a rental car. (Hyundai Santa Fe this time, which is actually quite nice, and if we need a small/mid SUV in the future, I’d definitely consider it; but due to my commute, we swapped for another Ford Fusion, this time the hybrid model, which again, I enjoyed, but the wife did not. Hey, you win some, you lose some.)
So that’s the story of how we totaled two cars (and damaged one car twice) in less than 4 months.
And that’s the reason I’m now taking a van-pool at least 2 days a week.
I’d always been a fairly safe & cautious driver, but I’ll admit, this long commute had turned me into a bit of a road-rager. Impatient would be the polite term. After all this, I’m back to my old cautious slow & steady ways… for the most part. I still get little flashes of panic when I go by the intersection where the Mazda wreck happened, and I’m always reminding the wife to stay cool and drink her water. She’s never had that happen before, and never felt like it since, so I’m sure it was a one-time fluke, but still.. the DMV wants her to re-test to get her license back, even after her doctors cleared her to drive. That’s a whole other topic, for another time. I will note that none of these incidents were due to cell-phone use, so at least we’re not guilty of that particular vice.