Finding Trigger Dependencies

Preamble

In SQL, TRIGGERs are often a subject of contention.  Some data professionals have strong opinions about them, especially on the “they’re bad!” side.  But like anything, they’re a specific kind of tool that has a place in a specific problem.  One of the common scenarios for their usage is what we call an “Audit trail” — a running log of any & all changes made to a table or data-set (or even a database schema, as RedGate DLM Dashboard employs).

Yes, there are newer product features built around this very requirement, such as Change Data Capture (CDC) and Temporal Tables, but they require particular versions and editions of the product, and not every environment is going to be there.  So we need something native and compatible to the lowest common/supportable denominator.

Like most areas of the database environment, triggers are often poorly documented.  So what happens when we need to review all the triggers in a database and show what tables they’re on & what columns they’re “tracking” (sticking with the audit-trail example)?  Or, more generally, you could say, we want to find all table & column dependencies of our triggers.  Well sure, we could use the SSMS Object Explorer GUI — find the table, the triggers, right-click and “Show Dependencies”.. but that’s not good enough, for two reasons: A) it doesn’t get down to the column level, and B) it’s not script-able.

System catalogs and DMVs to the rescue!

dmv-line-of-people
No, not THAT DMV…

Disclaimer: this is heavy on MS Docs links.  There was a StackOverflow Q/A that helped point me in that direction, but sadly I lost the tab and didn’t save the link.

The Goals

In this script, I mostly focus on the UPDATE triggers, because they’re the most heavy-handed in terms of auditing.  Whereas with a typical INSERT or DELETE, the triggered audit action is basically “Hey, a row was inserted/deleted, here’s its primary key” — and sure you can get fancier and log what all the values were in the entire row — those are fairly straightforward.  It’s basically “all or nothing”.  In contrast, with an UPDATE audit, we want to know just what was updated, and from what (old value) to what (new value).

So the goal of documenting the dependencies is to see which columns we care about — because, inevitably, there are some columns that we either “don’t care about” for auditing, or some that just plain don’t change (by their nature — they’re an identity or a primary key, or controlled via other means).  In turn, this helps us understand how to reproduce these audit mechanisms when a better technology comes along — such as CDC or temporal tables, as alluded to earlier.

cdc-centers-for-disease-control
No, not THAT CDC… yeesh.

The Walkthru

We start with the system catalog views sys.tables and sys.triggers.  This establishes our primary relationship — a trigger ‘lives’ on (is attached to) a single table.  Then we immediately step into murky water.  A trigger, like other programmable objects, has a definition script, which you can view in a myriad of ways — Object Explorer, sys.sp_helptext, sys.sql_modules, sys.syscomments (deprecated), or OBJECT_DEFINITION(OBJECT_ID).  So if we really wanted to go spelunking, we could just read the trigger’s definition and see what columns it deals with.  But after about 2 or 3 of those, our eyes are going to glaze over and our fingers will start to twitch.  We need something better, something that can handle hundreds of triggers and output a nice unified result!

Oh wait, what’s this?  Object Explorer has a “View Dependencies” function.. aw, bummer, it doesn’t show columns.  Just tables.  As we know, most of the GUI functionality is driven by existing system meta-data/catalogs/dmv’s, so there’s definitely something that keeps track of these dependencies.  Let’s hazard a guess… sys.sp_depends or sys.sql_expression_dependencies?  The former is deprecated, the latter doesn’t give us a way to get to columns either.  But thankfully, on the docs, the deprecated note points us toward the correct path: sys.dm_sql_referenced_entities and sys.dm_sql_referencing_entities.

Now we’re cooking with gas!  So how do I use these suckers?  Well, there’s a couple intermediary joins I need to go thru, and in fact these are functions, not views, so we need to know the proper arguments.  But fear not, I have done the hard work for you!  Check out the code and let me know what you think.

But wait, there’s more!  There’s a slight monkey-wrench here.  You see, in the column dependencies, we’re getting both the “audited” columns and the “key” columns.  But we’d like those to be distinguished from each other.  When we document and/or replicate the auditing using newer technology, the primary keys aren’t what we need to track (they’re in a sense “given”, known) — we need to know which columns are “audit-able” and which aren’t.  Hence the two queries with a UNION ALL.

union-vs-confederate
That is ALSO inappropriate. Good lord, who’s choosing these pictures today?

Conclusion

So there you have it, documenting some audit-trail triggers.  Or triggers in general.  Complete with the tables & columns on which they depend.  Enjoy!  ‘Til next time folks.

Advertisements

VS SSRS Project Gotchas

I now present to you, a recent adventure in building & configuring a SSRS 2016 / VS2015 project to replace our aging SSRS 2008R2 / VS2008 setup.  In trying to make things ‘better’, I wanted to centralize the storage of the Data Sources.  I found a tip on StackOverflow with a neat (albeit hacky) idea on how to do this.  So I started diving into it.  Here’s how that went.

The Setup

I have a bunch of published reports on a ‘real’ report server (SSRS instance), which were created off-the-cuff using Report Builder 3.0 (no source-control connection or anything).  So the first thing I had to do was go and download each RDL (thanks, Microsoft… ugh!), to my local VS2015 project folder, so I could bring them into my SSRS project.  I didn’t bother trying to download/copy the Data Sources (they’re basically just connection-strings, and anyway you can’t download them from Report Manager), so I settled for re-creating those in VS.

Then it was time to set up the solution & projects.  I prefer to organize my reports in context-relevant folders, like “Operations”, “Marketing”, etc.  In order to do this sensibly, within the VS solution, you need to create a Project for each major fold you’ll have.  Makes sense; this is how the old solution was constructed too.  Here’s where I tried to improve things:  I created a “Datasources” project, which would house just the shared Data Sources.  Here’s where I created my new data-sources, setting them up with saved SQL auth (login & pwd).  Sure, maybe not the “best practice” but it keeps things simple — my SSRS user only ever has read-only db access, and the pwd is managed well enough to keep auditors happy.

Thus, my plan (in following the SO tip) was to have all the other projects’ Data Sources be pointers to these shared Data Source files (RDS​’s).  Sounds good in theory, right?

what could possibly go wrong
ooh, sparkly!

Starting to Implement

Well.  I started small with just one project, one report, and one data source.  I went into my Nate test project, did “add existing item” into the Reports folder, and browsed to pick the rdl that I had just downloaded from my live SSRS server.  I then went to my Datasources project, copied the rds that I needed, pasted it into Nate test project’s Shared Data Sources folder.

Now at this point there are 2 copies of this file in my solution folder.  We don’t want that.  So I opened up trusty ol’ Notepad++ on the Nate test.rptproj file, and edited the XML node under <DataSources><ProjectItem><Name>My-Source.rds</Name><FullPath>My-Source.rds</FullPath></ProjectItem></DataSources> , changing the FullPath node value to <FullPath>..\Datasources\My-Source.rds</FullPath>.  I then deleted the physical copy of the file at \MySolution\Nate test\My-Source.rds , so that only the shared one physically remains in the filesystem (\MySolution\Datasources\My-Source.rds).

Another way to accomplish this, you may read, is to right-click Shared Data Sources within your project, e.g. Nate test, and say “Add existing item..”, and go select the global shared RDL from your master Datasources project-folder.  However, this still results in a physical copy of the file within Nate test folder!  So in either case you’ll end up doing some editing & deletion.

With me so far?  Okay, here’s where it gets interesting.  I could successfully build and deploy this report to my SSRS server, verifying that it showed my “new thing” (I added a text-box that simply said “This is the new hotness!”).  But I could not preview the report in VS!  Sad panda.  And of course, the most vague generic error message in the world:

An error occurred during local report processing
An error occurred during report processing

sarcasm-smiley
How specific and informative!

More Googling led to a somewhat random tip in the vein of “Go to the report properties, Open the report’s Data Source, and re-select the Shared Data Source from the dropdown”.  Combine with “Edit the Shared Data Source, ensure your saved login/pwd is filled in, and Save it”.  Presto!  Previewing now worked.

But why is this?  Well, as it turns out, it’s not all that surprising.  The RDLs are stored without the saved login/pwd, because of course they’re just plain-text XML files.  So to “protect you”, MSFT doesn’t store your SQL auth credentials, even if (and now matter how many times) you click “Save my password” on the connection-properties window (connection-string builder).

Great.  Well, I did that once, for Nate test project.  Let’s see what happens with another project (aka report folder), say Test again.  So I add an existing RDL, I copy-paste the Data Source from the global shared Datasources project, manually edit the rptproj in Notepad++, delete the copied rdl, etc.  I then try to preview the report… and behold, the same maddeningly vague error message!

“Well duh!” you may think to yourself, “the credentials still aren’t stored anywhere un-encrypted!”.  Yes, dear read, you are correct.  But do I want to right-click on this project’s Data Sources, edit it, fill in the credentials, save, rinse, repeat, for every single project?!  A resounding NO!

nobody-got-time-old-fashioned
ANGTFT

So what should I do?  I could go type in the credentials to the actual connection-strings contained within the ‘master’ rds files.  Of course then they’re in plain-text for all to see… but wait, do I care?  Another no.  Let’s go edit some connection-strings!

As a reminder, here’s what they will look like:

<ConnectString>Data Source=MyServer;Initial Catalog=MyReportDB;User ID=MyReportUser;Password=ThisIs@nAw3s0meP@ssw0rd!;</ConnectString>

Why did MSFT choose to name the XML node ConnectString in their rds schema instead of ConnectionString?  Don’t ask me!

The Gotchas

Here’s the first gotcha:  My reports (RDLs) need to be re-pointed at the correct Shared Datasource, because currently, as they were downloaded from my ‘real’ SSRS server — which had my desired top-level folder layout of “Datasources”, “Nate test”, “Marketing”, etc. — their Data Sources xml node points at \Datasources\MyDatasource.rdl.  This was the correct path on the SSRS server and still will be the correct path when all is said & done.  BUT, to support local debugging/previewing, the build output has to make a copy of the rds in the Project’s bin\Debug folder, which it will happily do, and then just as happily proclaim Could not find a part of the path 'C:\Users\Nate\Documents\Visual Studio 2015\Projects\MyReportSolution\Nate test\bin\Debug\Datasources\MyDatasource.rdl'. — because DUH, the build system didn’t make you a ‘Datasources’ sub-folder under the Debug folder, why would it!?  So by either manually editing the rdl file to remove the ​​\Datasources\ path from the node, or by clicking on the Report’s Data Sources node in the Report Data pane and re-pointing it at the local Shard Data Source (which again, is itself a pointer to the globally shared Datasources!), you can fix this issue.

In the rdl XML, that looks like this (using strikethru to show what you remove):

<DataSources>
<DataSource Name=”MyReport_DataSource”>
<DataSourceReference>\Datasources\MyDatasource</DataSourceReference>
<rd:SecurityType>None</rd:SecurityType>
<rd:DataSourceID>gobbledegook-guid</rd:DataSourceID>
</DataSource>
</DataSources>

And there’s another gotcha:  After we do all this, if we open the rds from within VS solution-explorer, it still won’t “think” it saved the authentication!  But the connection-string will be clearly visible with the plain password.  Here’s a couple pictures of what that looks like.

shared-datasource-properties-ssrs-vs2015
Data Source properties
datasource-connection-properties-ssrs-vs2015
Connection properties of the Data Source

Oh, by the way, did I mention how many times VS crashed on me while doing these gymnastics?  At least a few.

Lessons Learned

So, what have we learned?  Well, for one, this is a crappy situation born of poor in-product support.  I should be able to configure Solution-level shared Data Sources, use them in as many Projects (within said Solution) as I want, and have VS configuration management support them; bonus points for doing so with saved & encrypted credentials.  Ideally, when we check this into source-control, we’d check in the “DEV” environment flavor connection-configs.  Then, when the reports get deployed to the “PROD” SSRS server, the same globally shared Data Sources are already present (and they don’t get over-written, thankfully by default!), configured by the DBA with prod credentials, and nobody in the development pipeline needs to know said credentials.  Yay?

But alas.  We live in an imperfect world.  I also live in a world where I’m both the Report Developer, the Report Manager, and the Production DBA.  So guess what I get to do?  That’s right, not care about the connection-string security!

Okay, yes, I care a little.  Depending on my mood.  But not enough to turn my whole dang world upside-down trying to design and implement a robust multi-tiered-environment solution for one measly SSRS repository.  Kudos to you if you do — you’re a better man than I.

Or “better woman”.  But I’m not a woman.  So I guess the correct phrasing would be “You’re a better woman than I am a man.”  No, that’s not right.  “You’re a better person than I am?”  Sure, I guess that works.

In Review

To recap:

  1. Create the solution & projects
  2. Create the global shared Datasources project, and create your Data Sources (RDS‘s)
  3. Import your Reports (RDL‘s) to each requisite report-project
  4. Manually edit your rptproj to point the data-sources at the central shared path
  5. Manually edit your RDL‘s to remove previously established paths to the data-sources
  6. Configure your rptproj‘s to output to the correct folders on the report server — this should be done by default, i.e. VS has the “intelligence” to guess these for you, but just double-check.
  7. Build, Preview, and Deploy!
  8. Commit it all to source-control and let the collaboration begin.

And now to go automate all this with PowerShell… right after this beer.

Anyway.  Until next time, folks!  If I’ve made any mistakes, missteps, or otherwise offending your technical sensibilities, and/or if you know of a better way to accomplish what I’m after, I’d love to hear from you in the comments!  =)

TSQL Tuesday #96: Good Influences

This month’s invitation is brought to you by Ewald Cress (blog, twitter), who I already like based on his tagline —

finds joy in minutiae..

Yes, my friend, don’t we all.

elaine-curious-why-didnt-use-exclamation-point
I can’t spend the rest of my life coming into this stinking apartment every 10 minutes to pore of the excruciating minutiae of every single daily event!

The topic at hand is fairly non-technical, but still important: folks who have made a positive contribution to your career or professional development.  So it’s time for a shout-out!  About a year ago, I wrote about my first major career move.  There were several great influences in my first job, from the developers that taught me how to code, to the DBA who taught me how to keep cool & calm in the face of outages, to the boss who taught me the importance of time management and breadth of knowledge.

Post-mortem

Since I am way too late in posting this, and I don’t feel like waxing poetic, I’ll just say a general “thank you” to all those who’ve helped me along in my career so far, with special acknowledgement to my former boss, my current boss, the SQL family, and my own family.  Happy belated Thanksgiving and have a safe & pleasant holiday season!  I’ll have a real post again quite soon, diving back into the tech stuff.

food-coma-happy-belated-thanksgiving
hope they don’t mind me borrowing their image… =D

Totaling Cars for Fun & Profit Part 2

In a previous (and very long-winded) post, I talked about my experience with wrecking a couple cars.

I don’t recommend it.

But, I did promise a follow-up, in which I provide excruciating detail on how we “retain a salvage vehicle” for further use.  Or, in layman’s terms, “Dude, I wanna keep my car!“.

Let’s go to the DMV!

Said no one, ever.

A review of the steps and the corresponding DMV forms.

  1. Make sure you have your title, aka “pink slip”.
  2. Get your brake & light inspections done at a certified auto shop — use a site like brakeandlightinspectionlocation or dmv.com to find one.
  3. Wait for the insurance co. to send your copy of REG 481, “Salvage Vehicle Notice of Retention by Owner”.  They submit this to the DMV for you as well — but it helps to have a copy on-hand when you go in.
  4. Get form REG 343, “Application for Title or Registration”.  Fill out sections 1, 2, 4, and 9 (at least; others if applicable).
  5. Get form REG 488c, “Application for Salvage Certificate or Nonrepairable Vehicle Certificate”.  Fill out section 1 with your info (applicant) & your insurance co’s info.’
  6. Make the DMV appointment.  Bring all of the above.  The receptionist will be impressed that you’ve made it this far.  =)
    • Technically, the only things you actually need are the title & inspection certs.  The DMV receptionist can give you all the rest, assuming they’ve gotten the insurance notice (481) on file.  As I said, it doesn’t hurt to bring a copy.  The receptionist can also help you if you’re unsure of what sections to fill on the forms.

  7. The receptionist will give you REG 156 for your license plate exchange.  You can just fill this out while you wait for the vehicle inspection, or to be seen by the next agent.
  8. They’ll do the vehicle inspection, and the inspector will fill out REG 31.
  9. With all these papers in hand, you’re finally ready to perform the transaction!  You’ll pay the salvage title fee and the inspection fee, exchange your plates for new ones, and get a new registration card & stickers.
  10. Congratulations, you now own your P.O.S. / clunker / beater / whatever term of endearment you choose to call your beat-up-yet-still-running car!

Here are some fun sample pictures of the paperwork.

As it turned out, some of the forms that I’d filled out ahead of time were completely unnecessary, while others were redundant or replaced.  The thing that took the longest was waiting for the DMV to be notified that the vehicle was a salvage; apparently they’re a bit backlogged.

you don't say?
Shocking!

Here’s another little bit-o’-fun.  The front license plate on the Honda (remember, I said part of the process is giving the plates over to the DMV in exchange for new ones?) is a biatch to remove without proper tools.  I borrowed a standard pair of pliers from the nice young man behind the desk and struggled out there with the hex-nuts for nearly 15 minutes before he came out and said “Dude, don’t worry about it, we’ll call it destroyed”.  FYI, the proper tool is a socket set with both SAE & metric, somewhere between 3/8 inch and 11mm.  Apparently whoever installed this plate couldn’t decide between the two measurements systems so he/she used some of each.

ANYWAY.

end rant

Keeping your salvage vehicle does cost a bit, and is a small hassle.  But in the end, it can be worth the trouble, IF:

  • You are able to get it repaired for a small portion of the total-loss offer (what your insurance pays you)
  • You don’t care about how it looks (because that’s usually what makes the repair job much cheaper — not caring about the body work!)
  • You don’t ever plan on selling it again (because that’s what the DMV make sure of when they register it as a salvage)

Thanks for reading, and drive safe!

safety-first

TSQL Tuesday 95: Big Data

This month’s party brought to you by Mr. Hammer (b|t).

mc-hammer
No, not THAT one…

I apologize in advance for all the hammertime memes.  It was just too good to pass up.  Surely he must be used to this.  Or at least not surprised by it.  =D

So, Big Data.  What is it?  Well, in simple terms, it’s the realization and acceptance of the fact that data is multi-model, multi-faceted, multi-sourced, and constantly growing.  It’s the fact that the traditional RDBMS is no longer the be-all end-all source of truth and valuable information.  It’s part of a larger ecosystem involving JSON document stores, CSV files, streaming volatile bits of data coming from random devices and user activity that loses its meaning and potential impact almost as quickly as it can be gathered and sifted and stored.

But what do we actually get out of it?  As a small-medium enterprise NOT in the software business, I have to say, not as much as the hype would have us believe.  And look, I’m not so jaded and crusty that I refuse to adapt new tech.  I Just haven’t seen a meaningful transformative business use-case for it.  Sure, we have Google Analytics telling us how our websites are doing, and someone in marketing knows something about trending our social media traffic.  Does it really help us make more money?  Heck if I know.

cease thy actions, my timepiece has indicated the necessity of mallets
Old-timey colonials can even dig it…

Here’s what I’d like to see from the thought leaders.  Give me something I can chew on — a real-world, non-hypothetical, non-frivolous, impactful use-case for adopting and implementing something like Hadoop/Spark or Azure Data Lake.  Show me how my business can realistically journey down the path of predictive analytics and what it’s going to take from our Devs, IT staff, and management to actually get there.

Because they don’t get it yet.  I have managers still worrying about how much we’re spending on a dinky little flash storage array to support the growing needs of our on-prem converged infrastructure stack.  Meanwhile the AWS bill continues to baffle, and Devs want to play with Docker and Lambda.  But we can’t seem to convince the higher-ups that they’re short-staffed on the internal-apps team, even after a minor version upgrade takes 4 hours of Ops time and half a dozen end-users doing post-mortem testing just to be sure we didn’t break anything unexpected.

I’m not here to complain.  Really.  I do want to see something amazing, something inspiring, something that shows me what Big Data truly brings to the table.  And sure, I’ve see the vendor demos; they’re all just a bit outlandish, no?  I mean, they look really cool, sure — who doesn’t want to see a chord diagram of who’s killed who is GoT? — but does that really help my business improve sales and productivity?

My point is, there’s a gap.  A chasm of misunderstanding and mis-matched expectations between what management thinks Big Data is/means, and what it takes to actually implement.  They see the pretty pictures and the fancy demos, but they don’t see the toil and sweat (or at least, in the cloud, gobs of cash) that go into building & operating the underpinnings and pipelines that drive those nice graphics.  Not to mention the fundamental issues of data quality and governance.

continue not, time for hammer it is
OK OK, last one, I swear…

So do us a favor, Big Data pundits.  Show us something real, something that “the little guy” can use to up his/her game in the market.  Something that makes a positive impact on small non-startup non-software businesses with understaffed IT & Dev teams.  But more importantly, stop glossing over the effort and resources that it takes to “do Big Data right“.  Managers and executives need to understand that it’s not magic.  And IT practitioners need to understand that it’s actually worth-while.  Because I believe you — really — that the payoff in the end is there, and is good.  But you need to convince the whole stack.


PS: I know this is a fully day late for T-SQL Tuesday, and as such, I wasn’t going to post a ping-back in the comments of the invite, but then I saw there were only 8 others, so I felt it would benefit the event if I did add my late contribution.  I’ll tweet with a modified hash-tag instead of the standard #tsql2sday, to reflect my late-ness.  Hopefully that’s a fair compromise to the community & the event’s intentions.  =)

SQL Server Performance Troubleshooting Free Scripts and Tools List

Originally posted on David Peter Hansen:
Back in the days, I used to collect a lot of different scripts, tools, and other goodies for troubleshooting SQL Server performance issues. These days, however, I tend to use what is publicly and freely available (as well as some internal stuff), and keep a list of those in my…

I don’t normally reblog.  But when I do, it’s something awesome. =D

David Peter Hansen

Back in the days, I used to collect a lot of different scripts, tools, and other goodies for troubleshooting SQL Server performance issues. These days, however, I tend to use what is publicly and freely available (as well as some internal stuff), and keep a list of those in my head.

I’ve meant to write that list down for a while, and today Chrissy asked:

So here it is…

Disclaimer: While I do work as a Premier Field Engineer for Microsoft, this is my list – this is not an official list from my employer. Others likely have a list that differs from mine.

Free scripts and tools from Microsoft
These are scripts and tools provided…

View original post 892 more words

Dirty Laundry

It’s time for a more thought-y, less tech-y post.  Which is mostly my excuse for not wanting to write a bunch of code at the moment.  But that’s how I started this blog, with mostly opinion pieces, trying to offer some critical thinking on how DBAs and Developers work together.  So y’all better like it!

Today’s title is brought to you by Don Henley’s tune of the same name, which is now stuck in my head, thankyouverymuch.

dirty laundry goes in a basket not in a database
Paint.net is my friend… =D

This is about data quality.  When you have “dirty data”, just like dirty laundry, and you let it sit unattended, it starts to smell.  In software, this means the “badness” seeps into other areas of the environment, affecting systems and business processes that should otherwise function smoothly.

code smell is a surface indication that usually corresponds to a deeper problem in the system.

-Martin Fowler

And, more aptly:

Data quality is corporate America’s dirty little secret.

-Paul Gillen

But what is dirty data?  Generally, it’s anything that doesn’t quite fit the ideal data model — that perfect vision of how all the bits of information in the system fit together, the shape of each data entity and how they relate to each other.  Mostly, dirty data is what happens when you allow users to type things into text-boxes, and you write those text-box contents straight into the database without any layers of validation or cleansing.  (Coincidentally, that’s also how SQL injection happens, but most of us have been scared-straight by enough years of security bloggers hammering at our thick skulls — and our favorite XKCD — that we at least sanitize our inputs before dumping them to an INSERT statement.)

Let me take a recent example from my experience.  We have an ERP system that doubles as our CRM system (which is already a pair of bad idea jeans).  How do you think customer information gets into the database?  Customer Service Reps, typing stuff.  Usually by copying from a paper form.  Or the customers themselves, using an online form.  But guess what doesn’t happen in either case?  If you said “USPS address validation“, give yourself a hand!

joker give yourself a clap
Oh goooood for youuuuuu…. </Christian Bale>

Now, being that this system is our “source of truth” for customer info, it stands to reason that lots of other business functions & processes depend on it.  For example, let’s say we send a promotional calendar to our customers of a certain “subscription level” on a yearly basis.  We’re not in the publishing business, so we contract this out to another company.  But guess what they need from us in order to complete the job and mail out those calendars?  Addresses!  So what happens when there’s a bad address in our database?  A calendar gets returned, wasted cost and materials.  Multiply that by a couple thousand and you start to turn a few heads in the C-suite.

Later, around the Marketing table, someone has a brilliant idea that they need to run a mail-merge to send out a gift-package to the top 100 customers.  So they ask the DBA for a list of said customers.  “Sure!  Here ya go, here’s a report.”  And then the complaints start coming in.

“These customers aren’t active anymore.”

Then tell your CS reps to mark them as inactive in the system.  But no, we don’t do that, we just write “inactive” in the FirstName field.

“These ones are employees.”

Fine, figure out a special indicator to add for that, so I can exclude them from the report.  But no, of course, we can’t do that either; we just put “deactivated” in the FirstName field.

“This guys is dead.”

Yeah, not even kidding.  Apparently the powers-that-be decided to keep his info in the system, but type in “deceased” to the “Address 2” line (in the US, this is customarily the apartment/suite/unit number).

he's dead jim
Let’s beam him back up but write “deceased” on his badge, that’ll be sufficient.

But mostly, the biggest complaint is that we’re getting un-deliverable/return-to-sender when we try shipping out to some of these addresses.  And why?  Because they’re not subject to any external validation and quality-control.

So what’s the data professional’s responsibility in this?  In my opinion, it’s to advocate for data quality.  There are obviously big vendors out there like Melissa Data who will sell you a service to help get you there.  APIs abound, from USPS and other official sources, so building it isn’t out of the question.

One potential roadblock is, as usual, conservatism.  The business’s ERP system is its life-blood, highly sensitive to change and very guarded by over-protective management and finicky executives.  But the smelly dirty data-laundry continues to cause problems and has real-money impacts on corp. efficiency and profit.  Unfortunately, many people tend to take the ostrich approach.

if you bury your head in the sand your ass will get burnt
No idea who this Bennett person is, but they sound smart.

So, my good people, start “doing your laundry”.  Have those conversations with your teams and managers about the current state of your data quality, and what it’s going to look like moving forward.  Make some plans, have a road-map, and understand that it’s going to involve a lot of collaboration between key players.  And good luck!