T-SQL Tuesday #99: Counting Rows the Less-Hard-Way

We can get our row count, and min & max date values, without ever touching the actual source table!

Advertisements

This month’s invite courtesy of Aaron Bertrand (B | T), whose “bad habits” blog series still inspires many an impassioned debate or engaging argument discussion on a regular basis among DBAs & Developers alike.

And yes, I’m taking the easier of the two “dealer’s choice” choices — the SQL focused one.  (I’m not big on sharing/blogging personal stuff, at least not here; I may one day start another blog for that, or perhaps just occasionally post more #off-topic stuff , but for now you’ll have to be content with my stories of vehicle troubles and the occasional movie-geekery).

So, without further ado…

By the way, what is ‘ado’ and why should there be no further of it?

art-vandelay-importer-exporter
Accidentally apropos on many levels…

Counting Rows in Really Big Tables

Previously touched on here, tables of unusual size (TOUSes) can be tricky.  You don’t want to lock them up for a long period of time, but you often need to gather information about them (such as row count, size, range of values) to perform some kind of operational maintenance with/on them.  In particular, Aaron’s post on “counting rows the hard way” inspired me to look into this a bit more and try to come up with a clever-ish way of finding out some basic “shape of data” info without actually querying (scanning) the table itself.

To start with, it’s actually really simple to get the total row-count from a few system catalog views — Aaron’s already shown you that, so I won’t repeat.  My interest is more in questions like “How many rows match a where-clause?” or “What are the min & max values for thatColumn?”

For this post, I’ll be focusing on a particular kind of table — the “history” or “transaction” table.  The idea here is that you have a record of “every time some event happens in/to some entity”.  A very common example is audit-trail tables, which I’ve been dealing a lot with lately.  Another common example is a “transaction history” table, such as, in our new favorite MSSQL demo database WideWorldImporters, the table Warehouse.StockItemTransaction​.  It’s the 2nd largest table in the db at 260-some-thousand rows.  {The largest is a multi-million-row beast that is actually the system-versioned aka temporal table behind a “normal” table; I might build a phase-2 example around this, but not today.}  So, while our queries won’t be super slow, we’ll get enough of an idea of what’s bad & good from measuring our IO stats (with SET STATISTICS IO ON).

TL;DR: The demo script is available here; the headers below correspond to the comment-lines of the same name, but I’ve left enough commentary in the SQL itself to keep the average reader on-track, so feel free to check it out ahead of time.  But do keep reading at some point!  :o)

A. Gathering Some Intel

First up, we have good ol’ sys.sp_spaceused.  This gives you some sizing info about the table, including its row count.  There’s a “disclaimer” circulating around out there that it’s not “up to the millisecond accurate” , i.e. it might not have the most current row count if someone else is in the middle of an insert operation or whatnot.  But for all intents & purposes, you can consider it truth.

Then you have the “hard ways” that people typically use — and that Aaron, again, covered just fine in his post on the subject, so I won’t spend any more time there.

But what if I want to count rows matching a where clause (a predicate)?  And in dealing with a typical history/transaction table, the predicate is almost always “between such and such dates”.  (Of course we won’t literally use the between operator, as we have been chastised severely; we know best to use >= and < !)  Also, I want to know the MIN and MAX of said dates in the table.  Lord knows we’re gonna be doing some table-scanning.

B. Ok, Let’s Try an Index

In their benevolent wisdom, the SQL deities decided not to give us an index on WideWorldImporters.Warehouse.StockItemTransactions.TransactionOccurredWhen.

BTW, how’s that for a verbose column name?  What, TransactionDate not good enough?  I suppose it isdatetime2 after all, but still…

So we create an index on it, to see if that helps our poor “count where dates” query.  And behold, it does!  We’ve cut our # of logical reads down by about 90% (from 1900 to 200, if you’re following along in the script).  That’s fantastic, but… we can do better.  Because if the table is, say, 500 million rows instead of 260k, that’s about 400,000 logical reads, which.. could definitely suck.

C. The Better Way

Again, the script has an ode to Aaron’s query on sys.partitions/tables to get the row-count from the meta-data.  Then the real fun begins.

There’s a system DMV (or probably ‘DMF‘ – dynamic management function) called sys.dm_db_stats_histogram, which takes the table’s object_id and the index’s index_id as arguments.  It gives you, obviously enough, the statistics histogram of the statistics object corresponding to that index.  We want to store its output in a temp-table (or even a real table — go nuts!) so we can query it some more.

--For example, if our new index is index_id 7:
sys.dm_db_stats_histogram(OBJECT_ID('Warehouse.StockItemTransactions'), 7)

So we create our #StatsHist table (“hist” being an abbreviation for “histogram”, not “history”, though in retrospect that’s probably not worth the possible confusion), and we populate it with the meta-data from Warehouse.StockItemTransactions and its new index that we just created (on TransactionOccurredWhen).  Poof!  We have an easy way of showing min/max values in that column!  Well… almost.  We have to convert the variant datatype to an understandable & aggregate-able (probably a made-up word.. aggregable? aggregatable?) type.  So we add a column range_hk_proper of type datetime2 and populate it with the converted values of range_high_key from the stats-output.

There!  Now we’re cookin’ with gas.  Our min/max/count query, and our “count where date-range” query, run in mere milliseconds, without ever touching the actual source table.  So we don’t lock it up or block anybody else from writing to it, even in the most pessimistic isolation levels.

Except when you created that index we needed on the date column.

Yes, I know.  What we’re hoping is that the tables we deal with in the “real world” already have such an index that we can take advantage of.  If not, well, that’s what maintenance windows are for.  And you better believe you’re gonna need that index sooner or later.

cooking-with-mustard-gas
Been a while since I used a Family Guy meme….

Where To Next?

Ostensibly, this whole thing could probably be turned into a stored-proc so you could run it “on demand” for any table that had a date or datetime column which you wanted to get such information about.  It’d have to do a lot of error-checking, of course — it wouldn’t work if you don’t have such a column, and if there’s no index on it, and probably a myriad of other ‘gotchas’ that I’m not thinking of at the moment.  But I did try to lay the groundwork for improvement. #StatsHist stores schema & table name too, so if you felt like turning it into a mini-data-warehouse holding a BUNCH of stat-histograms for a whole mess of tables, you could definitely do that.  And then you could run some basic analytics on it — min/max/avg, counts by year/month/day, etc.

Sounds like fun, no?  ;o)

The Passing of the Torch

Did I mention documentation?

It’s always hard to say goodbye to a colleague, especially someone who’s so central and ingrained in the company lore and holds so much of the “tribal knowledge”.  Hell, I was that guy just a couple years ago.

calvin-brain-dump
Can you just leave your whole brain right there on the desk? Thanks.

So now I’ve seen a couple such old-hats move on from my current team, and seeing both sides of the proverbial torch-passing is interesting.  There’s definitely some very common, very important things that we should always do.

Documentation, documentation, and more documentation.

Indeed.  Also, finishing critical tasks, handing off in-flight projects, re-assigning tickets, talking to managers, prepping teammates for the work overflow, and cleaning out that huge buildup of clutter that you’ve collected over the years.  Virtual or physical… often both!

Unsurprisingly, where we all seem to differ widely is the human aspects.  Breaking the news, saying goodbyes, doing those last-minute get-togethers and send-offs.  What do those last few weeks and days look like?  For some, it’s just business-as-usual up to the last minute — they’re literally so busy they have little other choice.  That’s how it was with the helpdesk manager we parted with last year.  I used some of the time to put together documentation and thank-you letters, which I hope ended up being helpful.  Database diagrams were printed and taped.  Wikis were written.

But the main thing is to make sure you exchange contact info and stay in touch.  It gives the team a sense of comfort, knowing they can reach back out when those random questions that nobody’s thought about for several months resurface.

keep in touch and stay awesome
KITSA!

I’ve learned a lot from those folks that took the time to pass on their knowledge and made the effort to keep in contact.  And I appreciate them for that!  Today I’ll thank one of my exiting managers; she knows who she is.  She taught me a lot about our internal application stacks, integration and interop, company culture, tribal knowledge, and not standing for anybody’s BS, including my own.  Good luck with consulting, stay in touch, and kick some butt!

That’s all for this week.  I promise I’ll work on that “database collation problems” post soon…  :o)

T-SQL Tuesday #98: Orphaned Users Redux

It’s that time again!  The first #Tsql2sday of 2018.  Thanks to the Blobeater for this month’s invite: “your technical challenges conquered”.

Because I’m already ridiculously late, I have a short one.  This is about orphaned users — you know, when you restore a database and its users aren’t mapped to the server logins that they should be or used to be.

orphan-movie-poster
Not that kind of orphan… pretty decent movie tho!

The typical solution is sp_change_users_login with the auto_fix or update_one option.  But guess what?  Yep, that’s deprecated.  By the way, did you know that it also has a report option?  Apparently that’s got some bugs…ish?  Try it sometime and see — compare the output of sys.sp_helpuser where the ‘LoginName’ column is null, with sp_change_users_login 'report'.  Preferably on a DB you’ve restored from another server.  😉

So what’s the correct solution?  ALTER USER [theUser] WITH LOGIN = [theLogin].  Simple, no?  Let’s get more general.  Could we come up with a half-decent way do apply this kind of fix dynamically?  Well sure, the nice folks at DBATools have already solved that problem.  And that’s great, really.  But just in case that doesn’t work… ^_^

One of the many things I love about SQL Prompt is the right-click option to “Script as INSERT” (from the results grid).  This is a quick & easy way to built a temp-table for the results of an exec statement so you can do the ol’ insert #tmp exec sys.sp_blah !  Then we can query the list of DB users for the null LoginNames and write a little set of queries to fix them!  Sound good?

UPDATE: Behold the code!

Happy Tuesday!

PS: Coincidentally, today’s (Thursday) SQL Server Central newsletter featured a very similar post by a gentleman over at Madeira Data.  Go check it out, it’s another great solution to this problem!  And while you’re at it, get the SQL Server Radio podcast (created by a couple guys from the same company) – it’s a terrific addition to your iTunes library.

Movie Reviews and the Killer Database Collation

If you have a core database using a different collation than the rest of the DBs around it, BAD THINGS HAPPEN.

And we’re back!  Hi folks, thanks for being patient with my December hiatus.  The holiday season is always a little hectic but this year it felt especially sudden.  And hey, you all have better things to do than read a blog in between the home cooked meals and the family gatherings.. like sleep, shop, and go see all the new movies!

Thanks to both Pitch Perfect 3 and the latest New Year’s Rockin’ Eve, Britney’s “Toxic” is now stuck in my head, so that’s fun.

britney-stewardess-toxic
I think I’m ready now… for 2018.

Some of you may not know this, but I’m a big movie nerd.  Not like the weird “knows a bunch of obscure factoids about all the Tarantino movies” or whatever.  But I do quite enjoy the behind-the-scenes / making-of stuff — what used to be called “bonus features” on DVDs (remember those things??) — whenever the wife will tolerate sitting thru them with me.

Our genre of choice is generally horror.  Now, I’m gonna get nerdy on you for a bit; because there are several sub-types or horror, and I enjoy almost almost all of them.  Campy, creepy, fun, found-footage, gory, spooky, slasher, supernatural, tense, psychological, revenge, deconstruction, possession.  For the uninitiated, “deconstruction” is like 2012’s Cabin in the Woods — it pokes fun at the tropes while building on them in unique ways.  Those are one of my favorite kind; that one in particular is definitely in my top 10 all-time.

So to kick off this year, before diving back into the technical stuff, I’d like to give you a coupe lightning reviews of some horror movies that we’ve watched that are perhaps underrated or you may have missed.

  • The Babysitter (2017) – comedy/deconstruction. A young preteen boy, whose parents are gone a lot, has a great friendship with his older teen babysitter, but one night decides to spy on what she and her friends do after he goes to bed. And well, crap hits the fan.  Lots of fun, eye candy, and slapstick violence. 👍👍
  • Patchwork (2015) – campy/revenge. 3 girls are Frankenstein’d together and have to overcome their mental differences and physical struggles to piece together the perpetrator and hopefully exact some revenge. Superbly acted by the lead lady, plenty of violence and just enough funny bits to keep it going. 👍
  • Happy Death Day (2017) – slasher/deconstruction. Think Groundhog Day but with a college chick being killed by a masked marauder repeatedly.  She must try to find out who it is before it’s too late!  Somewhat predictable but still entertaining and engaging. 👍
  • Incarnate (2016) – possession/supernatural. A somewhat unique twist on the genre, a brain doc frees people from possession by mind-sharing & getting the person back in control of their own consciousness.  Think Inception meets Exorcist.  Very well-acted, convincingly scary demon, and nicely twisted ending. 👍👍
  • Demonic (2015) – creepy/found-footage. Bit of a misnomer, as it has nothing to do with demons; it’s about a ghost-summoning gone horribly wrong resulting in the deaths of all but 1 (ish?) member of the group that originally attempted said ritual.  Frank Grillo is always on-point.  Very engaging. 👍
  • Last Shift (2014) – gory/creepy/demon-y. Rookie cop gets stuck with the last watch in a soon-to-be-shut-down police station, chaos ensues.  Literally, this is some crazy crap; scary and bloody.  Original & vastly under-hyped, has an indie vibe but looks & feels professional-grade. 👍👍

Most of these should be stream-able.  So check ’em out!

Now on to the SQL stuff.

A not equal a
borrowed from the man himself, Pinal Dave =)

Collations are Hard

If you ever have to integrate a vendor database into your existing environment, and the vendor ‘mandates’ their DB use a certain collation (which differs from the rest of your SQL instances / databases), run away screamingSrsly.

Or convince your managers that you know better, and force it into the same collation as everything else you have to integrate with.  Good luck & godspeed.

Let me give you an example.  The ERP system is being upgraded, which of course means a new (upgraded) DB as well.  Part of this upgrade seems to involve supporting case-sensitive searching/matching against name fields.  To this end, the vendor insists that the DB should use a case-sensitive collation, namely ​Latin1_General_100_CS_AS.  Problem is, the rest of your DB environment, in which a lot of stuff touches the ERP database (via joins, linked-server queries, etc.), uses the SQL default collation of SQL_Latin1_General_CP1_CI_AS.

If you follow the vendor’s mandate recommendation, guess what’s going to happen to your queries/views/stored-procedures that touch this & other DBs?  Horrible things.  Terrible performance degradation.  Wailing a gnashing of teeth from the developers, business users, and customers.

Okay, I exaggerate.  Slightly.

But it really does hurt performance, and I don’t feel like it’s talked about enough in the data professional community.  In the next post, I’ll take this problem apart a little more and try to impart some of what I’ve learned from going through the pain of dealing with the aforementioned example.

Happy 2018!

PS: Apparently this is my 50th post!!  Go me!  :o)

50-cent-face-on-50-dollar-bill
fiddy. fiddy posts.

TSQL Tuesday #96: Good Influences

This month’s invitation is brought to you by Ewald Cress (blog, twitter), who I already like based on his tagline —

finds joy in minutiae..

Yes, my friend, don’t we all.

elaine-curious-why-didnt-use-exclamation-point
I can’t spend the rest of my life coming into this stinking apartment every 10 minutes to pore of the excruciating minutiae of every single daily event!

The topic at hand is fairly non-technical, but still important: folks who have made a positive contribution to your career or professional development.  So it’s time for a shout-out!  About a year ago, I wrote about my first major career move.  There were several great influences in my first job, from the developers that taught me how to code, to the DBA who taught me how to keep cool & calm in the face of outages, to the boss who taught me the importance of time management and breadth of knowledge.

Post-mortem

Since I am way too late in posting this, and I don’t feel like waxing poetic, I’ll just say a general “thank you” to all those who’ve helped me along in my career so far, with special acknowledgement to my former boss, my current boss, the SQL family, and my own family.  Happy belated Thanksgiving and have a safe & pleasant holiday season!  I’ll have a real post again quite soon, diving back into the tech stuff.

food-coma-happy-belated-thanksgiving
hope they don’t mind me borrowing their image… =D

TSQL Tuesday 95: Big Data

This month’s party brought to you by Mr. Hammer (b|t).

mc-hammer
No, not THAT one…

I apologize in advance for all the hammertime memes.  It was just too good to pass up.  Surely he must be used to this.  Or at least not surprised by it.  =D

So, Big Data.  What is it?  Well, in simple terms, it’s the realization and acceptance of the fact that data is multi-model, multi-faceted, multi-sourced, and constantly growing.  It’s the fact that the traditional RDBMS is no longer the be-all end-all source of truth and valuable information.  It’s part of a larger ecosystem involving JSON document stores, CSV files, streaming volatile bits of data coming from random devices and user activity that loses its meaning and potential impact almost as quickly as it can be gathered and sifted and stored.

But what do we actually get out of it?  As a small-medium enterprise NOT in the software business, I have to say, not as much as the hype would have us believe.  And look, I’m not so jaded and crusty that I refuse to adapt new tech.  I Just haven’t seen a meaningful transformative business use-case for it.  Sure, we have Google Analytics telling us how our websites are doing, and someone in marketing knows something about trending our social media traffic.  Does it really help us make more money?  Heck if I know.

cease thy actions, my timepiece has indicated the necessity of mallets
Old-timey colonials can even dig it…

Here’s what I’d like to see from the thought leaders.  Give me something I can chew on — a real-world, non-hypothetical, non-frivolous, impactful use-case for adopting and implementing something like Hadoop/Spark or Azure Data Lake.  Show me how my business can realistically journey down the path of predictive analytics and what it’s going to take from our Devs, IT staff, and management to actually get there.

Because they don’t get it yet.  I have managers still worrying about how much we’re spending on a dinky little flash storage array to support the growing needs of our on-prem converged infrastructure stack.  Meanwhile the AWS bill continues to baffle, and Devs want to play with Docker and Lambda.  But we can’t seem to convince the higher-ups that they’re short-staffed on the internal-apps team, even after a minor version upgrade takes 4 hours of Ops time and half a dozen end-users doing post-mortem testing just to be sure we didn’t break anything unexpected.

I’m not here to complain.  Really.  I do want to see something amazing, something inspiring, something that shows me what Big Data truly brings to the table.  And sure, I’ve see the vendor demos; they’re all just a bit outlandish, no?  I mean, they look really cool, sure — who doesn’t want to see a chord diagram of who’s killed who is GoT? — but does that really help my business improve sales and productivity?

My point is, there’s a gap.  A chasm of misunderstanding and mis-matched expectations between what management thinks Big Data is/means, and what it takes to actually implement.  They see the pretty pictures and the fancy demos, but they don’t see the toil and sweat (or at least, in the cloud, gobs of cash) that go into building & operating the underpinnings and pipelines that drive those nice graphics.  Not to mention the fundamental issues of data quality and governance.

continue not, time for hammer it is
OK OK, last one, I swear…

So do us a favor, Big Data pundits.  Show us something real, something that “the little guy” can use to up his/her game in the market.  Something that makes a positive impact on small non-startup non-software businesses with understaffed IT & Dev teams.  But more importantly, stop glossing over the effort and resources that it takes to “do Big Data right“.  Managers and executives need to understand that it’s not magic.  And IT practitioners need to understand that it’s actually worth-while.  Because I believe you — really — that the payoff in the end is there, and is good.  But you need to convince the whole stack.


PS: I know this is a fully day late for T-SQL Tuesday, and as such, I wasn’t going to post a ping-back in the comments of the invite, but then I saw there were only 8 others, so I felt it would benefit the event if I did add my late contribution.  I’ll tweet with a modified hash-tag instead of the standard #tsql2sday, to reflect my late-ness.  Hopefully that’s a fair compromise to the community & the event’s intentions.  =)

SQL Server Performance Troubleshooting Free Scripts and Tools List

Originally posted on David Peter Hansen:
Back in the days, I used to collect a lot of different scripts, tools, and other goodies for troubleshooting SQL Server performance issues. These days, however, I tend to use what is publicly and freely available (as well as some internal stuff), and keep a list of those in my…

I don’t normally reblog.  But when I do, it’s something awesome. =D

David Peter Hansen

Back in the days, I used to collect a lot of different scripts, tools, and other goodies for troubleshooting SQL Server performance issues. These days, however, I tend to use what is publicly and freely available (as well as some internal stuff), and keep a list of those in my head.

I’ve meant to write that list down for a while, and today Chrissy asked:

So here it is…

Disclaimer: While I do work as a Premier Field Engineer for Microsoft, this is my list – this is not an official list from my employer. Others likely have a list that differs from mine.

Free scripts and tools from Microsoft
These are scripts and tools provided…

View original post 892 more words