There are many great use-cases for the T-SQL
APPLY operator, and if you haven’t tried it yet, I encourage you to check out Kevin Feasel’s excellent presentation on it here. Today, I want to demonstrate a particularly interesting use-case (at least, to me!) that I found for it while tackling a real business problem — for reporting, specifically, which is what I spend a lot of my time dealing with lately.
We have a “history of estimated market values” table, which stores, for some arbitrary dates based largely on user input, values (prices) of our proverbial Widgets. In other words, we can’t guarantee we have every date in there for a given time period. Over on side B, we have a “sales” table, which tells us when a given Widget was sold and for how much. The business question, simply enough, is “When a given Widget was sold, how accurate was its ‘estimated market value’ at the time?” Or, put another way, “How closely did the sale price of that Widget match our estimated market value?”
I used two interesting bits of TSQL to accomplish this, the main one being our lovely
OUTER APPLY, in this case, because I still wanted the rows from “sales” even if we couldn’t find a closely-matching “market value” record.
If you haven’t used this operator yet, think of it conceptually like a
JOIN to a sub-query, but with the added value that you can reference the outer tables’ columns within it.
CROSS APPLY is like
INNER JOIN, and
OUTER APPLY is like
RIGHT JOINfans, no love for you here. You’re just too weird.
My other little trick involved using
TOP(1) in the apply’d sub-query, with a non-standard
ORDER BY clause. As I mentioned, within the
APPLY‘s body, you can reference the outer table’s columns. So I’m easily able to compare the
SaleDate (outer table) with the
EstimateDate (inner query). I want “the closest
EstimateDate to the
SaleDate”, which means I want the row where the difference between those two dates is the smallest. Which means making use of our friend
DATEDIFF. So let’s try:
ORDER BY DATEDIFF(day, Sales.SaleDate, MktValueHist.EstimateDate)
Do you see the problem yet? If I get a negative value, i.e. my SaleDate is way before my EstimateDate, say -100 days, that’s the top 1 row. I don’t want that! Let’s try our old friend from many many math classes ago, Mr. Absolute Value.
ORDER BY ABS(DATEDIFF(day, Sales.SaleDate, MktValueHist.EstimateDate)
That’s better! Now I have the top 1 “closest to my sale date” row from
All Together Now
SELECT Sales.WidgetID, Sales.WidgetName, Sales.Price, Sales.SaleDate
, mvh.MarketValue, mvh.EstimateDate
, Accuracy = some_made_up_mathy_thing_here
OUTER APPLY (
SELECT TOP(1) mValHist.MarketValue, mValHist.EstimateDate
FROM MarketValueHistory mValHist
WHERE mValHist.WidgetID = Sales.WidgetID
ORDER BY ABS(DATEDIFF(day, Sales.SaleDate, mValHist.EstimateDate)
WHERE Sales.SaleDate >= '20180101'
There, my completely fabricated yet totally-based-in-reality example of how to get 2018’s Widget Sale Prices with corresponding “closest to the sale-date” Market Value Estimate from our history table. You could even throw in some fancy math expression for “accuracy”, if you felt like it. Like maybe “relative difference“. =)
Note: For simplicity, we’re dealing with “whole dates”, i.e. just
datetime2. If your date values do include times, you’d want to change the
datediff interval to something more appropriate, like
And that’s how
ABS() saved my bacon today. Hooray!
Have an interesting use-case for
APPLY? Or perhaps an alternative approach to a similar problem? I’d love to hear from you! Drop me a comment, a tweet, or a LinkedIn msg. Happy summer!
PS: I’d also recommend checking out Steve Stedman’s useful “Join types poster“, if for no other reason than to have something concrete to use when explaining those concepts to your fellow cube-dwellers. It has cross/outer apply on the 2nd page in their more commonly used pattern, the TVF (table valued function) relationship.
PPS: Fine, have your meme…