Writing Diagnostic Queries is Hard Because SQL Server Still Has (Tiny) Bugs. - Brent Ozar UnlimitedĀ®


When I first got started working with SQL Server, I assumed it was flawless because it came in a sealed box from a ginormous corporation. I figured they’d tested every edge case, and that there was no way things were gonna go sideways.

These days, even though I read CU KB articles and giggle about their contents over at SQLServerUpdates.com, I’m still really happy with the quality of SQL Server. There are bugs, sure, but there are a lot of people doing really crazy stuff with databases and expecting it to work – even when there’s no way someone could possibly have thought to test that particular combination of craziness.

So when I run into a problem, I start by assuming it’s a problem with my code, because let’s be honest: it usually is. Almost always. 99.999% of the time, probably. (Hey, whaddya know, I’m achieving five nines.)

But every now and then, it’s actually SQL Server.

And when it is, I try to boil the problem down to the simplest possible code that anyone can walk through in a matter of seconds on their own machine in order to see the problem. I want someone at Microsoft to be able to highlight the code, copy it, paste it in, and go, “Oh whoa, wait a minute, he’s right.”

Here are two examples.

sys.dm_db_persisted_sku_features ignores isolation level requests.

This one bit me in sp_Blitz: I had a client that was rebuilding large columnstore indexes, and whenever they did, their job to log sp_Blitz to table would get blocked, and cause cascading blocking issues in other sessions. At first I thought c’mon, no way – I am really, really religious about using dirty reads in our procs because this ain’t financial data, and I’m fine with inaccurate results here under heavy contention. But blocking? That’s bad.

Here’s how it happens. In one session, create a clustered columnstore index:

Then in another session, in the same database, try to query that DMV:

I know, those two isolation level hints are redundant – I’m just proving the point that it’s really a bug. It gets blocked, which shouldn’t happen. Other DMVs don’t work this way.

Once I had it boiled down to that simple repro script, I filed a bug report with Microsoft, and added code in sp_BlitzIndex to simply skip that DMV by default. (You can change that by setting @SkipBlockingChecks = 0.) I wasn’t sure how quickly Microsoft would fix the bug, or how far back they’d backport the fix.

It’s been a month now without a word from Microsoft on the bug report, so I don’t hold out a lot of hope for that one. (It doesn’t help that I accidentally flagged the idea as inappropriate just now, and I have no way to unflag it, HAHAHA. Man, I *really* hate the user interface for feedback.azure.com.)

sys.identity_columns has the same problem.

Similar repro:

Then in another window:

That gets blocked, and I filed a bug report for that one too, and a Github issue for sp_BlitzIndex that I’ll work on at some point, but that one’s less urgent since you can’t (currently) set up Agent jobs to log sp_BlitzIndex to table. This one’s going to be a lot more challenging though, as you can read about in the Github issue – the workaround’s going to have to be a lot more ambitious, I think.

Are these big bugs? Not at all – but if you’re writing diagnostic queries like sp_Blitz and friends, they can turn into hours of troubleshooting trying to figure out what’s going wrong – especially given how widespread these scripts are these days.