High Performance Techniques For Microsoft SQL Server
High Performance Techniques For Microsoft SQL Server
Editor-in-chief:
Aaron Bertrand
Authors:
Aaron Bertrand
Erin Stellato
Glenn Berry
Jason Hall
Joe Sack
Jonathan Kehayias
Kevin Kline
Paul Randal
Paul White
eBook Lead:
Eric Smith
Project Lead:
Kevin Kline
Forward
It is with great pleasure that I present to you our first eBook, a collection of blog posts from
SQLPerformance.com. In the pages ahead you will find several useful, hand-picked articles that will help
give you insight into some of your most vexing performance problems. These articles were written by
several of the SQL Server industry's leading experts, including Paul Randal, Jonathan Kehayias, and Paul
White.
I want to thank SQL Sentry for making me Editor-in-Chief of the site, my esteemed colleague Kevin Kline
for helping assemble this eBook, our technical editor, Eric Smith, and all of our authors who have helped
make our content top-notch, and of course our readers who keep us motivated to keep producing
quality material. Thank you.
Aaron Bertrand
Table of Contents
Table of Contents
Best Approach for Running Totals
Split Strings the Right Way
Split Strings: Now with less T-SQL
My Perspective: The Top 5 Most Common SQL Server Performance Problems
Performance impact of different error handling techniques
Using named instances? Test your DAC connection!
What is the fastest way to calculate the median?
T-SQL Tuesday #33 : Trick Shots : Schema Switch-A-Roo
Conditional Order By
Splitting Strings : A Follow-Up
When the DRY principle doesnt apply
Hit-Highlighting in Full-Text Search
What impact can different cursor options have?
How much impact can a data type choice have?
What is the most efficient way to trim time from datetime?
Beware misleading data from SET STATISTICS IO
Trimming time from datetime a follow-up
Measuring Observer Overhead of SQL Trace vs. Extended Events
The Zombie PerfMon Counters That Never Die!
Is the sp_ prefix still a no-no?
Configuring a Dedicated Network for Availability Group Communication
Checking if a non-LOB column needs to be updated
Minimizing the impact of DBCC CHECKDB : DOs and DONTs
Performance Problems with SQL Server 2012 Enterprise Edition Under CAL Licensing
The Benefits of Indexing Foreign Keys
Quick Tip Speed Up a Slow Restore from the Transaction Log
Ten Common Threats to Execution Plan Quality
Bad cardinality estimates coming from SSMS execution plans
An important change to Extended Events in SQL Server 2012
Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?
TicketCount INT
);
GO
ALTER TABLE dbo.SpeedingTickets ADD CONSTRAINT pk PRIMARY KEY CLUSTERED ([Date]);
GO
;WITH x(d,h) AS
(
SELECT TOP (250)
ROW_NUMBER() OVER (ORDER BY [object_id]),
CONVERT(INT, RIGHT([object_id], 2))
FROM sys.all_objects
ORDER BY [object_id]
)
INSERT dbo.SpeedingTickets([Date], TicketCount)
SELECT TOP (10000)
d = DATEADD(DAY, x2.d + ((x.d-1)*250), '19831231'),
x2.h
FROM x CROSS JOIN x AS x2
ORDER BY d;
GO
SELECT [Date], TicketCount
FROM dbo.SpeedingTickets
ORDER BY [Date];
GO
Abridged results:
So again, 10,000 rows of pretty simple data small INT values and a series of dates from 1984 through May
of 2011.
The Approaches
Now my assignment is relatively simple and typical of many applications: return a resultset that has all 10,000
dates, along with the cumulative total of all speeding tickets up to and including that date. Most people
would first try something like this (well call this the inner join method):
SELECT
st1.[Date],
st1.TicketCount,
RunningTotal = SUM(st2.TicketCount)
FROM
dbo.SpeedingTickets AS st1
INNER JOIN
dbo.SpeedingTickets AS st2
ON st2.[Date] <= st1.[Date]
GROUP BY st1.[Date], st1.TicketCount
ORDER BY st1.[Date];
and be shocked to discover that it takes nearly 10 seconds to run. Lets quickly examine why by viewing the
graphical execution plan, using SQL Sentry Plan Explorer:
The big fat arrows should give an immediate indication of what is going on: the nested loop reads one
row for the first aggregation, two rows for the second, three rows for the third, and on and on through
the entire set of 10,000 rows. This means we should see roughly ((10000 * (10000 + 1)) / 2) rows
processed once the entire set is traversed, and that seems to match with the number of rows shown in
the plan.
Note that running the query without parallelism (using the OPTION (MAXDOP 1) query hint) makes the
plan shape a little simpler, but does not help at all in either execution time or I/O; as shown in the plan,
duration actually almost doubles, and reads only decrease by a very small percentage. Comparing to the
previous plan:
There are plenty of other approaches that people have tried to get efficient running totals. One example
is the subquery method which just uses a correlated subquery in much the same way as the inner join
method described above:
SELECT
[Date],
TicketCount,
RunningTotal = TicketCount + COALESCE(
(
SELECT SUM(TicketCount)
FROM dbo.SpeedingTickets AS s
WHERE s.[Date] < o.[Date]), 0
)
FROM dbo.SpeedingTickets AS o
ORDER BY [Date];
Comparing those two plans:
So while the subquery method appears to have a more efficient overall plan, it is worse where it
matters: duration and I/O. We can see what contributes to this by digging into the plans a little deeper.
By moving to the Top Operations tab, we can see that in the inner join method, the clustered index seek
is executed 10,000 times, and all other operations are only executed a few times. However, several
operations are executed 9,999 or 10,000 times in the subquery method:
So, the subquery approach seems to be worse, not better. The next method well try, Ill call the quirky
update method. This is not exactly guaranteed to work, and I would never recommend it for production
code, but Im including it for completeness. Basically the quirky update takes advantage of the fact that
during an update you can redirect assignment and math so that the variable increments behind the scenes as
each row is updated.
(
[Date] DATE PRIMARY KEY,
TicketCount INT,
RunningTotal INT
);
DECLARE @RunningTotal INT = 0;
INSERT @st([Date], TicketCount, RunningTotal)
SELECT [Date], TicketCount, RunningTotal = 0
FROM dbo.SpeedingTickets
ORDER BY [Date];
UPDATE @st
SET @RunningTotal = RunningTotal = @RunningTotal + TicketCount
FROM @st;
SELECT [Date], TicketCount, RunningTotal
FROM @st
ORDER BY [Date];
Ill re-state that I dont believe this approach is safe for production, regardless of the testimony youll hear
from people indicating that it never fails. Unless behavior is documented and guaranteed, I try to stay away
from assumptions based on observed behavior. You never know when some change to the optimizers
decision path (based on a statistics change, data change, service pack, trace flag, query hint, what have you)
will drastically alter the plan and potentially lead to a different order. If you really like this unintuitive
approach, you can make yourself feel a little better by using the query option FORCE ORDER (and this will try
to use an ordered scan of the PK, since thats the only eligible index on the table variable):
UPDATE @st
SET @RunningTotal = RunningTotal = @RunningTotal + TicketCount
FROM @st
OPTION (FORCE ORDER);
For a little more confidence at a slightly higher I/O cost, you can bring the original table back into play, and
ensure that the PK on the base table is used:
UPDATE st
SET @RunningTotal = st.RunningTotal = @RunningTotal + t.TicketCount
FROM dbo.SpeedingTickets AS t WITH (INDEX = pk)
INNER JOIN @st AS st
ON t.[Date] = st.[Date]
OPTION (FORCE ORDER);
Personally I dont think its that much more guaranteed, since the SET part of the operation could potentially
influence the optimizer independent of the rest of the query. Again, Im not recommending this approach,
Im just including the comparison for completeness. Here is the plan from this query:
Based on the number of executions we see in the Top Operations tab (Ill spare you the screen shot; its
1 for every operation), it is clear that even if we perform a join in order to feel better about ordering,
the quirky update allows the running totals to be calculated in a single pass of the data. Comparing it to
the previous queries, it is much more efficient, even though it first dumps data into a table variable and
is separated out into multiple operations:
This brings us to a recursive CTE method. This method uses the date value, and relies on the assumption
that there are no gaps. Since we populated this data above, we know that it is a fully contiguous series, but in
a lot of scenarios you cant make that assumption. So, while Ive included it for completeness, this approach
isnt always going to be valid. In any case, this uses a recursive CTE with the first (known) date in the table as
the anchor, and the recursive portion determined by adding one day (adding the MAXRECURSION option
since we know exactly how many rows we have):
;WITH x AS
(
SELECT [Date], TicketCount, RunningTotal = TicketCount
FROM dbo.SpeedingTickets
WHERE [Date] = '19840101'
UNION ALL
SELECT y.[Date], y.TicketCount, x.RunningTotal + y.TicketCount
FROM x INNER JOIN dbo.SpeedingTickets AS y
ON y.[Date] = DATEADD(DAY, 1, x.[Date])
)
SELECT [Date], TicketCount, RunningTotal
FROM x
ORDER BY [Date]
OPTION (MAXRECURSION 10000);
This query works about as efficiently as the quirky update method. We can compare it against the
subquery and inner join methods:
Like the quirky update method, I would not recommend this CTE approach in production unless you can
absolutely guarantee that your key column has no gaps. If you may have gaps in your data, you can
construct something similar using ROW_NUMBER(), but it is not going to be any more efficient than the
self-join method above.
And then we have the cursor approach:
DECLARE @st TABLE
(
[Date]
DATE PRIMARY KEY,
TicketCount INT,
RunningTotal INT
);
DECLARE
@Date
DATE,
@TicketCount INT,
@RunningTotal INT = 0;
DECLARE c CURSOR
LOCAL STATIC FORWARD_ONLY READ_ONLY
FOR
SELECT [Date], TicketCount
FROM dbo.SpeedingTickets
ORDER BY [Date];
OPEN c;
FETCH NEXT FROM c INTO @Date, @TicketCount;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @RunningTotal = @RunningTotal + @TicketCount;
INSERT @st([Date], TicketCount, RunningTotal)
SELECT @Date, @TicketCount, @RunningTotal;
FETCH NEXT FROM c INTO @Date, @TicketCount;
END
CLOSE c;
DEALLOCATE c;
SELECT [Date], TicketCount, RunningTotal
FROM @st
ORDER BY [Date];
which is a lot more code, but contrary to what popular opinion might suggest, returns in 1 second. We
can see why from some of the plan details above: most of the other approaches end up reading the
same data over and over again, whereas the cursor approach reads every row once and keeps the
running total in a variable instead of calculating the sum over and over again. We can see this by looking
at the statements captured by generating an actual plan in Plan Explorer:
We can see that over 20,000 statements have been collected, but if we sort by Estimated or Actual Rows
descending, we find that there are only two operations that handle more than one row. Which is a far
cry from a few of the above methods that cause exponential reads due to reading the same previous
rows over and over again for each new row.
Now, lets take a look at the new windowing enhancements in SQL Server 2012. In particular, we can
now calculate SUM OVER() and specify a set of rows relative to the current row. So, for example:
SELECT
[Date],
TicketCount,
SUM(TicketCount) OVER (ORDER BY [Date] RANGE UNBOUNDED PRECEDING)
FROM dbo.SpeedingTickets
ORDER BY [Date];
SELECT
[Date],
TicketCount,
SUM(TicketCount) OVER (ORDER BY [Date] ROWS UNBOUNDED PRECEDING)
FROM dbo.SpeedingTickets
ORDER BY [Date];
These two queries happen to give the same answer, with correct running totals. But do they work
exactly the same? The plans suggest that they dont. The version with ROWS has an additional operator,
a 10,000-row sequence project:
And thats about the extent of the difference in the graphical plan. But if you look a little closer at actual
runtime metrics, you see minor differences in duration and CPU, and a huge difference in reads. Why is
this? Well, this is because RANGE uses an on-disk spool, while ROWS uses an in-memory spool. With
small sets the difference is probably negligible, but the cost of the on-disk spool can certainly become
more apparent as sets get larger. I dont want to spoil the ending, but you might suspect that one of
these solutions will perform better than the other in a more thorough test.
As an aside, the following version of the query yields the same results, but works like the slower RANGE
version above:
SELECT
[Date],
TicketCount,
SUM(TicketCount) OVER (ORDER BY [Date])
FROM dbo.SpeedingTickets
ORDER BY [Date];
So as youre playing with the new windowing functions, youll want to keep little tidbits like this in mind:
the abbreviated version of a query, or the one that you happen to have written first, is not necessarily
the one you want to push to production.
The Actual Tests
In order to conduct fair tests, I created a stored procedure for each approach, and measured the results
by capturing statements on a server where I was already monitoring with SQL Sentry Performance
Advisor (if you are not using our tool, you can collect SQL:BatchCompleted events in a similar way using
SQL Server Profiler).
By fair tests I mean that, for example, the quirky update method requires an actual update to static
data, which means changing the underlying schema or using a temp table / table variable. So I
structured the stored procedures to each create their own table variable, and either store the results
there, or store the raw data there and then update the result. The other issue I wanted to eliminate was
returning the data to the client so the procedures each have a debug parameter specifying whether to
return no results (the default), top/bottom 5, or all. In the performance tests I set it to return no results,
but of course validated each to ensure that they were returning the right results.
The stored procedures are all modeled this way (Ive attached a script that creates the database and the
stored procedures, so Im just including a template here for brevity):
CREATE PROCEDURE [dbo].[RunningTotals_]
@debug TINYINT = 0
-- @debug = 1 : show top/bottom 3
-- @debug = 2 : show all 50k
AS
BEGIN
SET NOCOUNT ON;
DECLARE @st TABLE
(
[Date] DATE PRIMARY KEY,
TicketCount INT,
RunningTotal INT
);
INSERT @st([Date], TicketCount, RunningTotal)
-- one of seven approaches used to populate @t
IF @debug = 1 -- show top 3 and last 3 to verify results
BEGIN
;WITH d AS
(
SELECT [Date], TicketCount, RunningTotal,
rn = ROW_NUMBER() OVER (ORDER BY [Date])
FROM @st
)
SELECT [Date], TicketCount, RunningTotal
FROM d
WHERE rn < 4 OR rn > 9997
ORDER BY [Date];
END
IF @debug = 2 -- show all
BEGIN
SELECT [Date], TicketCount, RunningTotal
FROM @st
ORDER BY [Date];
END
END
GO
And I called them in a batch as follows:
You can see the extra second I added to the Windowed_Rows batch; it wasnt getting caught by the Top
SQL threshold because it completed in only 40 milliseconds! This is clearly our best performer and, if we
have SQL Server 2012 available, it should be the method we use. The cursor is not half-bad, either, given
either the performance or other issues with the remaining solutions. Plotting the duration on a graph is
pretty meaningless two high points and five indistinguishable low points. But if I/O is your bottleneck,
you might find the visualization of reads interesting:
Conclusion
From these results we can draw a few conclusions:
1. Windowed aggregates in SQL Server 2012 make performance issues with running totals
computations (and many other next row(s) / previous row(s) problems) alarmingly more
efficient. When I saw the low number of reads I thought for sure there was some kind of
mistake, that I must have forgotten to actually perform any work. But no, you get the same
number of reads if your stored procedure just performs an ordinary SELECT from the
SpeedingTickets table. (Feel free to test this yourself with STATISTICS IO.)
2. The issues I pointed out earlier about RANGE vs. ROWS yield slightly different runtimes
(duration difference of about 6x remember to ignore the second I added with WAITFOR), but
read differences are astronomical due to the on-disk spool. If your windowed aggregate can be
solved using ROWS, avoid RANGE, but you should test that both give the same result (or at least
that ROWS gives the right answer). You should also note that if you are using a similar query and
you dont specify RANGE nor ROWS, the plan will operate as if you had specified RANGE).
3. The subquery and inner join methods are relatively abysmal. 35 seconds to a minute to generate
these running totals? And this was on a single, skinny table without returning results to the
client. These comparisons can be used to show people why a purely set-based solution is not
always the best answer.
4. Of the faster approaches, assuming you are not yet ready for SQL Server 2012, and assuming
you discard both the quirky update method (unsupported) and the CTE date method (cant
guarantee a contiguous sequence), only the cursor performs acceptably. It has the highest
duration of the faster solutions, but the least amount of reads.
I hope these tests help give a better appreciation for the windowing enhancements that Microsoft has
added to SQL Server 2012. Please be sure to thank Itzik if you see him online or in person, since he was
the driving force behind these changes. In addition, I hope this helps open some minds out there that a
cursor may not always be the evil and dreaded solution it is often depicted to be.
(As an addendum, I did test the CLR function offered by Pavel Pawlowski, and the performance
characteristics were nearly identical to the SQL Server 2012 solution using ROWS. Reads were identical,
CPU was 78 vs. 47, and overall duration was 73 instead of 40. So if you wont be moving to SQL Server
2012 in the near future, you may want to add Pavels solution to your tests.)
Attachments: RunningTotals_Demo.sql.zip (2kb)
currently using a CLR function and this is not it, I strongly recommend you deploy it and compare I tested it
against a much simpler, VB-based CLR routine that was functionally equivalent, but performed about three
times worse.
So I took Adams function, compiled the code to a DLL (using csc), and deployed just that file to the server.
Then I added the following assembly and function to my database:
because you may have a case where you can trust the input for example it is possible to use for commaseparated lists of integers or GUIDs.
Numbers table
This solution uses a Numbers table, which you must build and populate yourself. (Weve been requesting a
built-in version for ages.) The Numbers table should contain enough rows to exceed the length of the longest
string youll be splitting. In this case well use 1,000,000 rows:
is 50,000 characters long, and so on up to 1 row of 500,000 characters. I did this both to compare the same
amount of overall data being processed by the functions, as well as to try to keep my testing times somewhat
predictable.
I use a #temp table so that I can simply use GO <constant> to execute each batch a specific number of times:
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
DECLARE @string_type INT = <string_type>; -- 1-5 from above
After the hyperbolic 40-second performance for the numbers table against 10 rows of 50,000
characters, I dropped it from the running for the last test. To better show the relative performance of
the four best methods in this test, Ive dropped the Numbers results from the graph altogether:
Next, lets compare when we perform a search against the comma-separated value (e.g. return the rows
where one of the strings is foo). Again well use the five functions above, but well also compare the
result against a search performed at runtime using LIKE instead of bothering with splitting.
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
DECLARE @i INT = <string_type>, @search NVARCHAR(32) = N'foo';
;WITH s(st, sv) AS
(
SELECT string_type, string_value
FROM dbo.strings AS s
WHERE string_type = @i
)
SELECT s.string_type, s.string_value FROM s
CROSS APPLY dbo.SplitStrings_<method>(s.sv, ',') AS t
WHERE t.Item = @search;
SELECT s.string_type
FROM dbo.strings
WHERE string_type = @i
AND ',' + string_value + ',' LIKE '%,' + @search + ',%';
These results show that, for small strings, CLR was actually the slowest, and that the best solution is
going to be performing a scan using LIKE, without bothering to split the data up at all. Again I dropped
the Numbers table solution from the 5th approach, when it was clear that its duration would increase
exponentially as the size of the string went up:
And to better demonstrate the patterns for the top 4 results, Ive eliminated the Numbers and XML
solutions from the graph:
Next, lets look at replicating the use case from the beginning of this post, where were trying to find all
the rows in one table that exist in the list being passed in. As with the data in the table we created
above, were going to create strings varying in length from 50 to 500,000 characters, store them in a
variable, and then check a common catalog view for existing in the list.
DECLARE
These results show that, for this pattern, several methods see their duration increase exponentially as
the size of the string goes up. At the lower end, XML keeps good pace with CLR, but this quickly
deteriorates as well. CLR is consistently the clear winner here:
And again without the methods that explode upward in terms of duration:
Finally, lets compare the cost of retrieving the data from a single variable of varying length, ignoring the
cost of reading data from a table. Again well generate strings of varying length, from 50 500,000
characters, and then just return the values as a set:
DECLARE
@i INT = <num>, -- value 1-5, yielding strings 50 - 500,000 characters
@x NVARCHAR(MAX) = N'a,id,xyz,abcd,abcde,sa,foo,bar,mort,splunge,bacon,';
SET @x = REPLICATE(@x, POWER(10, @i-1));
SET @x = SUBSTRING(@x, 1, LEN(@x)-1) + 'x';
SELECT Item FROM dbo.SplitStrings_<method>(@x, N',');
These results also show that CLR is fairly flat-lined in terms of duration, all the way up to 110,000 items
in the set, while the other methods keep decent pace until some time after 11,000 items:
Conclusion
In almost all cases, the CLR solution clearly out-performs the other approaches in some cases its a
landslide victory, especially as string sizes increase; in a few others, its a photo finish that could fall
either way. In the first test we saw that XML and CTE out-performed CLR at the low end, so if this is a
typical use case *and* you are sure that your strings are in the 1 10,000 character range, one of those
approaches might be a better option. If your string sizes are less predictable than that, CLR is probably
still your best bet overall you lose a few milliseconds at the low end, but you gain a whole lot at the
high end. Here are the choices I would make, depending on the task, with second place highlighted for
cases where CLR is not an option. Note that XML is my preferred method only if I know the input is XMLsafe; these may not necessarily be your best alternatives if you have less faith in your input.
The only real exception where CLR is not my choice across the board is the case where youre actually
storing comma-separated lists in a table, and then finding rows where a defined entity is in that list. In
that specific case, I would probably first recommend redesigning and properly normalizing the schema,
so that those values are stored separately, rather than using it as an excuse to not use CLR for splitting.
If you cant use CLR for other reasons, there isnt a clear-cut second place revealed by these tests; my
answers above were based on overall scale and not at any specific string size. Every solution here was
runner up in at least one scenario so while CLR is clearly the choice when you can use it, what you
should use when you cannot is more of an it depends answer youll need to judge based on your use
case(s) and the tests above (or by constructing your own tests) which alternative is better for you.
Addendum : An alternative to splitting in the first place
The above approaches require no changes to your existing application(s), assuming they are already
assembling a comma-separated string and throwing it at the database to deal with. One option you
should consider, if either CLR is not an option and/or you can modify the application(s), is using TableValued Parameters (TVPs). Here is a quick example of how to utilize a TVP in the above context. First,
create a table type with a single string column:
CREATE TYPE dbo.Items AS TABLE
(
Item NVARCHAR(4000)
);
Then the stored procedure can take this TVP as input, and join on the content (or use it in other ways this is
just one example):
tvp.Rows.Add(someThing.someValue);
using (connectionObject)
{
SqlCommand cmd
= new SqlCommand("dbo.UpdateProfile", connectionObject);
cmd.CommandType = CommandType.StoredProcedure;
SqlParameter tvparam = cmd.Parameters.AddWithValue("@TeamNames", tvp);
tvparam.SqlDbType = SqlDbType.Structured;
// other parameters, e.g. userId
cmd.ExecuteNonQuery();
}
You might consider this to be a prequel to a follow-up post.
Of course this doesnt play well with JSON and other APIs quite often the reason a comma-separated
string is being passed to SQL Server in the first place.
Then we need a couple of stored procedures to accept the lists from C#. For simplicity, again, well just
take a count so that we can be sure to perform a complete scan, and well ignore the count in the
application:
CREATE PROCEDURE dbo.SplitTest_UsingCLR
@list NVARCHAR(MAX)
AS
BEGIN
SET NOCOUNT ON;
SELECT c = COUNT(*)
FROM dbo.VersionStrings AS v
INNER JOIN dbo.SplitStrings_CLR(@list, N',') AS s
ON s.Item BETWEEN v.left_post AND v.right_post;
END
GO
CREATE PROCEDURE dbo.SplitTest_UsingTVP
@list dbo.VersionStringsTVP READONLY
AS
BEGIN
SET NOCOUNT ON;
SELECT c = COUNT(*)
FROM dbo.VersionStrings AS v
INNER JOIN @list AS l
ON l.VersionString BETWEEN v.left_post AND v.right_post;
END
GO
Note that a TVP passed into a stored procedure must be marked as READONLY there is currently no
way to perform DML on the data like you would for a table variable or temp table. However, Erland has
submitted a very popular request that Microsoft make these parameters more flexible (and plenty of
deeper insight behind his argument here).
The beauty here is that SQL Server no longer has to deal with splitting a string at all neither in T-SQL
nor in handing it off to CLR as its already in a set structure where it excels.
Next, a C# console application that does the following:
Accepts a number as an argument to indicate how many string elements should be defined
Builds a CSV string of those elements, using StringBuilder, to pass to the CLR stored procedure
Builds a DataTable with the same elements to pass to the TVP stored procedure
Also tests the overhead of converting a CSV string to a DataTable and vice-versa before calling
the appropriate stored procedures
The code for the C# app is found at the end of the article. I can spell C#, but I am by no means a guru; I
am sure there are inefficiencies you can spot there that may make the code perform a bit better. But
any such changes should affect the entire set of tests in a similar way.
I ran the application 10 times using 100, 1,000, 2,500 and 5,000 elements. The results were as follows
(this is showing average duration, in seconds, across the 10 tests):
Performance Aside
In addition to the clear performance difference, TVPs have another advantage table types are much
simpler to deploy than CLR assemblies, especially in environments where CLR has been forbidden for
other reasons. I am hoping that barriers to CLR are gradually disappearing, and new tools are making
deployment and maintenance less painful, but I doubt the ease of initial deployment for CLR will ever be
easier than native approaches.
On the other hand, on top of the read-only limitation, table types are like alias types in that they are
difficult to modify after the fact. If you want to change the size of a column or add a column, there is no
ALTER TYPE command, and in order to DROP the type and re-create it, you must first remove references
to the type from all procedures that are using it. So for example in the above case if we needed to
increase the VersionString column to NVARCHAR(32), wed have to create a dummy type and alter the
stored procedure (and any other procedure that is using it):
CREATE TYPE dbo.VersionStringsTVPCopy AS TABLE (VersionString NVARCHAR(32));
GO
ALTER PROCEDURE dbo.SplitTest_UsingTVP
@list dbo.VersionStringsTVPCopy READONLY
AS
...
GO
DROP TYPE dbo.VersionStringsTVP;
GO
CREATE TYPE dbo.VersionStringsTVP AS TABLE (VersionString NVARCHAR(32));
GO
ALTER PROCEDURE dbo.SplitTest_UsingTVP
@list dbo.VersionStringsTVP READONLY
AS
...
GO
DROP TYPE dbo.VersionStringsTVPCopy;
GO
(Or alternatively, drop the procedure, drop the type, re-create the type, and re-create the procedure.)
Conclusion
The TVP method consistently outperformed the CLR splitting method, and by a greater percentage as
the number of elements increased. Even adding in the overhead of converting an existing CSV string to a
DataTable yielded much better end-to-end performance. So I hope that, if I hadnt already convinced
you to abandon your T-SQL string splitting techniques in favor of CLR, I have urged you to give tablevalued parameters a shot. It should be easy to test out even if youre not currently using a DataTable (or
some equivalent).
Once Im suspecting this to be the case, I will usually jump to the Disk Activity tab in Performance
Advisor to see how tempdb is configured. Most times I actually see the same thing: A busy tempdb with
a single data file defined. From here Ill usually recommend reconfiguring tempdb, and direct them to a
resource like Jonathans article for more information.
Number 4: Expecting Auto Update Statistics to Keep Statistics Updated
The problem here is that the thresholds for triggering auto statistics updates end up being the same in
most cases, even for a very large table. Without going into a very deep explanation, the threshold is
~20% of the rows in the table. So on a really big table it takes a lot of data change to trigger an update.
Kevin Kline has a nice, easy to follow explanation of this here as well.
The reason this makes the list is that DBAs seem really surprised to find out that the auto update isnt
taking care of things the way the name implies. Then there are also many dbas that believe it should be
handled by their maintenance job. Then after looking at the maintenance, they are doing index reorgs
most of the time, and that wont update the statistics either (though a rebuild will). I also want to note
here that if you are using the Fragmentation Manager feature in SQL Sentry 7.0 and higher, you can
have a running history of when your indexes were reorganized rather than rebuilt. This can help you
decide whether the problem youre seeing could be related to auto update not happening.
The lesson here is really to keep an eye on statistics, and make sure theyre updated regularly, especially
on large tables, which are becoming more and more common as time goes on. Another option here can
be to use trace flag 2371 to actually change the formula used to trigger the update. The nicest
explanation of this option I have found is at this blog post by Juergen Thomas.
Number 3: The CXPACKET Wait Type
I would say that this is hands down the single most common wait type I see on larger SQL Server systems
when someone asks me to look into query performance with them.
There is a lot of information out there on how to deal with this, but sadly I still see a lot of people make
the initial assumption that the problem should be solved by having either the query or the entire server
set MAXDOP to 1. More often than not the problem can be handled by proper indexing or statistics
maintenance. It could also be that the plan cached for this query is just not optimal, and you can mark it
for recompile using sp_recompile, set recompile at the query level, or just evict the plan using DBCC
FREEPROCCACHE with a plan handle. It is best to exhaust these options before deciding to change
MAXDOP to 1 because you could be throwing away a lot of processing power without realizing it.
Paul Randal (blog|@PaulRandal) has a great survey on his blog here that seems to support what Im
used to seeing as well. In fact, hes the one who first taught me that MAXDOP 1 is not necessarily the
answer to this.
Number 2: Misunderstood Timeout Expired Prior to the Completion of
This one is huge. Outside of some very edge case behavior, there are two basic types of timeouts you
*might* deal with for SQL Server. These are connection timeouts and operation (or query) timeouts. In
both cases these are values set by the client connecting to the SQL Server. On the server side, there is a
remote query timeout setting, but this is the very edge case I mentioned and will leave that for another
discussion.
Im going to focus on operation timeouts, since they are the most common. Operation timeout errors
from various software tools might be the most misunderstood situation I come across. The cause of
these really boils down to one simple thing though: The client executing the command has set a
maximum amount of time that it will wait for the command to complete. If this maximum is reached
prior to completion the command is aborted. An error is raised from the client.
Many times the timeout error will induce a panic mode, because the error can look a bit intimidating.
The reality is, though, that this is not much different than hitting the stop button in SSMS because the
query was taking too long. In fact, it will show up exactly the same in a profiler trace with Error = 2
(Aborted).
So, what does a timeout like this really tell us? It tells us that queries are taking longer than expected.
We should go into performance tuning mode rather than something is broken mode. The error
information from the client is really just some good information on where you might start to focus your
tuning efforts.
If you receive timeout errors from the SQL Sentry monitoring service, and one of the servers you are
monitoring is the source, this is not telling you that SQL Sentry is broken. This is SQL Sentry telling you
that this server is experiencing performance issues. Again, it is time for performance tuning mode.
These errors could be easily consumed internally, and retried later, but this would be doing our
customers a huge disservice. We believe that you should know about *any* potential problem on your
monitored server, even if it is SQL Sentry encountering the problem.
Incidentally, this is true for SQL Sentry, just as it is for any other system that uses an RDBMS for a
repository your SQL Sentry database needs some TLC now and again. Without it you may indeed
experience some timeouts from your SQL Sentry client. We spend a lot of time tuning our queries for
performance before they ever make it out the door, but proper maintenance will ensure they continue
to run as intended.
Number 1: Memory Pressure
This is the big one. As soon as Kevin mentioned wanting this list its the first thing that popped into my
head. Not only because I see it so often, but also because it is so often mistaken for poor disk
performance.
There are lots of caches in SQL Server, but the most well-known is the data cache (aka buffer pool). The
easiest way to describe the data cache is that it is the data stored in memory, rather than persisted to
disk. Being able to store lots of data in memory long term is desirable because working with data in
memory is generally much quicker than having to perform physical IOs.
I could turn this post into a very long discussion on memory pressure in SQL Server at this point, but I
promise I will try to avoid that. There is already a ton of information available on this subject, and that is
not really the intent of this post. What I will say is that, usually, memory pressure manifests as a few
different symptoms. When viewed individually, some of these symptoms can lead you to incorrect, and
sometimes costly, conclusions.
The two misleading symptoms are that you may start to see higher than normal latency across the disk
subsystem, and you may start to see abnormally high waits related to disk activity. If you look at nothing
but these two symptoms, you may come to the conclusion that you need to work on your disk system.
This is why being presented with all relevant metrics on one dashboard is so important. You have to look
at the bigger picture, and having the memory-related data available along with the disk activity and
waits helps to paint a clearer picture of what is really going on.
Typically what Ill see (along with the disk waits and disk latency) is a PLE (Page Lifetime Expectancy) that
is fairly low for this server. I describe it this way because what is good or bad for this value really
depends. The larger your buffer cache is, the higher your critical threshold will be for PLE. The more
data there is to churn in and out of the buffer, the worse off you will be when the churn actually
happens. Another consideration is NUMA. The way the PLE counter is calculated can cause this value
alone to be very misleading when multiple NUMA nodes are involved, as described by Paul Randal in a
blog post about Page Life Expectancy isnt what you think Luckily in SQL Sentry 7.0 and higher, you can
actually see where PLE is for the individual NUMA nodes in history mode, which makes this a bit less of a
problem.
Ill usually also see consistently higher lazy writer activity, and SQL Server page faults (SQL Server going
to disk). Sometimes Ill see what I call buffer tearing. Its basically when the data buffer is up and down
frequently creating a jagged (or torn) edge on the history chart in Performance Advisor. Finally, I may
also see an abnormally large plan cache reducing available memory for the data cache.
All of these things together spell memory pressure, and there are various ways to deal with them, but
the important thing to note is that this is not a disk issue. Its not saying that your disk system is
necessarily wonderful either, but I am saying I wouldnt call up your SAN guy and order a bunch of new
hardware based on this situation. Once you get the memory pressure situation under control, SQL
Server will not need to go to disk as much, and the few symptoms related to disk may disappear
entirely!
The moral here is really to always consider the full picture of performance, because looking at one thing
out of context could severely limit your options for a solution.
Honorable Mention: SQL Server Agent History Retention Settings Unlimited
We see this enough to include it in this list, and I think anyone that uses SQL Server Agent should be
aware of it.
In SQL Server Agent Properties, under History, you can adjust retention settings.
For some reason, Ive seen quite a few people set this to unlimited by unchecking both checkboxes. If
you do this, and you use Agent jobs frequently, eventually youre going to run into problems with job
history in MSDB, because these tables arent really indexed very well. The settings Im using above are
generally fine for most cases, and if youre using SQL Sentry Event Manager, youre keeping this
information in the SQL Sentry database anyway, so retaining it here is just redundant.
Conclusion
So there are my (current) top 5 most common SQL Server performance issues/topics. For #4 and #5, I
actually had to run some numbers to find out what they were, but for the top three, I knew without
having to consider it much at all. Thanks for reading!
Just let the engine handle it, and bubble any exception back to the caller.
Use TRY/CATCH with ROLLBACK in the CATCH block (SQL Server 2005+).
And many take the approach that they should check if theyre going to incur the violation first, since it
seems cleaner to handle the duplicate yourself than to force the engine to do it. My theory is that you
should trust but verify; for example, consider this approach (mostly pseudo-code):
IF NOT EXISTS ([row that would incur a violation])
BEGIN
BEGIN TRY
BEGIN TRANSACTION;
INSERT ()...
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
-- well, we incurred a violation anyway;
-- I guess a new row was inserted or
-- updated since we performed the check
ROLLBACK TRANSACTION;
END CATCH
END
We know that the IF NOT EXISTS check does not guarantee that someone else wont have inserted the
row by the time we get to the INSERT (unless we place aggressive locks on the table and/or
use SERIALIZABLE), but the outer check does prevent us from trying to commit a failure and then having
to roll back. We stay out of the entire TRY/CATCH structure if we already know that the INSERT will fail,
and it would be logical to assume that at least in some cases this will be more efficient than entering
the TRY/CATCH structure unconditionally. This makes little sense in a single INSERT scenario, but
imagine a case where there is more going on in that TRY block (and more potential violations that you
could check for in advance, meaning even more work that you might otherwise have to perform and
then roll back should a later violation occur).
Now, it would be interesting to see what would happen if you used a non-default isolation level
(something Ill treat in a future post), particularly with concurrency. For this post, though, I wanted to
start slowly, and test these aspects with a single user. I created a table called dbo.[Objects], a very
simplistic table:
CREATE TABLE dbo.[Objects]
(
ObjectID INT IDENTITY(1,1),
Name NVARCHAR(255) PRIMARY KEY
);
GO
I wanted to populate this table with 100,000 rows of sample data. To make the values in the name
column unique (since the PK is the constraint I wanted to violate), I created a helper function that takes
a number of rows and a minimum string. The minimum string would be used to make sure that either (a)
the set started off beyond the maximum value in the Objects table, or (b) the set started at the
minimum value in the Objects table. (I will specify these manually during the tests, verified simply by
inspecting the data, though I probably could have built that check into the function.)
CREATE FUNCTION dbo.GenerateRows(@n INT, @minString NVARCHAR(32))
RETURNS TABLE
AS
RETURN
(
SELECT TOP (@n) name = name + '_' + RTRIM(rn)
FROM
(
SELECT a.name, rn = ROW_NUMBER() OVER
(PARTITION BY a.name ORDER BY a.name)
FROM sys.all_objects AS a
CROSS JOIN sys.all_objects AS b
WHERE a.name >= @minString
AND b.name >= @minString
) AS x
);
GO
This applies a CROSS JOIN of sys.all_objects onto itself, appending a unique row_number to each name,
so the first 10 results would look like this:
INSERT dbo.[Objects](name)
SELECT name FROM dbo.GenerateRows(100000, N'')
ORDER BY name;
GO
Now, since we are going to be inserting new unique values into the table, I created a procedure to
perform some cleanup at the beginning and end of each test in addition to deleting any new rows
weve added, it will also clean up the cache and buffers. Not something you want to code into a
procedure on your production system, of course, but quite fine for local performance testing.
CREATE PROCEDURE dbo.EH_Cleanup
-- P.S. "EH" stands for Error Handling, not "Eh?"
AS
BEGIN
SET NOCOUNT ON;
DELETE dbo.[Objects] WHERE ObjectID > 100000;
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
END
GO
I also created a log table to keep track of the start and end times for each test:
CREATE TABLE dbo.RunTimeLog
(
LogID
INT IDENTITY(1,1),
Spid
INT,
InsertType
VARCHAR(255),
ErrorHandlingMethod VARCHAR(255),
StartDate
DATETIME2(7) NOT NULL DEFAULT SYSUTCDATETIME(),
EndDate
DATETIME2(7)
);
GO
Finally, the testing stored procedure handles a variety of things. We have three different error handling
methods, as described in the bullets above: JustInsert, Rollback, and TryCatch; we also have three
different insert types: (1) all inserts succeed (all rows are unique), (2) all inserts fail (all rows are duplicates),
and (3) half inserts succeed (half the rows are unique, and half the rows are duplicates). Coupled with this are
two different approaches: check for the violation before attempting the insert, or just go ahead and let the
engine determine if it is valid. I thought this would give a good comparison of the different error handling
techniques combined with different likelihoods of collisions to see whether a high or low collision percentage
would significantly impact the results.
For these tests I picked 40,000 rows as my total number of insert attempts, and in the procedure I perform a
union of 20,000 unique or non-unique rows with 20,000 other unique or non-unique rows. You can see that I
hard-coded the cutoff strings in the procedure; please note that on your system these cutoffs will almost
certainly occur in a different place.
CREATE PROCEDURE dbo.EH_Insert
@ErrorHandlingMethod VARCHAR(255),
@InsertType
VARCHAR(255),
@RowSplit
INT = 20000
AS
BEGIN
SET NOCOUNT ON;
-- clean up any new rows and drop buffers/clear proc cache
EXEC dbo.EH_Cleanup;
DECLARE
@CutoffString1 NVARCHAR(255),
@CutoffString2 NVARCHAR(255),
@Name NVARCHAR(255),
@Continue BIT = 1,
@LogID INT;
IF @InsertType = 'AllSuccess'
SELECT @CutoffString1 = N'database_audit_specifications_1000',
@CutoffString2 = N'dm_clr_properties_1398';
-- if we want them all to fail, then it's easy, we can just
-- union two sets that start at the same place as the initial
-- population:
IF @InsertType = 'AllFail'
SELECT @CutoffString1 = N'', @CutoffString2 = N'';
-- and if we want half to succeed, we need 20,000 unique
-- values, and 20,000 duplicates:
IF @InsertType = 'HalfSuccess'
SELECT @CutoffString1 = N'database_audit_specifications_1000',
@CutoffString2 = N'';
DECLARE c CURSOR
LOCAL STATIC FORWARD_ONLY READ_ONLY
FOR
SELECT name FROM dbo.GenerateRows(@RowSplit, @CutoffString1)
UNION ALL
SELECT name FROM dbo.GenerateRows(@RowSplit, @CutoffString2);
OPEN c;
FETCH NEXT FROM c INTO @Name;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @Continue = 1;
------
block if we
back empty
if we have
duplicate
The graph that plots all of the durations at once shows a couple of serious outliers:
You can see that, in cases where we expect a high rate of failure (in this test, 100%), beginning a
transaction and rolling back is by far the least attractive approach (3.59 milliseconds per attempt), while
just letting the engine raise an error is about half as bad (1.785 milliseconds per attempt). The next
worst performer was the case where we begin a transaction then roll it back, in a scenario where we
expect about half of the attempts to fail (averaging 1.625 milliseconds per attempt). The 9 cases on the
left side of the graph, where we are checking for the violation first, did not venture above 0.515
milliseconds per attempt.
Having said that, the individual graphs for each scenario (high % of success, high % of failure, and 50-50)
really drive home the impact of each method.
Where all the inserts succeed
In this case we see that the overhead of checking for the violation first is negligible, with an average
difference of 0.7 seconds across the batch (or 125 microseconds per insert attempt):
What does this tell us? If we think we are going to have a high rate of failure, or have no idea what our
potential failure rate will be, then checking first to avoid violations in the engine is going to be
tremendously worth our while. Even in the case where we have a successful insert every time, the cost
of checking first is marginal and easily justified by the potential cost of handling errors later (unless your
anticipated failure rate is exactly 0%).
So for now I think I will stick to my theory that, in simple cases, it makes sense to check for a potential
violation before telling SQL Server to go ahead and insert anyway. In a future post, I will look at the
performance impact of various isolation levels, concurrency, and maybe even a few other error handling
techniques.
[As an aside, I wrote a condensed version of this post as a tip for mssqltips.com back in February.]
This is local, so of course the following server-level setting to allow remote admin connections has no
effect in this specific scenario:
EXEC sp_configure 'remote admin connections', 1;
GO
RECONFIGURE;
GO
I found that I could connect if I enabled trace flag 7806, even though that trace flag is meant for SQL
Server Express (as documented here). But I knew the problem had to be deeper than this Microsoft
couldnt have totally broken this feature, right?
It turns out that this symptom only affects *named* instances. I was talking about this with Jonathan
Kehayias, who had a default instance, and could connect fine. However he couldnt connect if he
explicitly specified the port number, which led him to discover that TCP/IP was disabled.
While this affects named instances of Developer Edition specifically because the TCP/IP protocol is
disabled by default, there are other scenarios where this can hurt you if you have named instances and
To resolve this, make sure that TCP/IP is enabled via the SQL Server Configuration Manager > Network
Protocols for <named instance> and make sure that the SQL Server Browser Service is running. You will
need to restart SQL Server.
Now, when you are able to connect via the DAC, if you try to connect within Management Studio, you
will get this error message:
This error message is benign (and I believe comes from the background IntelliSense connection). You can
see from your status bar that you are connected, and you can verify your connection is the DAC
connection by dismissing this error message and running a query.
In any case, confirming that you are able to connect via the DAC is an absolutely essential step in your
disaster recovery plan. If you cant connect to the DAC, you should plan for one or both of the following
actions during your next maintenance window (or earlier, if you can afford a service restart):
enable TCP/IP
In either case, ensure the SQL Server Browser Service is running. Also be sure the server setting to
enable remote connections is enabled, since you never know where you might be when you need to
access an unresponsive server.
Kendra Little wrote a great blog post about the DAC last year. Its fun to root around and see what you
can do with the DAC, and its really nice to know its there, but its also important to know how it might
not be able to help you in the event of actual server hardship.
Just from casual observance, we can see that the median for the table with odd rows should be 6, and
for the even table it should be 7.5 ((6+9)/2). So now lets see some solutions that have been used over
the years:
SQL Server 2000
In SQL Server 2000, we were constrained to a very limited T-SQL dialect. Im investigating these options
for comparison because some people out there are still running SQL Server 2000, and others may have
upgraded but, since their median calculations were written back in the day, the code might still look
like this today.
2000_A max of one half, min of the other
This approach takes the highest value from the first 50 percent, the lowest value from the last 50
percent, then divides them by two. This works for even or odd rows because, in the even case, the two
values are the two middle rows, and in the odd case, the two values are actually from the same row.
SELECT @Median = (
(SELECT MAX(val) FROM
(SELECT TOP 50 PERCENT val
FROM dbo.EvenRows ORDER BY val, id) AS t)
+ (SELECT MIN(val) FROM
(SELECT TOP 50 PERCENT val
FROM dbo.EvenRows ORDER BY val DESC, id DESC) AS b)
) / 2.0;
2000_B #temp table
This example first creates a #temp table, and using the same type of math as above, determines the two
middle rows with assistance from a contiguous IDENTITY column ordered by the val column. (The
order of assignment of IDENTITY values can only be relied upon because of the MAXDOP setting.)
CREATE TABLE #x
(
i INT IDENTITY(1,1),
val DECIMAL(12, 2)
);
CREATE CLUSTERED INDEX v ON #x(val);
INSERT #x(val)
SELECT val
FROM dbo.EvenRows
ORDER BY val OPTION (MAXDOP 1);
SELECT @Median = AVG(val)
FROM #x AS x
WHERE EXISTS
(
SELECT 1
FROM #x
WHERE x.i - (SELECT MAX(i) / 2.0 FROM #x) IN (0, 0.5, 1)
);
SQL Server 2005, 2008, 2008 R2
SQL Server 2005 introduced some interesting new window functions, such as ROW_NUMBER(), which
can help solve statistical problems like median a little easier than we could in SQL Server 2000. These
approaches all work in SQL Server 2005 and above:
2005_A dueling row numbers
This example uses ROW_NUMBER() to walk up and down the values once in each direction, then finds
the middle one or two rows based on that calculation. This is quite similar to the first example above,
with easier syntax:
SELECT @Median = AVG(1.0 * val)
FROM
(
SELECT val,
ra = ROW_NUMBER() OVER (ORDER BY val, id),
rd = ROW_NUMBER() OVER (ORDER BY val DESC, id DESC)
FROM dbo.EvenRows
) AS x
WHERE ra BETWEEN rd - 1 AND rd + 1;
2005_B row number + count
This one is quite similar to the above, using a single calculation of ROW_NUMBER() and then using the
total COUNT() to find the middle one or two rows:
SELECT @Median = AVG(1.0 * Val)
FROM
(
SELECT val,
c = COUNT(*) OVER (),
rn = ROW_NUMBER() OVER (ORDER BY val)
FROM dbo.EvenRows
) AS x
WHERE rn IN ((c + 1)/2, (c + 2)/2);
2005_C variation on row number + count
Fellow MVP Itzik Ben-Gan showed me this method, which achieves the same answer as the above two
methods, but in a very slightly different way:
SELECT @Median = AVG(1.0 * val)
FROM
(
And these metrics dont change much at all if we operate against a heap instead. The biggest percentage
change was the method that still ended up being the fastest: the paging trick using OFFSET / FETCH:
Here is a graphical representation of the results. To make it more clear, I highlighted the slowest
performer in red and the fastest approach in green.
I was surprised to see that, in both cases, PERCENTILE_CONT() which was designed for this type of
calculation is actually worse than all of the other earlier solutions. I guess it just goes to show that
while sometimes newer syntax might make our coding easier, it doesnt always guarantee that
performance will improve. I was also surprised to see OFFSET / FETCH prove to be so useful in scenarios
that usually wouldnt seem to fit its purpose pagination.
In any case, I hope I have demonstrated which approach you should use, depending on your version of
SQL Server (and that the choice should be the same whether or not you have a supporting index for the
calculation).
COMMIT TRANSACTION;
-- if successful:
EXEC sp_rename N'dbo.Lookup_Fake', N'dbo.Lookup_Shadow';
The downside to this initial approach was that sp_rename has a non-suppressible output message
warning you about the dangers of renaming objects. In our case we performed this task through SQL
Server Agent jobs, and we handled a lot of metadata and other cache tables, so the job history was
flooded with all these useless messages and actually caused real errors to be truncated from the history
details. (I complained about this in 2007, but my suggestion was ultimately dismissed and closed as
Wont Fix.)
A Better Solution : Schemas
Once we upgraded to SQL Server 2005, I discovered this fantastic command called CREATE SCHEMA. It
was trivial to implement the same type of solution using schemas instead of renaming tables, and now
the Agent history wouldnt be polluted with all of these unhelpful messages. Basically I created two new
schemas:
CREATE SCHEMA fake AUTHORIZATION dbo;
CREATE SCHEMA shadow AUTHORIZATION dbo;
Then I moved the Lookup_Shadow table into the cache schema, and renamed it:
ALTER SCHEMA shadow TRANSFER dbo.Lookup_Shadow;
EXEC sp_rename N'shadow.Lookup_Shadow', N'Lookup';
(If you are just implementing this solution, youd be creating a new copy of the table in the schema, not
moving the existing table there and renaming it.)
With those two schemas in place, and a copy of the Lookup table in the shadow schema, my three-way
rename became a three-way schema transfer:
TRUNCATE TABLE shadow.Lookup;
INSERT shadow.Lookup([cols])
SELECT [cols] FROM [source];
-- perhaps an explicit statistics update here
BEGIN TRANSACTION;
ALTER SCHEMA fake TRANSFER dbo.Lookup;
ALTER SCHEMA dbo TRANSFER shadow.Lookup;
COMMIT TRANSACTION;
ALTER SCHEMA shadow TRANSFER fake.Lookup;
At this point you can of course empty out the shadow copy of the table, however in some cases I found it
useful to leave the old copy of the data around for troubleshooting purposes:
Foreign Keys
This wont work out of the box if the lookup table is referenced by foreign keys. In our case we
didnt point any constraints at these cache tables, but if you do, you may have to stick with
intrusive methods such as MERGE. Or use append-only methods and disable or drop the foreign
keys before performing any data modifications (then re-create or re-enable them afterward). If
you stick with MERGE / UPSERT techniques and youre doing this between servers or, worse yet,
from a remote system, I highly recommend getting the raw data locally rather than trying to use
these methods between servers.
Statistics
Switching the tables (using rename or schema transfer) will lead to statistics flipping back and
forth between the two copies of the table, and this can obviously be an issue for plans. So you
may consider adding explicit statistics updates as part of this process.
Other Approaches
There are of course other ways to do this that I simply havent had the occasion to try. Partition
switching and using a view + synonym are two approaches I may investigate in the future for a
more thorough treatment of the topic. Id be interested to hear your experiences and how
youve solved this problem in your environment. And yes, I realize that this problem is largely
solved by Availability Groups and readable secondaries in SQL Server 2012, but I consider it a
trick shot if you can solve the problem without throwing high-end licenses at the problem, or
replicating an entire database to make a few tables redundant. :-)
Conclusion
If you can live with the limitations here, this approach may well be a better performer than a scenario
where you essentially take a table offline using SSIS or your own MERGE / UPSERT routine, but please be
sure to test both techniques. The most significant point is that the end user accessing the table should
have the exact same experience, any time of the day, even if they hit the table in the middle of your
periodic update.
Conditional Order By
By Aaron Bertrand
A common scenario in many client-server applications is allowing the end user to dictate the sort order of
results. Some people want to see the lowest priced items first, some want to see the newest items first, and
some want to see them alphabetically. This is a complex thing to achieve in Transact-SQL because you cant
just say:
(And when the error message says, an expression referencing a column name, you might find it ambiguous,
and I agree. But I can assure you that this does not mean a variable is a suitable expression.)
If you try to append @SortDirection, the error message is a little more opaque:
Msg 102, Level 15, State 1, Line x
Incorrect syntax near '@SortDirection'.
There are a few ways around this, and your first instinct might be to use dynamic SQL, or to introduce the
CASE expression. But as with most things, there are complications that can force you down one path or
another. So which one should you use? Lets explore how these solutions might work, and compare the
impacts on performance for a few different approaches.
Sample Data
Using a catalog view we all probably understand quite well, sys.all_objects, I created the following table
based on a cross join, limiting the table to 100,000 rows (I wanted data that filled many pages but that didnt
take significant time to query and test):
Well leave the key_col ordering as the default because it should be the most efficient if the user doesnt
have a preference; since the key_col is an arbitrary surrogate that should mean nothing to the user (and may
not even be exposed to them), there is no reason to allow reverse sorting on that column.
Approaches That Dont Work
The most common approach I see when someone first starts to tackle this problem is introducing control-offlow logic to the query. They expect to be able to do this:
SELECT key_col, [object_id], name, type_desc, modify_date
FROM dbo.sys_objects
ORDER BY
IF @SortColumn = 'key_col'
key_col
IF @SortColumn = 'object_id'
[object_id]
IF @SortColumn = 'name'
name
...
IF @SortDirection = 'ASC'
ASC
ELSE
DESC;
This obviously doesnt work. Next I see CASE being introduced incorrectly, using similar syntax:
SELECT key_col, [object_id], name, type_desc, modify_date
FROM dbo.sys_objects
ORDER BY CASE @SortColumn
WHEN 'key_col' THEN key_col
WHEN 'object_id' THEN [object_id]
WHEN 'name' THEN name
...
END CASE @SortDirection WHEN 'ASC' THEN ASC ELSE DESC END;
This is closer, but it fails for two reasons. One is that CASE is an expression that returns exactly one value of a
specific data type; this merges data types that are incompatible and therefore will break the CASE expression.
The other is that there is no way to conditionally apply the sort direction this way without using dynamic SQL.
Approaches That Do Work
The three primary approaches Ive seen are as follows:
Group compatible types and directions together
In order to use CASE with ORDER BY, there must be a distinct expression for each combination of compatible
types and directions. In this case we would have to use something like this:
CREATE PROCEDURE dbo.Sort_CaseExpanded
ORDER BY
CASE WHEN @SortDirection = 'ASC' THEN
CASE @SortColumn
WHEN 'key_col' THEN key_col
WHEN 'object_id' THEN [object_id]
END
END,
CASE WHEN @SortDirection = 'DESC' THEN
CASE @SortColumn
WHEN 'key_col' THEN key_col
WHEN 'object_id' THEN [object_id]
END
END DESC,
CASE WHEN @SortDirection = 'ASC' THEN
CASE @SortColumn
WHEN 'name' THEN name
WHEN 'type_desc' THEN type_desc
END
END,
CASE WHEN @SortDirection = 'DESC' THEN
CASE @SortColumn
WHEN 'name' THEN name
WHEN 'type_desc' THEN type_desc
END
END DESC,
CASE WHEN @SortColumn = 'modify_date'
AND @SortDirection = 'ASC' THEN modify_date
END,
CASE WHEN @SortColumn = 'modify_date'
AND @SortDirection = 'DESC' THEN modify_date
END DESC;
END
You might say, wow, thats an ugly bit of code, and I would agree with you. I think this is why a lot of folks
cache their data on the front end and let the presentation tier deal with juggling it around in different orders.
:-)
You can collapse this logic a little bit further by converting all the non-string types into strings that will sort
correctly, e.g.
CREATE PROCEDURE dbo.Sort_CaseCollapsed
ORDER BY
CASE WHEN @SortDirection = 'ASC' THEN
CASE @SortColumn
WHEN 'key_col' THEN RIGHT('000000000000' + RTRIM(key_col), 12)
WHEN 'object_id' THEN
RIGHT(COALESCE(NULLIF(LEFT(RTRIM([object_id]),1),'-'),'0')
+ REPLICATE('0', 23) + RTRIM([object_id]), 24)
WHEN 'name'
THEN name
WHEN 'type_desc' THEN type_desc
WHEN 'modify_date' THEN CONVERT(CHAR(19), modify_date, 120)
END
END,
CASE WHEN @SortDirection = 'DESC' THEN
CASE @SortColumn
WHEN 'key_col' THEN RIGHT('000000000000' + RTRIM(key_col), 12)
WHEN 'object_id' THEN
RIGHT(COALESCE(NULLIF(LEFT(RTRIM([object_id]),1),'-'),'0')
+ REPLICATE('0', 23) + RTRIM([object_id]), 24)
WHEN 'name' THEN name
WHEN 'type_desc' THEN type_desc
WHEN 'modify_date' THEN CONVERT(CHAR(19), modify_date, 120)
END
END DESC;
END
Still, its a pretty ugly mess, and you have to repeat the expressions twice to deal with the different sort
directions. I would also suspect that using OPTION RECOMPILE on that query would prevent you from being
stung by parameter sniffing. Except in the default case, its not like the majority of the work being done here
is going to be compilation.
Apply a rank using window functions
I discovered this neat trick from AndriyM, though it is most useful in cases where all of the potential ordering
columns are of compatible types, otherwise the expression used for ROW_NUMBER() is equally complex. The
most clever part is that in order to switch between ascending and descending order, we simply multiply the
ROW_NUMBER() by 1 or -1. We can apply it in this situation as follows:
Conclusion
For outright performance, dynamic SQL wins every time (though only by a small margin on this data set). The
ROW_NUMBER() approach, while clever, was the loser in each test (sorry AndriyM).
It gets even more fun when you want to introduce a WHERE clause, never mind paging. These three are like
the perfect storm for introducing complexity to what starts out as a simple search query. The more
permutations your query has, the more likely youll want to throw readability out the window and use
dynamic SQL in combination with the optimize for ad hoc workloads setting to minimize the impact of
single-use plans in your plan cache.
So here are my slightly revised preferred methods, for each type of task:
Youll notice that CLR has remained my method of choice, except in the one case where splitting doesnt
make sense. And in cases where CLR is not an option, the XML and CTE methods are generally more
efficient, except in the case of single variable splitting, where Jeffs function may very well be the best
option. But given that I might need to support more than 4,000 characters, the Numbers table solution
just might make it back onto my list in specific situations where Im not allowed to use CLR.
I promise that my next post involving lists will not talk about splitting at all, via T-SQL or CLR, and will
demonstrate how to simplify this problem regardless of data type.
As an aside, I noticed this comment in one of the versions of Jeffs functions that was posted in the
comments:
I also thank whoever wrote the first article I ever saw on numbers tables which is located at the
following URL and to Adam Machanic for leading me to it many years ago.
http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-numberstable.html
That article was written by me in 2004. So whoever added the comment to the function, youre
welcome. :-)
0 = stock wheels
1 = 17" wheels
2 = 18" wheels
4 = upgraded tires
So possible combinations are:
0
= no upgrade
1
= upgrade to 17" wheels only
2
= upgrade to 18" wheels only
4
= upgrade tires only
5 = 1 + 4 = upgrade to 17" wheels and better tires
6 = 2 + 4 = upgrade to 18" wheels and better tires
Lets set aside arguments, at least for now, about whether this should be packed into a single TINYINT in
the first place, or stored as separate columns, or use an EAV model fixing the design is a separate issue.
This is about working with what you have.
To make the examples useful, lets fill this table up with a bunch of random data. (And well assume, for
simplicity, that this table contains only orders that havent yet shipped.) This will insert 50,000 rows of
roughly equal distribution between the six option combinations:
;WITH n AS
(
SELECT n,Flag FROM (VALUES(1,0),(2,1),(3,2),(4,4),(5,5),(6,6)) AS n(n,Flag)
)
INSERT dbo.CarOrders
(
OrderID,
WheelFlag,
OrderDate
)
SELECT x.rn, n.Flag, DATEADD(DAY, x.rn/100, '20100101')
FROM n
INNER JOIN
(
SELECT TOP (50000)
n = (ABS(s1.[object_id]) % 6) + 1,
rn = ROW_NUMBER() OVER (ORDER BY s2.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
) AS x
ON n.n = x.n;
If we look at the breakdown, we can see this distribution. Note that your results may differ slightly than
mine depending on the objects in your system:
SELECT WheelFlag, [Count] = COUNT(*)
FROM dbo.CarOrders
GROUP BY WheelFlag;
Results:
WheelFlag Count
------------0
7654
1
8061
2
8757
4
8682
5
8305
6
8541
Now lets say its Tuesday, and we just got a shipment of 18 wheels, which were previously out of stock. This
means we are able to satisfy all of the orders that require 18 wheels both those that upgraded tires (6),
and those that did not (2). So we *could* write a query like the following:
SELECT OrderID
FROM dbo.CarOrders
WHERE WheelFlag IN (2,6);
In real life, of course, you cant really do that; what if more options are added later, like wheel locks, lifetime
wheel warranty, or multiple tire options? You dont want to have to write a series of IN() values for every
possible combination. Instead we can write a BITWISE AND operation, to find all the rows where the 2nd bit
is set, such as:
DECLARE @Flag TINYINT = 2;
SELECT OrderID
FROM dbo.CarOrders
WHERE WheelFlag & @Flag = @Flag;
This gets me the same results as the IN() query, but if I compare them using SQL Sentry Plan Explorer,
the performance is quite different:
Its easy to see why. The first uses an index seek to isolate the rows that satisfy the query, with a filter
on the WheelFlag column:
The second uses a scan, coupled with an implicit convert, and terribly inaccurate statistics. All due to the
BITWISE AND operator:
So what does this mean? At the heart of it, this tells us that the BITWISE AND operation is not sargable.
But all hope is not lost.
If we ignore the DRY principle for a moment, we can write a slightly more efficient query by being a bit
redundant in order to take advantage of the index on the WheelFlag column. Assuming that were after
any WheelFlag option above 0 (no upgrade at all), we can re-write the query this way, telling SQL Server
that the WheelFlag value must be at least the same value as flag (which eliminates 0 and 1), and then
adding the supplemental information that it also must contain that flag (thus eliminating 5).
SELECT OrderID
FROM dbo.CarOrders
WHERE WheelFlag >= @Flag
AND WheelFlag & @Flag = @Flag;
The >= portion of this clause is obviously covered by the BITWISE portion, so this is where we violate
DRY. But because this clause weve added is sargable, relegating the BITWISE AND operation to a
secondary search condition still yields the same result, and the overall query yields better performance.
We see a similar index seek to the hard-coded version of the query above, and while the estimates are
even further off (something that may be addressed as a separate issue), reads are still lower than with
the BITWISE AND operation alone:
We can also see that a filter is used against the index, which we didnt see when using the BITWISE AND
operation alone:
Conclusion
Dont be afraid to repeat yourself. There are times when this information can help the optimizer; even
though it may not be entirely intuitive to *add* criteria in order to improve performance, its important
to understand when additional clauses help whittle the data down for the end result rather than making
it easy for the optimizer to find the exact rows on its own.
[As an aside, I find two things amusing here: (1) that Bing favors Microsoft properties a lot more than
Google does, and (2) that Bing bothers returning 2.2 million results, many of which are likely irrelevant.]
These excerpts are commonly called snippets or query-biased summarizations. Weve been asking
for this functionality in SQL Server for some time, but have yet to hear any good news from Microsoft:
Connect #722324 : Would be nice if SQL Full Text Search provided snippet / highlighting support
Will Sql Server 2012 FTS have native support for hit highlighting?
There are some partial solutions. This script from Mike Kramar, for example, will produce a hithighlighted extract, but does not apply the same logic (such as language-specific word breakers) to the
document itself. It also uses an absolute character count, so the excerpt can begin and end with partial
words (as I will demonstrate shortly). The latter is pretty easy to fix, but another issue is that it loads the
entire document into memory, rather than performing any kind of streaming. I suspect that in full-text
indexes with large document sizes, this will be a noticeable performance hit. For now Ill focus on a
relatively small average document size (35 KB).
A simple example
So lets say we have a very simple table, with a full-text index defined:
CREATE FULLTEXT CATALOG [FTSDemo];
GO
CREATE TABLE [dbo].[Document]
(
[ID] INT IDENTITY(1001,1) NOT NULL,
[Url] NVARCHAR(200) NOT NULL,
[Date] DATE NOT NULL,
[Title] NVARCHAR(200) NOT NULL,
[Content] NVARCHAR(MAX) NOT NULL,
CONSTRAINT PK_DOCUMENT PRIMARY KEY(ID)
);
GO
CREATE FULLTEXT INDEX ON [dbo].[Document]
(
[Content] LANGUAGE [English],
[Title] LANGUAGE [English]
)
KEY INDEX [PK_Document] ON ([FTSDemo]);
This table is populated with a few documents (specifically, 7), such as the Declaration of Independence, and
Nelson Mandelas I am prepared to die speech. A typical full-text search against this table might be:
SELECT d.Title, d.[Content]
FROM dbo.[Document] AS d
INNER JOIN CONTAINSTABLE(dbo.[Document], *, N'states') AS t
ON d.ID = t.[KEY]
ORDER BY [RANK] DESC;
The result returns 4 rows out of 7:
SELECT d.Title,
Excerpt = dbo.HighLightSearch(d.[Content], N'states', 'font-weight:bold', 80)
FROM dbo.[Document] AS d
INNER JOIN CONTAINSTABLE(dbo.[Document], *, N'states') AS t
ON d.ID = t.[KEY]
ORDER BY [RANK] DESC;
The results show how the excerpt works: a <SPAN> tag is injected at the first keyword, and the excerpt is
carved out based on an offset from that position (with no consideration for using complete words):
FROM dbo.[Document] AS d
INNER JOIN CONTAINSTABLE(dbo.[Document], *, N'states') AS t
ON d.ID = t.[KEY]
ORDER BY t.[RANK] DESC;
The results show how the most relevant keywords are highlighted, and an excerpt is derived from that based
on full words and an offset from the term being highlighted:
Some additional advantages that I havent demonstrated here include the ability to choose different
summarization strategies, controlling the presentation of each keyword (rather than all) using unique
CSS, as well as support for multiple languages and even documents in binary format (most IFilters are
supported).
Performance results
Initially I tested the runtime metrics for the three queries using SQL Sentry Plan Explorer, against the 7row table. The results were:
Next I wanted to see how they would compare on a much larger data size. I inserted the table into itself until
I was at 4,000 rows, then ran the following query:
While both hit-highlighting options incur a significant penalty over not highlighting at all, the
ThinkHighlight solution with more flexible options represents a very marginal incremental cost in
terms of duration (~1%), while using significantly less memory (36%) than the UDF variant.
Conclusion
It should not come as a surprise that hit-highlighting is an expensive operation, and based on the
complexity of what has to be supported (think multiple languages), that very few solutions exist out
there. I think Mike Kramar has done an excellent job producing a baseline UDF that gets you a good way
toward solving the problem, but I was pleasantly surprised to find a more robust commercial offering
and found it to be very stable, even in beta form. I do plan to perform more thorough tests using a wider
range of document sizes and types. In the meantime, if hit-highlighting is a part of your application
requirements, you should try out Mike Kramars UDF and consider taking ThinkHighlight for a test drive.
DECLARE @i INT = 1;
DECLARE c CURSOR
-- LOCAL
-- LOCAL STATIC
-- LOCAL FAST_FORWARD
-- LOCAL STATIC READ_ONLY FORWARD_ONLY
FOR
SELECT c1.[object_id]
FROM sys.objects AS c1
CROSS JOIN (SELECT TOP 500 name FROM sys.objects) AS c2
ORDER BY c1.[object_id];
OPEN c;
FETCH c INTO @i;
WHILE (@@FETCH_STATUS = 0)
BEGIN
SET @i += 1; -- meaningless operation
FETCH c INTO @i;
END
CLOSE c;
DEALLOCATE c;
Results
Duration
Quite arguably the most important and common measure is, how long did it take? Well, it took almost
five times as long to run a cursor with the default options (or with only LOCALspecified), compared to
specifying either STATIC or FAST_FORWARD:
Memory
I also wanted to measure the additional memory that SQL Server would request when fulfilling each
cursor type. So I simply restarted before each cold cache test, measuring the performance counter Total
Server Memory (KB) before and after each test. The best combination here was LOCAL FAST_FORWARD:
tempdb usage
This result was surprising to me. Since the definition of a static cursor means that it copies the entire
result to tempdb, and it is actually expressed in sys.dm_exec_cursors as SNAPSHOT, I expected the hit
on tempdb pages to be higher with all static variants of the cursor. This was not the case; again we see a
roughly 5X hit on tempdb usage with the default cursor and the one with only LOCAL specified:
Conclusion
For years I have been stressing that the following option should always be specified for your cursors:
LOCAL STATIC READ_ONLY FORWARD_ONLY
From this point on, until I have a chance to test further permutations or find any cases where it is not the
fastest option, I will be recommending the following:
LOCAL FAST_FORWARD
(As an aside, I also ran tests omitting the LOCAL option, and the differences were negligible.)
That said, this is not necessarily true for *all* cursors. In this case, I am talking solely about cursors
where youre only reading data from the cursor, in a forward direction only, and you arent updating the
underlying data (either by the key or using WHERE CURRENT OF). Those are tests for another day.
Data that would be inserted roughly sequentially in real-time (e.g. events that are happening
right now);
I started with 2 tables like the following, then created 4 more (2 for SMALLDATETIME, 2 for DATE):
CREATE TABLE dbo.BirthDatesRandom_Datetime
(
ID INT IDENTITY(1,1) PRIMARY KEY,
dt DATETIME NOT NULL
);
CREATE TABLE dbo.EventsSequential_Datetime
(
ID INT IDENTITY(1,1) PRIMARY KEY,
dt DATETIME NOT NULL
);
CREATE INDEX d ON dbo.BirthDatesRandom_Datetime(dt);
CREATE INDEX d ON dbo.EventsSequential_Datetime(dt);
Sample Data
To generate some sample data, I used one of my handy techniques for generating something meaningful
from something that is not: the catalog views. On my system this returned 971 distinct date/time values
(1,000,000 rows altogether) in about 12 seconds:
;WITH y AS
(
SELECT TOP (1000000) d = DATEADD(SECOND, x, DATEADD(DAY, DATEDIFF(DAY, x, 0),
'20120101'))
FROM
(
SELECT s1.[object_id] % 1000
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
) AS x(x) ORDER BY NEWID()
)
SELECT DISTINCT d FROM y;
I put these million rows into a table so I could simulate sequential/random inserts using different access
methods for the exact same data from three different session windows:
This process took a little bit longer to complete (20 seconds). Then I created a second table to store the same
data but distributed randomly (so that I could repeat the same distribution across all inserts).
These results were not all that surprising to me inserting in random order led to longer runtimes than
inserting sequentially, something we can all take back to our roots of understanding how indexes in SQL
Server work and how more bad page splits can happen in this scenario (I didnt monitor specifically for
page splits in this exercise, but it is something I will consider in future tests).
I noticed that, on the random side, the implicit conversions on the incoming data might have had an
impact on timings, since they seemed a little bit higher than the native DATETIME -> DATETIME inserts.
So I decided to build two new tables containing source data: one using DATE and one
using SMALLDATETIME. This would simulate, to some degree, converting your data type properly before
passing it to the insert statement, such that an implicit conversion is not required during the insert. Here
are the new tables and how they were populated:
CREATE TABLE dbo.Staging_Random_SmallDatetime
(
ID INT IDENTITY(1,1) PRIMARY KEY,
source_date SMALLDATETIME NOT NULL
);
CREATE TABLE dbo.Staging_Random_Date
(
ID INT IDENTITY(1,1) PRIMARY KEY,
source_date DATE NOT NULL
);
INSERT dbo.Staging_Random_SmallDatetime(source_date)
SELECT CONVERT(SMALLDATETIME, source_date)
FROM dbo.Staging_Random ORDER BY ID;
INSERT dbo.Staging_Random_Date(source_date)
SELECT CONVERT(DATE, source_date)
FROM dbo.Staging_Random ORDER BY ID;
This did not have the effect I was hoping for timings were similar in all cases. So that was a wild goose
chase.
No rocket science here; use a smaller data type, you should use fewer pages. Switching
from DATETIME to DATE consistently yielded a 25% reduction in number of pages used,
whileSMALLDATETIME reduced the requirement by 13-20%.
Now for fragmentation and page density on the non-clustered indexes (there was very little difference for the
clustered indexes):
SELECT '{table_name}',
index_id
avg_page_space_used_in_percent,
avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats
(
DB_ID(), OBJECT_ID('{table_name}'),
NULL, NULL, 'DETAILED'
)
WHERE index_level = 0 AND index_id = 2;
Results:
I was quite surprised to see the ordered data become almost completely fragmented, while the data that was
inserted randomly actually ended up with slightly better page usage. Ive made a note that this warrants
further investigation outside the scope of these specific tests, but it may be something youll want to check
on if you have non-clustered indexes that are relying on largely sequential inserts.
[An online rebuild of the non-clustered indexes on all 6 tables ran in 7 seconds, putting page density back up
to the 99.5% range, and bringing fragmentation down to under 1%. But I didn't run that until performing the
query tests below...]
Range Query Test
Finally, I wanted to see the impact on runtimes for simple date range queries against the different indexes,
both with the inherent fragmentation caused by OLTP-type write activity, and on a clean index that is rebuilt.
The query itself is pretty simple:
Essentially we see slightly higher duration and reads for the DATETIME versions, but very little difference in
CPU. And the differences between SMALLDATETIME and DATE are negligible in comparison. All of the queries
had simplistic query plans like this:
Conclusion
While admittedly these tests are quite fabricated and could have benefited from more permutations,
they do show roughly what I expected to see: the biggest impacts on this specific choice are on space
occupied by the non-clustered index (where choosing a skinnier data type will certainly benefit), and on
the time required to perform inserts in arbitrary, rather than sequential, order (where DATETIME only
has a marginal edge).
Id love to hear your ideas on how to put data type choices like these through more thorough and
punishing tests. I do plan to go into more details in future posts.
But I wanted to compare the performance of some of the more common approaches I see out there.
Ive always used open-ended ranges, and since SQL Server 2008 weve been able to
use CONVERT(DATE)and still utilize an index on that column, which is quite powerful.
SELECT CONVERT(CHAR(8), CURRENT_TIMESTAMP, 112);
SELECT CONVERT(CHAR(10), CURRENT_TIMESTAMP, 120);
ON PRIMARY
(
NAME = N'Datetime_Testing_Data',
FILENAME = N'D:\DATA\Datetime_Testing.mdf',
SIZE = 20480000KB , MAXSIZE = UNLIMITED, FILEGROWTH = 102400KB
)
LOG ON
(
NAME = N'Datetime_Testing_Log',
FILENAME = N'E:\LOGS\Datetime_Testing_log.ldf',
SIZE = 3000000KB , MAXSIZE = UNLIMITED, FILEGROWTH = 20480KB );
Next, I created 12 tables:
AS
SELECT TOP (10000000) d = DATEADD(MINUTE, ROW_NUMBER() OVER
(ORDER BY s1.[object_id]), '19700101')
FROM sys.all_columns AS s1
CROSS JOIN sys.all_objects AS s2
ORDER BY s1.[object_id];
This allowed me to populate the tables this way:
INSERT /* dt_comp_clus */ dbo.datetime_compression_clustered(dt)
SELECT
[table] = OBJECT_NAME([object_id]),
row_count,
page_count = reserved_page_count,
reserved_size_MB = reserved_page_count * 8/1024
FROM sys.dm_db_partition_stats
WHERE OBJECT_NAME([object_id]) LIKE '%datetime%';
Counting the rows for a specific day, using the above seven approaches, as well as the openended date range
Converting all 10,000,000 rows using the above seven approaches, as well as just returning the
raw data (since formatting on the client side may be better)
[With the exception of the FLOAT methods and the DATETIME2 column, since this conversion is not
legal.]
For the first question, the queries look like this (repeated for each table type):
SELECT /* C_CHAR10 - dt_comp_clus */ COUNT(*)
FROM dbo.datetime_compression_clustered
WHERE CONVERT(CHAR(10), dt, 120) = '19860301';
SELECT /* C_CHAR8 - dt_comp_clus */ COUNT(*)
FROM dbo.datetime_compression_clustered
WHERE CONVERT(CHAR(8), dt, 112) = '19860301';
Here we see that the convert to date and the open-ended range using an index are the best performers.
However, against a heap, the convert to date actually takes some time, making the open-ended range
the optimal choice (click to enlarge):
And here are the second set of queries (again, repeating for each table type):
SELECT /* C_CHAR10 - dt_comp_clus */ dt = CONVERT(CHAR(10), dt, 120)
FROM dbo.datetime_compression_clustered;
SELECT /* C_CHAR8 - dt_comp_clus */ dt = CONVERT(CHAR(8), dt, 112)
FROM dbo.datetime_compression_clustered;
SELECT /* C_FLOOR_FLOAT - dt_comp_clus */ dt = CONVERT(DATETIME,
FLOOR(CONVERT(FLOAT, dt)))
FROM dbo.datetime_compression_clustered;
SELECT /* C_DATETIME - dt_comp_clus */ dt = CONVERT(DATETIME, DATEDIFF(DAY,
'19000101', dt))
FROM dbo.datetime_compression_clustered;
SELECT /* C_DATE - dt_comp_clus */ dt = CONVERT(DATE, dt)
FROM dbo.datetime_compression_clustered;
SELECT /* C_INT_FLOAT - dt_comp_clus */ dt = CONVERT(DATETIME, CONVERT(INT,
CONVERT(FLOAT, dt)))
FROM dbo.datetime_compression_clustered;
(For this set of queries, the heap showed very similar results practically indistinguishable.)
Conclusion
In case you wanted to skip to the punchline, these results show that conversions in memory are not
important, but if you are converting data on the way out of a table (or as part of a search predicate), the
method you choose can have a dramatic impact on performance. Converting to a DATE (for a single day)
or using an open-ended date range in any case will yield the best performance, while the most popular
method out there converting to a string is absolutely abysmal.
We also see that compression can have a decent effect on storage space, with very minor impact on
query performance. The effect on insert performance seems to be as dependent on whether or not the
table has a clustered index rather than whether or not compression is enabled. However, with a
clustered index in place, there was a noticeable bump in the duration it took to insert 10 million rows.
Something to keep in mind and to balance with disk space savings.
Clearly there could be a lot more testing involved, with more substantial and varied workloads, which I
may explore further in a future post.
Were returning 500,000 rows, and it takes about 10 seconds. I immediately know that something is
wrong with the logical reads number. Even if I didnt already know about the underlying data, I can tell
from the grid results in Management Studio that this is pulling more than 23 pages of data, whether
they are from memory or cache, and this should be reflected somewhere in STATISTICS IO. Looking at
the plan
we see parallelism is in there, and that weve scanned the entire table. So how is it possible that there are
only 23 logical reads?
We have a slightly less complex plan, and without the parallelism (for obvious reasons), STATISTICS IOis
showing us much more believable numbers for logical read counts.
What is the truth?
Its not hard to see that one of these queries is not telling the whole truth. While STATISTICS IO might
not tell us the whole story, maybe trace will. If we retrieve runtime metrics by generating an actual
execution plan in Plan Explorer, we see that the magical low-read query is, in fact, pulling the data from
memory or disk, and not from a cloud of magic pixie dust. In fact it has *more* reads than the other
version:
So it is clear that reads are happening, theyre just not appearing correctly in the STATISTICS IOoutput.
What is the problem?
Well, Ill be quite honest: I dont know, other than the fact that parallelism is definitely playing a role,
and it seems to be some kind of race condition. STATISTICS IO (and, since thats where we get the data,
our Table I/O tab) shows a very misleading number of reads. Its clear that the query returns all of the
data were looking for, and its clear from the trace results that it uses reads and not osmosis to do so. I
asked Paul White (blog | @SQL_Kiwi) about it and he suggested that only some of the pre-thread I/O
counts are being included in the total (and agrees that this is a bug).
If you want to try this out at home, all you need is AdventureWorks (this should repro against 2008,
2008 R2 and 2012 versions), and the following query:
SET STATISTICS IO ON;
DBCC SETCPUWEIGHT(1000) WITH NO_INFOMSGS;
GO
SELECT TOP (15000) *
FROM Sales.SalesOrderHeader
WHERE OrderDate < (SELECT '20080101');
SELECT TOP (15000) *
FROM Sales.SalesOrderHeader
WHERE OrderDate < (SELECT '20080101')
OPTION (MAXDOP 1);
DBCC SETCPUWEIGHT(1) WITH NO_INFOMSGS;
(Note that SETCPUWEIGHT is only used to coax parallelism. For more info, see Paul Whites blog post on
Plan Costing.)
Results:
Table
reads
Table
ahead
TransactionHistory. Scan
reads 0, lob logical reads
TransactionHistory. Scan
reads 0, lob logical reads
count 1, logical reads 5, physical reads 0, read0, lob physical reads 0, lob read-ahead reads 0.
count 1, logical reads 110, physical reads 0, read0, lob physical reads 0, lob read-ahead reads 0.
So it seems that we can easily reproduce this at will with a TOP operator and a low enough DOP. Ive
filed a bug:
And Paul has filed two other somewhat-related bugs involving parallelism, the first as a result of our
conversation:
Cardinality Estimation Error With Pushed Predicate on a Lookup [ related blog post ]
(For the nostalgic, here are six other parallelism bugs I pointed out a few years ago.)
What is the lesson?
Be careful about trusting a single source. If you look solely at STATISTICS IO after changing a query like
this, you may be tempted to focus on the miraculous drop in reads instead of the increase in duration.
At which point you may pat yourself on the back, leave work early and enjoy your weekend, thinking you
have just made a tremendous performance impact on your query. When of course nothing could be
further from the truth.
The Test
I created a loop where I would run each conversion 1,000,000 times, and then repeat the process for all
18 conversion methods 10 times. This would provide metrics for 10,000,000 conversions for each
method, eliminating any significant statistical skew.
CREATE TABLE #s(j INT, ms INT);
GO
IF @j = 18
SET @d = CAST(@ds AS DATE);
SET @x += 1;
END
INSERT #s SELECT @j, DATEDIFF(MILLISECOND, @t, SYSDATETIME());
SET @j += 1;
END
GO 10
SELECT
j, method = CASE ... END,
MIN(ms), MAX(ms), AVG(ms)
FROM #s
GROUP BY j ORDER BY j;
The Results
I ran this on a Windows 8 VM, with 8 GB RAM and 4 vCPUs, running SQL Server 2012 (11.0.2376). Here
are tabular results, sorted by average duration, fastest first:
If we removed the outlier (which uses SQL Server 2012s new FORMAT function, an obvious dog for this
purpose), wed have a real hard time picking a true loser here. Of the remaining 17 methods, to perform
this conversion a million times, the slowest method is only three seconds slower than the fastest
method, on average. The evidence still supports my earlier assertion that using CAST / CONVERTnatively
is about as efficient as you can get, but the improvement over the other approaches is only marginal,
and it wasnt even the winner in every single run.
<Server>SQL2K12-SVR1</Server>
<SequencingMode>stress</SequencingMode>
<ConnectTimeScale>1</ConnectTimeScale>
<ThinkTimeScale>1</ThinkTimeScale>
<HealthmonInterval>60</HealthmonInterval>
<QueryTimeout>3600</QueryTimeout>
<ThreadsPerClient>255</ThreadsPerClient>
<EnableConnectionPooling>Yes</EnableConnectionPooling>
<StressScaleGranularity>spid</StressScaleGranularity>
</ReplayOptions>
<OutputOptions>
<ResultTrace>
<RecordRowCount>No</RecordRowCount>
<RecordResultSet>No</RecordResultSet>
</ResultTrace>
</OutputOptions>
</Options>
During each of the replay operations, performance counters were collected in five second intervals for
the following counters:
These counters will be used to measure the overall server load, and the throughput characteristics of
each of the tests for comparison.
Test configurations
A total of seven different configurations were tested with Distributed Replay:
Baseline
Server-side Trace
Profiler on server
Profiler remotely
Each test was repeated three times to ensure that the results were consistent across different tests and
to provide an average set of results for comparison. For the initial baseline tests, no additional data
collection was configured for the SQL Server instance, but the default data collections that ship with SQL
Server 2012 were left enabled: the default trace and the system_health event session. This reflects the
general configuration of most SQL Servers, since it is not generally recommended that the default trace
or system_health session be disabled due to the benefits they provide to database administrators. This
test was used to determine the overall baseline for comparison with the tests where additional data
collection was being performed. The remaining tests are based on the TSQL_SPs template that ships
with SQL Server Profiler and collects the following events:
Sessions\ExistingConnection
Stored Procedures\RPC:Starting
Stored Procedures\SP:Completed
Stored Procedures\SP:Starting
Stored Procedures\SP:StmtStarting
TSQL\SQL:BatchStarting
This template was selected based on the workload used for the tests, which is primarily SQL batches
that are captured by the SQL:BatchStarting event, and then a number of events using the various
methods of hierarchyid, which are captured by the SP:Starting, SP:StmtStarting,
and SP:Completedevents. A server-side trace script was generated from the template using the export
functionality in SQL Server Profiler, and the only changes made to the script were to set
the maxfilesize parameter to 500MB, enable trace file rollover, and provide a filename to which the
trace was written.
The third and fourth tests used SQL Server Profiler to collect the same events as the server-side trace to
measure the performance overhead of tracing using the Profiler application. These tests were run using
SQL Profiler locally on the SQL Server and remotely from a separate client to ascertain whether there
was a difference in overhead by having Profiler running locally or remotely.
The final tests used Extended Events collected the same events, and the same columns based on an
event session created using my Trace to Extended Events conversion script for SQL Server 2012. The
tests included evaluating the event_file, ring_buffer, and new streaming provider in SQL Server 2012
separately to determine the overhead that each target might impose on the performance of the
server. Additionally, the event session was configured with the default memory buffer options, but was
changed to specify NO_EVENT_LOSS for the EVENT_RETENTION_MODE option for the event_file and
ring_buffer tests to match the behavior of server-side Trace to a file, which also guarantees no event
loss.
Results
With one exception, the results of the tests were not surprising. The baseline test was able to perform
the replay workload in thirteen minutes and thirty-five seconds, and averaged 2345 batch requests per
second during the tests. With the server-side Trace running, the replay operation completed in 16
minutes and 40 seconds, which is an 18.1% degradation to performance. The Profiler Traces had the
worst performers overall, and required 149 minutes when Profiler was run locally on the server, and 123
minutes and 20 seconds when Profiler was run remotely, yielding 90.8% and 87.6% degradation in
performance respectively. The Extended Events tests were the best performers, taking 15 minutes and
15 seconds for the event_file and 15 minutes and 40 seconds for the ring_buffer target, resulting in a
10.4% and 11.6% degradation in performance. The average results for all tests are displayed in Table 1
and charted in Figure 2:
The Extended Events streaming test is not quite a fair result in the context of the tests that were run and
requires a bit more explanation to understand the result. From the table results we can see that the
streaming tests for Extended Events completed in sixteen minutes and thirty-five seconds, equating to
34.1% degradation in performance. However, if we zoom into the chart and change its scale, as shown in
Figure 3, well see that the streaming had a much greater impact to the performance initially and then
began to perform in a manner similar to the other Extended Events tests:
An exception occurred during event enumeration. Examine the inner exception for more
information.
(Microsoft.SqlServer.XEvent.Linq)
Error 25726, severity 17, state 0 was raised, but no message with that error number
was found in sys.messages. If error is larger than 50000, make sure the user-defined
message is added using sp_addmessage.
(Microsoft SQL Server, Error: 18054)
Conclusions
All of the methods of collecting diagnostics data from SQL Server have observer overhead associated
with them and can impact the performance of a workload under heavy load. For systems running on SQL
Server 2012, Extended Events provide the least amount of overhead and provide similar capabilities for
events and columns as SQL Trace (some events in SQL Trace are rolled up into other events in Extended
Events). Should SQL Trace be necessary for capturing event data which may be the case until thirdparty tools are recoded to leverage Extended Events data a server-side Trace to a file will yield the
least amount of performance overhead. SQL Server Profiler is a tool to be avoided on busy production
servers, as shown by the tenfold increase in duration and significant reduction in throughput for the
replay.
While the results would seem to favor running SQL Server Profiler remotely when Profiler must be used,
this conclusion cannot be definitively drawn based on the specific tests that were run in this scenario.
Additional testing and data collection would have to be performed to determine if the remote Profiler
results were the result of lower context switching on the SQL Server instance, or if networking between
VMs played a factor in the lower performance impact to the remote collection. The point in these tests
was to show the significant overhead that Profiler incurs, regardless of where Profiler was being run.
Finally, the live event stream in Extended Events also has a high overhead when it is actually connected
in collecting data, but as shown in the tests, the Database Engine will disconnect a live stream if it falls
behind on the events to prevent severely impacting the performance of the server.
So heres an example. The % Disk Time and Disk Queue Length PerfMon counters were heavily
recommended as key performance indicators for I/O performance. SQL Server throws a lot of I/O at the
disks using scatter/gather to maximize the utilization of the disk-based I/O subsystem. This approach
leads to short bursts of long queue depths during checkpoints and read-aheads for an instance of SQL
Server. Sometimes the server workload is such that your disk cant keep up with the I/O shoved at it
and, when that happens, youll see long queue lengths too. The short burst scenario isnt a
problem. The lengthening queue length scenario usually is a problem. So is that a good practice?
In a word, not-so-much.
Those counters can still be of some use on an instance of SQL Server which only has one hard disk
(though thats exceedingly rare these days). Why?
The PerfMon counter % Disk Time is a bogus performance metric for several reasons. It does not take
into account asynchronous I/O requests. It cant tell what the real performance profile for an
underlying RAID set may be, since they contain multiple disk drives. The PerfMon counter Disk Queue
Length is also mostly useless, except on SQL Servers with a single physical disk, because the hard disk
controller cache obfuscates how many I/O operations are actually pending on the queue or not. In fact,
some hard disks even have tiny write caches as well, which further muddies the water was to whether
the I/O is truly queued, in a cache somewhere between the operating system and the disk, or has finally
made it all the way to the CMOS on the disk.
Better I/O PerfMon Counters
Instead of using those PerfMon counters, use the Avg Disk Reads/sec, Avg Disk Writes/sec, and Avg
Disk Transfers/sec to track the performance of disk subsystems. These counters track the average
number of read I/Os, write I/Os, and combined read and write I/Os that occured in the last
second. Occassionally, I like to track the same metrics by volume of data rather than the rate of I/O
operations. So, to get that data, you may wish to give these volume-specific PerfMon counters a try: Avg
Disk Transfer Bytes/sec, Avg Disk Read Bytes/sec, and Avg Disk Write Bytes/sec.
For SQL Server I/O Performance, Use Dynamic Management Views (DMV)
And unless youve been living in a cave, you should make sure to use SQL Servers Dynamic Management
Views (DMVs) to check on I/O performance for recent versions of SQL Server. Some of my favorite DMVs
for I/O include:
sys.dm_os_wait_stats
sys.dm_os_waiting_tasks
sys.dm_os_performance_counters
sys.dm_io_virtual_file_stats
sys.dm_io_pending_io_requests
sys.dm_db_index_operational_stats
sys.dm_db_index_usage_stats
So how are you tracking I/O performance metrics? Which ones are you using?
GO
Results:
(0 row(s) affected)
sp_test
foo
(1 row(s) affected)
The performance issue comes from the fact that master might be checked for an equivalent stored
procedure, depending on whether there is a local version of the procedure, and whether there is in fact
an equivalent object in master. This can lead to extra metadata overhead as well as an
additionalSP:CacheMiss event. The question is whether this overhead is tangible.
So lets consider a very simple procedure in a test database:
CREATE DATABASE sp_prefix;
GO
USE sp_prefix;
GO
CREATE PROCEDURE dbo.sp_something
AS
BEGIN
SELECT 'sp_prefix', DB_NAME();
END
GO
And equivalent procedures in master:
USE master;
GO
CREATE PROCEDURE dbo.sp_something
AS
BEGIN
SELECT 'master', DB_NAME();
END
GO
EXEC sp_MS_marksystemobject N'sp_something';
CacheMiss : Fact or Fiction?
If we run a quick test from our test database, we see that executing these stored procedures will never
actually invoke the versions from master, regardless of whether we properly database- or schemaqualify the procedure (a common misconception) or if we mark the master version as a system object:
USE sp_prefix;
GO
EXEC sp_prefix.dbo.sp_something;
GO
EXEC dbo.sp_something;
GO
EXEC sp_something;
Results:
sp_prefix
sp_prefix
sp_prefix
sp_prefix
sp_prefix
sp_prefix
Lets also run a Quick Trace using SQL Sentry Performance Advisor to observe whether there are
anySP:CacheMiss events:
We see CacheMiss events for the ad hoc batch that calls the stored procedure (since SQL Server
generally wont bother caching a batch that consists primarily of procedure calls), but not for the stored
procedure itself. Both with and without the sp_something procedure existing in master (and when it
exists, both with and without it being marked as a system object), the calls to sp_something in the user
database never accidentally call the procedure in master, and never generate anyCacheMiss events
for the procedure.
This was on SQL Server 2012. I repeated the same tests above on SQL Server 2008 R2, and found slightly
different results:
So on SQL Server 2008 R2 we see an additional CacheMiss event that does not occur in SQL Server 2012.
This occurs in all scenarios (no equivalent object master, an object in master marked as a system object,
and an object in master not marked as a system object). Immediately I was curious whether this
additional event would have any noticeable impact on performance.
Performance Issue: Fact or Fiction?
I made an additional procedure without the sp_ prefix to compare raw performance, CacheMiss aside:
USE sp_prefix;
GO
CREATE PROCEDURE dbo.proc_something
AS
BEGIN
SELECT 'sp_prefix', DB_NAME();
END
GO
So the only difference between sp_something and proc_something. I then created wrapper procedures
to execute them 1000 times each, using EXEC sp_prefix.dbo.<procname>, EXEC
dbo.<procname> andEXEC <procname> syntax, with equivalent stored procedures living in master and
marked as a system object, living in master but not marked as a system object, and not living in master
at all.
USE sp_prefix;
GO
CREATE PROCEDURE dbo.wrap_sp_3part
AS
BEGIN
DECLARE @i INT = 1;
WHILE @i <= 1000
BEGIN
EXEC sp_prefix.dbo.sp_something;
SET @i += 1;
END
END
GO
CREATE PROCEDURE dbo.wrap_sp_2part
AS
BEGIN
DECLARE @i INT = 1;
WHILE @i <= 1000
BEGIN
EXEC dbo.sp_something;
SET @i += 1;
END
END
GO
CREATE PROCEDURE dbo.wrap_sp_1part
AS
BEGIN
DECLARE @i INT = 1;
WHILE @i <= 1000
BEGIN
EXEC sp_something;
SET @i += 1;
END
END
GO
-- repeat for proc_something
Measuring runtime duration of each wrapper procedure with SQL Sentry Plan Explorer, the results show
that using the sp_ prefix has a significant impact on average duration in almost all cases (and certainly
on average):
We also see that the performance of SQL Server 2012 trends much better than the performance on SQL
Sevrer 2008 R2 no other variables are different. Both instances are on the same host, and neither is
under memory or other pressure of any kind. This could be a combination of the
additionalCacheMiss event and those transparent improvements you get from enhancements made to
the database engine between versions.
Another side effect : Ambiguity
If you create a stored procedure that references an object you created, say dbo.sp_helptext, and you
didnt realize (or didnt care) that this name collides with a system procedure name, then there is
potential ambiguity when someone is reviewing your stored procedure. They will most likely assume
you meant the system procedure, not a different procedure you created that happens to share its name.
Another interesting thing happens when you create a stored procedure that references a stored
procedure prefixed with sp_ that just happens to also exist in master. Lets pick an existing procedure
that you might not be immediately familiar with (and therefore might be a more likely representative of
the scenario Im describing): sp_resyncuniquetable.
CREATE PROCEDURE dbo.test1
AS
BEGIN
EXEC dbo.sp_resyncuniquetable;
END
GO
In Management Studio, IntelliSense doesnt underline the stored procedure name as invalid, because
there is a valid procedure with that name in master. So without seeing a squiggly line underneath, you
might assume the procedure is already there (and assuming the procedure in master can be executed
without error, this might pass QA/testing as well). If you choose a different name for your resync
procedure, lets say proc_resyncuniquetable, there is absolutely no chance for this ambiguity (unless
someone manually created that procedure in master, which I guess could happen). If the procedure
doesnt exist yet, the caller will still be created successfully (due to deferred name resolution), but you
will receive this warning:
The module 'test1' depends on the missing object 'dbo.proc_resyncuniquetable'.
The module will still be created; however, it cannot run successfully until the
object exists.
One more source of ambiguity can occur in this scenario. The following sequence of events is entirely
plausible:
1. You create the initial version of a procedure, say, sp_foo.
2. The deployer accidentally creates a version in master (and maybe notices, or maybe doesnt, but
in either case doesnt clean up).
3. The deployer (or someone else) creates the procedure, this time in the right database.
4. Over time, you make multiple modifications to your_database.dbo.sp_foo.
5. You replace sp_foo with sp_superfoo, and delete sp_foo from the user database.
6. When updating the application(s) to reference the new stored procedure, you might miss a
replacement or two for various reasons.
So in this scenario, the application is still calling sp_foo, and its not failing even though youve deleted
the local copy since it finds what it thinks is an equivalent in master. Not only is this stored procedure
in master not equivalent to sp_superfoo, its not even equivalent to the latest version ofsp_foo.
Procedure not found is a much easier problem to troubleshoot than Procedure doesnt exist but
code calling it works, and doesnt quite return the expected results.
Conclusion
I still think that, even though the behavior has changed slightly in SQL Server 2012, you shouldnt be
using the sp_ prefix at any time, unless your intention is to create a stored procedure in master *and*
mark it as a system object. Otherwise you are exposed to these performance issues as well as potential
ambiguity on multiple fronts.
And personally, I dont think stored procedures need to have any prefix at all but I have less tangible
evidence to convince you of that, other than asking you what other type of object could it possible be?
You cant execute a view, or a function, or a table
As I suggest often, I dont really care what your naming convention is, as long as youre consistent. But I
think you should avoid potentially harmful prefixes like sp_.
SQL2K12-SVR1
192.168.20.31
SQL2K12-SVR2
192.168.20.32
SQL2K12-SVR3
192.168.20.33
SQL2K12-SVR4
192.168.20.34
SQL2K12-SVR5
192.168.20.35
Setting up an availability group using a dedicated NIC is almost identical to a shared NIC process, only in
order to bind the availability group to a specific NIC, I first have to designate the LISTENER_IPargument
in the CREATE ENDPOINT command, using the aforementioned IP addresses for my dedicated NICs.
Below shows the creation of each endpoint across the five WSFC nodes:
:CONNECT SQL2K12-SVR1
USE [master];
GO
CREATE ENDPOINT [Hadr_endpoint]
AS TCP (LISTENER_PORT = 5022, LISTENER_IP = (192.168.20.31))
FOR DATA_MIRRORING (ROLE = ALL, ENCRYPTION = REQUIRED ALGORITHM AES);
GO
IF (SELECT state FROM sys.endpoints WHERE name = N'Hadr_endpoint') <> 0
BEGIN
ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED;
END
GO
USE [master];
GO
GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [SQLSKILLSDEMOS\SQLServiceAcct];
GO
:CONNECT SQL2K12-SVR2
-- ...repeat for other 4 nodes...
After creating these endpoints associated with the dedicated NIC, the rest of my steps in setting up the
availability group topology are no different than in a shared NIC scenario.
After creating my availability group, if I start driving data modification load against the primary replica
availability databases, I can quickly see that the availability group communication traffic is flowing on
the dedicated NIC using Task Manager on the networking tab (the first section is the throughput for the
dedicated availability group NIC):
And I can also track the stats using various performance counters. In the below image, the Inetl[R]
PRO_1000 MT Network Connection _2 is my dedicated availability group NIC and has the majority of NIC
traffic compared to the two other NICs:
Now having a dedicated NIC for availability group traffic can be a way to isolate activity and theoretically
improve performance, but if your dedicated NIC has insufficient bandwidth, as you might expect
performance will suffer and the health of the availability group topology will degrade.
For example, I changed the dedicated availability group NIC on the primary replica to a 28.8 Kbps
outgoing transfer bandwidth to see what would happen. Needless to say, it wasnt good. The availability
group NIC throughput dropped significantly:
Within a few seconds, the health of the various replicas degraded, with a couple of the replicas moving
to a not synchronizing state:
I increased the dedicated NIC on the primary replica to 64 Kbps and after a few seconds there was an
initial catch-up spike as well:
While things improved, I did witness periodic disconnects and health warnings at this lower NIC
throughput setting:
We now see a new top wait type, HADR_NOTIFICATION_DEQUEUE. This is one of those internal use
only wait types as defined by Books Online, representing a background task that processes WSFC
notifications. Whats interesting is that this wait type doesnt point directly to an issue, and yet the tests
show this wait type rise to the top in association with degraded availability group messaging throughput.
So the bottom line is isolating your availability group activity to a dedicated NIC can be beneficial if
youre providing a network throughput with sufficient bandwidth. However if you cant guarantee good
bandwidth even using a dedicated network, the health of your availability group topology will suffer.
Then I wanted to see what happens in these three cases when any value might be changed, when
particular values might be changed, when no values would be changed, and when all values will be
changed. I could affect this by changing the stored procedure to insert constants into particular
columns, or by changing the way variables were assigned.
-- to show when any value might change in a row, the procedure uses the full
cross join:
SELECT TOP (50000) x1.d, x2.d, x3.d, x4.d, x5.d, x6.d
-- to show when particular values will change on many rows, we can hard-code
constants:
-- two values exempt:
SELECT TOP (50000) N'abc', N'def', x3.d, x4.d, x5.d, x6.d
-- four values exempt:
SELECT TOP (50000) N'abc', N'def', N'ghi', N'jkl', x5.d, x6.d
-- to show when no values will change, we hard-code all six values:
SELECT TOP (50000) N'abc', N'def', N'ghi', N'jkl', N'mno', N'pqr'
-- and to show when all values will change, a different variable assignment
would take place:
DECLARE
@v1 NVARCHAR(50) = N'zzz',
@v2 NVARCHAR(50) = N'zzz',
@v3 NVARCHAR(50) = N'zzz',
@v4 NVARCHAR(50) = N'zzz',
@v5 NVARCHAR(50) = N'zzz',
@v6 NVARCHAR(50) = N'zzz';
Results
After running these tests, the blind update won in every single scenario. Now, youre thinking, whats
a couple hundred milliseconds? Extrapolate. If youre performing a lot of updates in your system, this
can really start to take a toll.
Detailed results in Plan Explorer: Any change | 2 values exempt | 4 values exempt | All values
exempt| All change
Based on feedback from Roji, I decided to test this with a few indexes as well:
CREATE INDEX x1 ON dbo.whatever(v1);
CREATE INDEX x2 ON dbo.whatever(v2);
CREATE INDEX x3 ON dbo.whatever(v3) INCLUDE(v4,v5,v6);
Durations were substantially increased with these indexes:
Detailed results in Plan Explorer: Any change | 2 values exempt | 4 values exempt | All values
exempt| All change
Conclusion
From this test, it seems to me that it is usually not worth checking if a value should be updated. If your
UPDATE statement affects multiple columns, it is almost always cheaper for you to scan all of the
columns where any value might have changed rather than check each column individually. In a future
post, I will investigate whether this scenario is paralleled for LOB columns.
http://www.sqlperformance.com/app/wp-content/uploads/2012/11/SSMS_MP.png
If you look closer, there is very little you can do to control how the task operates. Even the quite
expansive Properties panelexposes a whole lot of settings for the maintenance subplan, but virtually
nothing about theDBCC commands it will run. Personally I think you should take a much more proactive
and controlled approach to how you perform your CHECKDBoperations in production environments, by
creating your own jobs and manually hand-crafting your DBCC commands. You might tailor your
schedule or the commands themselves to different databases for example the ASP.NET membership
database is probably not as crucial as your sales database, and could tolerate less frequent and/or less
thorough checks.
But for your crucial databases, I thought I would put together a post to detail some of the things I would
investigate in order to minimize the disruption DBCC commands may cause and what myths and
marketing hoopla you should be wary of. And I want to thank Paul Mr. DBCC Randal
(blog |@PaulRandal) for providing valuable input not only to this specific post, but also his endless
advice on his blog, #sqlhelp and in SQLskills Immersion training.
Please take all of these ideas with a grain of salt, and do your best to perform adequate testing in your
environment not all of these suggestions will yield better performance in all environments. But you
owe it to yourself, your users and your stakeholders to at least consider the impact that
your CHECKDBoperations might have, and take steps to mitigate those effects where feasible without
introducing unnecessary risk by not checking the right things.
Reduce the noise and consume all errors
No matter where you are running CHECKDB, always use the WITH NO_INFOMSGS option. This simply
suppresses all the irrelevant output that just tells you how many rows are in each table; if youre
interested in that information, you can get it from simple queries against DMVs and not while DBCC is
running. Suppressing the output makes it far less likely that youll miss a critical message buried in all
that happy output.
Similarly, you should always use the WITH ALL_ERRORMSGS option, but especially if you are running SQL
Server 2008 RTM or SQL Server 2005 (in those cases, you may see the list of per-object errors truncated
to 200). For any CHECKDB operations other than quick ad-hoc checks, you should consider directing
output to a file. Management Studio is limited to 1000 lines of output from DBCC CHECKDB, so you
might miss out on some errors if you exceed this figure.
While not strictly a performance issue, using these options will prevent you from having to run the
process again. This is particularly critical if youre in the middle of disaster recovery.
Offload logical checks where possible
In most cases, CHECKDB spends the majority of its time performing logical checks of the data. If you
have the ability to perform these checks on a true copy of the data, you can focus your efforts on the
physical structure of your production systems, and use the secondary server to handle all of the logical
checks and alleviate that load from the primary. By secondary server, I mean only the following:
The place where you test your full restores because you test your restores, right?
Other folks (most notably the behemoth marketing force that is Microsoft) might have convinced you
that other forms of secondary servers are suitable for DBCC checks. For example:
SAN mirroring;
or other variations
Unfortunately, this is not the case, and none of these secondaries are valid, reliable places to perform
your checks as an alternative to the primary. Only a one-for-one backup can serve as a true copy;
anything else that relies on things like the application of log backups to get to a consistent state is not
going to reliably reflect integrity problems on the primary.
So rather than try to offload your logical checks to a secondary and never perform them on the primary,
here is what I suggest:
1. Make sure you are frequently testing the restores of your full backups. And no, this does not
include COPY_ONLY backups from from an AG secondary, for the same reasons as above that
would only be valid in the case where you have just initiated the secondary with a full restore.
2. Run DBCC CHECKDB often against the full restore, before doing anything else. Again, replaying
log records at this point will invalidate this database as a true copy of the source.
3. Run DBCC CHECKDB against your primary, perhaps broken up in ways that Paul Randal suggests,
and/or on a less frequent schedule, and/or using PHYSICAL_ONLY more often than not. This can
depend on how often and reliably you are performing (2).
4. Never assume that checks against the secondary are enough. Even with an exact replica of your
primary database, there are still physical issues that can occur on the I/O subsystem of your
primary that will never propagate to the secondary.
5. Always analyze DBCC output. Just running it and ignoring it, to check it off some list, is as helpful
as running backups and claiming success without ever testing that you can actually restore that
backup when needed.
Experiment with trace flags 2549, 2562, and 2566
Ive done some thorough testing of two trace flags (2549 and 2562) and have found that they can yield
substantial performance improvements. These two trace flags are described in a lot more detail in KB
#2634571, but basically:
This optimizes the checkdb process by treating each individual database file as residing
on a unique underlying disk. This is okay to use if your database has a single data file, or
if you know that each database file is, in fact, on a separate drive. If your database has
multiple files and they share a single, direct-attached spindle, you should be wary of this
trace flag, as it may do more harm than good.
This flag treats the entire checkdb process as a single batch, at the cost of higher
tempdb utilization (up to 5% of the database size).
Uses a better algorithm to determine how to read pages from the database, reducing
latch contention (specifically for DBCC_MULTIOBJECT_SCANNER). Note that this specific
improvement is in the SQL Server 2012 code path, so you will benefit from it even
without the trace flag. This can avoid errors such as:
The above two trace flags are available in the following versions:
If you are still using SQL Server 2005, this trace flag, introduced in 2005 SP2 CU#9
(9.00.3282) (though not documented in that Cumulative Updates Knowledge Base
article, KB #953752), attempts to correct poor performance of DATA_PURITY checks on
x64-based systems. You can see more details in KB #945770. This trace flag should not
be necessary in more modern versions of SQL Server, as the problem in the query
processor has been fixed.
If youre going to use any of these trace flags, I highly recommend setting them at the session level
using DBCC TRACEON rather than as a startup trace flag. Not only does it enable you to turn them off
without having to cycle SQL Server, but it also allows you to implement them only when performing
certain CHECKDB commands, as opposed to operations using any type of repair.
Reduce I/O impact: optimize tempdb
DBCC CHECKDB can make heavy use of tempdb, so make sure you plan for resource utilization there.
This is usually a good thing to do in any case. For CHECKDB youll want to properly allocate space to
tempdb; the last thing you want is for CHECKDB progress (and any other concurrent operations) to have
to wait for an autogrow. You can get an idea for requirements using WITH ESTIMATEONLY, as
Paulexplains here. Just be aware that the estimate can be quite low due to a bug in SQL Server 2008 R2.
Also if you are using trace flag 2562 be sure to accommodate for the additional space requirements.
And of course, all of the typical advice for optimizing tempdb on just about any system is appropriate
here as well: make sure tempdb is on its own set of fast spindles, make sure it is sized to accommodate
all other concurrent activity without having to grow, make sure you are using an optimal number of data
files, etc. A few other resources you might consider:
A SQL Server DBA myth a day: (12/30) tempdb should always have one data file per processor
core
There are some tips here for reducing the risk of these errors during CHECKDB operations, and reducing
their impact in general with several fixes available, depending on your operating system and SQL
Server version:
http://blogs.msdn.com/b/psssql/archive/2009/03/04/workarounds.aspx
http://blogs.msdn.com/b/psssql/archive/2008/07/10/retries.aspx
Reduce CPU impact
DBCC CHECKDB is multi-threaded by default (but only in Enterprise Edition). If your system is CPUbound, or you just want CHECKDB to use less CPU at the cost of running longer, you can consider
reducing parallelism in a couple of different ways:
1. Use Resource Governor on 2008 and above, as long as you are running Enterprise Edition. To
target just DBCC commands for a particular resource pool or workload group, youll have to
write a classifier function that can identify the sessions that will be performing this work (e.g. a
specific login or a job_id).
2. Use Trace flag 2528 to turn off parallelism for DBCC CHECKDB (as well
as CHECKFILEGROUP andCHECKTABLE). Trace flag 2528 is described here. Of course this is only
valid in Enterprise Edition, because in spite of what Books Online currently says, the truth is
that CHECKDB does not go parallel in Standard Edition.
3. While the DBCC command itself does not support OPTION (MAXDOP n), it does respect the
global setting max degree of parallelism. Probably not something I would do in production
unless I had no other options, but this is one overarching way to control
certain DBCC commands if you cant target them more explicitly.
Weve been asking for better control over the number of CPUs that DBCC CHECKDB uses, but theyve
been repeatedly denied. For example, Ola Hallengren asked for the ability to add MAXDOP to the
command to limit the number of CPUs used on a multi-core system: Connect #468694 : MAXDOP option
in DBCC CHECKDB. And Chirag Roy made a similar request (or radically different, depending on your
point of view) to enable CHECKDB to override the server-level setting and use *more* CPUs: Connect
#538754 : Introduce Setting to Force DBCC CHECKDB to run Multi Threaded when MAXDOP = 1.
My Findings
I wanted to demonstrate a few of these techniques in an environment I could control. I
installedAdventureWorks2012, then expanded it using the AW enlarger script written by Jonathan
Kehayias (blog | @SQLPoolBoy), which grew the database to about 7 GB. Then I ran a series
of CHECKDBcommands against it, and timed them. I used a plain vanilla DBCC CHECKDB on its own, then
all other commands used WITH NO_INFOMSGS, ALL_ERRORMSGS. Then four tests with (a) no trace
flags, (b) 2549, (c) 2562, and (d) both 2549 and 2562. Then I repeated those four tests, but added
the PHYSICAL_ONLYoption, which bypasses all of the logical checks. The results (averaged over 10 test
runs) are telling:
Then I expanded the database some more, making many copies of the two enlarged tables, leading to a
database size just north of 70 GB, and ran the tests again. The results, again averaged over 10 test runs:
In these two scenarios, I have learned the following (again, keeping in mind that your mileage may vary,
and that you will need to perform your own tests to draw any meaningful conclusions):
At small database sizes, the NO_INFOMSGS option can cut processing time significantly
when the checks are run in SSMS. On larger databases, however, this benefit diminishes,
as the time and work spent relaying the information becomes such an insignificant
portion of the overall duration. 21 seconds out of 2 minutes is substantial; 88 seconds
out of 35 minutes, not so much.
The two trace flags I tested had a significant impact on performance representing a
runtime reduction of 40-60% when both were used together.
2. When I can push logical checks to a secondary server (again, assuming that I am performing
logical checks elsewhere against a true copy):
o
In my scenario, the trace flags had very little impact on duration when
performingPHYSICAL_ONLY checks.
Of course, and I cant stress this enough, these are relatively small databases and only used so that I
could perform repeated, measured tests in a reasonable amount of time. This was also a fairly beefy
server (80 logical CPUs, 128 GB RAM) and I was the only user. Duration and interaction with other
workloads on the system may skew these results quite a bit. Here is a quick glimpse of typical CPU
usage, using SQL Sentry Performance Advisor, during one of the CHECKDB operations (and none of the
options really changed the overall impact on CPU, just duration):
And here is another view, showing similar CPU profiles for three different sample CHECKDB operations
in historical mode (Ive overlaid a description of the three tests sampled in this range):
On even larger databases, hosted on busier servers, you may see different effects, and your mileage is
quite likely to vary. So please perform your due diligence and test out these options and trace flags
during a typical concurrent workload before deciding how you want to approach CHECKDB.
Conclusion
DBCC CHECKDB is a very important but often undervalued part of your responsibility as a DBA or
architect, and crucial to the protection of your companys data. Do not take this responsibility lightly,
and do your best to ensure that you do not sacrifice anything in the interest of reducing impact on your
production instances. Most importantly: look beyond the marketing data sheets to be sure you fully
understand how valid those promises are and whether you are willing to bet your companys data on
them. Skimping out on some checks or offloading them to invalid secondary locations could be a disaster
waiting to happen.
Finally, if you have an unresolved question about DBCC CHECKDB, post it to the #sqlhelp hash tag on
twitter. Paul checks that tag often and, since his picture should appear in the main Books Onlinearticle,
its likely that if anyone can answer it, he can. If its too complex for 140 characters, you can ask here
(and I will make sure Paul sees it at some point), or post to a forum site such asanswers.sqlsentry.net or dba.stackexchange.com.
With the default configuration that SQL Server applies under the 20 logical processor limitation using
Server+CAL, the first 20 schedulers are VISIBLE ONLINE and any remaining schedulers are VISIBLE
OFFLINE. As a result, performance problems can occur for the instance, due to NUMA node scheduler
imbalances. To demonstrate this I created a VM on our Dell R720 test server which has two sockets and
Intel Xeon E5-2670 processors installed, each with 8 cores and Hyperthreading enabled, providing a
total of 32 logical processors available under Windows Server 2012 Datacenter Edition. The VM was
configured to have 32 virtual CPUs with 16 virtual processors allocated in two vNUMA nodes.
In SQL Server under the Enterprise Server+CAL licensing model, this results in a scheduler configuration
that is similar to the following:
SELECT
parent_node_id,
[status],
scheduler_id,
[cpu_id],
is_idle,
current_tasks_count,
runnable_tasks_count,
active_workers_count,
load_factor
FROM sys.dm_os_schedulers
As you can see, all 16 of the logical processors in the first NUMA node and only four of the logical
processors in the second NUMA node are used by the instance. This results in a significant imbalance of
schedulers between the two NUMA nodes that can lead to significant performance problems under
load. To demonstrate this, I spun up 300 connections running the AdventureWorks Books Online
workload against the instance and then captured the scheduler information for the VISIBLE ONLINE
schedulers in the instance using the following query:
SELECT
parent_node_id,
scheduler_id,
[cpu_id],
is_idle,
current_tasks_count,
runnable_tasks_count,
active_workers_count,
load_factor
FROM sys.dm_os_schedulers
WHERE [status] = N'VISIBLE ONLINE';
An example output of this query under load is shown in Figure 3 below.
You can also see this symptom visually in monitoring tools such as SQL Sentry Performance Advisor:
This information shows a significant imbalance and performance is going to be affected as a result. This
is clearly evident in the runnable tasks counts for the four schedulers in the second NUMA node, which
are three to four times the size of those for the schedulers in the first NUMA node. So what exactly is
the problem and why does this occur?
At first glance you might think that this is a bug in SQL Server, but it isnt. This is something that occurs
by design, though I doubt that this scenario was expected when the 20 logical processor limitation was
originally implemented. On NUMA-based systems, new connections are assigned to the NUMA nodes in
a round-robin fashion, and then inside of the NUMA node the connection is assigned to a scheduler
based on load. If we change the way that we are looking at this data and aggregate the data based on
parent_node_id well see that the tasks are actually being balanced across the NUMA nodes. To do this
well use the following query, the output of which is shown in Figure 5.
SELECT
parent_node_id,
SUM(current_tasks_count) AS current_tasks_count,
SUM(runnable_tasks_count) AS runnable_tasks_count,
SUM(active_workers_count) AS active_workers_count,
AVG(load_factor) AS avg_load_factor
FROM sys.dm_os_schedulers
WHERE [status] = N'VISIBLE ONLINE'
GROUP BY parent_node_id;
This behavior is documented in Books Online for SQL Server (http://msdn.microsoft.com/enus/library/ms180954(v=sql.105).aspx). Knowing what I know about SQLOS, SQL Server, and hardware,
this makes sense. Prior to the 20 logical processor limitation in SQL Server 2012 Enterprise Edition with
Server+CAL licensing, it was a rare scenario that SQL Server would have a scheduler imbalance between
NUMA nodes in a production server. One of the problems in this specific case is the way that the virtual
NUMA was passed through to the VM. Performing the exact same installation on the physical hardware
allows all of the schedulers to be ONLINE VISIBLE since the additional logical processors presented by
the hyperthreads are distinguishable by SQL and free.
In other words, the 20-logical processor limit actually results in 40 schedulers ONLINE if (a) it is not a
virtual machine, (b) the processors are Intel, and (c) hyper-threading is enabled.
So we see this message in the error log:
Date
11/14/2012 10:36:18 PM
Log
SQL Server (Current 11/14/2012 10:36:00 PM)
Source Server
Message
SQL Server detected 2 sockets with 8 cores per socket and 16 logical processors per
socket, 32 total logical processors; using 32 logical processors based on SQL Server
licensing. This is an informational message; no user action is required.
And the same query as above results in all 32 processors being VISIBLE ONLINE:
SELECT
parent_node_id,
[status],
scheduler_id,
[cpu_id],
is_idle,
current_tasks_count,
runnable_tasks_count,
active_workers_count,
load_factor
FROM sys.dm_os_schedulers
WHERE [status] = N'VISIBLE ONLINE';
In this case, since there are only 32 logical processors, the 20 (well, 40) core limit does not impact us at
all, and work is distributed evenly across all of the cores.
In scenarios where the 20 processor limitation affects the NUMA balance of schedulers it is possible to
manually change the server configuration to balance the number of VISIBLE ONLINE schedulers in each
of the NUMA nodes through the use of ALTER SERVER CONFIGURATION. In the VM example provided,
the following command will configure the first 10 logical processors in each NUMA node to VISIBLE
ONLINE.
ALTER SERVER CONFIGURATION
SET PROCESS AFFINITY CPU = 0 TO 9, 16 TO 25;
With this new configuration, running the same workload of 300 sessions and the AdventureWorks Books
Online workload, we can see that the load imbalance no longer occurs.
And again using SQL Sentry Performance Advisor we can see the CPU load distributed more evenly
across both NUMA nodes:
This problem is not strictly limited to VMs and the way that virtual CPUs are presented to the OS. It is
also possible to run into this problem with physical hardware. For example, an older Dell R910 with four
sockets and eight cores per socket, or even an AMD Opteron 6200 Interlagos based server with two
sockets and 16 cores per socket, which presents itself as four NUMA nodes with eight cores each. Under
either of these configurations, the process imbalance can also result in one of the NUMA nodes being
set offline entirely. Consequently, other side effects such as memory from that node being distributed
across the online nodes leading to foreign memory access issues can also degrade performance.
Summary
The default configuration of SQL Server 2012 using the Enterprise Edition licensing for Server+CAL is not
ideal under all NUMA configurations that might exist for SQL Server. Whenever Enterprise Server+CAL
licensing is being used, the NUMA configuration and scheduler statuses per NUMA node needs to be
reviewed to ensure that SQL Server is configured for optimum performance. This problem does not
occur under core-based licensing since all of the schedulers are licensed and VISIBLE ONLINE.
foreign key, and they can impact performance when the primary key value is updated, or if the row is
deleted.
In the AdventureWorks2012 database, there is one table, SalesOrderDetail, with SalesOrderID as a
foreign key. For the SalesOrderDetail table, SalesOrderID and SalesOrderDetailID combine to form the
primary key, supported by a clustered index. If the SalesOrderDetail table did not have an index on
the SalesOrderID column, then when a row is deleted from SalesOrderHeader, SQL Server would have to
verify that no rows for the same SalesOrderID value exist. Without any indexes that contain
the SalesOrderID column, SQL Server would need to perform a full table scan of SalesOrderDetail. As
you can imagine, the larger the referenced table, the longer the delete will take.
An Example
We can see this in the following example, which uses copies of the aforementioned tables from
theAdventureWorks2012 database that have been expanded using a script which can be found here.
The script was developed by Jonathan Kehayias (blog | @SQLPoolBoy) and creates
aSalesOrderHeaderEnlarged table with 1,258,600 rows, and a SalesOrderDetailEnlarged table with
4,852,680 rows. After the script was run, the foreign key constraint was added using the statements
below. Note that the constraint is created with the ON DELETE CASCADE option. With this option, when
an update or delete is issued against the SalesOrderHeaderEnlarged table, rows in the corresponding
table(s) in this case just SalesOrderDetailEnlarged are updated or deleted.
In addition, the default, clustered index for SalesOrderDetailEnglarged was dropped and recreated to
just have SalesOrderDetailID as the primary key, as it represents a typical design.
USE [AdventureWorks2012];
GO
/* remove original clustered index */
ALTER TABLE [Sales].[SalesOrderDetailEnlarged]
DROP CONSTRAINT [PK_SalesOrderDetailEnlarged_SalesOrderID_SalesOrderDetailID];
GO
/* re-create clustered index with SalesOrderDetailID only */
ALTER TABLE [Sales].[SalesOrderDetailEnlarged]
ADD CONSTRAINT [PK_SalesOrderDetailEnlarged_SalesOrderDetailID] PRIMARY KEY CLUSTERED
(
[SalesOrderDetailID] ASC
)
WITH
(
PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY];
GO
/* add foreign key constraint for SalesOrderID */
Using SQL Sentry Plan Explorer, the execution plan shows a clustered index scan
againstSalesOrderDetailEnlarged as there is no index on SalesOrderID:
The nonclustered index to support SalesOrderDetailEnlarged was then created using the following
statement:
USE [AdventureWorks2012];
GO
/* create nonclustered index */
CREATE NONCLUSTERED INDEX [IX_SalesOrderDetailEnlarged_SalesOrderID] ON
[Sales].[SalesOrderDetailEnlarged]
(
[SalesOrderID] ASC
)
WITH
(
PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF,
ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON
)
ON [PRIMARY];
GO
Another delete was executed for a SalesOrderID that affected one row in SalesOrderHeaderEnlargedand
72 rows in SalesOrderDetailEnlarged:
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
USE [AdventureWorks2012];
GO
DELETE FROM [Sales].[SalesOrderHeaderEnlarged] WHERE [SalesOrderID] = 697505;
The statistics IO and timing information showed a dramatic improvement:
And the query plan showed an index seek of the nonclustered index on SalesOrderID, as expected:
The query execution time dropped from 1898 ms to 27 ms a 98.58% reduction, and reads for
theSalesOrderDetailEnlarged table decreased from 50647 to 48 a 99.9% improvement. Percentages
aside, consider the I/O alone generated by the delete. The SalesOrderDetailEnlarged table is only 500
MB in this example, and for a system with 256 GB of available memory, a table taking up 500 MB in the
buffer cache doesnt seem like a terrible situation. But a table of 5 million rows is relatively small; most
large OLTP systems have tables with hundreds of millions rows. In addition, it is not uncommon for
multiple foreign key references to exist for a primary key, where a delete of the primary key requires
deletes from multiple related tables. In that case, it is possible to see extended durations for deletes
which is not only a performance issue, but a blocking issue as well, depending on isolation level.
Conclusion
It is generally recommended to create an index which leads on the foreign key column(s), to support not
only joins between the primary and foreign keys, but also updates and deletes. Note that this is a
general recommendation, as there are edge case scenarios where the additional index on the foreign
key was not used due to extremely small table size, and the additional index updates actually negatively
impacted performance. As with any schema modifications, index additions should be tested and
monitored after implementation. It is important to ensure that the additional indexes produce the
desired effects and do not negatively impact solution performance. It is also worth noting how much
additional space is required by the indexes for the foreign keys. This is essential to consider before
creating the indexes, and if they do provide a benefit, must be considered for capacity planning going
forward.
Based on the query logic, is the following plan shape what you would expect to see?
And what about this alternative plan, where instead of a nested loop we have a hash match?
The correct answer is dependent on a few other factors but one major factor is the number of rows
in each of the tables. In some cases, one physical join algorithm is more appropriate than the other
and if the initial cardinality estimate assumptions arent correct, your query may be using a non-optimal
approach.
Identifying cardinality estimate issues is relatively straightforward. If you have an actual execution plan,
you can compare the estimated versus actual row count values for operators and look for skews.SQL
Sentry Plan Explorer simplifies this task by allowing you to see actual versus estimated rows for all
operators in a single plan tree tab versus having to hover over the individual operators in the graphical
plan:
Now, skews dont always result in poor quality plans, but if you are having performance issues with a
query and you see such skews in the plan, this is one area that is then worthy of further investigation.
Identification of cardinality estimate issues is relatively straightforward, but the resolution often isnt.
There are a number of root causes as to why cardinality estimate issues can occur, and Ill cover ten of
the more common reasons in this post.
Missing or Stale Statistics
Of all the reasons for cardinality estimate issues, this is the one that you hope to see, as it is often
easiest to address. In this scenario, your statistics are either missing or out-of-date. You may have
database options for automatic statistics creation and updates disabled, no recomputed enabled for
specific statistics, or have large enough tables that your automatic statistics updates simply arent
happening frequently enough.
Sampling Issues
It may be that the precision of the statistics histogram is inadequate for example, if you have a very
large table with significant and/or frequent data skews. You may need to change your sampling from the
default or if even that doesnt help investigate using separate tables, filtered statistics or filtered
indexes.
Hidden Column Correlations
The query optimizer assumes that columns within the same table are independent. For example, if you
have a city and state column, we may intuitively know that these two columns are correlated, but SQL
Server does not understand this unless we help it out with an associated multi-column index, or with
manually-created multi-column statistics. Without helping the optimizer with correlation, the selectivity
of your predicates may be exaggerated.
Below is an example of two correlated predicates:
SELECT
lastname,
firstname
FROM dbo.member
WHERE city = 'Minneapolis'
AND state_prov - 'MN';
I happen to know that 10% of our 10,000 row member table qualify for this combination, but the query
optimizer is guessing that it is 1% of the 10,000 rows:
Now contrast this with the appropriate estimate that I see after adding multi-column statistics:
If this is your root cause, you would be well-advised to explore alternatives like temporary tables and or
permanent staging tables where possible.
Scalar and MSTV UDFs
Similar to table variables, multi-statement table-valued and scalar functions are a black-box from a
cardinality estimation perspective. If youre encountering plan quality issues due to them, consider
inline table functions as an alternative or even pulling out the function reference entirely and just
referencing objects directly.
Below shows an estimated versus actual plan when using a multi-statement table-valued function:
of visibility to remote statistics due to insufficient permissions may be the source for your cardinality
estimation issues.
And there are others
There are other reasons why cardinality estimates can be skewed, but I believe Ive covered the most
common ones. The key point is to pay attention to the skews in association with known, poorly
performing queries. Dont assume that the plan was generated based on accurate row count conditions.
If these numbers are skewed, you need to try to troubleshoot this first.
In Plan Explorer, we tried to be helpful by multiplying the estimated number of rows (rounded to 17) by
the number of executions (45), and came up with 765:
For most operators, this approach yields the right data, but due to this bug in SQL Server, it is not
correct for key/RID lookups. Weve adjusted for that, and released the appropriate fix in 7.2.42.0
(download it now!). The graphical plan now properly shows correct row counts for both estimated:
And actual:
Ill repeat Pauls warning: Watch out for poor cardinality estimates when a predicate is applied as part
of a lookup.
There were some more complex problems caused by these misleading estimates, which we have also
addressed. I will blog about a few of those in a follow-up post for this post I just wanted to
demonstrate that we quickly resolved the specific issue Paul highlighted in his post.
So, if an object is modified, say, 20 times, I would expect to pull 40 events. And this is exactly what
happens in SQL Server 2008, 2008 R2 and 2012. The challenge comes when more than 500 modifications
happen (leading to more than 1,000 events). In SQL Server 2008 and 2008 R2, we still capture all events.
But SQL Server 2012 will drop some due to a change in the ring_buffer target. To demonstrate, lets
build a quick, sample event session that trades performance for prevention of losing events (note that
this is not the set of options I would prescribe for any production system):
USE master;
GO
CREATE EVENT SESSION [XE_Alter] ON SERVER
ADD EVENT sqlserver.object_altered
(
ACTION (sqlserver.server_principal_name)
WHERE (sqlserver.session_id = 78) -- change 78 to your current spid
)
ADD TARGET package0.ring_buffer (SET MAX_MEMORY = 4096)
WITH (EVENT_RETENTION_MODE = NO_EVENT_LOSS, MAX_DISPATCH_LATENCY = 5 SECONDS);
ALTER EVENT SESSION [XE_Alter] ON SERVER STATE = START;
GO
With the session started, in the same window, run the following script, which creates two procedures,
and alters them in a loop.
CREATE PROCEDURE dbo.foo_x AS SELECT 1;
GO
CREATE PROCEDURE dbo.foo_y AS SELECT 1;
GO
ALTER PROCEDURE dbo.foo_x AS SELECT 2;
GO 275
ALTER PROCEDURE dbo.foo_y AS SELECT 2;
GO 275
DROP PROCEDURE dbo.foo_x, dbo.foo_y;
GO
Now, lets pull the object name, and how many times each object was modified from the target, and
drop the event session (be patient; on my system, this consistently takes about 40 seconds):
;WITH raw_data(t) AS
(
SELECT CONVERT(XML, target_data)
FROM sys.dm_xe_sessions AS s
INNER JOIN sys.dm_xe_session_targets AS st
ON s.[address] = st.event_session_address
WHERE s.name = 'XE_Alter'
AND st.target_name = 'ring_buffer'
),
xml_data (ed) AS
(
SELECT e.query('.')
FROM raw_data
CROSS APPLY t.nodes('RingBufferTarget/event') AS x(e)
)
SELECT [object_name] = obj, event_count = COUNT(*)
FROM
(
SELECT
--[login] =
ed.value('(event/action[@name="server_principal_name"]/value)[1]',
'nvarchar(128)'),
obj = ed.value('(event/data[@name="object_name"]/value)[1]',
'nvarchar(128)'),
phase = ed.value('(event/data[@name="ddl_phase"]/text)[1]', 'nvarchar(128)')
FROM xml_data
) AS x
WHERE phase = 'Commit'
GROUP BY obj;
GO
DROP EVENT SESSION [XE_Alter] ON SERVER;
GO
Results (which ignore exactly half of the 1,000 captured events, focusing on Commit events only):
object_name
===========
foo_x
foo_y
event_count
===========
225
275
This shows that 50 commit events (100 events total) were dropped for foo_x, and exactly 1,000 total
events have been collected ((225 + 275) * 2)). SQL Server seems to arbitrarily decide which events to
drop in theory, if it were collecting 1,000 events and then stopping, I should have 275 events forfoo_x,
and 225 for foo_y, since I altered foo_x first, and I shouldnt have hit the cap until after that loop was
completed. But obviously there are some other mechanics at play here in how XEvents decides which
events to keep and which events to throw away.
In any case, you can get around this by specifying a different value for MAX_EVENTS_LIMIT in the ADD
TARGET portion of the code:
-- ...
ADD TARGET package0.ring_buffer (SET MAX_MEMORY = 4096, MAX_EVENTS_LIMIT = 0)
------------------------------------------------------^^^^^^^^^^^^^^^^^^^^^^
-- ...
Note that 0 = unlimited, but you can specify any integer value. When we run our test above with the
new setting, we see more accurate results, since no events were dropped:
object_name
===========
foo_x
foo_y
event_count
===========
275
275
As mentioned above, if you attempt to use this property when creating an event session against SQL
Server 2008 / 2008 R2, you will get this error:
So if you are doing any kind of code generation and want consistent behavior across versions, youll have to
check the version first, and only include the attribute for 2012 and above.
Conclusion
If you are upgrading from SQL Server 2008 / 2008 R2 to 2012, or have written Extended Events code that
targets multiple versions, you should be aware of this behavior change and code accordingly. Otherwise
you risk dropping events, even in situations where you would assume and where previous behavior
would imply that dropped events were not possible. This isnt something tools like the Upgrade
Advisor or Best Practices Analyzer are going to point out for you.
The underlying mechanics surrounding this problem are described in detail in this bug report and this
blog post.
Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT
EXISTS?
By Aaron Bertrand
A pattern I see quite a bit, and wish that I didnt, is NOT IN. Lets say you want to find all the patients
who have never had a flu shot. Or, in AdventureWorks2012, a similar question might be, show me all of
the customers who have never placed an order. Expressed using NOT IN, a pattern I see all too often,
that would look something like this (note that Im using the enlarged header and detail tables from this
script by Jonathan Kehayias (blog | @SQLPoolBoy)):
SELECT CustomerID
FROM Sales.Customer
WHERE CustomerID NOT IN
(
SELECT CustomerID
FROM Sales.SalesOrderHeaderEnlarged
);
When I see this pattern, I cringe. But not for performance reasons after all, it creates a decent enough
plan in this case:
The main problem is that the results can be surprising if the target column is NULLable (SQL Server
processes this as a left anti semi join, but cant reliably tell you if a NULL on the right side is equal to or
not equal to the reference on the left side). Also, optimization can behave differently if the column is
NULLable, even if it doesnt actually contain any NULL values (Gail Shaw talked about this back in 2010).
In this case, the target column is not nullable, but I wanted to mention those potential issues with NOT
IN I may investigate these issues more thoroughly in a future post.
TL;DR version
Instead of NOT IN, use a correlated NOT EXISTS for this query pattern. Always. Other methods may rival
it in terms of performance, when all other variables are the same, but all of the other methods
introduce either performance problems or other challenges.
Alternatives
So what other ways can we write this query?
OUTER APPLY
One way we can express this result is using a correlated OUTER APPLY.
SELECT c.CustomerID
FROM Sales.Customer AS c
OUTER APPLY
(
SELECT CustomerID
FROM Sales.SalesOrderHeaderEnlarged
WHERE CustomerID = c.CustomerID
) AS h
WHERE h.CustomerID IS NULL;
Logically, this is also a left anti semi join, but the resulting plan is missing the left anti semi join operator,
and seems to be quite a bit more expensive than the NOT IN equivalent. This is because it is no longer a
left anti semi join; it is actually processed in a different way: an outer join brings in all matching and nonmatching rows, and *then* a filter is applied to eliminate the matches:
You need to be careful, though, about what column you check for NULL. In this case CustomerIDis the
logical choice because it is the joining column; it also happens to be indexed. I could have
picked SalesOrderID, which is the clustering key, so it is also in the index on CustomerID. But I could
have picked another column that is not in (or that later gets removed from) the index used for the join,
leading to a different plan. Or even a NULLable column, leading to incorrect (or at least unexpected)
results, since there is no way to differentiate between a row that doesnt exist and a row that does exist
but where that column is NULL. And it may not be obvious to the reader / developer / troubleshooter
that this is the case. So I will also test these three WHEREclauses:
WHERE h.SalesOrderID IS NULL; -- clustered, so part of index
WHERE h.SubTotal IS NULL; -- not nullable, not part of the index
WHERE h.Comment IS NULL; -- nullable, not part of the index
The first variation produces the same plan as above. The other two choose a hash join instead of a
merge join, and a narrower index in the Customer table, even though the query ultimately ends up
reading the exact same number of pages and amount of data. However, while theh.SubTotal variation
produces the correct results:
The h.Comment variation does not, since it includes all of the rows where h.Comment IS NULL, as well as
all of the rows that did not exist for any customer. Ive highlighted the subtle difference in the number
of rows in the output after the filter is applied:
In addition to needing to be careful about column selection in the filter, the other problem I have with
the LEFT OUTER JOIN form is that it is not self-documenting, in the same way that an inner join in the
old-style form of FROM dbo.table_a, dbo.table_b WHERE ... is not self-documenting. By that I mean it
is easy to forget the join criteria when it is pushed to the WHERE clause, or for it to get mixed in with
other filter criteria. I realize this is quite subjective, but there it is.
EXCEPT
If all we are interested in is the join column (which by definition is in both tables), we can useEXCEPT
an alternative that doesnt seem to come up much in these conversations (probably because usually
you need to extend the query in order to include columns youre not comparing):
SELECT CustomerID
FROM Sales.Customer AS c
EXCEPT
SELECT CustomerID
FROM Sales.SalesOrderHeaderEnlarged;
This comes up with the exact same plan as the NOT IN variation above:
One thing to keep in mind is that EXCEPT includes an implicit DISTINCT so if you have cases where you
want multiple rows having the same value in the left table, this form will eliminate those duplicates.
Not an issue in this specific case, just something to keep in mind just likeUNION versus UNION ALL.
NOT EXISTS
My preference for this pattern is definitely NOT EXISTS:
SELECT CustomerID
FROM Sales.Customer AS c
WHERE NOT EXISTS
(
SELECT 1
FROM Sales.SalesOrderHeaderEnlarged
WHERE CustomerID = c.CustomerID
);
(And yes, I use SELECT 1 instead of SELECT * not for performance reasons, but simply to clarify intent:
this subquery does not return any data.)
Its performance is similar to NOT IN and EXCEPT, and it produces an identical plan, but is not prone to
the potential issues caused by NULLs or duplicates:
Performance Tests
I ran a multitude of tests, with both a cold and warm cache, to validate that my long-standing
perception about NOT EXISTS being the right choice remained true. The typical output looked like this:
Ill take the incorrect result out of the mix when showing the average performance of 20 runs on a graph
(I only included it to demonstrate how wrong the results are), and I did execute the queries in different
order across tests to make sure that one query was not consistently benefitting from the work of a
previous query. Focusing on duration, here are the results:
If we look at duration and ignore reads, NOT EXISTS is your winner, but not by much. EXCEPT and NOT IN
arent far behind, but again you need to look at more than performance to determine whether these
options are valid, and test in your scenario.
What if there is no supporting index?
The queries above benefit, of course, from the index onSales.SalesOrderHeaderEnlarged.CustomerID.
How do these results change if we drop this index? I ran the same set of tests again, after dropping the
index:
DROP INDEX [IX_SalesOrderHeaderEnlarged_CustomerID]
ON [Sales].[SalesOrderHeaderEnlarged];
This time there was much less deviation in terms of performance between the different methods. First
Ill show the plans for each method (most of which, not surprisingly, indicate the usefulness of the
missing index we just dropped). Then Ill show a new graph depicting the performance profile both with
a cold cache and a warm cache.
NOT IN, EXCEPT, NOT EXISTS (all three were identical)
OUTER APPLY
Performance Results
We can immediately see how useful the index is when we look at these new results. In all but one case
(the left outer join that goes outside the index anyway), the results are clearly worse when weve
dropped the index:
So we can see that, while there is less noticeable impact, NOT EXISTS is still your marginal winner in
terms of duration. And in situations where the other approaches are susceptible to schema volatility, it
is your safest choice, too.
Conclusion
This was just a really long-winded way of telling you that, for the pattern of finding all rows in table A
where some condition does not exist in table B, NOT EXISTS is typically going to be your best choice. But,
as always, you need to test these patterns in your own environment, using your schema, data and
hardware, and mixed in with your own workloads.
Unused indexes come from a variety of sources such as someone mistakenly creating an index per table
column, someone creating every index suggested by the missing index DMVs, or someone creating all
indexes suggested by the Database Tuning Advisor. It could also be that the workload characteristics
have changed and so what used to be useful indexes are no longer being used.
Wherever they came from, unused indexes should be removed to reduce their overhead. You can
determine which indexes are unused using the sys.dm_db_index_usage_stats DMV, and I recommend
you read posts by my colleagues Kimberly L. Tripp (here), and Joe Sack (here and here), as they explain
how to use the DMV correctly.
Index Fragmentation
Most people think of index fragmentation as a problem that affects queries that have to read large
amounts of data. While this is one of the problems that fragmentation can cause, fragmentation is also a
problem because of how it occurs.
Fragmentation is caused by an operation called a page split. The simplest cause of a page split is when
an index record must be inserted on a particular page (because of its key value) and the page does not
have enough free space. In this scenario, the following operations will take place:
Some of the records from the full page are moved to the new page, thus creating free space in
the required page
All of these operations generate log records, and as you might imagine, this can be significantly more
than is required to insert a new record on a page that does not require a page split. Back in 2009
Iblogged an analysis of page split cost in terms of the transaction log and found some cases where a
page split generated over 40 times more transaction log than a regular insert!
The first step in reducing the extra cost is to remove unused indexes, as I described above, so that
theyre not generating page splits. The second step is to identify remaining indexes that are becoming
fragmented (and so must be suffering page splits) using the sys.dm_db_index_physical_stats DMV (or
SQL Sentrys new Fragmentation Manager) and proactively creating free space in them using an index
fillfactor. A fillfactor instructs SQL Server to leave empty space on index pages when the index is built,
rebuilt, or reorganized so that there is space to allow new records to be inserted without requiring a
page split, hence cutting down on the extra log records generated.
Of course nothing comes for free the trade-off when using fillfactors is that you are proactively
provisioning extra space in the indexes to prevent more log records being generated but thats usually
a good trade-off to make. Choosing a fillfactor is relatively easy and I blogged about that here.
Summary
Reducing the write latency of a transaction log file does not always mean moving to a faster I/O
subsystem, or segregating the file into its own portion of the I/O subsystem. With some simple analysis
of the indexes in your database, you may be able to significantly reduce the amount of transaction log
records being generated, leading to a commensurate reduction in write latency.
There are other, more subtle issues that can affect transaction log performance, and Ill explore those in
a future post.
The deprecated sp_getbindtoken and sp_bindsession system stored procedures used to handle
bound connections
Distributed transactions
My goal was to test each technology and see if it influenced the TRANSACTION_MUTEX wait type.
The first test I performed used the deprecated sp_getbindtoken and sp_bindsession stored procedures.
The sp_getbindtoken returns a transaction identifier which can then be used bysp_bindsession to bind
multiple sessions together on the same transaction.
Before each test scenario, I made sure to clear my test SQL Server instances wait statistics:
DBCC SQLPERF('waitstats', CLEAR);
GO
My test SQL Server instance was running SQL Server 2012 SP1 Developer Edition (11.0.3000). I used
theCredit sample database, although you could use any other kind of sample database like
AdventureWorks if you wanted to, as the schema and data distribution isnt directly relevant to the
subject of this article and wasnt necessary in order to drive the TRANSACTION_MUTEX wait time.
sp_getbindtoken / sp_bindsession
In the first session window of SQL Server Management Studio, I executed the following code to begin a
transaction and output the bind token for enlistment by the other planned sessions:
USE Credit;
GO
BEGIN TRANSACTION;
DECLARE @out_token varchar(255);
EXECUTE sp_getbindtoken @out_token OUTPUT;
waiting_tasks_count
wait_time_ms
max_wait_time_ms
TRANSACTION_MUTEX
181732
93899
So I see that there were two waiting tasks (the two sessions that were simultaneously trying to update
the same table via the loop). Since I hadnt executed SET NOCOUNT ON, I was able to see that only the
first executed UPDATE loop got changes in. I tried this similar technique using a few different variations
(for example four overall sessions, with three waiting) and the TRANSACTION_MUTEXincrementing
showed similar behavior. I also saw the TRANSACTION_MUTEX accumulation when simultaneously
updating a different table for each session so modifications against the same object in a loop wasnt
required in order to reproduce the TRANSACTION_MUTEX wait time accumulation.
Distributed transactions
My next test involved seeing if TRANSACTION_MUTEX wait time was incremented for distributed
transactions. For this test, I used two SQL Server instances and a linked server connected between the
two of them. MS DTC was running and properly configured, and I executed the following code that
performed a local DELETE and a remote DELETE via the linked server and then rolled back the changes:
USE Credit;
GO
SET XACT_ABORT ON;
-- Assumes MS DTC service is available, running, properly configured
BEGIN DISTRIBUTED TRANSACTION;
DELETE [dbo].[charge] WHERE charge_no = 1;
DELETE [JOSEPHSACK-PC\AUGUSTUS].[Credit].[dbo].[charge] WHERE charge_no = 1;
ROLLBACK TRANSACTION;
The TRANSACTION_MUTEX showed no activity on the local server:
wait_type
signal_wait_time_ms
waiting_tasks_count
wait_time_ms
max_wait_time_ms
TRANSACTION_MUTEX
However the waiting tasks count was incremented on the remote server:
wait_type
signal_wait_time_ms
waiting_tasks_count
wait_time_ms
max_wait_time_ms
TRANSACTION_MUTEX
So my expectation to see this was confirmed given that we have one distributed transaction with more
than one session involved in some way with the same transaction.
MARS (Multiple Active Result Sets)
What about the use of Multiple Active Result Sets (MARS)? Would we also expect to
seeTRANSACTION_MUTEX accumulate when associated with MARS usage?
For this, I used the following C# console application code tested from Microsoft Visual Studio against my
SQL Server 2012 instance and the Credit database. The logic of what Im actually doing isnt very useful
(returns one row from each table), but the data readers are on the same connection and the connection
attribute MultipleActiveResultSets is set to true, so it was enough to verify if indeed MARS could
drive TRANSACTION_MUTEX accumulation as well:
string ConnString =
@"Server=.;Database=Credit;Trusted_Connection=True;MultipleActiveResultSets=t
rue;";
SqlConnection MARSCon = new SqlConnection(ConnString);
MARSCon.Open();
SqlCommand MARSCmd1 = new SqlCommand("SELECT payment_no FROM dbo.payment;",
MARSCon);
SqlCommand MARSCmd2 = new SqlCommand("SELECT charge_no FROM dbo.charge;",
MARSCon);
SqlDataReader MARSReader1 = MARSCmd1.ExecuteReader();
SqlDataReader MARSReader2 = MARSCmd2.ExecuteReader();
MARSReader1.Read();
MARSReader2.Read();
Console.WriteLine("\t{0}", MARSReader1[0]);
Console.WriteLine("\t{0}", MARSReader2[0]);
After executing this code, I saw the following accumulation for TRANSACTION_MUTEX:
wait_type
signal_wait_time_ms
waiting_tasks_count
wait_time_ms
max_wait_time_ms
TRANSACTION_MUTEX
So as you can see, the MARS activity (however trivially implemented) caused an uptick in
theTRANSACTION_MUTEX wait type accumulation. And the connection string attribute itself doesnt
drive this, the actual implementation does. For example, I removed the second reader implementation
and just maintained a single reader with MultipleActiveResultSets=true, and as expected, there was
noTRANSACTION_MUTEX wait time accumulation.
Conclusion
If you are seeing high TRANSACTION_MUTEX waits in your environment, I hope this post gives you some
insight into three avenues to explore - to determine both where these waits are coming from, and
whether or not they are necessary.
A transaction abort log record is generated at the end of a transaction roll back.
60KB of log records have been generated since the previous log flush.
The smallest log flush possible is a single 512-byte log block. If all transactions in a workload are very
small (e.g. inserting a single, small table row) then there will be lots of minimally-sized log flushes
occurring. Log flushes are performed asynchronously, to allow decent transaction log throughput, but
there is a fixed limit of 32 concurrent log-flush I/Os at any one time.
There are two possible effects this may have:
1. On a slow-performing I/O subsystem, the volume of tiny transaction log writes could overwhelm
the I/O subsystem leading to high-latency writes and subsequent transaction log throughput
degradation. This situation can be identified by high-write latencies for the transaction log file in
the output of sys.dm_io_virtual_file_stats (see the demo links at the top of the previous post)
2. On a high-performing I/O subsystem, the writes may complete extremely quickly, but the limit
of 32 concurrent log-flush I/Os creates a bottleneck when trying to make the log records durable
on disk. This situation can be identified by low write latencies and a near-constant number of
outstanding transaction log writes near to 32 in the aggregated output of
sys.dm_io_pending_io_requests (see the same demo links).
In both cases, making transactions longer (which is very counter-intuitive!) can reduce the frequency of
transaction log flushes and increase performance. Additionally, in case #1, moving to a higherperforming I/O subsystem may help but may lead to case #2. With case #2, if the transactions cannot
be made longer, the only alternative is to split the workload over multiple databases to get around the
fixed limit of 32 concurrent log-flush I/Os.
Transaction Log Auto-Growth
Whenever new space is added to the transaction log it must be zero-initialized (writing out zeroes to
overwrite the previous use of that portion of the disk), no matter whether the instant file initialization
feature is enabled or not. This applies to creation, manual growth, and auto-growth of the transaction
log. While the zero initialization is taking places, log records cannot be flushed to the log, so autogrowth during a workload that is changing data can lead to a noticeable drop in throughput, especially if
the auto-growth size is set to be large (say gigabytes, or left at the default 10%).
Auto-growth should be avoided, then, if at all possible by allowing the transaction log to clear so there is
always free space that can be reused for new log records. Transaction log clearing (also known as
transaction log truncation, not to be confused with transaction log shrinking) is performed by
transaction log backups when using the Full or Bulk-Logged recovery modes, and by checkpoints when
using the Simple recovery mode.
Log clearing can only occur if nothing requires the log records in the section of transaction log being
cleared. One common problem that prevents log clearing is having long-running transactions. Until a
transaction commits or rolls back, all the log records from the beginning of the transaction onwards are
required in case the transaction rolls back including all the log records from other transactions that are
interspersed with those from the long-running transaction. Long-running transactions could be because
of poor design, code that is waiting for human input, or improper use of nested transactions, for
example. All of these can be avoided once they are identified as a problem.
You can read more about this here and here.
High-Availability Features
Some high-availability features can also delay transaction log clearing:
Database mirroring and availability groups when running asynchronously can build up a queue
of log records that have not yet been sent to the redundant database copy. These log records
must be kept around until theyre sent, delaying transaction log clearing.
Transactional replication (and also Change Data Capture) relies on a Log Reader Agent job to
periodically scan the transaction log for transactions that modify a table contained in a
replication publication. If the Log Reader Agent falls behind for any reason, or is purposefully
made to run infrequently, all the log records that have not been scanned by the job must be
kept around, delaying transaction log clearing.
When running in synchronous mode, database mirroring and availability groups can cause other
problems with the logging mechanism. Using synchronous database mirroring as an example, any
transaction that commits on the principal cannot actually return to the user or application until all log
records it generated have successfully been sent to the mirror server, adding a commit delay on the
principal. If the average transaction size is long, and the delay is short, this may not be noticeable, but if
the average transaction is very short, and the delay is quite long, this can have a noticeable effect on the
workload throughput. In that case, either the performance goals of the workload need to be changed,
the high-availability technology changed to asynchronous mode, or the network bandwidth and speed
between the principal and redundant databases must be increased.
Incidentally, the same kind of issue can occur if the I/O subsystem itself is synchronously mirrored with
a potential delay for all writes that SQL Server performs.
Summary
As you can see, transaction log performance is not just about extra transaction log records being
generated there are many environmental factors that can have a profound effect too.
The bottom line is that transaction log health and performance are of paramount importance for
maintaining overall workload performance. In these two posts Ive detailed the major causes of
transaction log performance problems so hopefully youll be able to identify and remediate any that you
have.
If you want to learn a whole lot more about transaction log operations and performance tuning, I
recommend that you check out my 7.5 hour online training course on logging, recovery, and the
transaction log, available through Pluralsight.
The problem with this query is that, if there are no orders on a certain day, there will be no row for that
day. This can lead to confusion, misleading data, or even incorrect calculations (think daily averages) for
the downstream consumers of the data.
So there is a need to fill those gaps with the dates that are not present in the data. And sometimes
people will stuff their data into a #temp table and use a WHILE loop or a cursor to fill in the missing
dates one-by-one. I wont show that code here because I dont want to advocate its use, but Ive seen it
all over the place.
Before we get too deep into dates, though, lets first talk about numbers, since you can always use a
sequence of numbers to derive a sequence of dates.
Numbers table
Ive long been an advocate of storing an auxiliary numbers table on disk (and, for that matter, a
calendar table as well).
Here is one way to generate a simple numbers table with 1,000,000 values:
SELECT TOP (1000000) n = CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id]))
INTO dbo.Numbers
FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
OPTION (MAXDOP 1);
CREATE UNIQUE CLUSTERED INDEX n ON dbo.Numbers(n)
-- WITH (DATA_COMPRESSION = PAGE)
;
Why MAXDOP 1? See Paul Whites blog post and his Connect item relating to row goals.
However, many people are opposed to the auxiliary table approach. Their argument: why store all that
data on disk (and in memory) when they can generate the data on-the-fly? My counter is to be realistic
and think about what youre optimizing; computation can be expensive, and are you sure that
calculating a range of numbers on the fly is always going to be cheaper? As far as space, the Numbers
table only takes up about 11 MB compressed, and 17 MB uncompressed. And if the table is referenced
frequently enough, it should always be in memory, making access fast.
Lets take a look at a few examples, and some of the more common approaches used to satisfy them. I
hope we can all agree that, even at 1,000 values, we dont want to solve these problems using a loop or
a cursor.
Generating a sequence of 1,000 numbers
Starting simple, lets generate a set of numbers from 1 through 1,000.
Numbers table
Of course with a numbers table this task is pretty simple:
SELECT TOP (1000) n FROM dbo.Numbers ORDER BY n;
spt_values
This is a table that is used by internal stored procedures for various purposes. Its use online seems to be
quite prevalent, even though it is undocumented, unsupported, it may disappear one day, and because
it only contains a finite, non-unique, and non-contiguous set of values. There are 2,164 unique and 2,508
total values in SQL Server 2008 R2; in 2012 there are 2,167 unique and 2,515 total. This includes
duplicates, negative values, and even if using DISTINCT, plenty of gaps once you get beyond the number
2,048. So the workaround is to use ROW_NUMBER() to generate a contiguous sequence, starting at 1,
based on the values in the table.
SELECT TOP (1000) n = ROW_NUMBER() OVER (ORDER BY number)
FROM [master]..spt_values ORDER BY n;
Plan:
That said, for only 1,000 values, you could write a slightly simpler query to generate the same sequence:
SELECT DISTINCT n = number FROM master..[spt_values] WHERE number BETWEEN 1 AND 1000;
This leads to a simpler plan, of course, but breaks down pretty quickly (once your sequence has to be
more than 2,048 rows):
In any case, I do not recommend the use of this table; Im including it for comparison purposes, only
because I know how much of this is out there, and how tempting it might be to just re-use code you
come across.
sys.all_objects
Another approach that has been one of my favorites over the years is to use sys.all_objects.
Like spt_values, there is no reliable way to generate a contiguous sequence directly, and we have the
same issues dealing with a finite set (just under 2,000 rows in SQL Server 2008 R2, and just over 2,000
rows in SQL Server 2012), but for 1,000 rows we can use the same ROW_NUMBER() trick. The reason I
like this approach is that (a) there is less concern that this view will disappear anytime soon, (b) the view
itself is documented and supported, and (c) it will run on any database on any version since SQL Server
2005 without having to cross database boundaries (including contained databases).
SELECT TOP (1000) n = ROW_NUMBER() OVER (ORDER BY [object_id]) FROM sys.all_objects
ORDER BY n;
Plan:
Stacked CTEs
I believe Itzik Ben-Gan deserves the ultimate credit for this approach; basically you construct a CTE with
a small set of values, then you create the Cartesian product against itself in order to generate the
number of rows you need. And again, instead of trying to generate a contiguous set as part of the
underlying query, we can just apply ROW_NUMBER() to the final result.
;WITH e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), -- 10
e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b), -- 10*10
e3(n) AS (SELECT 1 FROM e1 CROSS JOIN e2) -- 10*100
SELECT n = ROW_NUMBER() OVER (ORDER BY n) FROM e3 ORDER BY n;
Plan:
Recursive CTE
Finally, we have a recursive CTE, which uses 1 as the anchor, and adds 1 until we hit the maximum. For
safety I specify the maximum in both the WHERE clause of the recursive portion, and in
the MAXRECURSION setting. Depending on how many numbers you need, you may have to
set MAXRECURSION to 0.
;WITH n(n) AS
(
SELECT 1
UNION ALL
Performance
Of course with 1,000 values the differences in performance is negligible, but it can be useful to see how
these different options perform:
I ran each query 20 times and took average runtimes. I also tested the dbo.Numbers table, in both
compressed and uncompressed formats, and with both a cold cache and a warm cache. With a warm
cache it very closely rivals the other fastest options out there (spt_values, not recommended, and
stacked CTEs), but the first hit is relatively expensive (though I almost laugh calling it that).
To Be Continued
If this is your typical use case, and you wont venture far beyond 1,000 rows, then I hope I have shown
the fastest ways to generate those numbers. If your use case is a larger number, or if you are looking for
solutions to generate sequences of dates, stay tuned. Later in this series, I will explore generating
sequences of 50,000 and 1,000,000 numbers, and of date ranges ranging from a week to a year.
Plan:
spt_values
Since there are only ~2,500 rows in spt_values, we need to be a little more creative if we want to use it
as the source of our set generator. One way to simulate a larger table is to CROSS JOIN it against itself. If
we did that raw wed end up with ~2,500 rows squared (over 6 million). Needing only 50,000 rows, we
need about 224 rows squared. So we can do this:
;WITH x AS
(
SELECT TOP (224) number FROM [master]..spt_values
)
SELECT TOP (50000) n = ROW_NUMBER() OVER (ORDER BY x.number)
FROM x CROSS JOIN x AS y
ORDER BY n;
Note that this is more equivalent to, but more concise than, this variation:
SELECT TOP (50000) n = ROW_NUMBER() OVER (ORDER BY x.number)
FROM
sys.all_objects
Like spt_values, sys.all_objects does not quite satisfy our 50,000 row requirement on its own, so we will
need to perform a similar CROSS JOIN.
;;WITH x AS
(
SELECT TOP (224) [object_id] FROM sys.all_objects
)
SELECT TOP (50000) n = ROW_NUMBER() OVER (ORDER BY x.[object_id])
FROM x CROSS JOIN x AS y
ORDER BY n;
Plan:
Stacked CTEs
We only need to make a minor adjustment to our stacked CTEs in order to get exactly 50,000 rows:
;WITH e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
Plan:
Recursive CTEs
An even less substantial change is required to get 50,000 rows out of our recursive CTE: change
the WHERE clause to 50,000 and change the MAXRECURSION option to zero.
;WITH n(n) AS
(
SELECT 1
UNION ALL
SELECT n+1 FROM n WHERE n < 50000
)
SELECT n FROM n ORDER BY n
OPTION (MAXRECURSION 0);
Plan:
Performance
As with the last set of tests, well compare each technique, including the Numbers table with both a cold
and warm cache, and both compressed and uncompressed:
The TOP isnt strictly necessary, but thats only because we know that our Numbers table and our
desired output have the same number of rows. The plan is still quite similar to previous tests:
spt_values
To get a CROSS JOIN that yields 1,000,000 rows, we need to take 1,000 rows squared:
;WITH x AS
(
SELECT TOP (1000) number FROM [master]..spt_values
)
SELECT n = ROW_NUMBER() OVER (ORDER BY x.number)
FROM x CROSS JOIN x AS y ORDER BY n;
Plan:
sys.all_objects
Again, we need the cross product of 1,000 rows:
;WITH x AS
(
SELECT TOP (1000) [object_id] FROM sys.all_objects
)
SELECT n = ROW_NUMBER() OVER (ORDER BY x.[object_id])
FROM x CROSS JOIN x AS y ORDER BY n;
Plan:
Stacked CTEs
For the stacked CTE, we just need a slightly different combination of CROSS JOINs to get to 1,000,000
rows:
;WITH e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), -- 10
e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b), -- 10*10
e3(n) AS (SELECT 1 FROM e1 CROSS JOIN e2 AS b), -- 10*100
e4(n) AS (SELECT 1 FROM e3 CROSS JOIN e3 AS b) -- 1000*1000
SELECT n = ROW_NUMBER() OVER (ORDER BY n) FROM e4 ORDER BY n;
Plan:
At this row size, you can see that the stacked CTE solution goes parallel. So I also ran a version
with MAXDOP 1 to get a similar plan shape as before, and to see if parallelism really helps:
Recursive CTE
The recursive CTE again has just a minor change; only the WHERE clause needs to change:
;WITH n(n) AS
(
SELECT 1
UNION ALL
SELECT n+1 FROM n WHERE n < 1000000
)
SELECT n FROM n ORDER BY n
OPTION (MAXRECURSION 0);
Plan:
Performance
Once again we see the performance of the recursive CTE is abysmal:
Now, if we start with just the simple series generator, it may look like this. Im going to add an ORDER BY here
as well, just to be safe, since we can never rely on assumptions we make about order.
;WITH n(n) AS (SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4)
SELECT n FROM n ORDER BY n;
-- result:
n
---1
2
3
4
To convert that into a series of dates, we can simply apply DATEADD() from the start date:
;WITH n(n) AS (SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4)
SELECT DATEADD(DAY, n, @s) FROM n ORDER BY n;
-- result:
---2012-01-02
2012-01-03
2012-01-04
2012-01-05
This still isnt quite right, since our range starts on the 2nd instead of the 1st. So in order to use our start
date as the base, we need to convert our set from 1-based to 0-based. We can do that by subtracting 1:
;WITH n(n) AS (SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4)
d.OrderDate,
OrderCount = COUNT(o.SalesOrderID)
FROM d
LEFT OUTER JOIN Sales.SalesOrderHeader AS o
ON o.OrderDate >= d.OrderDate
AND o.OrderDate < DATEADD(DAY, 1, d.OrderDate)
GROUP BY d.OrderDate
ORDER BY d.OrderDate;
(Note that we can no longer say COUNT(*), since this will count the left side, which will always be 1.)
Another way to write this would be:
;WITH d(OrderDate) AS
(
SELECT TOP (DATEDIFF(DAY, @s, @e) + 1) DATEADD(DAY, n-1, @s)
FROM
(
SELECT 1 UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4
) AS n(n) ORDER BY n
)
SELECT
d.OrderDate,
OrderCount = COUNT(o.SalesOrderID)
FROM d
LEFT OUTER JOIN Sales.SalesOrderHeader AS o
ON o.OrderDate >= d.OrderDate
AND o.OrderDate < DATEADD(DAY, 1, d.OrderDate)
GROUP BY d.OrderDate
ORDER BY d.OrderDate;
This should make it easier to envision how you would replace the leading CTE with the generation of a
date sequence from any source you choose. Well go through those (with the exception of the recursive
CTE approach, which only served to skew graphs), using AdventureWorks2012, but well use
the SalesOrderHeaderEnlarged table I created from this script by Jonathan Kehayias. I added an index to
help with this specific query:
CREATE INDEX d_so ON Sales.SalesOrderHeaderEnlarged(OrderDate);
Also note that Im choosing an arbitrary date ranges that I know exists in the table.
Numbers table
;WITH d(OrderDate) AS
(
SELECT TOP (DATEDIFF(DAY, @s, @e) + 1) DATEADD(DAY, n-1, @s)
FROM dbo.Numbers ORDER BY n
)
SELECT
d.OrderDate,
OrderCount = COUNT(s.SalesOrderID)
FROM d
LEFT OUTER JOIN Sales.SalesOrderHeaderEnlarged AS s
ON s.OrderDate >= @s AND s.OrderDate <= @e
AND CONVERT(DATE, s.OrderDate) = d.OrderDate
WHERE d.OrderDate >= @s AND d.OrderDate <= @e
GROUP BY d.OrderDate
ORDER BY d.OrderDate;
spt_values
DECLARE @s DATE = '2006-10-23', @e DATE = '2006-10-29';
;WITH d(OrderDate) AS
(
SELECT DATEADD(DAY, n-1, @s)
FROM (SELECT TOP (DATEDIFF(DAY, @s, @e) + 1)
ROW_NUMBER() OVER (ORDER BY Number) FROM master..spt_values) AS x(n)
)
SELECT
d.OrderDate,
OrderCount = COUNT(s.SalesOrderID)
FROM d
LEFT OUTER JOIN Sales.SalesOrderHeaderEnlarged AS s
ON s.OrderDate >= @s AND s.OrderDate <= @e
AND CONVERT(DATE, s.OrderDate) = d.OrderDate
WHERE d.OrderDate >= @s AND d.OrderDate <= @e
GROUP BY d.OrderDate
ORDER BY d.OrderDate;
sys.all_objects
DECLARE @s DATE = '2006-10-23', @e DATE = '2006-10-29';
;WITH d(OrderDate) AS
(
SELECT DATEADD(DAY, n-1, @s)
FROM (SELECT TOP (DATEDIFF(DAY, @s, @e) + 1)
ROW_NUMBER() OVER (ORDER BY [object_id]) FROM sys.all_objects) AS x(n)
)
SELECT
d.OrderDate,
OrderCount = COUNT(s.SalesOrderID)
FROM d
LEFT OUTER JOIN Sales.SalesOrderHeaderEnlarged AS s
ON s.OrderDate >= @s AND s.OrderDate <= @e
AND CONVERT(DATE, s.OrderDate) = d.OrderDate
WHERE d.OrderDate >= @s AND d.OrderDate <= @e
GROUP BY d.OrderDate
ORDER BY d.OrderDate;
Stacked CTEs
DECLARE @s DATE = '2006-10-23', @e DATE = '2006-10-29';
;WITH e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b),
d(OrderDate) AS
(
SELECT TOP (DATEDIFF(DAY, @s, @e) + 1)
d = DATEADD(DAY, ROW_NUMBER() OVER (ORDER BY n)-1, @s)
FROM e2
)
SELECT
d.OrderDate,
OrderCount = COUNT(s.SalesOrderID)
FROM d LEFT OUTER JOIN Sales.SalesOrderHeaderEnlarged AS s
ON s.OrderDate >= @s AND s.OrderDate <= @e
AND d.OrderDate = CONVERT(DATE, s.OrderDate)
WHERE d.OrderDate >= @s AND d.OrderDate <= @e
GROUP BY d.OrderDate
ORDER BY d.OrderDate;
Now, for a year long range, this wont cut it, since it only produces 100 rows. For a year wed need to
cover 366 rows (to account for potential leap years), so it would look like this:
DECLARE @s DATE = '2006-10-23', @e DATE = '2007-10-22';
;WITH e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b),
e3(n) AS (SELECT 1 FROM e2 CROSS JOIN (SELECT TOP (37) n FROM e2) AS b),
d(OrderDate) AS
(
SELECT TOP (DATEDIFF(DAY, @s, @e) + 1)
d = DATEADD(DAY, ROW_NUMBER() OVER (ORDER BY N)-1, @s)
FROM e3
)
SELECT
d.OrderDate,
OrderCount = COUNT(s.SalesOrderID)
Calendar table
This is a new one that we didnt talk much about in the previous two posts. If you are using date series
for a lot of queries then you should consider having both a Numbers table and a Calendar table. The
same argument holds about how much space is really required and how fast access will be when the
table is queried frequently. For example, to store 30 years of dates, it requires less than 11,000 rows
(exact number depends on how many leap years you span), and takes up a mere 200 KB. Yes, you read
that right: 200 kilobytes. (And compressed, its only 136 KB.)
To generate a Calendar table with 30 years of data, assuming youve already been convinced that having
a Numbers table is a good thing, we can do this:
DECLARE @s DATE = '2005-07-01'; -- earliest year in SalesOrderHeader
DECLARE @e DATE = DATEADD(DAY, -1, DATEADD(YEAR, 30, @s));
SELECT TOP (DATEDIFF(DAY, @s, @e) + 1)
d = CONVERT(DATE, DATEADD(DAY, n-1, @s))
INTO dbo.Calendar
FROM dbo.Numbers ORDER BY n;
CREATE UNIQUE CLUSTERED INDEX d ON dbo.Calendar(d);
Now to use that Calendar table in our sales report query, we can write a much simpler query:
DECLARE @s DATE = '2006-10-23', @e DATE = '2006-10-29';
SELECT
OrderDate = c.d,
OrderCount = COUNT(s.SalesOrderID)
FROM dbo.Calendar AS c
LEFT OUTER JOIN Sales.SalesOrderHeaderEnlarged AS s
Performance
I created both compressed and uncompressed copies of the Numbers and Calendar tables, and tested a
one week range, a one month range, and a one year range. I also ran queries with cold cache and warm
cache, but that turned out to be largely inconsequential.
Paul White (blog | @SQL_Kiwi) pointed out that you can coerce the Numbers table to produce a much
more efficient plan using the following query:
SELECT
OrderDate = DATEADD(DAY, n, 0),
OrderCount = COUNT(s.SalesOrderID)
FROM dbo.Numbers AS n
LEFT OUTER JOIN Sales.SalesOrderHeader AS s
ON s.OrderDate >= CONVERT(DATETIME, @s)
AND s.OrderDate < DATEADD(DAY, 1, CONVERT(DATETIME, @e))
AND DATEDIFF(DAY, 0, OrderDate) = n
WHERE
n.n >= DATEDIFF(DAY, 0, @s)
AND n.n <= DATEDIFF(DAY, 0, @e)
GROUP BY n
ORDER BY n;
At this point Im not going to re-run all of the performance tests (exercise for the reader!), but I will
assume that it will generate better or similar timings. Still, I think a Calendar table is a useful thing to
have even if it isnt strictly necessary.
Conclusion
The results speak for themselves. For generating a series of numbers, the Numbers table approach wins
out, but only marginally even at 1,000,000 rows. And for a series of dates, at the lower end, you will
not see much difference between the various techniques. However, it is quite clear that as your date
range gets larger, particularly when youre dealing with a large source table, the Calendar table really
demonstrates its worth especially given its low memory footprint. Even with Canadas wacky metric
system, 60 milliseconds is way better than about 10 *seconds* when it only incurred 200 KB on disk.
I hope youve enjoyed this little series; its a topic Ive been meaning to revisit for ages.
Usually this database does not need to be in full recovery, especially since if you are in disaster recovery
mode and restoring your database, the last thing you should be worrying about is trying to maintain
sessions for users in your web app who are likely to be long gone by the time youve restored. I dont
think Ive ever come across a situation where point-in-time recovery was a necessity for a transient
database like ASPState.
Minimize / isolate I/O
When setting up ASPState initially, you can use the -sstype c and -d arguments to store session state in a
custom database that is already on a different drive (just like you would with tempdb). Or, if your
tempdb database is already optimized, you can use the -sstype t argument. These are explained in detail
in the Session-State Modes andASP.NET SQL Server Registration Tool documents on MSDN.
If youve already installed ASPState, and youve determined that you would benefit from moving it to its
own (or at least a different) volume, then you can schedule or wait for a brief maintenance period and
follow these steps:
ALTER DATABASE ASPState SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
ALTER DATABASE ASPState SET OFFLINE;
ALTER DATABASE ASPState MODIFY FILE (NAME = ASPState, FILENAME = '{new
path}\ASPState.mdf');
ALTER DATABASE ASPState MODIFY FILE (NAME = ASPState_log, FILENAME = '{new
path}\ASPState_log.ldf');
At this point you will need to manually move the files to <new path>, and then you can bring the
database back online:
ALTER DATABASE ASPState SET ONLINE;
Isolate applications
It is possible to point more than one application at the same session state database. I recommend
against this. You may want to point applications at different databases, perhaps even on different
instances, to better isolate resource usage and provide utmost flexibility for all of your web properties.
If you already have multiple applications using the same database, thats okay, but youll want to keep
track of the impact each application might be having. Microsofts Rex Tang published a useful query to
see space consumed by each session; here is a modification that will summarize number of sessions and
total/avg session size per application:
SELECT
a.AppName,
SessionCount = COUNT(s.SessionId),
TotalSessionSize = SUM(DATALENGTH(s.SessionItemLong)),
AvgSessionSize = AVG(DATALENGTH(s.SessionItemLong))
FROM
dbo.ASPStateTempSessions AS s
LEFT OUTER JOIN
dbo.ASPStateTempApplications AS a
ON SUBSTRING(s.SessionId, 25, 8) =
SUBSTRING(sys.fn_varbintohexstr(CONVERT(VARBINARY(8), a.AppId)), 3, 8)
GROUP BY a.AppName
ORDER BY TotalSessionSize DESC;
If you find that you have a lopsided distribution here, you can set up another ASPState database
elsewhere, and point one or more applications at that database instead.
Make more friendly deletes
The code for dbo.DeleteExpiredSessions uses a cursor, replacing a single DELETE in earlier
implementations. (This, I think, was based largely on this post by Greg Low.)
Originally the code was:
CREATE PROCEDURE DeleteExpiredSessions
AS
DECLARE @now DATETIME
SET @now = GETUTCDATE()
DELETE ASPState..ASPStateTempSessions
WHERE Expires < @now
RETURN 0
GO
(And it may still be, depending on where you downloaded the source, or how long ago you installed
ASPState. There are many outdated scripts out there for creating the database, though you really should
be using aspnet_regsql.exe.)
Currently (as of .NET 4.5), the code looks like this (anyone know when Microsoft will start using semicolons?).
My idea is to have a happy medium here dont try to delete ALL rows in one fell swoop, but dont play
one-by-one whack-a-mole, either. Instead, delete n rows at a time in separate transactions reducing
the length of blocking and also minimizing the impact to the log:
ALTER PROCEDURE dbo.DeleteExpiredSessions
@top INT = 1000
AS
BEGIN
SET NOCOUNT ON;
DECLARE @now DATETIME, @c INT;
SELECT @now = GETUTCDATE(), @c = 1;
BEGIN TRANSACTION;
WHILE @c <> 0
BEGIN
;WITH x AS
(
SELECT TOP (@top) SessionId
FROM dbo.ASPStateTempSessions
WHERE Expires < @now
ORDER BY SessionId
)
DELETE x;
SET @c = @@ROWCOUNT;
IF @@TRANCOUNT = 1
BEGIN
COMMIT TRANSACTION;
BEGIN TRANSACTION;
END
END
IF @@TRANCOUNT = 1
BEGIN
COMMIT TRANSACTION;
END
END
GO
You will want to experiment with TOP depending on how busy your server is and what impact it has on
duration and locking. You may also want to consider implementing snapshot isolation this will force
some impact to tempdb but may reduce or eliminating blocking seen from the app.
Also, by default, the job ASPState_Job_DeleteExpiredSessions runs every minute. Consider dialing that
back a bit reduce the schedule to maybe every 5 minutes (and again, a lot of this will come down to
how busy your applications are and testing the impact of the change). And on the flip side, make sure it
is enabled otherwise your sessions table will grow and grow unchecked.
Touch sessions less often
Every time a page is loaded (and, if the web app hasnt been created correctly, possibly multiple times
per page load), the stored procedure dbo.TempResetTimeout is called, ensuring that the timeout for
that particular session is extended as long as they continue to generate activity. On a busy web site, this
can cause a very high volume of update activity against the table dbo.ASPStateTempSessions. Here is the
current code for dbo.TempResetTimeout:
ALTER PROCEDURE [dbo].[TempResetTimeout]
@id tSessionId
AS
UPDATE [ASPState].dbo.ASPStateTempSessions
SET Expires = DATEADD(n, Timeout, GETUTCDATE())
WHERE SessionId = @id
RETURN 0
Now, imagine you have a web site with 500 or 5,000 users, and they are all madly clicking from page to
page. This is probably one of the most frequently called operations in any ASPState implementation, and
while the table is keyed on SessionId so the impact of any individual statement should be minimal in
aggregate this can be substantially wasteful, including on the log. If your session timeout is 30 minutes
and you update the timeout for a session every 10 seconds because of the nature of the web app, what
is the point of doing it again 10 seconds later? As long as that session is asynchronously updated at some
point before the 30 minutes are up, there is no net difference to the user or the application. So I thought
that you could implement a more scalable way to touch sessions to update their timeout values.
One idea I had was to implement a service broker queue so that the application does not have to wait
on the actual write to happen it calls the dbo.TempResetTimeout stored procedure, and then the
activation procedure takes over asynchronously. But this still leads to a lot more updates (and log
activity) than is truly necessary.
A better idea, IMHO, is to implement a queue table that you only insert to, and on a schedule (such that
the process completes a full cycle in some time shorter than the timeout), it would only update the
timeout for any session it seesonce, no matter how many times they *tried* to update their timeout
within that span. So a simple table might look like this:
CREATE TABLE dbo.SessionStack
(
SessionId tSessionId, -- nvarchar(88) - of course they had to use alias types
EventTime DATETIME,
Processed BIT NOT NULL DEFAULT 0
);
CREATE CLUSTERED INDEX et ON dbo.SessionStack(EventTime);
GO
And then we would change the stock procedure to push session activity onto this stack instead of
touching the sessions table directly:
ALTER PROCEDURE dbo.TempResetTimeout
@id tSessionId
AS
BEGIN
SET NOCOUNT ON;
INSERT INTO dbo.SessionStack(SessionId, EventTime)
SELECT @id, CURRENT_TIMESTAMP;
END
GO
The clustered index is on the smalldatetime column to prevent page splits (at the potential cost of a hot
page), since the event time for a session touch will always be monotonically increasing.
Then well need a background process to periodically summarize new rows in dbo.SessionStack and
updatedbo.ASPStateTempSessions accordingly.
CREATE PROCEDURE dbo.SessionStack_Process
AS
BEGIN
SET NOCOUNT ON;
-- unless you want to add tSessionId to model or manually to tempdb
-- after every restart, we'll have to use the base type here:
CREATE TABLE #s(SessionId NVARCHAR(88), EventTime SMALLDATETIME);
-- the stack is now your hotspot, so get in & out quickly:
UPDATE dbo.SessionStack SET Processed = 1
OUTPUT inserted.SessionId, inserted.EventTime INTO #s
WHERE Processed IN (0,1) -- in case any failed last time
you should be able to put these in individual jobs and run them concurrently, since in theory the
DML should be affecting completely different sets of pages.
Conclusion
Those are my ideas so far. Id love to hear about your experiences with ASPState: What kind of scale
have you achieved? What kind of bottlenecks have you observed? What have you done to mitigate
them?
capacity, and your total required I/O capacity (which is related to the number and type of PCI-E
expansion slots in the server).
One common misconception is that bigger Intel-based servers (in terms of socket counts) are faster
servers. This is simply not true, for a number of reasons. The sales volume and market share of twosocket servers is much higher than it is for four-socket and larger servers. There is also less engineering
and validation work required for two-socket capable Intel processors compared to four-socket capable
Intel processors. Because of these factors, Intel releases new processor architectures more frequently
and earlier for lower socket count servers. Currently, Intels single-socket E3 family is using the 22nm Ivy
Bridge and the two-socket E5 family is using the 32nm Sandy Bridge-EP, while Intel E7 family is using the
older 32nm Westmere-EX microarchitecture.
Another reason is that you do not get linear scaling as you increase your socket count, even with Nonuniform memory access (NUMA) architecture processors, which scale much better than the older
symmetrical multiprocessing (SMP) architecture. This means that a four-socket server will not have
twice the processor performance or capacity as a two-socket server with the same model processor.
This can be confirmed by comparing the TPC-E OLTP benchmark results of two-socket systems with Intel
Xeon E7-2870 processors to four-socket systems with Intel Xeon E7-4870 processors to eight-socket
systems with Intel Xeon E7-8870 processors. Even though these are essentially the same processor with
the same individual performance characteristics, the TPC-E benchmark score does not double as you
double the socket count, as you can see in Table
1.
Processor
Socket Count
TPC-E Score
TPC-E Score/Core
Xeon E7-2870
1560.70
20
78.04
Xeon E7-4870
2862.61
40
71.57
Xeon E7-8870
4614.22
80
57.68
When I think about comparing single-socket to two-socket, to four and eight-socket processors, I like to
use a car and truck analogy. A single-socket server is like a Formula-1 race car, being extremely fast but
having very little cargo capacity. A two-socket server is like a Tesla Model S, being very fast and having
pretty decent cargo capacity. A four-socket server is like a large SUV, being slower but having more
cargo capacity than a Tesla Model S. Finally, an eight-socket server is like a Mack truck, able to haul a
huge load at a much slower rate than an SUV.
Processor
Socket Count
TPC-E Score
TPC-E Score/Core
Xeon E5-2690
1881.76
16
117.61
Xeon E5-4650
2651.27
32
82.85
Comparing Table 1 to Table 2, we can see that the Intel Xeon E5 family does quite a bit better on TPC-E
than the Intel Xeon E7 family does, which is no surprise, since we are comparing the newer Sandy
Bridge-EP to the older Westmere-EX microarchitecture. From a performance perspective, the twosocket Xeon E5-2690 does much better than the two-socket Xeon E7-2870. In my opinion, you really
should not be using the two-socket Xeon E7-2870 for SQL Server 2012 because of its lower singlethreaded performance and higher physical core counts (which means a higher SQL Server 2012 licensing
cost).
Currently, my favorite Intel server processor is the Intel Xeon E5-2690. It will give you excellent singlethreaded performance and relatively affordable SQL Server 2012 licensing costs. If you need to step up
to a four-socket server, then I would choose an Intel Xeon E5-4650 processor instead of using an Intel
Xeon E7-4870 processor, since you will get better single-threaded performance and lower SQL Server
2012 license costs. Using TPC-E benchmark scores is an excellent way to compare the performance and
SQL Server 2012 license efficiency of different processor families.
Once the constraints exist, we can compare the resource usage for DBCC CHECKCONSTRAINTS for a
single constraint, a table, and the entire database using Extended Events. First well create a session
that simply capturessp_statement_completed events, includes the sql_text action, and sends the output
to the ring_buffer:
CREATE EVENT SESSION [Constraint_Performance] ON SERVER
ADD EVENT sqlserver.sp_statement_completed
(
ACTION(sqlserver.database_id,sqlserver.sql_text)
)
ADD TARGET package0.ring_buffer
(
SET max_events_limit=(5000)
)
WITH
(
MAX_MEMORY=32768 KB, EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
MAX_DISPATCH_LATENCY=30 SECONDS, MAX_EVENT_SIZE=0 KB,
MEMORY_PARTITION_MODE=NONE, TRACK_CAUSALITY=OFF, STARTUP_STATE=OFF
);
GO
Next well start the session and run each of the DBCC CHECKCONSTRAINT commands, then output the
ring buffer to a temp table to manipulate. Note that DBCC DROPCLEANBUFFERS executes before each
check so that each starts from cold cache, keeping a level testing field.
ALTER EVENT SESSION [Constraint_Performance]
ON SERVER
STATE=START;
GO
USE [AdventureWorks2012];
GO
DBCC DROPCLEANBUFFERS;
GO
DBCC CHECKCONSTRAINTS ('[Sales].[CK_SalesOrderDetailEnlarged_OrderQty]') WITH
NO_INFOMSGS;
GO
DBCC DROPCLEANBUFFERS;
GO
DBCC CHECKCONSTRAINTS
('[Sales].[FK_SalesOrderDetailEnlarged_SalesOrderHeaderEnlarged_SalesOrderID]
') WITH NO_INFOMSGS;
GO
DBCC DROPCLEANBUFFERS;
GO
DBCC CHECKCONSTRAINTS ('[Sales].[SalesOrderDetailEnlarged]') WITH NO_INFOMSGS;
GO
DBCC DROPCLEANBUFFERS;
GO
DBCC CHECKCONSTRAINTS WITH ALL_CONSTRAINTS, NO_INFOMSGS;
GO
DECLARE @target_data XML;
SELECT @target_data = CAST(target_data AS XML)
FROM sys.dm_xe_sessions AS s
INNER JOIN sys.dm_xe_session_targets AS t
ON t.event_session_address = s.[address]
WHERE s.name = N'Constraint_Performance'
AND t.target_name = N'ring_buffer';
SELECT
n.value('(@name)[1]', 'varchar(50)') AS event_name,
DATEADD(HOUR ,DATEDIFF(HOUR, SYSUTCDATETIME(),
SYSDATETIME()),n.value('(@timestamp)[1]', 'datetime2')) AS [timestamp],
Using Extended Events to dig into the inner workings of CHECKCONSTRAINTS is an interesting task, but
what were really interested here is resource consumption specifically I/O. We can aggregate
the physical_reads for each check command to compare the I/O:
SELECT [sql_text], SUM([physical_reads]) AS [Total Reads]
FROM #EventData
WHERE [sql_text] LIKE 'DBCC%'
GROUP BY [sql_text];
1MB to 64MB
64MB to 1GB
For example, if you create a transaction log to be 8GB youll get 16 VLFs where each is roughly 512MB. If
you then grow the log by another 4GB, youll get an additional 16 VLFs with each being roughly 256MB,
for a total of 32 VLFs.
A general best practice is to set the log auto-growth to something other than the default 10%, so that
you can control the pause thats required when zero-initializing new transaction log space. Lets say you
create a 256MB transaction log and set the auto-growth to 32MB, and then the log grows to a steadystate size of 16GB. Given the formula above, this will result in your transaction log having more than
4,000 VLFs.
This many VLFs will likely result in some performance issues for operations that process the transaction
log (e.g. crash recovery, log clearing, log backups, transactional replication, database restores). This
situation is called having VLF fragmentation. Generally any number of VLFs more than a thousand or so
is going to be problematic and needs to be addressed (the most Ive ever heard of is 1.54 million VLFs in
a transaction log that was more than 1TB in size!).
The way to tell how many VLFs you have is to use the undocumented (and completely safe) DBCC
LOGINFO command. The number of rows of output is the number of VLFs in your transaction log. If you
think you have too many, the way to reduce them is:
1. Allow the log to clear
2. Manually shrink the log
3. Repeat steps 1 and 2 until the log reaches a small size (which may be tricky on a busy production
system)
4. Manually grow the log to the size it should be, in up to 8GB steps so each VLF is no larger than
about 0.5GB
You can read more about VLF fragmentation issues and the process to fix them at:
Tempdb
Tempdb needs to have its transaction log configured just like any other database, and it may grow just
like any other database. But it also has some insidious behavior that can cause you problems.
When a SQL Server instance restarts for any reason, tempdbs data and log files will revert to the size
they were most recently set to. This is different from all other databases, which remain at their current
size after an instance restart.
This behavior means that if the tempdb transaction log has grown to accommodate the normal
workload you must perform an ALTER DATABASE to set the log file size otherwise its size will drop after
an instance restart and it will have to grow again. Every time a log file grows or auto-grows, the new
space must be zero-initialized and logging activity pauses while that is done. So if you do not manage
your tempdb log file size correctly, youll pay a performance penalty as it grows after each instance
restart.
Regular Log File Shrinking
Quite often I hear people saying how they usually shrink a databases transaction log after it grows from
a regular operation (e.g. a weekly data import). This is not a good thing to do.
Just as I explained above, whenever the transaction log grows or auto-grows, theres a pause while the
new portion of the log file is zero-initialized. If youre regularly shrinking the transaction log because it
grows to size X, that means youre regularly suffering performance problems as the transaction log autogrows back to size X again.
If your transaction log keeps growing to size X, leave it alone! Proactively set it to size X, managing your
VLFs as I explained above, and accept size X as the size thats required for your normal workload. A
larger transaction log is not a problem.
Multiple Log Files
There is no performance gain from creating multiple log files for a database. Adding a second log file
may be necessary, however, if the existing log file runs out of space and youre unwilling to force the
transaction log to clear by switching to the simple recovery model and performing a checkpoint (as this
breaks the log backup chain).
Im often asked whether there is any pressing reason to remove the second log file or whether its ok to
leave it in place. The answer is that you should remove it as soon as you can.
Although the second log file doesnt cause performance problems for your workload, it does affect
disaster recovery. If your database is destroyed for some reason, youll need to restore it from scratch.
The first phase of any restore sequence is to create the data and log files if they dont exist.
You can make the data file creation almost instantaneous by enabling instant file initialization which
skips the zero-initialization but that doesnt apply to log files. This means that the restore has to create
all log files that existed when the full backup was taken (or are created during the period of time
covered by a transaction log backup) and zero-initialize them. If created a second log file and forgot to
drop it again, zero-initializing it during a disaster recovery operation is going to add to the total
downtime. This isnt a workload performance problem, but it affects the availability of the server as a
whole.
Reverting from a Database Snapshot
The final issue in my list is actually a bug in SQL Server. If you use a database snapshot as a way to
quickly recover back to a known point in time without having to restore backups (known as reverting
from the snapshot) then you can save a lot of time. However, there is a big downside.
When the database reverts from the database snapshot, the transaction log is recreated with two
0.25MB VLFs. This means you will have to grow your transaction log back to its optimal size and number
of VLFs (or it will auto-grow itself), with all the zero-initialization and workload pauses Ive discussed
previously. Clearly not the desired behavior.
Summary
As you can see from this post and my previous two posts, there are many things that can lead to poor
transaction log performance, which then has a knock-on effect on the performance of your overall
workload.
If you can take care of all these things, youll have healthy transaction logs. But it doesnt end there as
you need to make sure youre monitoring your transaction logs so youre alerted for things like autogrowth and excessive read and write I/O latencies. Ill cover how to do that in a future post.
As is usual for SQL Servers demand-driven pipeline, execution starts at the leftmost operator
the UPDATE in this case. It requests a row from the Table Update, which asks for a row from the
Compute Scalar, and down the chain to the Table Scan:
The Table Scan operator reads rows one at a time from the storage engine, until it finds one that
satisfies the Salary predicate. The output list in the graphic above shows the Table Scan operator
returning a row identifier and the current value of the Salary column for this row. A single row
containing references to these two pieces of information is passed up to the Compute Scalar:
The Compute Scalar defines an expression that applies the salary raise to the current row. It returns a
row containing references to the row identifier and the modified salary to the Table Update, which
invokes the storage engine to perform the data modification. This iterative process continues until the
Table Scan runs out of rows. The same basic process is followed if the table has a clustered index:
The main difference is that the clustered index key(s) and uniquifier (if present) are used as the row
identifier instead of a heap RID.
The Problem
Changing from the logical three-phase operation defined in the SQL standard to the physical iterative
execution model has introduced a number of subtle changes, only one of which we are going to look at
today. A problem can occur in our running example if there is a nonclustered index on the Salary
column, which the query optimizer decides to use to find rows that qualify (Salary < $25,000):
CREATE NONCLUSTERED INDEX nc1
ON dbo.Employees (Salary);
The row-by-row execution model can now produce incorrect results, or even get into an infinite loop.
Consider an (imaginary) iterative execution plan that seeks the Salary index, returning a row at a time to
the Compute Scalar, and ultimately on to the Update operator:
There are a couple of extra Compute Scalars in this plan due to an optimization that skips nonclustered
index maintenance if the Salary value has not changed (only possible for a zero salary in this case).
Ignoring that, the important feature of this plan is that we now have an ordered partial index scan
passing a row at a time to an operator that modifies the same index (the green highlight in the Plan
Explorer graphic above makes it clear the Clustered Index Update operator maintains both the base
table and the nonclustered index).
Anyway, the problem is that by processing one row at a time, the Update can move the current row
ahead of the scan position used by the Index Seek to locate rows to change. Working through the
example should make that statement a bit clearer:
The nonclustered index is keyed, and sorted ascending, on the salary value. The index also contains a
pointer to the parent row in the base table (either a heap RID or the clustered index keys plus uniquifier
if necessary). To make the example easier to follow, assume the base table now has a unique clustered
index on the Name column, so the nonclustered index contents at the start of update processing are:
l
The first row returned by the Index Seek is the $21,000 salary for Smith. This value is updated to
$23,100 in the base table and the nonclustered index by the Clustered Index operator. The
nonclustered index now contains:
The next row returned by the Index Seek will be the $22,000 entry for Brown which is updated to
$24,200:
Now the Index Seek finds the $23,100 value for Smith, which is updated again, to $25,410. This process
continues until all employees have a salary of at least $25,000 which is not a correct result for the
given UPDATE query. The same effect in other circumstances can lead to a runaway update which only
terminates when the server runs out of log space or an overflow error occurs (it could occur in this case
if someone had a zero salary). This is the Halloween Problem as it applies to updates.
Avoiding the Halloween Problem for Updates
Eagle-eyed readers will have noticed that the estimated cost percentages in the imaginary Index Seek
plan did not add up to 100%. This is not a problem with Plan Explorer I deliberately removed a key
operator from the plan:
The query optimizer recognizes that this pipelined update plan is vulnerable to the Halloween Problem,
and introduces an Eager Table Spool to prevent it from occurring. There is no hint or trace flag to
prevent inclusion of the spool in this execution plan because it is required for correctness.
As its name suggests, the spool eagerly consumes all rows from its child operator (the Index Seek)
before returning a row to its parent Compute Scalar. The effect of this is to introduce complete phase
separation all qualifying rows are read and saved into temporary storage before any updates are
performed.
This brings us closer to the three-phase logical semantic of the SQL standard, though please note plan
execution is still fundamentally iterative, with operators to the right of the spool forming the read
cursor, and operators to the left forming the write cursor. The contents of the spool are still read and
processed row by row (it is not passed en masse as the comparison with the SQL standard might
otherwise lead you to believe).
The drawbacks of the phase separation are the same as mentioned earlier. The Table Spool
consumes tempdb space (pages in the buffer pool) and may require physical reads and writes to disk
under memory pressure. The query optimizer assigns an estimated cost to the spool (subject to all the
usual caveats about estimations) and will choose between plans that require protection against the
Halloween Problem versus those that dont on the basis of estimated cost as normal. Naturally, the
optimizer may incorrectly choose between the options for any of the normal reasons.
In this case, the trade-off is between the efficiency increase by seeking directly to qualifying records
(those with a salary < $25,000) versus the estimated cost of the spool required to avoid the Halloween
Problem. An alternative plan (in this specific case) is a full scan of the clustered index (or heap). This
strategy does not require the same Halloween Protection because the keys of the clustered index are
not modified:
Because the index keys are stable, rows cannot move position in the index between iterations, avoiding
the Halloween Problem in the present case. Depending on the runtime cost of the Clustered Index Scan
compared with the Index Seek plus Eager Table Spool combination seen previously, one plan may
execute faster than the other. Another consideration is that the plan with Halloween Protection will
acquire more locks than the fully pipelined plan, and the locks will be held for longer.
Final Thoughts
Understanding the Halloween Problem and the effects it can have on data modification query plans will
help you analyse data-changing execution plans, and can offer opportunities to avoid the costs and sideeffects of unnecessary protection where an alternative is available.
There are several forms of the Halloween Problem, not all of which are caused by reading and writing to
the keys of a common index. The Halloween Problem is also not limited to UPDATEqueries. The query
optimizer has more tricks up its sleeve to avoid the Halloween Problem aside from brute-force phase
separation using an Eager Table Spool. These points (and more) will be explored in the next instalments
of this series.
(
SomeKey integer NOT NULL
);
-- Sample data
INSERT dbo.Staging
(SomeKey)
VALUES
(1234),
(1234);
-- Test query
INSERT dbo.Demo
SELECT s.SomeKey
FROM dbo.Staging AS s
WHERE NOT EXISTS
(
SELECT 1
FROM dbo.Demo AS d
WHERE d.SomeKey = s.SomeKey
);
The execution plan is:
The problem in this case is subtly different, though still an example of the same core issue. There is no
value 1234 in the target Demo table, but the Staging table contains two such entries. Without phase
separation, the first 1234 value encountered would be inserted successfully, but the second check
would find that the value 1234 now exists and would not attempt to insert it again. The statement as a
whole would complete successfully.
This might produce a desirable outcome in this particular case (and might even seem intuitively correct)
but it is not a correct implementation. The SQL standard requires that data modification queries execute
as if the three phases of reading, writing and checking constraints occur completely separately (see part
one).
Searching for all rows to insert as a single operation, we should select both 1234 rows from the Staging
table, since this value does not exist in the target yet. The execution plan should therefore try to
insert both 1234 rows from the Staging table, resulting in a primary key violation:
Msg 2627, Level 14, State 1, Line 1
Violation of PRIMARY KEY constraint PK_Demo.
Cannot insert duplicate key in object dbo.Demo.
The duplicate key value is (1234).
The statement has been terminated.
The phase separation provided by the Table Spool ensures that all checks for existence are completed
before any changes are made to the target table. If you run the query in SQL Server with the sample
data above, you will receive the (correct) error message.
Halloween Protection is required for INSERT statements where the target table is also referenced in the
SELECT clause.
Delete Statements
We might expect the Halloween Problem not to apply to DELETE statements, since it shouldnt really
matter if we try to delete a row multiple times. We can modify our staging table example toremove rows
from the Demo table that do not exist in Staging:
TRUNCATE TABLE dbo.Demo;
TRUNCATE TABLE dbo.Staging;
INSERT dbo.Demo (SomeKey) VALUES (1234);
DELETE dbo.Demo
WHERE NOT EXISTS
(
SELECT 1
FROM dbo.Staging AS s
WHERE s.SomeKey = dbo.Demo.SomeKey
);
This test seems to validate our intuition because there is no Table Spool in the execution plan:
This type of DELETE does not require phase separation because each row has a unique identifier (an RID
if the table is a heap, clustered index key(s) and possibly a uniquifier otherwise). This unique row locator
is a stable key there is no mechanism by which it can change during execution of this plan, so the
Halloween Problem does not arise.
DELETE Halloween Protection
Nevertheless, there is at least one case where a DELETE requires Halloween protection: when the plan
references a row in the table other than the one which is being deleted. This requires a self-join,
commonly found when hierarchical relationships are modelled. A simplified example is shown below:
CREATE TABLE dbo.Test
(
pk char(1) NOT NULL,
ref char(1) NULL,
CONSTRAINT PK_Test
PRIMARY KEY (pk)
);
INSERT dbo.Test
(pk, ref)
VALUES
('B', 'A'),
('C', 'B'),
('D', 'C');
There really ought to be a same-table foreign key reference defined here, but lets ignore that design
failing for a moment the structure and data are nonetheless valid (and it is sadly quite common to find
foreign keys omitted in the real world). Anyway, the task at hand is to delete any row where
the ref column points to a non-existent pk value. The natural DELETE query matching this requirement
is:
The query plan is:
Notice this plan now features a costly Eager Table Spool. Phase separation is required here because
otherwise results could depend on the order in which rows are processed:
If the execution engine starts with the row where pk = B, it would find no matching row (ref = A and
there is no row where pk = A). If execution then moves on to the row where pk = C, it would also be
deleted because we just removed row B pointed to by its ref column. The end result would be that
iterative processing in this order would delete all the rows from the table, which is clearly incorrect.
On the other hand, if the execution engine processed the row with pk =D first, it would find a matching
row (ref = C). Assuming execution continued in reverse pk order, the only row deleted from the table
would be the one where pk = B. This is the correct result (remember the query should execute as if the
read, write, and validation phases had occurred sequentially and without overlaps).
Phase separation for constraint validation
As an aside, we can see another example of phase separation if we add a same-table foreign key
constraint to the previous example:
DROP TABLE dbo.Test;
CREATE TABLE dbo.Test
(
pk char(1) NOT NULL,
ref char(1) NULL,
CONSTRAINT PK_Test
PRIMARY KEY (pk),
CONSTRAINT FK_ref_pk
FOREIGN KEY (ref)
REFERENCES dbo.Test (pk)
);
INSERT dbo.Test
(pk, ref)
VALUES
('B', NULL),
('C', 'B'),
('D', 'C');
The execution plan for the INSERT is:
The insert itself does not require Halloween protection since the plan does not read from the same table
(the data source is an in-memory virtual table represented by the Constant Scan operator). The SQL
standard does however require that phase 3 (constraint checking) occurs after the writing phase is
complete. For this reason, a phase separation Eager Table Spool is added to the plan after the Clustered
Index Index, and just before each row is checked to make sure the foreign key constraint remains valid.
If you are starting to think that translating a set-based declarative SQL modification query to a robust
iterative physical execution plan is a tricky business, you are beginning to see why update processing (of
which Halloween Protection is but a very small part) is the most complex part of the Query Processor.
DELETE statements require Halloween Protection where a self-join of the target table is present.
Summary
Halloween Protection can be an expensive (but necessary) feature in execution plans that change data
(where change includes all SQL syntax that adds, changes or removes rows). Halloween Protection is
required for UPDATE plans where a common index structures keys are both read and modified,
for INSERT plans where the target table is referenced on the reading side of the plan, and
for DELETE plans where a self-join on the target table is performed.
The next part in this series will cover some special Halloween Problem optimizations that apply only
to MERGE statements.
consequently fail with a PRIMARY KEY violation. Without phase separation, the INSERT would incorrectly
add one value, completing without an error being thrown.
The INSERT execution plan
The code above has one difference from that used in part two; a nonclustered index on the Staging table
has been added. The INSERT execution plan still requires Halloween Protection though:
Notice the lack of an Eager Table Spool in this plan. Despite that, the query still produces the correct
error message. It seems SQL Server has found a way to execute the MERGE plan iteratively while
respecting the logical phase separation required by the SQL standard.
The only difference there is the multiplication by one in the VALUES clause something which does not
change the logic of the query, but which is enough to prevent the hole-filling optimization being applied.
Hole-filling with Nested Loops
In the previous example, the optimizer chose to join the tables using a Merge join. The hole-filling
optimization can also be applied where a Nested Loops join is chosen, but this requires an extra
uniqueness guarantee on the source table, and an index seek on the inner side of the join. To see this in
action, we can clear out the existing staging data, add uniqueness to the nonclustered index, and try
the MERGE again:
The resulting execution plan again uses the hole-filling optimization to avoid Halloween Protection,
using a nested loops join and an inner-side seek into the target table:
reuse is an important consideration. It is also helpful to ensure that pages have sufficient free space to
accommodate new rows, avoiding page splits. This is typically achieved through normal index
maintenance and the assignment of a suitable FILLFACTOR.
I mention OLTP workloads, which typically feature a large number of relatively small changes, because
the MERGE optimizations may not be a good choice where a large number of are rows processed per
statement. Other optimizations like minimally-logged INSERTs cannot currently be combined with holefilling. As always, the performance characteristics should be benchmarked to ensure the expected
benefits are realized.
The hole-filling optimization for MERGE inserts may be combined with updates and deletes using
additional MERGE clauses; each data-changing operation is assessed separately for the Halloween
Problem.
Avoiding the join
The final optimization we will look at can be applied where the MERGE statement contains update and
delete operations as well as a hole-filling insert, and the target table has a unique clustered index. The
following example shows a common MERGE pattern where unmatched rows are inserted, and matching
rows are updated or deleted depending on an additional condition:
CREATE TABLE #T
(
col1 integer NOT NULL,
col2 integer NOT NULL,
CONSTRAINT PK_T
PRIMARY KEY (col1)
);
CREATE TABLE #S
(
col1 integer NOT NULL,
col2 integer NOT NULL,
CONSTRAINT PK_S
PRIMARY KEY (col1)
);
INSERT #T
(col1, col2)
VALUES
(1, 50),
(3, 90);
INSERT #S
(col1, col2)
VALUES
(1, 40),
(2, 80),
(3, 90);
The MERGE statement required to make all the required changes is remarkably compact:
MERGE #T AS t
USING #S AS s ON t.col1 = s.col1
WHEN NOT MATCHED THEN INSERT VALUES (s.col1, s.col2)
WHEN MATCHED AND t.col2 - s.col2 = 0 THEN DELETE
WHEN MATCHED THEN UPDATE SET t.col2 -= s.col2;
No Halloween Protection, no join between the source and target tables, and its not often you will see a
Clustered Index Insert operator followed by a Clustered Index Merge to the same table. This is another
optimization targeted at OLTP workloads with high plan reuse and suitable indexing.
The idea is to read a row from the source table and immediately try to insert it into the target. If a key
violation results, the error is suppressed, the Insert operator outputs the conflicting row it found, and
that row is then processed for an update or delete operation using the Merge plan operator as normal.
If the original insert succeeds (without a key violation) processing continues with the next row from the
source (the Merge operator only processes updates and deletes). This optimization primarily
benefits MERGE queries where most source rows result in an insert. Again, careful benchmarking is
required to ensure performance is better than using separate statements.
Summary
The MERGE statement provides several unique optimization opportunities. In the right circumstances, it
can avoid the need to add explicit Halloween Protection compared with an equivalentINSERT operation,
or perhaps even a combination of INSERT, UPDATE, and DELETE statements. Additional MERGE-specific
optimizations can avoid the index b-tree traversal that is usually needed to locate the insert position for
a new row, and may also avoid the need to join the source and target tables completely.
In the final part of this series, we will look at how the query optimizer reasons about the need for
Halloween protection, and identify some more tricks it can employ to avoid the need to add Eager Table
Spools to execution plans that change data.
The spool reads all rows from its input and stores them in a hidden tempdb work table. The pages of this
work table may remain in memory, or they might require physical disk space if the set of rows is large,
or if the server is under memory pressure.
Full phase separation can be less than ideal because we generally want to run as much of the plan as
possible as a pipeline, where each row is fully processed before moving on to the next. Pipelining has
many advantages including avoiding the need for temporary storage, and only touching each row once.
The SQL Server Optimizer
SQL Server goes much further than the two techniques described so far, though it does of course include
both as options. The SQL Server query optimizer detects queries that require Halloween Protection,
determines how much protection is required, and uses cost-based analysis to find the cheapest method
of providing that protection.
The easiest way to understand this aspect of the Halloween Problem is to look at some examples. In the
following sections, the task is to add a range of numbers to an existing table but only numbers that do
not already exist:
CREATE TABLE dbo.Test
(
pk
CONSTRAINT PK_Test
PRIMARY KEY CLUSTERED (pk)
);
5 rows
The first example processes a range of numbers from 1 to 5 inclusive:
INSERT dbo.Test (pk)
SELECT Num.n
FROM dbo.Numbers AS Num
WHERE
Num.n BETWEEN 1 AND 5
AND NOT EXISTS
(
SELECT NULL
FROM dbo.Test AS t
WHERE t.pk = Num.n
);
Since this query reads from and writes to the keys of the same index on the Test table, the execution
plan requires Halloween Protection. In this case, the optimizer uses full phase separation using an Eager
Table Spool:
50 rows
With five rows now in the Test table, we run the same query again, changing the WHERE clause to
process the numbers from 1 to 50 inclusive:
This plan provides correct protection against the Halloween Problem, but it does not feature an Eager
Table Spool. The optimizer recognizes that the Hash Match join operator is blocking on its build input; all
rows are read into a hash table before the operator starts the matching process using rows from the
probe input. As a consequence, this plan naturally provides phase separation (for the Test table only)
without the need for a spool.
The optimizer chose a Hash Match join plan over the Nested Loops join seen in the 5-row plan for costbased reasons. The 50-row Hash Match plan has a total estimated cost of 0.0347345units. We can force
the Nested Loops plan used previously with a hint to see why the optimizer did not choose nested loops:
This plan has an estimated cost of 0.0379063 units including the spool, a bit more than the Hash Match
plan.
500 Rows
With 50 rows now in the Test table, we further increase the range of numbers to 500:
This time, the optimizer chooses a Merge Join, and again there is no Eager Table Spool. The Sort
operator provides the necessary phase separation in this plan. It fully consumes its input before
returning the first row (the sort cannot know which row sorts first until all rows have been seen). The
optimizer decided that sorting 50 rows from the Test table would be cheaper than eagerspooling 450 rows just before the update operator.
The Sort plus Merge Join plan has an estimated cost of 0.0362708 units. The Hash Match and Nested
Loops plan alternatives come out at 0.0385677 units and 0.112433 units respectively.
Something odd about the Sort
If you have been running these examples for yourself, you might have noticed something odd about that
last example, particularly if you looked at the Plan Explorer tool tips for the Test table Seek and the Sort:
The Seek produces an ordered stream of pk values, so what is the point of sorting on the same column
immediately afterward? To answer that (very reasonable) question, we start by looking at just
the SELECT portion of the INSERT query:
SELECT Num.n
FROM dbo.Numbers AS Num
WHERE
Num.n BETWEEN 1 AND 500
AND NOT EXISTS
(
SELECT 1
FROM dbo.Test AS t
WHERE t.pk = Num.n
)
ORDER BY
Num.n;
This query produces the execution plan below (with or without the ORDER BY I added to address certain
technical objections you might have):
Notice the lack of a Sort operator. So why did the INSERT plan include a Sort? Simply to avoid the
Halloween Problem. The optimizer considered that performing a redundant sort (with its built-in phase
separation) was the cheapest way to execute the query and guarantee correct results. Clever.
Halloween Protection Levels and Properties
The SQL Server optimizer has specific features that allow it to reason about the level of Halloween
Protection (HP) required at each point in the query plan, and the detailed effect each operator has.
These extra features are incorporated into the same property framework the optimizer uses to keep
track of hundreds of other important bits of information during its search activities.
Each operator has a required HP property and a delivered HP property. The required property indicates
the level of HP needed at that point in the tree for correct results. The delivered property reflects the
HP provided by the current operator and the cumulative HP effects provided by its subtree.
The optimizer contains logic to determine how each physical operator (for example, a Compute Scalar)
affects the HP level. By exploring a wide range of plan alternatives and rejecting plans where the
delivered HP is less than the required HP at the update operator, the optimizer has a flexible way to find
correct, efficient plans that do not always require an Eager Table Spool.
Plan changes for Halloween Protection
We saw the optimizer add a redundant sort for Halloween Protection in the previous Merge Join
example. How can we be sure this is more efficient than a simple Eager Table Spool? And how can we
know which features of an update plan are only there for Halloween Protection?
Both questions can be answered (in a test environment, naturally) using undocumented trace flag 8692,
which forces the optimizer to use an Eager Table Spool for Halloween Protection. Recall that the Merge
Join plan with the redundant sort had an estimated cost of 0.0362708 magic optimizer units. We can
compare that to the Eager Table Spool alternative by recompiling the query with trace flag 8692
enabled:
INSERT dbo.Test (pk)
SELECT Num.n
FROM dbo.Numbers AS Num
WHERE
Num.n BETWEEN 1 AND 500
AND NOT EXISTS
(
SELECT 1
FROM dbo.Test AS t
WHERE t.pk = Num.n
)
OPTION (QUERYTRACEON 8692);
The Eager Spool plan has an estimated cost of 0.0378719 units (up from 0.0362708 with the redundant
sort). The cost differences shown here are not very significant due to the trivial nature of the task and
the small size of the rows. Real-world update queries with complex trees and larger row counts often
produce plans that are much more efficient thanks to the SQL Server optimizers ability to think deeply
about Halloween Protection.
Other non-spool options
Positioning a blocking operator optimally within a plan is not the only strategy open to the optimizer to
minimize the cost of providing protection against the Halloween Problem. It can also reason about the
range of values being processed, as the following example demonstrates:
CREATE TABLE #Test
(
pk
integer IDENTITY PRIMARY KEY,
some_value integer
);
CREATE INDEX i ON #Test (some_value);
-- Pretend the table has lots of data in it
UPDATE STATISTICS #Test
WITH ROWCOUNT = 123456, PAGECOUNT = 1234;
UPDATE #Test
SET some_value = 10
WHERE some_value = 5;
The execution plan shows no need for Halloween Protection, despite the fact we are reading from and
updating the keys of a common index:
The optimizer can see that changing some_value from 5 to 10 could never cause an updated row to be
seen a second time by the Index Seek (which is only looking for rows where some_value is 5). This
reasoning is only possible where literal values are used in the query, or where the query
specifies OPTION (RECOMPILE), allowing the optimizer to sniff the values of the parameters for a one-off
execution plan.
Even with literal values in the query, the optimizer may be prevented from applying this logic if the
database option FORCED PARAMETERIZATION is ON. In that case, the literal values in the query are
replaced by parameters, and the optimizer can no longer be sure that Halloween Protection is not
required (or will not be required when the plan is reused with different parameter values):
In case you are wondering what happens if FORCED PARAMETERIZATION is enabled and the query
specifies OPTION (RECOMPILE), the answer is that the optimizer compiles a plan for the sniffed values,
and so can apply the optimization. As always with OPTION (RECOMPILE), the specific-value query plan is
not cached for reuse.
Top
This last example shows how the Top operator can remove the need for Halloween Protection:
UPDATE TOP (1) t
SET some_value += 1
FROM #Test AS t
WHERE some_value <= 10;
No protection is required because we are only updating one row. The updated value cannot be
encountered by the Index Seek, because the processing pipeline stops as soon as the first row is
updated. Again, this optimization can only be applied if a constant literal value is used in the TOP, or if a
variable returning the value 1 is sniffed using OPTION (RECOMPILE).
If we change the TOP (1) in the query to a TOP (2), the optimizer chooses a Clustered Index Scan instead
of the Index Seek:
We are not updating the keys of the clustered index, so this plan does not require Halloween Protection.
Forcing the use of the nonclustered index with a hint in the TOP (2) query makes the cost of the
protection apparent:
The optimizer estimated the Clustered Index Scan would be cheaper than this plan (with its extra
Halloween Protection).
Odds and Ends
There are a couple of other points I want to make about Halloween Protection that have not found a
natural place in the series before now. The first is the question of Halloween Protection when a rowversioning isolation level is in use.
Row Versioning
SQL Server provides two isolation levels, READ COMMITTED SNAPSHOT and SNAPSHOT ISOLATION that
use a version store in tempdb to provide a statement- or transaction-level consistent view of the
database. SQL Server could avoid Halloween Protection completely under these isolation levels, since
the version store can provide data unaffected by any changes the currently executing statement might
have made so far. This idea is currently not implemented in a released version of SQL Server, though
Microsoft has filed a patent describing how this would work, so perhaps a future version will incorporate
this technology.
Heaps and Forwarded Records
If you are familiar with the internals of heap structures, you might be wondering if a particular
Halloween Problem might occur when forwarded records are generated in a heap table. In case this is
new to you, a heap record will be forwarded if an existing row is updated such that it no longer fits on
the original data page. The engine leaves behind a forwarding stub, and moves the expanded record to
another page.
A problem could occur if a plan containing a heap scan updates a record such that it is forwarded. The
heap scan might encounter the row again when the scan position reaches the page with the forwarded
record. In SQL Server, this issue is avoided because the Storage Engine guarantees to always follow
forwarding pointers immediately. If the scan encounters a record that has been forwarded, it ignores it.
With this safeguard in place, the query optimizer does not have to worry about this scenario.
SCHEMABINDING and T-SQL Scalar Functions
There are very few occasions when using a T-SQL scalar function is a good idea, but if you must use one
you should be aware of an important effect it can have regarding Halloween Protection. Unless a scalar
function is declared with the SCHEMABINDING option, SQL Server assumes the function accesses tables.
To illustrate, consider the simple T-SQL scalar function below:
The execution plan now includes an Eager Table Spool for Halloween Protection. SQL Server assumes
the function accesses data, which might include reading from the Product table again. As you may recall,
an INSERT plan that contains a reference to the target table on the reading side of the plan requires full
Halloween Protection, and as far as the optimizer knows, that might be the case here.
Adding the SCHEMABINDING option to the function definition means SQL Server examines the body of
the function to determine which tables it accesses. It finds no such access, and so does not add any
Halloween Protection:
ALTER FUNCTION dbo.ReturnInput
(
@value integer
)
RETURNS integer
WITH SCHEMABINDING
AS
BEGIN
RETURN @value;
END;
GO
DECLARE @T AS TABLE (ProductID int PRIMARY KEY);
INSERT @T (ProductID)
SELECT p.ProductID
FROM AdventureWorks2012.Production.Product AS p;
This issue with T-SQL scalar functions affects all update queries INSERT, UPDATE, DELETE, and MERGE.
Knowing when you are hitting this problem is made more difficult because unnecessary Halloween
Protection will not always show up as an extra Eager Table Spool, and scalar function calls may be
hidden in views or computed column definitions, for example.
This would delete 456,960 rows (about 10% of the table), spread across many orders. This isnt a
realistic modification in this context, since it will mess with pre-calculated order totals, and you cant
really remove a product from an order that has already shipped. But using a database we all know and
love, it is analogous to, say, deleting a user from a forum site, and also deleting all of their messages a
real scenario I have seen in the wild.
So one test would be to perform the following, one-shot delete:
DELETE dbo.SalesOrderDetailEnlarged WHERE ProductID IN (712, 870, 873);
I know this is going to require a massive scan and take a huge toll on the transaction log. Thats kind of
the point. :-)
While that was running, I put together a different script that will perform this delete in chunks: 25,000,
50,000, 75,000 and 100,000 rows at a time. Each chunk will be committed in its own transaction (so that
if you need to stop the script, you can, and all previous chunks will already be committed, instead of
having to start over), and depending on the recovery model, will be followed by either a CHECKPOINT or
a BACKUP LOG to minimize the ongoing impact on the transaction log. (I will also test without these
operations.) It will look something like this (I am not going to bother with error handling and other
niceties for this test, but you shouldnt be as cavalier):
SET NOCOUNT ON;
DECLARE @r INT;
SET @r = 1;
WHILE @r > 0
BEGIN
BEGIN TRANSACTION;
DELETE TOP (100000) -- this will change
dbo.SalesOrderDetailEnlarged
WHERE ProductID IN (712, 870, 873);
SET @r = @@ROWCOUNT;
COMMIT TRANSACTION;
-- CHECKPOINT;
-- if simple
-- BACKUP LOG ... -- if full
END
Of course, after each test, I would restore the original backup of the database WITH REPLACE,
RECOVERY, set the recovery model accordingly, and run the next test.
The Results
The outcome of the first test was not very surprising at all. To perform the delete in a single statement,
it took 42 seconds in full, and 43 seconds in simple. In both cases this grew the log to 579 MB.
The next set of tests had a couple of surprises for me. One is that, while these chunking methods did
significantly reduce impact to the log file, only a couple of combinations came close in duration, and
none were actually faster. Another is that, in general, chunking in full recovery (without performing a log
backup between steps) performed better than equivalent operations in simple recovery. Here are the
results for duration and log impact:
Log size, in MB, after various delete operations removing 457K rows
Again, in general, while log size is significantly reduced, duration is increased. You can use this type of
scale to determine whether its more important to reduce the impact to disk space or to minimize the
amount of time spent. For a small hit in duration (and after all, most of these processes are run in the
background), you can have a significant savings (up to 94%, in these tests) in log space usage.
Note that I did not try any of these tests with compression enabled (possibly a future test!), and I left
the log autogrow settings at the terrible defaults (10%) partly out of laziness and partly because many
environments out there have retained this awful setting.
But what if I have more data?
Next I thought I should test this on a slightly larger database. So I made another database and created a
new, larger copy of dbo.SalesOrderDetailEnlarged. Roughly ten times larger, in fact. This time instead of
a primary key on SalesOrderID, SalesorderDetailID, I just made it a clustered index (to allow for
duplicates), and populated it this way:
SELECT c.*
INTO dbo.SalesOrderDetailReallyReallyEnlarged
FROM AdventureWorks2012.Sales.SalesOrderDetailEnlarged AS c
CROSS JOIN
(
SELECT TOP 10 Number FROM master..spt_values
) AS x;
CREATE CLUSTERED INDEX so ON
dbo.SalesOrderDetailReallyReallyEnlarged(SalesOrderID,SalesOrderDetailID);
-- I also made this index non-unique:
CREATE NONCLUSTERED INDEX rg ON
dbo.SalesOrderDetailReallyReallyEnlarged(rowguid);
CREATE NONCLUSTERED INDEX p ON
dbo.SalesOrderDetailReallyReallyEnlarged(ProductID);
Due to disk space limitations, I had to move off of my laptops VM for this test (and chose a 40-core box,
with 128 GB of RAM, that just happened to be sitting around quasi-idle :-)), and still it was not a quick
process by any means. Population of the table and creation of the indexes took ~24 minutes.
The table has 48.5 million rows and takes up 7.9 GB in disk (4.9 GB in data, and 2.9 GB in index).
This time, my query to determine a good set of candidate ProductID values to delete:
SELECT TOP (3)
ProductID, ProductCount = COUNT(*)
FROM dbo.SalesOrderDetailReallyReallyEnlarged
GROUP BY ProductID
ORDER BY ProductCount DESC;
Yielded the following results:
ProductID ProductCount
--------- -----------870
1828320
712
1318980
873
1308060
So we are going to delete 4,455,360 rows, a little under 10% of the table. Following a similar pattern to
the above test, were going to delete all in one shot, then in chunks of 500,000, 250,000 and 100,000
rows.
Results:
Log size, in MB, after various delete operations removing 4.5MM rows
So again, we see a significant reduction in log file size (over 97% in cases with the smallest chunk size of
100K); however, at this scale, we see a few cases where we also accomplish the delete in less time, even
with all the autogrow events that must have occurred. That sounds an awful lot like win-win to me!
This time with a bigger log
Now, I was curious how these different deletes would compare with a log file pre-sized to accommodate
for such large operations. Sticking with our larger database, I pre-expanded the log file to 6 GB, backed it
up, then ran the tests again:
ALTER DATABASE delete_test MODIFY FILE
(NAME=delete_test_log, SIZE=6000MB);
Results, comparing duration with a fixed log file to the case where the file had to autogrow
continuously:
Duration, in seconds, of various delete operations removing 4.5MM rows, comparing fixed log size and
autogrow
Again we see that the methods that chunk deletes into batches, and do *not* perform a log backup or a
checkpoint after each step, rival the equivalent single operation in terms of duration. In fact, see that
most actually perform in less overall time, with the added bonus that other transactions will be able to
get in and out between steps. Which is a good thing unless you want this delete operation to block all
unrelated transactions.
Conclusion
It is clear that there is no single, correct answer to this problem there are a lot of inherent it depends
variables. It may take some experimenting to find your magic number, as there will be a balance
between the overhead it takes to backup the log and how much work and time you save at different
chunk sizes. But if you are planning to delete or archive a large number of rows, it is quite likely that you
will be better off, overall, performing the changes in chunks, rather than in one, massive transaction
even though the duration numbers seem to make that a less attractive operation. Its not all about
duration if you dont have a sufficiently pre-allocated log file, and dont have the space to
accommodate such a massive transaction, it is probably much better to minimize log file growth at the
cost of duration, in which case youll want to ignore the duration graphs above and pay attention to the
log size graphs.
If you can afford the space, you still may or may not want to pre-size your transaction log accordingly.
Depending on the scenario, sometimes using the default autogrow settings ended up slightly faster in
my tests than using a fixed log file with plenty of room. Plus, it may be tough to guess exactly how much
youll need to accommodate a large transaction you havent run yet. If you cant test a realistic scenario,
try your best to picture your worst case scenario then, for safety, double it. Kimberly Tripp
(blog | @KimberlyLTripp) has some great advice in this post: 8 Steps to better Transaction Log
throughput in this context, specifically, look at point #6. Regardless of how you decide to calculate
your log space requirements, if youre going to end up needing the space anyway, better to take it in a
controlled fashion well in advance, than to halt your business processes while they wait for an autogrow
(never mind multiple!).
Another very important facet of this that I did not measure explicitly is the impact to concurrency a
bunch of shorter transactions will, in theory, have less impact on concurrent operations. While a single
delete took slightly less time than the longer, batched operations, it held all of its locks for that entire
duration, while the chunked operations would allow for other queued transactions to sneak in between
each transaction. In a future post Ill try to take a closer look on this impact (and I have plans for other
deeper analysis as well).
This problem can also occur in derived tables, common table expressions and in-line functions, but I see
it most often with views because they are intentionally written to be more generic.
Window functions
Window functions are distinguished by the presence of an OVER() clause and come in three varieties:
ROW_NUMBER
RANK
DENSE_RANK
NTILE
COUNT, COUNT_BIG
CHECKSUM_AGG
LAG, LEAD
FIRST_VALUE, LAST_VALUE
The ranking and aggregate window functions were introduced in SQL Server 2005, and considerably
extended in SQL Server 2012. The analytic window functions are new for SQL Server 2012.
All of the window functions listed above are susceptible to the optimizer limitation detailed in this
article.
Example
Using the AdventureWorks sample database, the task at hand is to write a query that returns all product
#878 transactions that occurred on the most recent date available. There are all sorts of ways to express
this requirement in T-SQL, but we will choose to write a query that uses a windowing function. The first
step is to find transaction records for product #878 and rank them in date order descending:
SELECT
th.TransactionID,
th.ReferenceOrderID,
th.TransactionDate,
th.Quantity,
rnk = RANK() OVER (
ORDER BY th.TransactionDate DESC)
FROM Production.TransactionHistory AS th
WHERE
th.ProductID = 878
ORDER BY
rnk;
The results of the query are as expected, with six transactions occurring on the most recent date
available. The execution plan contains a warning triangle, alerting us to a missing index:
As usual for missing index suggestions, we need to remember that the recommendation is not the result
of a through analysis of the query it is more of an indication that we need to think a bit about how this
query accesses the data it needs.
The suggested index would certainly be more efficient than scanning the table completely, since it
would allow an index seek to the particular product we are interested in. The index would also cover all
the columns needed, but it would not avoid the sort (by TransactionDate descending). The ideal index
for this query would allow a seek on ProductID, return the selected records in
reverse TransactionDate order, and cover the other returned columns:
CREATE NONCLUSTERED INDEX ix
ON Production.TransactionHistory
(ProductID, TransactionDate DESC)
INCLUDE
(ReferenceOrderID, Quantity);
With that index in place, the execution plan is much more efficient. The clustered index scan has been
replaced by a range seek, and an explicit sort is no longer necessary:
The final step for this query is to limit the results to just those rows that rank #1. We cannot filter
directly in the WHERE clause of our query because window functions may only appear in
theSELECT and ORDER BY clauses.
We can workaround this restriction using a derived table, common table expression, function, or view.
On this occasion, we will use a common table expression (aka an in-line view):
WITH RankedTransactions AS
(
SELECT
th.TransactionID,
th.ReferenceOrderID,
th.TransactionDate,
th.Quantity,
rnk = RANK() OVER (
ORDER BY th.TransactionDate DESC)
FROM Production.TransactionHistory AS th
WHERE
th.ProductID = 878
)
SELECT
TransactionID,
ReferenceOrderID,
TransactionDate,
Quantity
FROM RankedTransactions
WHERE
rnk = 1;
The execution plan is the same as before, with an extra Filter to return only rows ranked #1:
It turns out that our query is very useful, so the decision is taken to generalize it and store the definition
in a view. For this to work for any product, we need to do two things: return theProductID from the
view, and partition the ranking function by product:
CREATE VIEW dbo.MostRecentTransactionsPerProduct
WITH SCHEMABINDING
AS
SELECT
sq1.ProductID,
sq1.TransactionID,
sq1.ReferenceOrderID,
sq1.TransactionDate,
sq1.Quantity
FROM
(
SELECT
th.ProductID,
th.TransactionID,
th.ReferenceOrderID,
th.TransactionDate,
th.Quantity,
rnk = RANK() OVER (
PARTITION BY th.ProductID
ORDER BY th.TransactionDate DESC)
FROM Production.TransactionHistory AS th
) AS sq1
WHERE
sq1.rnk = 1;
Selecting all the rows from the view results in the following execution plan and correct results:
We can now find the most recent transactions for product 878 with a much simpler query on the view:
SELECT
mrt.ProductID,
mrt.TransactionID,
mrt.ReferenceOrderID,
mrt.TransactionDate,
mrt.Quantity
FROM dbo.MostRecentTransactionsPerProduct AS mrt
WHERE
mrt.ProductID = 878;
Our expectation is that the execution plan for this new query will be exactly the same as before we
created the view. The query optimizer should be able to push the filter specified in theWHERE clause
down into the view, resulting in an index seek.
We need to stop and think a bit at this point, however. The query optimizer can only produce execution
plans that are guaranteed to produce the same results as the logical query specification is it safe to
push our WHERE clause into the view?<
The answer is yes, so long as the column we are filtering on appears in the PARTITION BY clause of the
window function in the view. The reasoning is that eliminating complete groups (partitions) from the
window function will not affect the ranking of rows returned by the query. The question is, does the SQL
Server query optimizer know this? The answer depends on which version of SQL Server we are running.
SQL Server 2005 execution plan
0
A look at the Filter properties in this plan shows it applying two predicates:
The ProductID = 878 predicate has not been pushed down into the view, resulting in a plan that scans
our index, ranking every row in the table before filtering for product #878 and rows ranked #1.
The SQL Server 2005 query optimizer cannot push suitable predicates past a window function in a lower
query scope (view, common table expression, in-line function or derived table). This limitation applies to
all SQL Server 2005 builds.
SQL Server 2008+ execution plan
This is the execution plan for the same query on SQL Server 2008 or later:
The ProductID predicate has been successfully pushed past the ranking operators, replacing the index
scan with the efficient index seek.
The 2008 query optimizer includes a new simplification rule SelOnSeqPrj (select on sequence project)
that is able to push safe outer-scope predicates past window functions. To produce the less efficient
plan for this query in SQL Server 2008 or later, we have to temporarily disable this query optimizer
feature:
SELECT
mrt.ProductID,
mrt.TransactionID,
mrt.ReferenceOrderID,
mrt.TransactionDate,
mrt.Quantity
Unfortunately, the SelOnSeqPrj simplification rule only works when the predicate performs a
comparison with a constant. For that reason, the following query produces the sub-optimal plan on SQL
Server 2008 and later:
DECLARE @ProductID INT = 878;
SELECT
mrt.ProductID,
mrt.TransactionID,
mrt.ReferenceOrderID,
mrt.TransactionDate,
mrt.Quantity
FROM dbo.MostRecentTransactionsPerProduct AS mrt
WHERE
mrt.ProductID = @ProductID;
The problem can still occur even where the predicate uses a constant value. SQL Server may decide to
auto-parameterize trivial queries (one for which an obvious best plan exists). If auto-parameterization is
successful, the optimizer sees a parameter instead of a constant, and the SelOnSeqPrj rule is not
applied.
For queries where auto-parameterization is not attempted (or where it is determined to be unsafe), the
optimization may still fail, if the database option for FORCED PARAMETERIZATION is on. Our test query
(with the constant value 878) is not safe for auto-parameterization, but the forced parameterization
setting overrides this, resulting in the inefficient plan:
ALTER DATABASE AdventureWorks
The SelOnSeqPrj rule does not exist in SQL Server 2005, so OPTION (RECOMPILE) cannot help there. In
case you are wondering, the OPTION (RECOMPILE) workaround results in a seek even if the database
option for forced parameterization is on.
All versions workaround #1
In some cases, it is possible to replace the problematic view, common table expression, or derived table
with a parameterized in-line table-valued function:
CREATE FUNCTION dbo.MostRecentTransactionsForProduct
(
@ProductID integer
)
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
SELECT
sq1.ProductID,
sq1.TransactionID,
sq1.ReferenceOrderID,
sq1.TransactionDate,
sq1.Quantity
FROM
(
SELECT
th.ProductID,
th.TransactionID,
th.ReferenceOrderID,
th.TransactionDate,
th.Quantity,
rnk = RANK() OVER (
PARTITION BY th.ProductID
ORDER BY th.TransactionDate DESC)
FROM Production.TransactionHistory AS th
WHERE
th.ProductID = @ProductID
) AS sq1
WHERE
sq1.rnk = 1;
This function explicitly places the ProductID predicate in the same scope as the window function,
avoiding the optimizer limitation. Written to use the in-line function, our example query becomes:
SELECT
mrt.ProductID,
mrt.TransactionID,
mrt.ReferenceOrderID,
mrt.TransactionDate,
mrt.Quantity
FROM dbo.MostRecentTransactionsForProduct(878) AS mrt;
This produces the desired index seek plan on all versions of SQL Server that support window functions.
This workaround produces a seek even where the predicate references a parameter or local variable
OPTION (RECOMPILE) is not required.<
The function body could of course be simplified to remove the now-redundant PARTITION BY clause,
and to no longer return the ProductID column. I left the definition the same as the view it replaced to
more clearly illustrate the cause of the execution plan differences.
All versions workaround #2
The second workaround only applies to ranking window functions that are filtered to return rows
numbered or ranked #1 (using ROW_NUMBER, RANK, or DENSE_RANK). This is a very common usage
however, so it is worth mentioning.
An additional benefit is that this workaround can produce plans that are even more efficient than the
index seek plans seen previously. As a reminder, the previous best plan looked like this:
That execution plan ranks 1,918 rows even though it ultimately returns only 6. We can improve this
execution plan by using the window function in an ORDER BY clause instead of ranking rows and then
filtering for rank #1:
SELECT TOP (1) WITH TIES
th.TransactionID,
th.ReferenceOrderID,
th.TransactionDate,
th.Quantity
FROM Production.TransactionHistory AS th
WHERE
th.ProductID = 878
ORDER BY
RANK() OVER (
ORDER BY th.TransactionDate DESC);
That query nicely illustrates the use of a window function in the ORDER BY clause, but we can do even
better, eliminating the window function completely:
SELECT TOP (1) WITH TIES
th.TransactionID,
th.ReferenceOrderID,
th.TransactionDate,
th.Quantity
FROM Production.TransactionHistory AS th
WHERE
th.ProductID = 878
ORDER BY
th.TransactionDate DESC;
This plan reads only 7 rows from the table to return the same 6-row result set. Why 7 rows? The Top
operator is running in WITH TIES mode:
It continues to request one row at a time from its subtree until the TransactionDate changes. The
seventh row is required for the Top to be sure that no more tied-value rows will qualify.
We can extend the logic of the query above to replace the problematic view definition:
ALTER VIEW dbo.MostRecentTransactionsPerProduct
WITH SCHEMABINDING
AS
SELECT
p.ProductID,
Ranked1.TransactionID,
Ranked1.ReferenceOrderID,
Ranked1.TransactionDate,
Ranked1.Quantity
FROM
-- List of product IDs
(SELECT ProductID FROM Production.Product) AS p
CROSS APPLY
(
-- Returns rank #1 results for each product ID
SELECT TOP (1) WITH TIES
th.TransactionID,
th.ReferenceOrderID,
th.TransactionDate,
th.Quantity
FROM Production.TransactionHistory AS th
WHERE
th.ProductID = p.ProductID
ORDER BY
th.TransactionDate DESC
) AS Ranked1;
The view now uses a CROSS APPLY to combine the results of our optimized ORDER BY query for each
product. Our test query is unchanged:
DECLARE @ProductID integer;
SET @ProductID = 878;
SELECT
mrt.ProductID,
mrt.TransactionID,
mrt.ReferenceOrderID,
mrt.TransactionDate,
mrt.Quantity
FROM dbo.MostRecentTransactionsPerProduct AS mrt
WHERE
mrt.ProductID = @ProductID;
Both pre- and post-execution plans show an index seek without needing an OPTION (RECOMPILE) query
hint. The following is a post-execution (actual) plan:
If the view had used ROW_NUMBER instead of RANK, the replacement view would simply have omitted
the WITH TIES clause on the TOP (1). The new view could also be written as a parameterized in-line
table-valued function of course.
One could argue that the original index seek plan with the rnk = 1 predicate could also be optimized to
only test 7 rows. After all, the optimizer should know that rankings are produced by the Sequence
Project operator in strict ascending order, so execution could end as soon as a row with a rank greater
than one is seen. The optimizer does not contain this logic today, however.
Final Thoughts
People are often disappointed by the performance of views that incorporate window functions. The
reason can often be traced back to the optimizer limitation described in this post (or perhaps because
the view designer did not appreciate that predicates applied to the view must appear in the PARTITION
BY clause to be safely pushed down).
I do want to emphasise that this limitation does not just apply to views, and neither is it limited
to ROW_NUMBER, RANK, and DENSE_RANK. You should be aware of this limitation when using any
function with an OVER clause in a view, common table expression, derived table, or in-line table-valued
function.
SQL Server 2005 users that encounter this issue are faced with the choice of rewriting the view as a
parameterized in-line table-valued function, or using the APPLY technique (where applicable).
SQL Server 2008 users have the extra option of using an OPTION (RECOMPILE) query hint if the issue can
be solved by allowing the optimizer to see a constant instead of a variable or parameter reference.
Remember to check post-execution plans when using this hint though: the pre-execution plan cannot
generally show the optimal plan.
Index
Berry, Glenn
Selecting a Processor for SQL Server 2012
Bertrand, Aaron
Best Approach for Running Totals
Split Strings the Right Way
Split Strings: Now with less T-SQL
Performance impact of different error handling techniques
Using named instances? Test your DAC connection!
What is the fastest way to calculate the median?
T-SQL Tuesday #33: Trick Shots : Schema Switch-A-Roo
Conditional Order By
Splitting Strings : A Follow-Up
When the DRY principle doesnt apply
Hit-Highlighting in Full-Text Search
What impact can different cursor options have?
How much impact can a data type choice have?
What is the most efficient way to trim time from datetime?
Beware misleading data from SET STATISTICS IO
Trimming time from datetime a follow-up
Is the sp_ prefix still a no-no?
Checking if a non-LOB column needs to be updated
Minimizing the impact of DBCC CHECKDB : DOs and DONTs
An important change to Extended Events in SQL Server 2012
Bad cardinality estimates coming from SSMS execution plans
Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?
Generate a set or sequence without loops part 1
Generate a set or sequence without loops part 2
Generate a set or sequence without loops part 3
Potential enhancements to ASPState
Selecting a Processor for SQL Server 2012
Break large delete operations into chunks
Error Handling
Performance impact of different error handling techniques
Extended Events
A Look At DBCC CHECKCONSTRAINTS and I/O
An important change to Extended Events in SQL Server 2012
Measuring Observer Overhead of SQL Trace vs. Extended Events
Hall, Jason
My Perspective: The Top 5 Most Common SQL Server Performance Problems
Installation
Selecting a Processor for SQL Server 2012
IO Subsystem
Break large delete operations into chunks
A Look At DBCC CHECKCONSTRAINTS and I/O
Trimming More Transaction Log Fat
Potential enhancements to ASPState