Friday, August 10, 2018

T-SQL performance killers

These are some performance killers (in no particular order) that are easily avoided.  None-the-less, I see them all the time in production code.

1. Forgetting the schema name for the table.
select col1, col2, col3 from table1

While this looks correct at first glance, it is a sneaky performance killer.  You will find code written like this all over the internet.  The reason is that people posting on the internet usually don't know what your schema name will be and maybe not even which type of database you are using.

SQL Server uses a query optimizer to build a plan for how it will execute a query.  This plan generation takes time.  Even a simple join or pivot can take a few seconds to generate a plan.  Once the plan is generated it is stored in a cache and SQL can re-use it for subsequent calls to that same query.  The catch is that in order to reuse the plan, SQL needs to ensure that the query is actually the same query the next time.  One of the ways it does this is by storing the schema that the plan was generated against.  If the schema is not specified for *all* the tables in a query, the plan is still generated but is not stored in a way that it can ever be re-used.

For example, consider the following query.  Let's say this query gets run 100 times a day by various users.
select a.col1 from table1 a inner join table2 b on a.x=b.x

Lets say that SQL takes 2 seconds (an eternity, but a real world number) to generate and store the plan for this query, and another .5 seconds to bring back the results.  The first time a user runs the query it will take 2.5 seconds for the results to return.  The next time a user runs that same query it will still take 2.5 seconds. Over the course of a day, that is 250 seconds (2.5*100)!!!

Here is the same query with the schema names.
select a.col1 from dbo.table1 a inner join dbo.table2 b on a.x=b.x

It will still take 2.5 seconds the first time a user runs it.  But each subsequent run will re-use the plan and run in .5 seconds.  Over the course of a day that is (99*.5)+2.5=52 seconds.  Just over 1/5 the time!  And that is just one query!  This can add up to a huge performance difference quickly!

2.  Select *
This is a lazy way of typing the query.  It saves the developer time and that is where the advantage ends.  Unless you really need to use *all* of the columns in a table, the query will always perform better if the columns are explicitly listed.  How much better will depend on the size and number of columns in the table.
Where I see this crop up most often is in a statement such as this:

if exists(select * from dbo.table)

This forces SQL to expand the * to all column names.  This admittedly only takes a fraction of a second, but when we are talking about a production accounting system, those fractions add up quickly.
This brings us to the next item.

3. Improper use of if exists statements
The best practice is:

if exists(select 1 from dbo.table where [condition])

This selects the first column regardless of name.  There will always be at least one column in a table.  A developer reading this instantly understands that we don't care at that point *what* is in the table, only *whether* there is a record.

I also often see non-inner joins in an exists statement:

if exists(select 1 from dbo.table1 left outer join dbo.table2 on table1.x=table2.x)

This is forcing SQL to build and plan for and return results of an outer join before evaluating exists.  It makes no sense to add this overhead.
Here is the equivalent, much more efficient statement:

if exists(select 1 from dbo.table1)

Another example of inefficient code is:

if(select count(1) from dbo.table1)!=0

Logically, this is the same as the if exists statement, but it does not perform as well because SQL must count all of the rows in the table before evaluating the condition.  If the table is large, this can add more than a second to the processing time each time it runs.

4. Unnecessarily using WITH RECOMPILE statement in a procedure.
This should generally only be used in a proc that builds dynamic SQL and executes it.  In this case, you don't want SQL to think that the execute plan it built last time for the proc itself will work this time because the parameters are different and thus the statement that is built is different.  Even in this case, you should generally try your best to write the proc in a way that does not require dynamic SQL and remove the WITH RECOMPILE statement.

I have never seen any other real-world case in which this clause did not hurt performance.  Indeed, it hurts performance in ALL cases.  It is just necessary in those very specific cases.

5. Unnecessarily using dynamic SQL
Dynamic SQL should be (and in almost all cases CAN be) avoided in procedures.  Most times that I see it used, it is just a lazy way to do something that should have been done in the application layer anyway.  In many of those cases there are ways to avoid it just by refactoring the SQL proc.  Here are some real word examples of its use:

SELECT @exestring = @curcompany + '.dbo.smCleanupPostingJournalEntries'
EXEC @exestring

This statement will perform better if the calling application gets the value for @curcompany and calls the correct proc in the first place.  Doing it this way also increases the chance of errors that do not bubble back up to the application properly.  This is what happens if the proc in the dynamic SQL fails or doesn't exist.

CREATE PROC dbo.SomeProc (@TempTableName VARCHAR(255))
AS
BEGIN
DECLARE @sqlString VARCHAR(2000)

SELECT @sqlString = 'UPDATE a set a.Myfield=b.Somefield FROM dbo.MyTable a inner join dbo.' + @TempTableName + ' b on a.Keyfield=b.Keyfield'

EXEC sp_sqlexec @sqlString
END

In this case the application is building a temp table then passing it in.  This make zero sense if the table is not a #temp table.  Even if it is #temp table, it will always perform better if you build the temp table inside the proc and then use it normally.  If you find a case where it is otherwise, I would be very interested to see it.







SQL 2022 TSQL snapshot backups!

  SQL 2022 now actually supports snapshot backups!  More specifically, T-SQL snapshot backups. Of course this is hardware-dependent. Here ...