Tuesday, June 19, 2012

SQL Table variables

How to use table variables (introduced in T-SQL in SQL Server 2000)

First, lets look at the reason we are using a non-permanent table in the first place.  Say I have a table with 100,000 rows.  I need to join that table to another and I will only find matches in a small number of records.  Additionally, I only need a work with a few columns of mylarge table.  Joining this large table to my query can severely degrade performance.  I can often overcome this issue and increase performance by creating a table on the fly in code, loading it with the records from the large table that I will actually be working with, and using that instead.

In prior versions of SQL, the only way to do this was with a temp table.

Here is a sample of a temp table declaration:
 CREATE TABLE #MYTEMPTABLE ([object_id] INT,[name] VARCHAR(128),max_length SMALLINT);

This creates a table in the temp database (on disk).  I can then load this table like so:
INSERT INTO #MYTEMPTABLE SELECT [object_id],[name],max_length FROM sys.all_columns WHERE max_length>2000;

Note that if I were to just use the all_columns table, my query would have to do a table scann through many thousands of records to find the ones with max_length>2000.  This way we will do that one time up front, before a join.

I can work with the temp table much like a normal table:
SELECT * FROM sys.columns C INNER JOIN #MYTEMPTABLE T ON C.[object_id]=T.[object_id];

I must remember to drop the temp table when I am done with it.  Otherwise it could cause object conflicts.
DROP TABLE #MYTEMPTABLE;

The down side to doing it this way is that the few records I am working with are still being stored in a physical database (tempdb), which is always slower than RAM (though not by much if the database is on an SSD).

This code can be made to run faster (sometimes by orders of magnitude) by using a table variable instead of a temp table. 
Here is the same functionality:
DECLARE @MYTEMPTABLE AS TABLE ([object_id] INT,[name] VARCHAR(128),max_length SMALLINT);

INSERT INTO @MYTEMPTABLE SELECT [object_id],[name],max_length FROM sys.all_columns WHERE max_length>2000;

SELECT * FROM sys.columns C INNER JOIN @MYTEMPTABLE T ON C.[object_id]=T.[object_id];

Advantages of table variables over temporary tables:
1. They have well-defined scopes. 
2. They're cleaned up automatically at the end of the stored proc in which they're defined
3. They tend to result in fewer recompilations
4. They result in less locking in stored procs and transactions
5. For recordsets smaller than 100,000 they are almost always faster than a temporary table.  In 12 years, I have never run into a real world instance where using the table variable was slower than the temporary table as long as the number of rows stored in the table variable was less than 100,000.  In cases where the dataset is larger than that, you typically get more benefit from being able to define indexes on the temporary table.

Did this help you?  If so, please leave a comment!

SQL 2022 TSQL snapshot backups!

  SQL 2022 now actually supports snapshot backups!  More specifically, T-SQL snapshot backups. Of course this is hardware-dependent. Here ...