Mighty Code: 2012

Wednesday, September 26, 2012

Table variable tests show that they are much faster than #temp!

Working on a client's data migration recently has given me the chance to benchmark table variables against traditional temp tables in SQL 2008.

The server I used was not especially fast and tempdb is even stored on the same drive as the rest of the databases. Still, the results have shown that table variables were at least 10x faster in every test.

The most telling scenario was as follows:

There was a table with 2.8Million rows. One of the fields of interest was a 500 character comment field.
The application that populated this table added a record for each comment with no regard to keys (there was no unique key). However, I noticed that there was a combination of fields that could be used for a unique key, but there were several thousand instances where I would need to concatenate the comment field from more than one record to make the key unique.

Example:
Key     Comment
1         abc
1         def
2         der
3         mmd
3        xyz

Need to be imported to the new table as:
1       abcdef
2       der
3       mmdxyz

There was a bit more complexity involved, but hopefully, you get the idea.

So I needed to use xml_path to concat the values like this:
SELECT A.KEY,
(SELECT COMMENTFIELD AS 'text()'
FROM MYTABLE
WHERE KEY=A.KEY
ORDER BY KEY ASC
FOR XML_PATH(''))
FROM MYTABLE A

The problem is that this returns a recordset like this:
1   abcdef
1   abcdef
2   der
3   mmdxyz
3   mmdxyz

No problem, I just need to group them, right? Well there are 2.8M rows here! Time to benchmark!

I first wrote this using #temp tables.
SELECT A.KEY,
(SELECT COMMENTFIELD AS 'text()'
FROM MYTABLE
WHERE KEY=A.KEY
ORDER BY KEY ASC
FOR XML_PATH(''))
INTO #MYTEMP
FROM MYTABLE A

The insert and grouping alone took 9.5 hours.

I then rewrote it to use table variables

DECLARE @MYTEMP AS TABLE(
KEY AS INT,
COMMENTFIELD AS VARCHAR(1000)
)

INSERT INTO @MYTEMP
SELECT A.KEY,
(SELECT COMMENTFIELD AS 'text()'
FROM MYTABLE
WHERE KEY=A.KEY
ORDER BY KEY ASC
FOR XML_PATH(''))
FROM MYTABLE A

This same logic took 21 minutes! That means it only took 4% as long to run!

If that doesn't make the case for using table variables over temp tables, I don't know what will. From now on, I will default to the table variable when writing code and only consider the temp table for special cases where I need to pass the dataset around.

Did this help you? If so, please leave a comment!

Thursday, August 9, 2012

Why you should not store user data on the same drive as SQL files

I am not aware of any official “Best Practice” statement from Microsoft in relation to storing user files on the same drive as SQL database and log files, however, I would recommend against the practice for three reasons:

Drive space. SQL database and log files can grow quickly without advance notice. Best practice is to keep 15% drive space free at all times to safeguard against unexpected growth causing the drive to run out of space. Users are much more unpredictable than the database, and can run the drive out of space without warning, causing SQL to crash.
Disk I/O. Each HBA (Host Bus Adapter) has a finite amount of I/O throughput that it can handle. Even if the concerns from #1 are handled using disk space quotas or some other limiting measure, I am not aware of a good way to limit certain users I/O capacity. User file operations are typically larger and slower than SQL data operations. So even if the users are not filling up the drive, they could easily use enough of the I/O throughput to cause a choke point for SQL and degrade performance. These types of performance hits would be very difficult to diagnose and track down, so better to avoid them up front.
Fragmentation. When set up properly, SQL server grows data and log files by allocating a large block of contiguous disk space. Since user files are much smaller, they can create thousands of tiny gaps in disk space. The result is that SQL Server cannot find a continuous block large enough for the next growth operation. The SQL data and log files will become fragmented. There is no safe way to defragment them with windows tools, and the SQL Server Maintenance operations that do so can only handle defragmenting the SQL file if there is sufficient continuous disk space, not moving the user files. The end result is a performance hit and can become significant over time. This is also one of the reasons that the SQL user databases and tempdb data and log files should not be stored on the OS drive.

(I did not list security concerns, as those can be overcome with proper AD setup and administration, but many organizations’ security policy still do not allow this for fear of undiscovered exploits in AD security.)

If this is still not clear, hopefully this analogy will shed some light.
Imagine a swimming pool filled with barrels with a garden hose running to it that can be moved from barrel to barrel.

The swimming pool is the hard drive.
The barrels are the database growth allocation blocks.
The garden hose is the HBA.
The water is the data.

SQL pulls and pushes water through the garden hose, usually at a trickle.
A user storing or reading a file blasts water at high pressure through the hose intermittently. There is nothing stopping the user from blasting so much water in that they overflow the pool. If they blast or suck water through the hose for an extended period of time (seconds is extended), then SQL has to wait for them to finish to resume the trickle.
Users can put water in any barrel, but SQL prefers an empty barrel. If there are no empty ones, SQL must use part of two or more barrels, but must remember which barrels go together.

This quickly becomes laborious to keep track of, and translates into degraded performance.

Did this help you? If so, please leave a comment!

Tuesday, July 3, 2012

Dream SQL Install

I see debates all the time over which way is best to set up the hardware for a particular SQL install. So I decided to lay out what the hardware setup would look like in my ideal GP SQL installation if money was no object.

First, I would buy a Dell PowerEdge and an HP Proliant and us them as doorstops while the boys carry in a new maxed out SGI IDC3212-RP4
Specs here: http://www.sgi.com/products/servers/infinitedata_cluster/configs.html

We may also need step-stools later, so I would then toss them in the corner. I would then hook up a SAN with a fibre channel backbone. I am only installing TWO in my imaginary world, so I would put in 5 2TB SSDs. I would install the OS and SQL Server binaries on SSD1. Then the following:
SSD2: MDF files (Except tempdb)
SSD3: LDF files
SSD4: tempdb files
SSD5: BAK files
I would set the initial db size on DYNAMICS and TWO to 250GB to prevent file growth for a while.

I would then connect the entire rig to a 16Tbps connection and replicate the install on the east and west coast, I would then set up SQL clustering to mirror the entire setup between the systems, and use SQL replication features to allow seamless switching from one instance to another in case of failure.

So there is the upper limit. Anything else is a compromise, and must be weighed against an organization's needs and budget.

I hope you enjoyed my little foray in dreamland.

Tuesday, June 19, 2012

SQL Table variables

How to use table variables (introduced in T-SQL in SQL Server 2000)

First, lets look at the reason we are using a non-permanent table in the first place. Say I have a table with 100,000 rows. I need to join that table to another and I will only find matches in a small number of records. Additionally, I only need a work with a few columns of mylarge table. Joining this large table to my query can severely degrade performance. I can often overcome this issue and increase performance by creating a table on the fly in code, loading it with the records from the large table that I will actually be working with, and using that instead.

In prior versions of SQL, the only way to do this was with a temp table.

Here is a sample of a temp table declaration:
CREATE TABLE #MYTEMPTABLE ([object_id] INT,[name] VARCHAR(128),max_length SMALLINT);

This creates a table in the temp database (on disk). I can then load this table like so:
INSERT INTO #MYTEMPTABLE SELECT [object_id],[name],max_length FROM sys.all_columns WHERE max_length>2000;

Note that if I were to just use the all_columns table, my query would have to do a table scann through many thousands of records to find the ones with max_length>2000. This way we will do that one time up front, before a join.

I can work with the temp table much like a normal table:
SELECT * FROM sys.columns C INNER JOIN #MYTEMPTABLE T ON C.[object_id]=T.[object_id];

I must remember to drop the temp table when I am done with it. Otherwise it could cause object conflicts.
DROP TABLE #MYTEMPTABLE;

The down side to doing it this way is that the few records I am working with are still being stored in a physical database (tempdb), which is always slower than RAM (though not by much if the database is on an SSD).

This code can be made to run faster (sometimes by orders of magnitude) by using a table variable instead of a temp table.
Here is the same functionality:
DECLARE @MYTEMPTABLE AS TABLE ([object_id] INT,[name] VARCHAR(128),max_length SMALLINT);

INSERT INTO @MYTEMPTABLE SELECT [object_id],[name],max_length FROM sys.all_columns WHERE max_length>2000;

SELECT * FROM sys.columns C INNER JOIN @MYTEMPTABLE T ON C.[object_id]=T.[object_id];

Advantages of table variables over temporary tables:
1. They have well-defined scopes.
2. They're cleaned up automatically at the end of the stored proc in which they're defined
3. They tend to result in fewer recompilations
4. They result in less locking in stored procs and transactions
5. For recordsets smaller than 100,000 they are almost always faster than a temporary table. In 12 years, I have never run into a real world instance where using the table variable was slower than the temporary table as long as the number of rows stored in the table variable was less than 100,000. In cases where the dataset is larger than that, you typically get more benefit from being able to define indexes on the temporary table.

Did this help you? If so, please leave a comment!

Friday, February 10, 2012

SQL Server Setup and Maintenance Tips

Server Maintenance
- Windows Updates

Run windows updates manually and frequently. Turning on automatic updates is not advised, based on the history of windows updates breaking certain products. Before installing updates, check the internet for reports of issues it could cause with your applications.

- Virus Scanners

Exclude MDF, NDF, LDF, BAK, TRN and BCP file extensions and make sure schedules scans don;t coincide with other scheduled tasks.

- Backups

Perform a full system backup periodically and before and after installations are run.

- Defragmentation

SANs and SSDs are less likely to get fragmented in normal operation. If a drive is used only for database files, it is less likely to need defragmentation. Drives that hold BAK files will need frequent defragmentation.

SQL Server Setup Best Practices
- Plan the data storage scheme before installing SQL

Disk 1: OS, page file, SQL Server binaries
Disk 2 or LUN 1: MDF files (Except tempdb)
Disk 3 or LUN 2: LDF files
Disk 4 or LUN 3: tempdb files
Disk 5 or LUN 4: BAK files

- Check for Physical File Fragmentation before creating a database or log file
- Pre-size MDF and LDF files
- Do not set auto-growth of MDF or LDF files to a percentage. Set it to a value in MB

Revisit periodically and reset initial size if necessary

- Plan disk sizes so that you can keep free drive space above 15%
- Turn on Instant File Initialization in SQL
- Turn on Auto Statistics

SQL Server Maintenance Best Practice for a Typical GP Instance

- Use Full backup model
- Weekly (or more frequent) Full database backup, transported off site.
- Nightly incremental database backup
- Periodic log backups throughout the day

Use backup compression

- What should you back up?

master
msdb
DYNAMICS
All company databases
NEVER back up tempdb
Model only needs to be backed up when making changes to master (such as when upgrading or patching SQL Server)

- When backing up data, the tasks should be run in this order

1. Check DB
2. Reorganize indexes
3. Update Statistics
4. Backup Database

- When backup up logs, maintenance tasks are not necessary

SQL Server Manual Period Maintenance Tasks
(These tasks could be scheduled, but their results should be checked before allowing users back into the system)

- Index Maintenance

Add missing indexes
Remove unused and duplicate indexes
Monitor indexes for excessive fragmentation

- Msdb Maintenance

Periodically delete backup, restore, job, and maintenance plan history
sp_delete_backuphistory [oldest date]
Use log file viewer for others

- Shrinking

Always rebuild indexes after shrinking. If you need to avoid growing the file with a rebuild after shrinking, then at least re-organize indexes. Shrinking leaves indexes nearly 100% fragmented by design.
Never shrink tempdb
Shrinking should be a rare occurrence if the growth settings are tuned properly

- Monitor Drive free space

Keep more than 15% free

- Monitor SQL Server logs and Windows event logs for errors and warnings

Did this help you? If so, please leave a comment!

Mighty Code