Performance Tuning Tips, cont.

2. Distribute I/O!

Our server was configured to separate the SQL Server data files from the log files, which is generally a good
idea, for reasons of both performance and recoverability. All the data files were on one RAID 5 array, and all
of the log files were on another. But we were running out of space on the data drive, so I expanded the
databases, adding files on the array used for the logs. This resulted in another half hour of time saved in the
copy phase. This time I was the one that was surprised. The copy statements were written using the
OPENQUERY function:

     insert myTable select * from openquery(OracleServer, ‘select … from myOracleTable’)

An examination of the query plans for the copy statements revealed the reason, namely that data to be
inserted from such a statement is first put into tempdb as an intermediate step. Thus, the insert statement
was making the data drive (which tempdb was on) do much more work (write to tempdb, read from tempdb,
write to permanent table) than the log drive, which just wrote to the transaction log. Putting some of the data
on the log drive distributed the I/O more evenly. Eventually I took this further, putting tempdb entirely on its
own array (RAID 1, in this case) and later on making use of two additional disk arrays as well.

This idea can be extended to the level of individual tables, and even indexes, by using filegroups (read about
filegroups in Books Online). It is debatable whether or not manually specifying the placement of tables and
indexes on specific drives is worthwhile, compared to simply giving the server lots of drives to work with and
letting it decide on placement. It probably depends on the situation, and how well one actually understands
the data access and usage patterns in the particular system involved. I chose to manually place several of
the largest tables and indexes on specific drives, and was rewarded by significant speedups.

Either way, distributing disk I/O across as many drives as possible adds a transparent form of parallelism,
and helps tremendously to reduce I/O bottlenecks. One should keep this principle in mind when ordering
hardware as well; additional disk drives not only add capacity, they can also add speed. In one instance,
putting the two big tables read by a (45 minute) query on separate drives, and the third table it wrote to on yet
another drive reduced its run time by 12 minutes. Later on, a further 12 minutes on this particular query was
gained by another method:
<Previous      Home      Next>