427
Chapter 16: Programming with T-SQL
16
Bulk Insert Options
In practice, you probably need to use some options when using BULK INSERT:
■ (^) Field Terminator: Specifi es the character used to delimit or separate columns in
the source fi le. The default, of course, is a comma, but the pipe character (|) can be
used in production.
■ (^) Row Terminator: Specifi es the character that ends a row in the source fi le. “\n”
means end of row and is the typical setting. However, fi les from mainframes or
other systems sometimes don’t use a clean end of line. In these cases, use a hex
editor to view the actual end of line characters and specify the row terminator in
hex. For example, a hex value of 0A is coded as follows:
ROWTERMINATOR = '0x0A'
■ (^) FirstRow: Useful when specifying whether the incoming fi le has column headers.
If the fi le does have column headers, then use this option to indicate that the fi rst
row of data is actually the second row of the fi le.
■ (^) TabLock: Places an exclusive lock on the entire table and saves SQL Server the
trouble of having to lock the table’s data pages being created. This option can dra-
matically improve performance, but at the cost of blocking data readers during the
bulk insert. If the bulk insert is part of an ETL into a staging table, then there’s
no problem, but if it’s a bulk insert into a production system with potential users
selecting data, then this might not be a good idea. Multiple bulk-import streams
can potentially block each other. To prevent this, SQL Server provides a special
internal lock, called a bulk-update (BU) lock. To get a BU lock, you need to specify
the TABLOCK option with each bulk-import stream without blocking other bulk-
import streams.
■ (^) Rows per Batch: Tells SQL Server to insert n number of rows in a single batch,
rather than the entire fi le. Tweaking the batch size can improve performance.
Beginning with 100 and then experimenting to fi nd the best size for the particu-
lar set of data works best. This helps performance because the logging is done less
often. Too many rows, however, often exceed memory cache and may create waits.
Often, 2,000 rows is the best number.
■ (^) Max Errors: Specifi es how many rows can fail before the bulk insert fails. Depending
on the business requirement for the data, you may need to set this to zero.
■ (^) The Errorfile: Option that points to a fi le that can collect any rows not accepted
by the BULK INSERT operation. This is a great idea and should be used with every
BULK INSERT command in production.
Other options, which you may never need in production include Check_Constraints,
CodePage, DataFileType, FireTriggers, KeepIdentity, KeepNulls, Kilobytes
per_batch, and Order. (The best practice of bulk inserting into a staging table and then
performing the ETL merge into the permanent tables makes these commands less useful.)
c16.indd 427c16.indd 427 7/30/2012 5:38:13 PM7/30/2012 5:38:13 PM
http://www.it-ebooks.info