Transferring data around with ADO.NET 2.0 - SqlBulkCopy
I'm working at a client on a data migration and found some results that were surprising. I expected there to be a difference but not one quite this large...
Synopsis of Problem:
Right now I'm working on a rather complex migration from a bunch of disparate data sources into a consolidated Sql Server database. Phase 1 simply involves getting the data into tables which mimick the source. Phase 2 involves normalizing and moving the data over. Now that the data has been moved over, ongoing weekly data needs to be imported from various Excel sheets into the database.
Approach:
At first DTS was used. It's a Sql Server 2000 install for the time being so SSIS isn't an option. DTS worked ok but there was one really big problem. The end user who will handle the migration is only familiar with Access and doesn't yet know their way around Sql Server's tools. We couldn't just automatically schedule the packages to run b/c there is not a specific interval when the sheets will arrive and they may come from multiple places. DTS would do the job but because of coupling issues, it wasn't a great solution b/c it involved way too much transition work.
So I created a Winforms app that let the user point to each sheet and just hit import. Using an OleDbDataAdapter, I'd connect to the Excel sheet, set AcceptChangesDuringFill to false on the adapter and just call Fill. From there I'd configure and adapter pointing to the Sql Server DB (it only needed an Insert command) and just pass the datatable filled from Excel to the Update method. Easy enough. But, it was SLOOOOOWWWW. There would often be upwards of 20,000 records to import. Moreoever, just about every field was VARCHAR (255) which made the record huge. I am fully aware that the VARCHAR issue needs to be resolved but for the time being, it's a contraint we just have to deal with. Anyway, it could take hours to do the updates. Hours.
In comes SqlBulkCopy. It's easier than the data adapter approach. It's faster. And it's less code.
So the only real challenge if you want to call it that was writing out the column mappings. After that, you just need to declare a SqlBulkCopy object, set the column mappings. give it a connection string, specify a Destination table and just call the WriteToServer method.
Results:
With one exception, the writes happen in under a second. They are so fast that I had to wonder if they actually worked. They did. With respect to the one sheet that has 20,000+ rows, it takes about 7 seconds from start to finish. 7 Seconds! Compared to two+ hours.
Needless to say this was almost the text book use case for SqlBulkCopy. The dataadapter is a wonderful and powerful object, but it's nowhere near as efficient. In the 2.0 framework you can specify the batchUpdateSize property which can minimize round trips, but that only goes so far. I shaved a few minutes off with it but it was still marginal. In cases where you need to simply move data to a Sql Server database, I'd highly encourage starting out with the SqlBulkCopy b/c in short, it just flat out rocks.