Text to Columns doesn't suck
Posted
Wed, May 27 2009 16:57
by
Nate Oliver
Picking right up where I left off yesterday, let's talk about some native functionality in Excel, an algorithm, that doesn't suck when it comes to String parsing - Text to Columns. If you recall from yesterday:
http://msmvps.com/blogs/nateoliver/archive/2009/05/26/recommended-free-reading-bruce-mckinney.aspx
One of Bruce McKinney's issues with VB's Split() function is that it can't handle multiple, consecutive delimiters. Text to Columns can, Bruce would be pleased.
It's a little unfair to compare Text to Columns to Split(), as Text to Columns works on Range Objects in a column, while Split() works on single Strings. However, it is my Blog and I'll take such artistic licenses from time-to-time.
Text to Columns is one of the most underrated, overlooked native Excel functionalities, in my opinion. Even just the other day I was involved in a thread where I recommended looking at it, and it's almost as if my post didn't exist?
http://www.mrexcel.com/forum/showthread.php?t=389241
Side-bar rant: This is why points-based forums don't always work. The OP, awarding the points/correct answer, is not 100% guaranteed to choose what is probably the best approach. I don't mean to pick on anyone, here, it's just a recent example of me noting Text to Columns is a good approach to the issue at hand.
So, here's a working example. Take the following:
Foo, Bar
We have two delimiters, a comma and a space. Copy and paste that in A1 through A100000 (or less if you're using Excel pre-2007). Now select Column A, in Excel 2007 select the Data Tab, or earlier, Data menu, click into Text to Columns.
You now have a dialog, let's go with Delimited -> Check Space and Comma as your Delimiters and check 'Treat consecutive delimiters as one'. Now click Finish. Wow, that was fast.
It's also pretty easy to record a Macro to replicate this - should you want automation, e.g.,
Sub foo()
Range("A:A").TextToColumns Destination:=Range("A1"), _
DataType:=xlDelimited, TextQualifier:=xlDoubleQuote, _
ConsecutiveDelimiter:=True, Comma:=True, Space:=True, _
FieldInfo:=Array(Array(1, 1), Array(2, 1))
End Sub
Man, that's fast, let's revisit that. 100,000 parsed Strings, eh? How fast is that? Let's time it, we'll call foo() with the following timer-procedure:
Public Declare Function QueryPerformanceFrequency _
Lib "kernel32.dll" ( _
lpFrequency As Currency) As Long
Public Declare Function QueryPerformanceCounter _
Lib "kernel32.dll" ( _
lpPerformanceCount As Currency) As Long
Sub bar()
Dim Ctr1 As Currency, Ctr2 As Currency, Freq As Currency
Dim Overhead As Currency
QueryPerformanceFrequency Freq
QueryPerformanceCounter Ctr1
QueryPerformanceCounter Ctr2
Overhead = Ctr2 - Ctr1 ' determine API overhead
QueryPerformanceCounter Ctr1 ' time loop
Call foo
QueryPerformanceCounter Ctr2
Debug.Print (Ctr2 - Ctr1 - Overhead) / Freq
End Sub
It runs in roughly 1 second (on my machine) to parse 100,000 Strings (in Range Objects), with multiple, consecutive delimiters. I find that to be really impressive. I'm not sure who wrote this particular algorithm, but dollar earned that day.
There's a brief introduction to Text to Columns, but if you explore the dialog/GUI, you can see there's more options available to us than what I've spoken to, e.g., Fixed-length parsing, the ability to skip parsed columns of return values, marshaling your data to different locations, various formatting options, etc...
Text to Columns won't solve all of life's problems, but if you want to parse a column of data, it certainly is a good place to start.