Text to Columns doesn't suck

Posted Wed, May 27 2009 16:57 by Nate Oliver

Picking right up where I left off yesterday, let's talk about some native functionality in Excel, an algorithm, that doesn't suck when it comes to String parsing - Text to Columns. If you recall from yesterday:

http://msmvps.com/blogs/nateoliver/archive/2009/05/26/recommended-free-reading-bruce-mckinney.aspx

One of Bruce McKinney's issues with VB's Split() function is that it can't handle multiple, consecutive delimiters. Text to Columns can, Bruce would be pleased.

It's a little unfair to compare Text to Columns to Split(), as Text to Columns works on Range Objects in a column, while Split() works on single Strings. However, it is my Blog and I'll take such artistic licenses from time-to-time.

Text to Columns is one of the most underrated, overlooked native Excel functionalities, in my opinion. Even just the other day I was involved in a thread where I recommended looking at it, and it's almost as if my post didn't exist?

http://www.mrexcel.com/forum/showthread.php?t=389241

Side-bar rant: This is why points-based forums don't always work. The OP, awarding the points/correct answer, is not 100% guaranteed to choose what is probably the best approach. I don't mean to pick on anyone, here, it's just a recent example of me noting Text to Columns is a good approach to the issue at hand.

So, here's a working example. Take the following:

Foo, Bar

We have two delimiters, a comma and a space. Copy and paste that in A1 through A100000 (or less if you're using Excel pre-2007). Now select Column A, in Excel 2007 select the Data Tab, or earlier, Data menu, click into Text to Columns.

You now have a dialog, let's go with Delimited -> Check Space and Comma as your Delimiters and check 'Treat consecutive delimiters as one'. Now click Finish. Wow, that was fast.

It's also pretty easy to record a Macro to replicate this - should you want automation, e.g.,


Sub foo()
Range("A:A").TextToColumns Destination:=Range("A1"), _
    DataType:=xlDelimited, TextQualifier:=xlDoubleQuote, _
    ConsecutiveDelimiter:=True, Comma:=True, Space:=True, _
    FieldInfo:=Array(Array(1, 1), Array(2, 1))
End Sub


Man, that's fast, let's revisit that. 100,000 parsed Strings, eh? How fast is that? Let's time it, we'll call foo() with the following timer-procedure:


Public Declare Function QueryPerformanceFrequency _
    Lib "kernel32.dll" ( _
    lpFrequency As Currency) As Long

Public Declare Function QueryPerformanceCounter _
    Lib "kernel32.dll" ( _
    lpPerformanceCount As Currency) As Long

Sub bar()
Dim Ctr1 As Currency, Ctr2 As Currency, Freq As Currency
Dim Overhead As Currency
QueryPerformanceFrequency Freq
QueryPerformanceCounter Ctr1
QueryPerformanceCounter Ctr2
Overhead = Ctr2 - Ctr1 ' determine API overhead
QueryPerformanceCounter Ctr1 ' time loop
Call foo
QueryPerformanceCounter Ctr2
Debug.Print (Ctr2 - Ctr1 - Overhead) / Freq
End Sub


It runs in roughly 1 second (on my machine) to parse 100,000 Strings (in Range Objects), with multiple, consecutive delimiters. I find that to be really impressive. I'm not sure who wrote this particular algorithm, but dollar earned that day.

There's a brief introduction to Text to Columns, but if you explore the dialog/GUI, you can see there's more options available to us than what I've spoken to, e.g., Fixed-length parsing, the ability to skip parsed columns of return values, marshaling your data to different locations, various formatting options, etc...

Text to Columns won't solve all of life's problems, but if you want to parse a column of data, it certainly is a good place to start.

Comments

# re: Text to Columns doesn't suck

Wednesday, May 27, 2009 6:50 PM by Allisun

can I use this to ask a question?

Yeah, so, I use text to columns like everyday, for formatting when I print to excel in dbf... anyhoo, the spacing between a customer name and salesrep # is always different, so when I text to columns that column, many of the numbers are shifted.  Is there a way to get around this?  You'll maybe need me to send you an example of what I mean?

thanks!

# re: Text to Columns doesn't suck

Thursday, May 28, 2009 8:00 AM by Jon Peltier

Text To Columns is great. I only wish it were easier to not use the previous delimiters that you set in the dialog. So many times, I apply Text To Columns, it splits using a delimiter which is not a delimiter. I have to Undo, then Text To Columns a random cell to clear the settings (sometimes it's a pain to reset them on the first go), then reapply.

Probably if all of my data looked the same, this would not be a problem.

And I agree that the point rating systems on those type of forums are the suck.

# re: Text to Columns doesn't suck

Thursday, May 28, 2009 9:25 AM by sam

Hi Nate,

Nice Blog..

I always wonderd what "Treat Consequitve Delimiters as One" does. It doest seem to matter if I keep it ticked or not... I get the same results

Sam

# re: Text to Columns doesn't suck

Monday, June 01, 2009 3:58 PM by Nate Oliver

Hi Sam - Thanks!

It does make a difference, take the example I posted and leave the consecutive option unchecked. Note how you end up with an extra, blank column?

Allisun, feel free to send me the file. But I suspect I know what's happening. Your sales rep numbers have leading zeros?

What I didn't mention here, and I should have, is that TTC also coerces numeric values, housed as Strings, to real numbers.

What you want to do in the dialog, is make sure to set your numeric column to Text, formatting-wise.

# re: Text to Columns doesn't suck

Wednesday, September 14, 2011 5:55 PM by Niraj

ok i stumbled upon this while i was trying to figure out a solution to my problem...

my columns have multiple commas and they look like this

cat, dog, rat, mouse, etc

so i want each of them split up into different columns..

cat | dog | rat | mouse | etc

the text to columns only works if there is 1 comma... i have a spreadhseet with close to 1000 rows of data that i need to split like this.. please help!

Leave a Comment

(required) 
(required) 
(optional)
(required) 
If you can't read this number refresh your screen
Enter the numbers above: