Optimizing String Concatenation in VBA

Posted Sat, May 23 2009 15:04 by Nate Oliver

Good day,

This probably should have been my first blog entry, as my code structures often look at using buffers when dealing with dynamic Arrays and Strings. Today we will have a brief look at Strings.

When building a String you want to avoid a large number of concatenations, especially in a potentially large, dynamic loop; which I see people doing all the time.

While one does want to write self-documented, maintainable code, and the intent when concatenating in a loop is apparent, the performance degradation can be significant. As noted in the following Microsoft KB, regarding String Concatenation in VB6/VBA:

http://support.microsoft.com/kb/170964

"the length of time increases proportionally to N-squared. Therefore, 1000 iterations will take about 100 times longer than 100 iterations."


We're looking at compounding performance degradation - not good.

The problem, while addressed in the previous KB, is also addressed, here:

http://msdn.microsoft.com/en-us/library/ms810466.aspx

"Slowdown as a result of frequent allocs and reallocs. This is a very common phenomenon when you use scripting languages. The strings are repeatedly allocated, grown with reallocation, and freed up. Don't do this. Try to allocate large strings, if possible, and use the buffer. An alternative is to minimize concatenation operations."

"If you make extensive use of Automation structures, consider factoring out Automation BSTRs from your mainline code, or at least avoid repeated operations on BSTR. (BSTR concatenation leads to excessive reallocs and alloc/free operations.)"


So, often times it makes a lot more sense, performance-wise, to work with Arrays, whether it's String & Byte Arrays, which directly coerce, or Array->String via Join() - with a buffered, 1D Array.

Another efficient way to build a String is to allocate a number of characters to it, building a buffer before you manipulate it, and stack it with the Mid$() function, e.g.,

 

Public Declare Function QueryPerformanceFrequency _
    Lib "kernel32.dll" ( _
    lpFrequency As Currency) As Long

Public Declare Function QueryPerformanceCounter _
    Lib "kernel32.dll" ( _
    lpPerformanceCount As Currency) As Long

Sub Faster(strVal() As String)
Dim tmpStr As String, i As Long
Let tmpStr = Space$(UBound(strVal) + 1)
For i = LBound(strVal) To UBound(strVal)
    Mid$(tmpStr, i + 1, 1) = strVal(i)
Next
End Sub

Sub Slower(strVal() As String)
Dim tmpStr As String
Let tmpStr = strVal(0) & strVal(1) & strVal(2) & _
    strVal(3) & strVal(4) & strVal(5)
End Sub

Sub foo()
Dim Ctr1 As Currency, Ctr2 As Currency, Freq As Currency
Dim Overhead As Currency
Dim i As Long
Dim strVal(0 To 5) As String
strVal(0) = 0:  strVal(1) = 1:  strVal(2) = 2
strVal(3) = 3:  strVal(4) = 4:  strVal(5) = 5
QueryPerformanceFrequency Freq
QueryPerformanceCounter Ctr1
QueryPerformanceCounter Ctr2
Overhead = Ctr2 - Ctr1 ' determine API overhead
QueryPerformanceCounter Ctr1 ' time loop
For i = 1 To 100000
    Call Faster(strVal)
    'Call Slower(strVal)
Next
QueryPerformanceCounter Ctr2
Debug.Print (Ctr2 - Ctr1 - Overhead) / Freq
End Sub



So, in this example I've set up a high-resoultion timer, and I call foo() 20 times with Faster() uncommented, note the times, then do the same with Slower(). Faster() is slower, right? It looks slower. On my machine it's actually ~6% faster than Slower(), and Slower() is one of the simplest cases of String concatenation you would come across, a small number of concatenations with a small amount of characters - and they both essentially do the same thing.

I wouldn't always avoid concatenation, that's not quite what I'm getting at, e.g., if your Strings are small, the number of concatenations is small, String concatenation might make sense; after all, we're talking RAD and maintainable code.

Here's two factors to consider:

1) How many times are you concatenating on a String?
2) How large are the Strings & characters in question?

If the answer to either one of these questions is "large", you might want consider an alternative approach to working with Strings. If you're concatenating in a large loop, you really should reconsider what you are doing.

If interested, I'd recommend reading the following post by Microsoft's Kevin Williamson on the implications of concatenation as it pertains to BSTRs and VB6/VBA:

http://groups.google.com/group/microsoft.public.vb.com/msg/a4738fcf6faa185a

Ken Getz's optimization techniques also makes for a great read:

http://msdn.microsoft.com/en-us/library/aa188211(office.10).aspx

While this becomes more important in larger applications, like building rather large Strings, e.g., writing to files, it's a good concept to keep in mind, and said concept tends to show up in my code.

Comments

# re: Optimizing String Concatenation in VBA

Tuesday, June 02, 2009 2:02 AM by fzz

You do realize that your example isn't realistic? If not, replace your statements

strVal(0) = 0:  strVal(1) = 1:  strVal(2) = 2

strVal(3) = 3:  strVal(4) = 4:  strVal(5) = 5

with

strVal(0) = "a"

strVal(1) = "bb"

strVal(2) = "ccc"

strVal(3) = "dddd"

strVal(4) = "eeeee"

strVal(5) = "ffffff"

convert your faster and slower routines to functions, and check the respective function results. Your slower routine at least produces the correct result, "abbcccddddeeeeeffffff". Your faster routine produces the incorrect result "abcdef". It's easy to produce wrong answers quickly. Most people don't even need code to do that.

It takes a fair amount of nonobvious overhead code to handle preallocated buffers that don't eat too much memory. The technique is discussed in Kernighan & Pike, 'The Practice of Programming', chapter 4. In brief, you start off with a moderately large buffer, and when you need to grow the buffer, you double it. You need to keep track of the buffer size and how much of the buffer has been used.

It's fastest to allocate the largest possible buffer up front (all free system memory?), but you waste system resources that way.

# re: Optimizing String Concatenation in VBA

Tuesday, June 02, 2009 11:12 AM by Nate Oliver

That might be true if I was suggesting that my example be used in all cases, and it was just an example, and I think it could be more realistic than you appear to think.

But you did note the following?

"So, often times it makes a lot more sense, performance-wise, to work with Arrays, whether it's String & Byte Arrays, which directly coerce, or Array->String via Join() - with a buffered, 1D Array."

Your example, while trivial code, requires a different approach, so change faster() to this:

Sub Faster(strVal() As String)

Dim tmpStr As String

Let tmpStr = Join(strVal, vbNullString)

End Sub

This essentially does the same thing as slower(), and is on the order of ~100% faster, on my machine.

I'm not quite following this comment:

"It's fastest to allocate the largest possible buffer up front (all free system memory?), but you waste system resources that way."

I get the part about largest possible buffer, as otherwise there'd be overhead to moniter where you're at in the buffer and adjust it, otherwise. But I'm not sure about all free system memory?

I doubt you're going to chew up that much memory building a buffered String in VBA?

Leave a Comment

(required) 
(required) 
(optional)
(required)