May 2009 - Posts
This seems like pretty straight-forward advice, right? E.g., make sure the light is green before you enter the intersection. Generally, follow the standard arithmetic order of operations, unless you're using Excel, which has its own order of operations (documented in the help file). Structure your words in a manner that follow typical English rules, should you be speaking English, etc...
Unless you're Yoda, somehow he can violate all of this, and it seems to work... But, the force is really strong in that one...
Back to topic, there can be issues with Excel 2007 VBA, if you violate its expected order of operations, in terms of unexpected behavior.
Take the following example, which I pretty much recorded, in Excel 2002 (XP). The goal is to create a red, transparent (i.e., the cells contents are showing) oval around D2:F2:
Sub foo()
Dim shp As Shape
Application.ScreenUpdating = False
With Worksheets(1).Range("D2")
Set shp = .Parent.Shapes.AddShape(msoShapeOval, _
.Left, .Top - .RowHeight / 4, _
.Width * 3, .RowHeight * 1.5)
End With
With shp
With .Fill
.Visible = msoFalse
.Transparency = 0#
End With
With .Line
.Weight = 1.25
.ForeColor.SchemeColor = 10
.BackColor.RGB = RGB(255, 255, 255)
End With
End With
Application.ScreenUpdating = True
Set shp = Nothing
End Sub
This works as expected in Excel 2002, it's basically the syntax the Excel 2002 Macro recorder gives you, in that order. I admit, I tweaked the recorded Macro a little, but the order in which the Property toggles occur in is essentially what the recorder gives you, or me...
Try that in Excel 2007. You find yourself with a red-outlined oval in the expected range, except, it's not transparent; it's filled and you can't see the underlying cell contents - which was a significant part of the point, in this case.
After staring at this thing for a while and not quite following the problem (to put it mildly), I asked for an assist. I got a good one, J-Walk followed up and mentioned that my Transparency Property toggle wasn't necessary and turned out to be the problem. He's right, the following works fine in Excel 2007:
Sub foo()
Dim shp As Shape
Application.ScreenUpdating = False
With Worksheets(1).Range("D2")
Set shp = .Parent.Shapes.AddShape(msoShapeOval, _
.Left, .Top - .RowHeight / 4, _
.Width * 3, .RowHeight * 1.5)
End With
With shp
With .Fill
.Visible = msoFalse
'.Transparency = 0#
End With
With .Line
.Weight = 1.25
.ForeColor.SchemeColor = 10
.BackColor.RGB = RGB(255, 255, 255)
End With
End With
Application.ScreenUpdating = True
Set shp = Nothing
End Sub
Note the commented-out line of code - the Transparency Property. However, this is kind of strange, no? That should be a non-op. After playing with it, it appears to be an order of operations issue.
Why wouldn't Fill be able to keep in mind that it has several Properties, like Transparency and Visible, and remember them, irrespective of which order they were set to? The answer to that question? It should - but it doesn't. While the Transparency Property isn't really necessary, here, it can have the non-effect we want it to, if we change the order of operations, i.e., change:
With .Fill
.Visible = msoFalse
.Transparency = 0#
End With
To:
With .Fill
.Transparency = 0#
.Visible = msoFalse
End With
So even though toggling the Transparency Property might not make a lot of sense, here, it turns out to only be an issue if used later vs. sooner - it appears to cancel out the the Visible Property when toggled after. With that in mind, the following works as expected in Excel 2007:
Sub foo()
Dim shp As Shape
Application.ScreenUpdating = False
With Worksheets(1).Range("D2")
Set shp = .Parent.Shapes.AddShape(msoShapeOval, _
.Left, .Top - .RowHeight / 4, _
.Width * 3, .RowHeight * 1.5)
End With
With shp
With .Fill
.Transparency = 0#
.Visible = msoFalse
End With
With .Line
.Weight = 1.25
.ForeColor.SchemeColor = 10
.BackColor.RGB = RGB(255, 255, 255)
End With
End With
Application.ScreenUpdating = True
Set shp = Nothing
End Sub
Which does include the Transparency Property toggle. In my opinion, this is a bug, the order of these particular operations probably shouldn't matter, never have, but they do.
Either way, that's not really the point of the examples. The point is when you review code that doesn't work as expected between one version of Excel to the next, there's two things to consider:
1) Test, test, test
2) Consider the order of operations you're proceeding with
Well, there's quite a few more things to consider than this, but these are two, amongst many.
This might surprise some, but to be honest, I really don't mind people contacting me via email, via this blog. The volume of email I'm likely to receive from this blog pales in comparison to the outrageous amount of inbox activity that Facebook generates... And if you want to say hello, well, I like friendly people.
However, it's not recommended that you send me random Excel questions in an offline format such as this for a few reasons:
1) That's not really the intent of this blog, nor my existence.
2) I'm not always very good about responding to these, for a variety of reasons, e.g., I sleep, sometimes. Maybe I don't know the answer? Well, that's not likely. ;)
What is recommended is that you find an Excel Q&A forum, preferably, for your sake, a free one, and post your question there. I personally recommend the following Excel Q&A forum:
http://www.mrexcel.com/forum/index.php
This particular Q&A forum is free and anonymous, should you choose a random handle. It is specifically designed to serve the very purpose of why you would be contacting me, in this case. It is the point of its existence.
There's a few benefits to approaching your quandary in this manner. These forums function 24/7, there are always experts online and you're likely to get a much, much (with emphasis on much) faster response than hitting my inbox. Who knows what I'm preoccupied with, going Byte Array to Byte Array?
This isn't to say there aren't other high-quality, free Excel Q&A forums, there are. This is just a recommendation.
However, do feel free to say hello, if so inclined.
There was a recent blog entry that listed the top-10 online Excel resources:
http://www.odesk.com/blog/2009/05/excel/
I don't necessarily agree with, nor endorse, this list or the order, etc... But I would like to take a moment to speak to several items that are on the list, and some that aren't on the list.
When you make a list like this, given the vast amount of high-quality Excel resources on the Internet, you're going to get it a little right, and probably a little wrong, as you can see by this thread:
http://www.mrexcel.com/forum/showthread.php?t=388373
Such a list doesn't go without controversy. In this particular blog entry, I'll attempt to avoid dogging anything on said list, and look to augment its contents - it's Friday - I'm in a good mood.
Let's start with MrExcel.com, which is owned and maintained by fellow Microsoft Excel MVP, Bill Jelen. I need to disclaim this a little bit, I've been active there since 2000ish, and am an Administrator for their forums - I'm what sports fans would call a "homer". However, there are some interesting aspects to this site that I would like to highlight.
The first are the 500+, free, Excel articles:
http://www.mrexcel.com/articles.shtml
If you're into podcasts, Bill creates these, too:
http://www.mrexcel.com/podcast.shtml
The third are the web-based forums:
http://www.mrexcel.com/forum/index.php
These are great forums, some of the best Excel-minds on Earth stop by to take part in Q&A, for free. One particular sub-forum, within the forums, I'd like to highlight is the "Hall of Fame" forum:
http://www.mrexcel.com/forum/forumdisplay.php?f=17
As you can see, it's fairly lean: There's only 12 subjects taken from ~2,000,000 forum posts. We take this area of the forums fairly seriously, only really dynamite posts need apply for this sub-forum.
Experts-Exchange is a great site, I participate there from time-to-time, and there are a lot experts there, as implied. This is a little misleading, however:
"Suck it up and register for the site, answer a few questions for people that are more of a novice than you, and save yourself the monthly fee."
It's not just a matter of providing the answer, the Original Poster of said question needs to consider your answer the best, and there could be many. And, of course, that's out of your hands.
Office Online and MSDN can be great resources - if you want to learn how to use Excel from a novice to expert level, these are sites you should be reading. I find the following quote to be interesting:
"Yes, that’s right…Microsoft actually provides some things for free."
Microsoft provides a ton of content, for free, and they actually continually look to improve both their content and delivery mechanisms - it's a serious operation. How do I know this? I volunteered to assist with the Excel Content Partnership Council, which is, for the most part (myself not included), an internal group at Microsoft that's committed to driving great Excel content.
As a Microsoft [Excel] MVP, I have a non-disclosure agreement with Microsoft and can't disclose the details of my involvement - in a nutshell, they put me under flood-lights and beat me up on their content. Just kidding, but Microsoft really does care about their content and is committed to helping their customers with their products. I was told my involvement was helpful, so if you notice their content improving with time - you're welcome. ;)
Jon Peltier's site is mind-boggling excellent when it comes to Excel Charts. I recall spending a few hours creating a fairly complex chart, and when done, I thought "I bet Jon Peltier would be impressed". Then I went to his site and noted he had already fully documented what I had done. I had bet incorrectly - whoops.
Also, the following quote is technically incorrect:
"Currently, there are only 90 people ever granted the title of Excel MVP by Microsoft, so you can see how important of a designation it is."
Currently there are just over 100 Microsoft Excel MVPs, spread throughout the world. The word 'ever' isn't quite right, either, some of us are newer, and there have been several retired Excel MVPs. Still, a small number - I'm currently the only Microsoft Excel MVP residing in the state of Minnesota.
The Excel Product Team blog is a great resource, and something to keep your eye on if you're really interested in Excel. For a fairly obvious reason, you're reading content directly from the people who make Excel, well, Excel.
Good. Now let's talk about some resources that aren't included on the aforementioned list, and some examples of why I like them.
Chip Pearson's site:
http://www.cpearson.com/Excel/MainPage.aspx
Chip, a fellow Excel MVP, really knows his stuff, and his content is outstanding. E.g., if someone asks me about running a scheduled/timed procedure, I send them directly to Chip's site (as a start), to review the OnTime Method:
http://www.cpearson.com/excel/OnTime.aspx
It's such a well written article on the subject, it doesn't make sense for me to reinvent that particular wheel.
Mr. Spreadsheet, John Walkenbach's (J-Walk), site:
http://spreadsheetpage.com/
John's a fellow Excel MVP, and again, the content is outstanding and the variety ranges, quite a bit.
I find Debra Dalgleish's (Excel MVP) site to be excellent as well:
http://contextures.com/index.html
Especially when it comes to Pivot Tables:
http://contextures.com/xlfaqPivot.html
The last resource I will speak to are the Newsgroups, via Google. Some of the content that is available there, and highly searchable, is outstanding. E.g., take Laurent Longre's post on the Evaluate Method:
http://groups.google.com/group/microsoft.public.excel.programming/msg/a6b26610d3b8abc3
Now, at this point in life, for me, this is old-hat. But when I first read that, I thought "whoa". We'll be revisiting the Evaluate Method, by the way. In any event, there's a ton of great content in the Newsgroups.
I've certainly left off a number of other great resources, e.g., UtterAccess.com, XtremeVBTalk.com, Daily Dose of Excel, etc... It's a little difficult to speak to every single great, free resource out there. But, there are great Excel resources out there on the WWW, so if you're interested in learning about Excel, get out there and explore!
Today we'll take a different direction: How well does the person you're talking to actually know Excel?
To be honest, I've never put together my own scale - I simply talk to people and come up with some sort of subjective opinion. But others have, including Aaron Blood, as noted here:
http://www.xl-logic.com/modules.php?name=Content&pa=showpage&pid=1
I really like his list. Should you be in HR and interviewing, or otherwise, the following is absolutely true:
"I do find it somewhat amusing that most intermediate level users are VERY over-confident in their Excel ability. My experience has been that if I ask an intermediate user to rank their Excel skill level on a scale of 1 to 10 the response is typically between 8 and 10 and they often list themselves as an Excel Expert on their resumes."
The reason I know this is true? Allow me to make an example of myself: I actually would have listed myself as an expert in 1998 - and, today, I know about 1/0 what I knew about Excel back then. I hate to explain my own jokes, but that's infinity. Rise over run sort of thing...
As always, I don't fully agree with everything that's being presented to us, e.g.,
"Expert Functions:
A true expert knows the strengths & weaknesses of every single one and has probably had to build some new ones that were not available."
I might be taking this slightly out of context, in terms of specialized functions... But...
That's not exactly true, nor necessary, in my opinion, to be an expert. Excel houses a lot of native functions - it's really mature software. To know every single one of them, inside and out, would actually be kind of insane. E.g., if you don't have a degree in Finance, and aren't working on a Finance-type projects, but still want to work with Excel at a sophisticated level, there really might not be a need for you understand IRR(), or XIRR(), or MIRR().
And I guarantee, as a Finance Major, studying this stuff by hand, you don't want to rebuild IRR(); an iterative, hit-or-miss algorithm.
But, overall the user-scale is pretty good, perhaps really good. In fact, if you are in HR (or serving in that capacity), you're reading this, and you're posting for job applicants with Excel experience; please, do not say anything along the lines of the following:
"Excel Expert: Must be an expert with VLOOKUPS".
That's embarrassing for everyone involved. If I were taking part in said interview, I'd enter it embarrassed. And then it might be your turn.
Would you like said applicant to provide the explanation of the possibilities regarding the sequential or binary search implications associated with the algorithm? If you want an explanation to the column-offset... that's fair enough. But an important note: it's not VLOOKUPS.
In any event, perhaps that helps show where the bar potentially lies.
Picking right up where I left off yesterday, let's talk about some native functionality in Excel, an algorithm, that doesn't suck when it comes to String parsing - Text to Columns. If you recall from yesterday:
http://msmvps.com/blogs/nateoliver/archive/2009/05/26/recommended-free-reading-bruce-mckinney.aspx
One of Bruce McKinney's issues with VB's Split() function is that it can't handle multiple, consecutive delimiters. Text to Columns can, Bruce would be pleased.
It's a little unfair to compare Text to Columns to Split(), as Text to Columns works on Range Objects in a column, while Split() works on single Strings. However, it is my Blog and I'll take such artistic licenses from time-to-time.
Text to Columns is one of the most underrated, overlooked native Excel functionalities, in my opinion. Even just the other day I was involved in a thread where I recommended looking at it, and it's almost as if my post didn't exist?
http://www.mrexcel.com/forum/showthread.php?t=389241
Side-bar rant: This is why points-based forums don't always work. The OP, awarding the points/correct answer, is not 100% guaranteed to choose what is probably the best approach. I don't mean to pick on anyone, here, it's just a recent example of me noting Text to Columns is a good approach to the issue at hand.
So, here's a working example. Take the following:
Foo, Bar
We have two delimiters, a comma and a space. Copy and paste that in A1 through A100000 (or less if you're using Excel pre-2007). Now select Column A, in Excel 2007 select the Data Tab, or earlier, Data menu, click into Text to Columns.
You now have a dialog, let's go with Delimited -> Check Space and Comma as your Delimiters and check 'Treat consecutive delimiters as one'. Now click Finish. Wow, that was fast.
It's also pretty easy to record a Macro to replicate this - should you want automation, e.g.,
Sub foo()
Range("A:A").TextToColumns Destination:=Range("A1"), _
DataType:=xlDelimited, TextQualifier:=xlDoubleQuote, _
ConsecutiveDelimiter:=True, Comma:=True, Space:=True, _
FieldInfo:=Array(Array(1, 1), Array(2, 1))
End Sub
Man, that's fast, let's revisit that. 100,000 parsed Strings, eh? How fast is that? Let's time it, we'll call foo() with the following timer-procedure:
Public Declare Function QueryPerformanceFrequency _
Lib "kernel32.dll" ( _
lpFrequency As Currency) As Long
Public Declare Function QueryPerformanceCounter _
Lib "kernel32.dll" ( _
lpPerformanceCount As Currency) As Long
Sub bar()
Dim Ctr1 As Currency, Ctr2 As Currency, Freq As Currency
Dim Overhead As Currency
QueryPerformanceFrequency Freq
QueryPerformanceCounter Ctr1
QueryPerformanceCounter Ctr2
Overhead = Ctr2 - Ctr1 ' determine API overhead
QueryPerformanceCounter Ctr1 ' time loop
Call foo
QueryPerformanceCounter Ctr2
Debug.Print (Ctr2 - Ctr1 - Overhead) / Freq
End Sub
It runs in roughly 1 second (on my machine) to parse 100,000 Strings (in Range Objects), with multiple, consecutive delimiters. I find that to be really impressive. I'm not sure who wrote this particular algorithm, but dollar earned that day.
There's a brief introduction to Text to Columns, but if you explore the dialog/GUI, you can see there's more options available to us than what I've spoken to, e.g., Fixed-length parsing, the ability to skip parsed columns of return values, marshaling your data to different locations, various formatting options, etc...
Text to Columns won't solve all of life's problems, but if you want to parse a column of data, it certainly is a good place to start.
First off, who is Bruce McKinney? In his own words:
http://brucem.mystarband.net/author.htm
"Bruce McKinney was a factory assembly worker, a logger, a newspaper reporter, a PR flunkie, and an unpublished fiction writer before he discovered computers and lost all interest in the real world. During his years at Microsoft, he wrote programming manuals and online help for assembler, Basic, and C. Bruce also served for a time as czar of sample programs, writing or revising many of the samples known to longtime Microsoft language customers. He was a software developer on Microsoft C 7 and FORTRAN Power Station 1. He left Microsoft in 1997 and is now trying to decide what to do when he grows up."
So, we have a very insightful power user, from Microsoft itself. The main body of work in question is 'Hardcore Visual Basic', which is available online, for free, here:
http://vb.mvps.org/hardcore/
And Bruce provides some updates to this work, here:
http://brucem.mystarband.net/mckinney1.htm
While it's true that VB6 and the language as we know it, on that platform, is at the end of its life-cycle, it still lives on in a certain sense, in VBA. So, if you're interested in VBA outside of Office-specific Object Models, this body of work is still relevant and insightful.
And, even better, it makes for good reading. IMO, Bruce has a great sense of humor and can string together fairly clever rants and strongly worded opinions in addition to technical insight. This isn't to say I agree with everything Bruce has to say, e.g., Bruce's take on the Let Statement:
http://vb.mvps.org/hardcore/html/let.htm
"If you think about it, Let a = 1 is technically accurate. It also lets compilers optimize more efficiently. In other words, this pedantic nonsense makes perfect sense. Fortunately, Microsoft didn’t buy this argument, and neither should you."
He's correct, Let is optional. However, I lke this pedantic nonsense, and tend to use it where applicable in my VBA. It doesn't cost me many keystrokes and it flags intrinsic assignment. So, we agree to disagree.
Bruce also provides us with one of my favorite all-time rants: "Split Sucks":
http://brucem.mystarband.net/mckinney2b.htm#18
"Well, of course you can't do 1 because (unfortunately) users can turn off error trapping. And 2 just doesn't fully express your contempt for those who disregard your instructions. So 3 is the correct answer because it's so deliciously unpredictable.
Enough sarcasm. It's hard to fully express my contempt for the VBScript developer who wrote this crap or for the VB designers who accepted it into their library without checking to see that it meets minimum standards of competence."
I'm not sure if I should find that to be as funny as I do, but I do. Laughing matters aside, Bruce does have a good point. Split() is useful, I use it all the time, but as he's noting, it could, and perhaps should, be more useful and more robust along the lines of consecutive delimeters.
Which leads me directly into my next topic, on String parsing, natively, in Excel... Until then, thanks Bruce!
Good day,
I thought I'd provide a link to an interview that Toby Richards, Ed Blankenship & I took part in, directly preceding the Microsoft MVP Global Summit (in Seattle):
http://www.microsoft.com/presspass/features/2009/mar09/03-02MVP-QA.mspx
The MVP Summit is a tremendous experience - I don't anticipate missing any of them while I'm welcome to attend - which is invite-only to Microsoft MVPs. The experience ranges quite a bit, whether it's meeting with fellow MVPs or Microsoft personnel, fairly technical moments, to really casual gatherings.
I've so far attended two of them, and I believe I generally average about 4 hours of sleep per night.
Happy Memorial Day!
Right - Excel
I'm a fly by the seat of my pants kind of guy, crunch data now, ask questions later - I put the rapid in RAD. It's quite common for me to open a brand new, unsaved Workbook, and simply start to perform whatever task at hand.
It turns out this is not a very good idea in Excel 2007 if you're creating a Pivot Table and your intent is to save the file in an older file format (pre 2007), that being .xls, which Excel refers to as compatibility mode. There's a lack of compatibility.
The first time I did this (and last time), Excel prompted me, at save-time, with the Compatibility Checker, that my Pivot Table won't work. I thought it was joking, turns out it wasn't. You're kind of in no-man's land at this point, you've invested time in creating your Pivot Table, you need compatibility with prior versions, and there's no way to get there.
This is explained, here:
http://blogs.msdn.com/excel/archive/2006/08/30/730796.aspx
"Strategies for sharing PivotTables with other users
As noted above, version 12 PivotTables are not downgraded to version 10
PivotTables and will not be refreshable in previous versions of Excel. If
you wish to share PivotTables with people using a previous version of Excel
AND they have a need to refresh the PivotTables, you will need to ensure
that these PivotTables were created as version 10 PivotTables."
Basically, the inability to save a file with a refreshable Pivot Table that was created in an unsaved Excel Workbook, in Excel 2007, as an .xls file, is understood and by design. Whew - that was a mouthful.
The safe strategy, when it comes to Pivot Table creation, is to always save your Workbook as an .xls file before you even think about creating a Pivot Table. This rule really applies to saving in an .xls file format with Pivot Tables - but personally? I think I'm going to make it a general one, save before you start.
Good day,
Today's thoughts are a tack-on to yesterday's regarding optimizing String concatenation:
http://msmvps.com/blogs/nateoliver/archive/2009/05/23/optimizing-string-concatenation-in-vba.aspx
But with a specific goal in mind: to generate random alpha-numeric Strings of a given length.
There are ways to do this natively in Excel, e.g., from Tushar Mehta's (Microsoft Excel MVP) website:
http://www.tushar-mehta.com/excel/tips/rand_alphanum.html
My entry today will discuss an approach I use in VBA, which is:
Public Function randStr(ByRef strLength As Long) As String
Dim b() As Byte, keyArr() As Byte
Dim i As Long, lngUpper As Long
If Not CBool(strLength) Then Exit Function
Let keyArr = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
Let lngUpper = (UBound(keyArr) + 1) \ 2
ReDim b(1 To strLength * 2)
For i = 1 To UBound(b) Step 2
Let b(i) = keyArr(Int(lngUpper * Rnd + 1) * 2 - 2)
Next
Let randStr = b
End Function
Examples of usage could include:
Sub foo()
Debug.Print randStr(100)
Debug.Print randStr(1)
'etc
End Sub
So, you call the function, randStr() and pass the length of the random String you want back. There's a few benefits to approaching this problem in this manner:
- It's fast - really fast
- It's available in both your VBA project, say a UserForm, or as a Worksheet Function call, as a UDF
- It's really flexible - more so than your Yoga instructor
To these points. First, it took me a little while to come up with this combination. When I first wrote this, I used a Variant Array to house the key, which are the values you want to limit the return Strings to. That was slow, too slow. I could out-pace other approaches, like concatenation or buffered Strings on long return Strings, say 16 characters or so, but I had trouble pacing short String returns. As you can imagine, this didn't sit well with me.
The bottle-neck was in creating the Variant Array, and so it hits me - go Byte Array to Byte Array. Bingo. This is the fastest VB approach that I've seen to this problem thus far, on both short and long return Strings, i.e., this baby's got some chops.
The second point, I hope, is self-evident. The third point is that the function is dynamic based on what you feed into the keyArr. E.g., say you want to remove the upper-cased letters? Do it:
Public Function randStr(ByRef strLength As Long) As String
Dim b() As Byte, keyArr() As Byte
Dim i As Long, lngUpper As Long
If Not CBool(strLength) Then Exit Function
Let keyArr = "0123456789abcdefghijklmnopqrstuvwxyz"
Let lngUpper = (UBound(keyArr) + 1) \ 2
ReDim b(1 To strLength * 2)
For i = 1 To UBound(b) Step 2
Let b(i) = keyArr(Int(lngUpper * Rnd + 1) * 2 - 2)
Next
Let randStr = b
End Function
And again, sample usage as such:
Sub foo()
Debug.Print randStr(100)
Debug.Print randStr(1)
'etc
End Sub
In a nut-shell, the math is doing the work for you, you don't need to mess around with the boundaries of your keyArr (Array), etc... Just feed keyArr whatever you want your possibilities to be, and it will eat it up for you.
Good day,
This probably should have been my first blog entry, as my code structures often look at using buffers when dealing with dynamic Arrays and Strings. Today we will have a brief look at Strings.
When building a String you want to avoid a large number of concatenations, especially in a potentially large, dynamic loop; which I see people doing all the time.
While one does want to write self-documented, maintainable code, and the intent when concatenating in a loop is apparent, the performance degradation can be significant. As noted in the following Microsoft KB, regarding String Concatenation in VB6/VBA:
http://support.microsoft.com/kb/170964
"the length of time increases proportionally to N-squared. Therefore, 1000 iterations will take about 100 times longer than 100 iterations."
We're looking at compounding performance degradation - not good.
The problem, while addressed in the previous KB, is also addressed, here:
http://msdn.microsoft.com/en-us/library/ms810466.aspx
"Slowdown as a result of frequent allocs and reallocs. This is a very common phenomenon when you use scripting languages. The strings are repeatedly allocated, grown with reallocation, and freed up. Don't do this. Try to allocate large strings, if possible, and use the buffer. An alternative is to minimize concatenation operations."
"If you make extensive use of Automation structures, consider factoring out Automation BSTRs from your mainline code, or at least avoid repeated operations on BSTR. (BSTR concatenation leads to excessive reallocs and alloc/free operations.)"
So, often times it makes a lot more sense, performance-wise, to work with Arrays, whether it's String & Byte Arrays, which directly coerce, or Array->String via Join() - with a buffered, 1D Array.
Another efficient way to build a String is to allocate a number of characters to it, building a buffer before you manipulate it, and stack it with the Mid$() function, e.g.,
Public Declare Function QueryPerformanceFrequency _
Lib "kernel32.dll" ( _
lpFrequency As Currency) As Long
Public Declare Function QueryPerformanceCounter _
Lib "kernel32.dll" ( _
lpPerformanceCount As Currency) As Long
Sub Faster(strVal() As String)
Dim tmpStr As String, i As Long
Let tmpStr = Space$(UBound(strVal) + 1)
For i = LBound(strVal) To UBound(strVal)
Mid$(tmpStr, i + 1, 1) = strVal(i)
Next
End Sub
Sub Slower(strVal() As String)
Dim tmpStr As String
Let tmpStr = strVal(0) & strVal(1) & strVal(2) & _
strVal(3) & strVal(4) & strVal(5)
End Sub
Sub foo()
Dim Ctr1 As Currency, Ctr2 As Currency, Freq As Currency
Dim Overhead As Currency
Dim i As Long
Dim strVal(0 To 5) As String
strVal(0) = 0: strVal(1) = 1: strVal(2) = 2
strVal(3) = 3: strVal(4) = 4: strVal(5) = 5
QueryPerformanceFrequency Freq
QueryPerformanceCounter Ctr1
QueryPerformanceCounter Ctr2
Overhead = Ctr2 - Ctr1 ' determine API overhead
QueryPerformanceCounter Ctr1 ' time loop
For i = 1 To 100000
Call Faster(strVal)
'Call Slower(strVal)
Next
QueryPerformanceCounter Ctr2
Debug.Print (Ctr2 - Ctr1 - Overhead) / Freq
End Sub
So, in this example I've set up a high-resoultion timer, and I call foo() 20 times with Faster() uncommented, note the times, then do the same with Slower(). Faster() is slower, right? It looks slower. On my machine it's actually ~6% faster than Slower(), and Slower() is one of the simplest cases of String concatenation you would come across, a small number of concatenations with a small amount of characters - and they both essentially do the same thing.
I wouldn't always avoid concatenation, that's not quite what I'm getting at, e.g., if your Strings are small, the number of concatenations is small, String concatenation might make sense; after all, we're talking RAD and maintainable code.
Here's two factors to consider:
1) How many times are you concatenating on a String?
2) How large are the Strings & characters in question?
If the answer to either one of these questions is "large", you might want consider an alternative approach to working with Strings. If you're concatenating in a large loop, you really should reconsider what you are doing.
If interested, I'd recommend reading the following post by Microsoft's Kevin Williamson on the implications of concatenation as it pertains to BSTRs and VB6/VBA:
http://groups.google.com/group/microsoft.public.vb.com/msg/a4738fcf6faa185a
Ken Getz's optimization techniques also makes for a great read:
http://msdn.microsoft.com/en-us/library/aa188211(office.10).aspx
While this becomes more important in larger applications, like building rather large Strings, e.g., writing to files, it's a good concept to keep in mind, and said concept tends to show up in my code.
Good day,
So we probably agree that Excel and/or the VBA IDE aren't the ideal locations to do your word processing, however, it's done there, either way. Both Excel and VBA (the latter being based on VB6) provide us with some tools to help, in terms of lower-casing, upper-casing, and proper-casing (every first letter is capitalized). But what about sentence-casing?
There's no solid native way, that I know of, to do this in Excel, so, we're going to roll our own, naturally. I'm going to post two algorithms I have written in the past to look at this, and keep in mind the intent, to deal with multi-sentence capitalization issues, deal with the varying forms of "I", whether it's I'm, it's, etc... What I don't attempt to speak to is proper nouns, that's beyond what my scope was.
Here is my first take on what I call sCase():
Private Declare Function IsCharLower Lib "user32" Alias "IsCharLowerA" ( _
ByVal cChar As Byte) As Long
Private Declare Function CharUpperBuffB Lib "user32" Alias "CharUpperBuffA" ( _
lpsz As Byte, _
ByVal cchLength As Long) As Long
Public Function sCase(ByRef strIn As String) As String
Dim bArr() As Byte, I As Long, i2 As Long
If strIn = vbNullString Then Exit Function
Let bArr = StrConv(strIn, vbFromUnicode)
CharUpperBuffB bArr(0), 1
For I = 1 To UBound(bArr)
Select Case bArr(I)
Case 105
If Not I = UBound(bArr) Then
Select Case bArr(I + 1)
Case 32, 33, 39, 44, 46, 58, 59, 63, 148, 160
If bArr(I - 1) = 32 Then _
CharUpperBuffB bArr(I), 1
End Select
ElseIf bArr(I - 1) = 32 Or bArr(I - 1) = 160 Then _
CharUpperBuffB bArr(I), 1
End If
Case 33, 46, 58, 63
For i2 = I + 1 To UBound(bArr)
If IsCharLower(bArr(i2)) Then
CharUpperBuffB bArr(i2), 1
I = i2: Exit For
End If
If bArr(i2) <> 32 And bArr(i2) <> 33 And bArr(i2) <> 46 _
And bArr(i2) <> 63 And bArr(i2) <> 160 Then
I = i2: Exit For
End If
Next
End Select
Next
sCase = StrConv(bArr, vbUnicode)
End Function
And here is my 2nd:
Public Function sCase(ByRef strIn As String) As String
Dim bArr() As Byte, i As Long, i2 As Long
If strIn = vbNullString Then Exit Function
Let bArr = strIn
Select Case bArr(0)
Case 97 To 122
bArr(0) = bArr(0) - 32
End Select
For i = 2 To UBound(bArr) Step 2
Select Case bArr(i)
Case 105
If Not i = UBound(bArr) - 1 Then
Select Case bArr(i + 2)
Case 32, 33, 39, 44, 46, 58, 59, 63, 148, 160
If bArr(i - 2) = 32 Then _
bArr(i) = bArr(i) - 32
End Select
ElseIf bArr(i - 2) = 32 Then _
bArr(i) = bArr(i) - 32
End If
Case 33, 46, 58, 63
For i2 = i + 2 To UBound(bArr) Step 2
Select Case bArr(i2)
Case 97 To 122
bArr(i2) = bArr(i2) - 32
i = i2: Exit For
End Select
Select Case bArr(i2)
Case 32, 33, 46, 63, 160
Case Else
i = i2: Exit For
End Select
Next
End Select
Next
sCase = bArr
End Function
Why would I have two versions? Good question. The 2nd version is a lot faster, perhaps not to the eye, but %-wise. But it has an issue, it's very insensitive to international languages, I suspect it's safe for most US-based text strings. The first UDF, which requires the API calls (2nd does not), attempts to be a little more sensitive to international languages.
Note that my understanding of both English and various international languages is limited in scope, e.g., I tested the first UDF with limited amounts of French, etc... But these might not be perfect - think of them as a start, or approach.
Right, so how do you call these? Here are examples:
Sub testTime()
Debug.Print sCase(LCase$("À LA CARTE")) 'Something
Debug.Print LCase$("À LA CARTE") 'Something
Debug.Print sCase("À LA CARTE") 'Nothing
End Sub
And:
Sub testTime()
Debug.Print sCase(LCase$("hello? erm, i'M only testing, eh. indeed, " & _
"i am inquisitve."))
Debug.Print sCase(LCase$("how old?! 22 Years."))
Debug.Print sCase(LCase$("how old?! twenty-two Years."))
Debug.Print sCase(LCase$("hmmmm.... wOrking?!?! sam i am. yes-no? " & _
"isn't i'm isn't."))
Debug.Print sCase(LCase$("THE DAY WAS SUNNY AND I WORE A HAT.PETER WAS THERE "))
Debug.Print sCase(LCase$("no WorRies, i'm ONLY testIng! yes-no?"))
Debug.Print sCase(LCase$("mY fRiend & i"))
Debug.Print sCase(LCase$("iiiiiiiiiiiiii"))
Debug.Print sCase(LCase$(" T. toast %T i @"))
Debug.Print sCase(LCase$("re: sentences."))
Debug.Print sCase("hello? thought i'd test this for McDonald. NOTHING.")
End Sub
You can also call these in the Workbook, should they be available there, the way you'd call any normal UDF, e.g.,
=scase(A1)
Note my use of LCase(). I tried to bake in some flexibility, in terms of what your data might look like, e.g., It's generally intended for you to lower-case your String, before calling sCase(), but there might be scenarios where your proper nouns are in good shape, and you don't want to do that.
You might also note that I'm using a linear search against a Byte Array, and that this might appear to be slow. Give it a shot, time-wise, before fully coming to that conclusion. I've seen alternative approaches to this problem with Regular Expressions (RegExp) that can't quite reach the performance levels of these algorithms, nor are they as effective. Just my experience.
If you want to dig into Byte Arrays, I'd recommend starting with Bruce McKinney's thoughts:
http://vb.mvps.org/hardcore/html/whatisunicode.htm
http://brucem.mystarband.net/mckinney2b.htm#12
Some of my work, here, was assisted with these resources.
If you're interested in previewing Office 2010, aka 14, see the following blog-post from the Excel team:
http://blogs.msdn.com/excel/archive/2009/05/12/microsoft-office-2010-technical-preview.aspx
It turns out you actually register to be invited, which are limited in number, for said technical preview, here:
www.office2010themovie.com
Welcome to my new blog - a first for me. My name is Nate Oliver, I've been a Microsoft Excel MVP since 2007 and I have things to tell you.
This blog probably isn't for everyone, I pretty much plan on keeping to Excel-related content, otherwise I'll mark said content as "off topic". The content of this blog probably will vary quite a bit, ranging from [perhaps] elementary Excel tasks to what I hope you consider to be obscene or arcane.
Please feel free to comment as you see fit, especially if you have follow up questions or content requests. I should be able to get a variety of content going fairly quickly, as I've been very active in Excel-related communities since 2000ish. Some of my early content will probably be more on the arcane-side, but more is to come.