yesterday I posted a bit of code that was checkign to see if a string was whitespace only.  Let's assume the assumption I made was correct, that checkign the middle of the string then spreading out in each direction is the best way of doing that.

So the code I worte yesterday had some branching "issues".  As I wrote it, I really felt I was missing something obvious... you know that, "this just doesn't feel right" kind of feeling.  To do the branching I used a boolean flag, hasNonWhitespace.  The only alternative I could think of was to use a GoTo statement.  So here's the first cut, and also a second version using GoTo.   Is GoTo evil ?  Can you optimize this code some other way ????

Public Function IsWhitespaceOnly(ByVal stringToCheck As String, ByVal ParamArray whitespaceChars() As Char) As Boolean
Dim length As Int32 = stringToCheck.Length
Dim middle As Int32 = length \ 2
Dim first As Int32 = middle
Dim second As Int32 = middle + 1
Dim chr As Char

If whitespaceChars Is Nothing OrElse whitespaceChars.Length = 0 Then
whitespaceChars = New Char() {" "c, ChrW(&H3000)}
End If

Dim whitespaceUbound As Int32 = whitespaceChars.Length - 1
Dim hasNonWhitespace As Boolean = False

For i As Int32 = 0 To middle

If first >= 0 Then
hasNonWhitespace = True
chr = stringToCheck.Chars(first)
For j As Int32 = 0 To whitespaceUbound
If chr = whitespaceChars(j) Then
hasNonWhitespace = False
Exit For
End If
Next
first -= 1
End If

If hasNonWhitespace Then Return False

If second < length Then
hasNonWhitespace = True
chr = stringToCheck.Chars(second)
For j As Int32 = 0 To whitespaceUbound
If chr = whitespaceChars(j) Then
hasNonWhitespace = False
Exit For
End If
Next
second += 1
End If

If hasNonWhitespace Then Return False

Next

Return Not hasNonWhitespace

End Function

' using GoTo

Public Function IsWhitespaceOnly(ByVal stringToCheck As String, ByVal ParamArray whitespaceChars() As Char) As Boolean

Dim length As Int32 = stringToCheck.Length
Dim middle As Int32 = length \ 2
Dim first As Int32 = middle
Dim second As Int32 = middle + 1

Dim chr As Char

If whitespaceChars Is Nothing OrElse whitespaceChars.Length = 0 Then
whitespaceChars = New Char() {" "c, ChrW(&H3000)}
End If

Dim whitespaceUbound As Int32 = whitespaceChars.Length - 1

For i As Int32 = 0 To middle

If first >= 0 Then
chr = stringToCheck.Chars(first)
For j As Int32 = 0 To whitespaceUbound
If chr = whitespaceChars(j) Then
first -= 1
GoTo secondpart
End If
Next
Return False
End If

secondpart:
If second < length Then

chr = stringToCheck.Chars(second)
For j As Int32 = 0 To whitespaceUbound
If chr = whitespaceChars(j) Then
second += 1
GoTo nextpart
End If
Next
Return False
End If

nextpart:
Next

Return True

End Function

Filed under:

#### #re: Can you optimize this code ? (Is GoTo evil ?)

Monday, June 06, 2005 8:35 PM by bill

I don't see GoTo's as evil, so long as you have a damn good reason for using them.

And I've tried. Really tried. Really hard.

I can't come up with anything that runs faster.

I'm still trying tho

#### #re: Can you optimize this code ? (Is GoTo evil ?)

Monday, June 06, 2005 8:50 PM by bill

Hey Geoff,

I think you've hit the nail on the head ... it's not that they are evil, more it's the way that we use them that can be. In complicated code, or large methods I think GoTo should be avoided. With complicated code they can actually make it hard to restructure, and can easily lend themselves to spaghetti style code. In large methods, they tend to be used instead of breaking the method out into smaller methods. But in this kind of code where the method is pretty simple, unlikely to need major changes so I am thinking that maybe goto is not so bad in this case. What I found interesting though was my instinct was to avoid using GoTo, rather use a flag, or look at even using While True blocks etc.

BTW: I'm glad you haven't been able to optimize it Makes me feel happier about that choice.

#### #re: Can you optimize this code ? (Is GoTo evil ?)

Tuesday, June 07, 2005 11:15 AM by bill

I'm going to explain my thoughts first, then show the code since I wrote it without really knowing VB.NET and without a compiler:

To me the first problem with your function is the duplication of code where you check if the first character is whitespace and where you check if the second character is whitespace. If you pull that out into its own function that simplies the code and removes the need for the goto:

Public Function IsWhitespaceCharacter(ByVal charToCheck As Char, ByVal ParamArray whitespaceChars() As Char) As Boolean
Dim whitespaceUbound As Int32 = whitespaceChars.Length - 1
For j As Int32 = 0 To whitespaceUbound
If charToCheck = whitespaceChars(j) Then
Return True
End If
Next
Return False
End Function

Public Function IsWhitespaceOnly(ByVal stringToCheck As String, ByVal ParamArray whitespaceChars() As Char) As Boolean
Dim length As Int32 = stringToCheck.Length
Dim middle As Int32 = length \ 2
Dim first As Int32 = middle
Dim second As Int32 = middle + 1

If whitespaceChars Is Nothing OrElse whitespaceChars.Length = 0 Then
whitespaceChars = New Char() {" "c, ChrW(&H3000)}
End If

For i As Int32 = 0 To middle

If first >= 0 Then
If Not IsWhitespaceCharacter(stringToCheck.Chars(first), whitespaceChars) Return False;
first -= 1
End If

If second < length Then
If Not IsWhitespaceCharacter(stringToCheck.Chars(second), whitespaceChars) Return False;
second += 1
End If

Next

Return True ' I believe a empty string is considered whitespace only, correct as necessary.

End Function

#### #Re: Can you optimize this code ? (Is GoTo evil ?)

Tuesday, June 07, 2005 12:28 PM by bill

Hey Matthew,

The code you posted looks pretty good to me (I haven't tested it). The problems I see with it though are increased call stack usage, and the ubbound for the character array loop is recalculated each time the IsWhitespaceCharacter function is called. So my guess is it would result in a performance penalty.

#### #re: Can you optimize this code ? (Is GoTo evil ?)

Tuesday, June 07, 2005 12:44 PM by bill

The ubound recalculated bothered me too -- I almost passed it in as an extra parameter. But I thought their might be some use for the IsWhitespaceCharacter as a standalone function, so I didn't. If you knew it was internal only, then I'd add it as a parameter at the small cost of increased coupling between the two.

The call stack usage doesn't bother me too much -- mainly because I think there is a decent chance it would get inlined. I'd love if someone who has a VB.NET compiler could do a quick perf analysis of the two.

#### #Re: Can you optimize this code ? (Is GoTo evil ?)

Tuesday, June 07, 2005 1:19 PM by bill

I can't right now, but I'll run some tests later tonight and post back here. It sounds like Geoff may have run some tests... I actually haven't run any yet.

BTW: I actually think my base assumption of starting searching in the middle is probably not a good assumption. I tend to find strings generally have leading whitespace rather than trailing whitespace, but it would depend on usage. Take for exampel reading from a text file line by line, if the lines actually have the carriage returns and or line feeds on the and of them, and they are considered whitespace, the optimal may be searching from "near" the end but not at the end.

#### #re: Can you optimize this code ? (Is GoTo evil ?)

Tuesday, June 07, 2005 3:45 PM by bill

Bill, no, i think your right in starting from the middle.

Remember, the way the mothod is structured is to find one single non-whitespace char ASAP, and bail out immediately if it does. If whitespace mostly appears on the left or right, it doesn't matter. most of the time (and we can only talk probabilities here) a non-whitespace char is highly likely to be somewhere near the middle of the string, or not too far away from there.

As for tests, yeah, I have been. I'll run through Matthew code when I get a chance.

#### #Re: Can you optimize this code ? (Is GoTo evil ?)

Tuesday, June 07, 2005 4:11 PM by bill

Hey Geoff,

I think you're wrong about me being right. It's probably not a bad assumption. I think when I look at strings, there's a reasonable probability they will start with whitespace. Chances of them having more than one consequtive whitespace char in the middle is low, but only if the ratio of test to whitespace is generally large. One example that comes to mind is say parsing an indented XML file. LEt's assume they use "space" or similar ,not tabs. There the likelihood of the ratio of whitespace to non-whitespace on any given line is likely to be > 1:1,( depending on the depth of the object model and the type of xml output used) in many cases. So for those lines, the ideal spot to start would be near the end. Definetly not at the start, not so the middle, and maybe not the end. We could optimise this better is we knew the whitespace char list was optimised accordingly. That is if it checks for cr's and lf' early in the list, then it's only going to be one or two comparisons for the ending chars, rather than the 20 ro so comparisons it can take ot declare a character as not being whitespace. Obviously the relative efficeincy depends a lot on the whitespace shar list though, whether cr and lf are considered to be whitespace, whether the whitespace lsit is large etc. But If I take the two standard lsits, VB's 2 char one, and String's 21 char one, then I think searching from the end in will genrally be the most efficient, as we can reduce the complexity of your code, and with the VB char set, as cr or lf is considered non whitespace, and with the string char set the cr and lf are pretty early on in the list, so the cost of those likely couple of comparisons shoudl be trivial compared to the cost of settign up to say work from the middle out.

Okay, maybe, maybe not... it's a theory anyway

#### #re: Can you optimize this code ? (Is GoTo evil ?)

Tuesday, June 07, 2005 8:21 PM by bill

binary search perhaps?

#### #re: Can you optimize this code ? (Is GoTo evil ?)

Tuesday, June 07, 2005 9:58 PM by bill

Here's an interesting one for you. based on a few sample strings of random gumpf, this one seemd to deal better with longer strings...

Protected Overrides Function IsWhitespaceOnly(ByVal stringToCheck As String, ByVal ParamArray whitespaceChars() As Char) As Boolean
Dim length As Int32 = stringToCheck.Length
Dim middle As Int32 = length \ 2
Dim quarter As Int32 = middle \ 2
Dim first As Int32 = middle
Dim second As Int32 = middle + 1

Dim chr As Char

If whitespaceChars Is Nothing OrElse whitespaceChars.Length = 0 Then
whitespaceChars = New Char() {" "c, ChrW(&H3000)}
End If

Dim whitespaceUbound As Int32 = whitespaceChars.Length - 1

'check first char
chr = stringToCheck.Chars(0)
For j As Int32 = 0 To whitespaceUbound
If chr = whitespaceChars(j) Then
GoTo firstpart
End If
Next
Return False
firstpart:
If length = 1 Then Return True

'check last char
chr = stringToCheck.Chars(length - 1)
For j As Int32 = 0 To whitespaceUbound
If chr = whitespaceChars(j) Then
GoTo secondpart
End If
Next
Return False
secondpart:
If length = 2 Then Return True

'check middle char
chr = stringToCheck.Chars(middle)
For j As Int32 = 0 To whitespaceUbound
If chr = whitespaceChars(j) Then
GoTo thirdpart
End If
Next
Return False
thirdpart:
If length = 3 Then Return True

'check first quarter char
chr = stringToCheck.Chars(quarter)
For j As Int32 = 0 To whitespaceUbound
If chr = whitespaceChars(j) Then
GoTo sixthpart
End If
Next
Return False
sixthpart:

'check third quarter char
chr = stringToCheck.Chars(middle + quarter)
For j As Int32 = 0 To whitespaceUbound
If chr = whitespaceChars(j) Then
GoTo seventhpart
End If
Next
Return False
seventhpart:

'check first quarter range backwards
For i As Int32 = quarter - 1 To 1 Step -1
chr = stringToCheck.Chars(i)
For j As Int32 = 0 To whitespaceUbound
If chr = whitespaceChars(j) Then
GoTo eightpart
End If
Next
Return False
eightpart:
Next

'check last quarter range forwards
For i As Int32 = middle + quarter + 1 To length - 2
chr = stringToCheck.Chars(i)
For j As Int32 = 0 To whitespaceUbound
If chr = whitespaceChars(j) Then
GoTo ninepart
End If
Next
Return False
ninepart:
Next

'check second quarter range forwards
For i As Int32 = quarter + 1 To middle - 1
chr = stringToCheck.Chars(i)
For j As Int32 = 0 To whitespaceUbound
If chr = whitespaceChars(j) Then
GoTo fourthpart
End If
Next
Return False
fourthpart:
Next

'check third quarter range backwards
For i As Int32 = middle + quarter - 1 To middle + 1 Step -1
chr = stringToCheck.Chars(i)
For j As Int32 = 0 To whitespaceUbound
If chr = whitespaceChars(j) Then
GoTo fifthpart
End If
Next
Return False
fifthpart:
Next

Return True

End Function

#### #String Performance Part 2

Wednesday, June 08, 2005 7:04 AM by TrackBack

I thought I'd talk a little bit more about a post from a few days ago, String Performance Over Different...

#### #Re: Can you optimize this code ? (Is GoTo evil ?)

Wednesday, June 08, 2005 11:48 PM by bill

Geoff .... uhm... I'm speechless

#### #re: Can you optimize this code ? (Is GoTo evil ?)

Thursday, June 09, 2005 7:44 AM by bill

Char.IsWhitespace(myChar) ...

This uses a Select Case statement. Returns true if the character is
ChrW(9)
ChrW(10)
ChrW(11)
ChrW(12)
ChrW(13)
" "

Or if the unicode category is 11,12,13 ..

Probably not the most efficient, but it's in there.

#### #re: Can you optimize this code ? (Is GoTo evil ?)

Thursday, June 09, 2005 10:30 PM by bill

Speechless huh?

So tell me...NOW are gotos evil? *grin* I think all it really proves is that you have to be willing to wear the penalties of anything other than walking the entire string from one end to the other. Statisitical analysis (which is what i was trying to emulate while not really doing it) might allow us to find in advance more _likely_ locations of non-whitespace chars, but it is it really worth the penalty of either slower or harder to maintain code?

#### #re: Can you optimize this code ? (Is GoTo evil ?)

Friday, June 10, 2005 12:48 AM by bill

hey Jacob,

Actually that is probalby very efficient. I think I might switch to that not only becuase of that, but also it adds some consistency. Although one has to ask why Vb's Trim, and String.Trim don't use that ?? I wonder if they did some benchmarks on calling CharacterInfo.IsWhitespace(char) or not ? Then again maybe it's just because String.Trim is designed to do other things. I think that may be it. I haven't tested, but I think the CharacterInfo.Iswhitespace(char) is probalby the better choice... well it's my bet at this moment anyway

#### #re: Can you optimize this code ? (Is GoTo evil ?)

Friday, June 10, 2005 1:03 AM by bill

Hey Geoff,

uhm... speechless.... totally ...

<slap><slap> Ah, no I'm better.... yes, you have convinced me, GoTo is evil. give it an inch and someone will take it for a mile <bg>

i'm going for the easier to maintain code. I know the following is different code, but simple is good, you've convinced me. (well unless you tell me the perf suxs that is)

Public Function IsWhitespace(ByVal stringToCheck As String) As Boolean
IsWhitespace = True
Dim i As Int32

If stringToCheck Is Nothing OrElse stringToCheck.Length = 0 Then
Return True
End If

i = stringToCheck.Length
While IsWhitespace = True And i > 0
i -= 1
IsWhitespace = Char.IsWhiteSpace(stringToCheck.Chars(i))
End While

Return IsWhitespace

End Function

#### #re: Can you optimize this code ? (Is GoTo evil ?)

Friday, June 10, 2005 6:37 AM by bill

With respect to the statistical analysis as to where one should start the inquiry, if you're dealing with a random string, then the odds are highly in your favor that any character you start with will be non-whitespace. Though, no character has any more likelihood than any other.

However, if you have some context about the string and its contents (i.e. it's not random) then some analysis could be performed to increase the likelihood. Though, I'd be willing to bet that the first or last character would be the most fruitful.

#### #re: Can you optimize this code ? (Is GoTo evil ?)

Friday, June 10, 2005 6:45 AM by bill

Hey Bill, I haven't done any testing with this, but in my head this is a little more efficient:

Public Function IsWhitespace(ByVal stringToCheck As String) As Boolean
IsWhitespace = True
Dim i As Int32

If Not stringToCheck is Nothing Then

i = stringToCheck.Length -1
Do

IsWhitespace = Char.IsWhiteSpace(stringToCheck.Chars(i))
i-=1

Loop Until Not IsWhitespace OrElse i < 0

End If

Return IsWhitespace
End Function

#### #Re: Can you optimize this code ? (Is GoTo evil ?)

Friday, June 10, 2005 11:54 AM by bill

Hey Jacob,

On the code, you need to test for a zero length string, or set up your loop such that it isn't entered in that case. Looking at mine, I could drop the explicit test for a zero length string and it should work okay, which would be about the same in code. I would do an early exit rather than nest the hwile loop... I just prefer them.... I doubt there is any performance difference between the two on that count.

The statistical analysis of strings, I think given the predonimance of "code", html and xml, leading whitespace is very common. Given the parsing on a line by line basis is also common, then carriage returns and line feeds can be expected to be very common at the end of a string. Ideally we want to find a non whitespace character as soon as possible, so looking at the ends is actually the worst place to start. My guess is, that two or three characters in from the end is the most likely place, BUT considering we have to test all the characters if we find the string is purely whitespace, then startign from other than the ends adds extra complexity.

anyway, if it's raining on Sunday and I get bored on Sunday, what I'll do is recursively parse all plain text files in( my documents or something like that) and cache all the strings on a line by line basis, and then test some code with them

#### #Re: Can you optimize this code ? (Is GoTo evil ?)

Friday, June 10, 2005 11:58 AM by bill

oh btw: did you notice I used And, not AndAlso. Given that during the loop you do have to check both (only on exit of the loop can you short circuit), And is actually more efficient than the conditional branching AndAlso

#### #re: Can you optimize this code ? (Is GoTo evil ?)

Saturday, June 11, 2005 12:50 AM by bill

Yup, my fault on the zero length string. I saw that it could be taken out of yours, so I left it out of mine.

re: Stats
Well, what you're talking about is the context of that which you are parsing. Parsing code/html/xml makes sense on that front to start in the middle and work your way out to the ends. Now, assume that you have some generic text file, you have no knowledge whatsoever of its contents. You randomly take a line from that file and you randomly take a contiguous sequence of characters from that line. Any character in that sequence is as likely to be (non)whitespace as any other. That's the point I was trying to make.

Oh, and that's an interesting note on the short-circuit. I'll have to keep that in mind. I have reams of code that looks like:

Do Until blnFound OrElse i = Count
..
Loop

What's up with your human proof, btw? It never works the first time for me.

#### #Re: Can you optimize this code ? (Is GoTo evil ?)

Sunday, June 12, 2005 2:44 PM by bill

Hey Jacob,

At least on developer's machines, "text" is more likely to have leading whitespace. As office moves to XML this is going to spread. For use input, leading whitespace is rere (usually only a sinlge space), but trailing whitespace can be common as it is visually not as obvious. Even then the usual I see is people cut and pasting and getitng on extra trailing whitespace.

So i get what you mean about it any character can pobbily be whitespace or not, but I think there is a definetly a probability that the start is less likely than the end for non whitespace. It's like the alphabet.. although any letter could be any legal char, we usually find a higher percentage of a and e, s and t compared to q or z etc. So if you had to test for each different letter, you'd be better to test for say s and t earlier on, than alphabetically.

Re: short circuited versus non, the simple rule I use is if the second expression is not dependant on the first to be legal (eg testing for null first) then bitwise operators can be more performant than branching ones. The toehr factors are the cost of evaluating the second expression. But if you expect that to be evaluated the majority of the time anyway then that cost is the same for both.

As ot the CAPTCHA, yeh don't even try first time. Apparently it has a rediculous short time out. (sorry, but not my doing)