Detecting null, empty or whitespace strings ..

Posted Sun, Jun 5 2005 15:24 by bill

Last week Geoff and I were chatting and Geoff threw this bit of code past me:

If (someString & "").Trim().Length > 0 Then

And so the discussion began There's lots of interesting quirks about dealing with Strings, not just from a VB perspective. I think I'll have to consider writing up a lot of information about Strings, but for today I want to focus purely on this particular almost trivial piece of code smile

So that piece of code effectively does three things. It checks for a null string, an empty string, and it checks for whitespace.

Checking for a null string.... Usually you check of a null string by just checking if the string Is Nothing, eg, If someString Is Nothing Then. This particular code does this via String.Concat. The someString & "" equates to a call to String.Concat, and String.Concat will return an empty string, "", if both arguments are Nothing, or the other operand if one operand is Nothing. So in the case where someString is actually Nothing the String.Concat works well, and is very cheap call. However, in the case where someString is not nothing, the call has some more expense and actually allocates another string. The cost of which can be high depending on the size of someString.

Checking for an empty string.... You can check for an empty string a variety of ways. You cannot rely on doing a reference comparison, eg: If someString Is String.Empty Then... Although that may work in some cases it will fail in in the case where someString is not interned. So that leaves us with doing either equality or length tests. So we could just do a If someString.Length = 0 Then, or a If someString = "" Then etc. In VB.NET we can also do a If someString = Nothing Then which combines checking for a null string and an empty string. This basically equates to a call to String.CompareOrdinal(someString , "") which is the method you want to be calling. Now in .NET 1.0 and 1.1, VB checks both arguments before calling CompareOrdinal, replacing null strings with String.Empty before making the call. In 2.0, this has been somewhat more optimised and does an early return for the cases where either operand is Nothing or String.Empty. (note in the current beta is is minuscule more performant to call If Nothing = someString Then. I have asked the VB team to swap that around such that If someString = Nothing is the more performant... Minor code change only)

Checking for whitespace... This is potentially an expensive exercise especially if we are only looking for a whitespace only string. The Trim method of String should be avoided in this particular scenario as we do not need to trim from both ends, rather just one end only, after all we are only looking for a whitespace only string. So TrimStart or using Vb's LTrim method would be the better approach. The question is which one to use and why ?

Vb's LTrim method will trim only " " and ChrW(&H3000) (which is referred to as the Ideographic space character)

The TrimStart method of a string on the other hand takes a param array argument supplying the characters to trim for. If omitted it checks for about 21 different characters..

String.WhitespaceChars = New Char() { ChrW(9), ChrW(10), ChrW(11), ChrW(12), ChrW(13), " "c, ChrW(160), ChrW(8192), ChrW(8193), ChrW(8194), ChrW(8195), ChrW(8196), ChrW(8197), ChrW(8198), ChrW(8199), ChrW(8200), ChrW(8201), ChrW(8202), ChrW(8203), ChrW(12288), ChrW(65279) }

That can be an expensive process as it literally loops through the characters in the string and compares it with each of the whitespaceChars. So for a non whitespace string at a minimum, at least 21 comparisons are made.

So if you are happy with VB's LTrim, checking for just "" and ChrW(&H3000), then you should use that rather than TrimStart without supplying the characters to check for. Now LTrim also includes some nice VB goodness. When you call LTrim(someString), it actually checks for null string and string.empty for you !! Ah ha you might be (or should be saying). And there in lies the answer...

If VB.LTrim(someString).Length > 0 Then

Is probably the most performant way of writing the code or at least the simplest. However, in the case where someString is large, and the first n chars are whitespace this can still be an expensive operation as a new potentially large string will be allocated. You can't avoid needing to iterate through the characters to check for whitespace, but you could write a method that just returns a Boolean instead of creating the trimmed string. If you were to write such a method, you'd probably want to get smart about detecting a whitespace only string, and rather than iterate from each end of the string, iterate from the middle out. As soon as we hit once char that is not whitespace we can return, so we want to optimise our chances of getting a whitespace early in the process.

So if you really really wanted you could end up writing a utility method such as :

Public Function IsEmptyNullOrWhitespaceOnly(ByVal stringToCheck As String, ByVal ParamArray whitespaceChars() As Char) As Boolean
   If stringToCheck Is Nothing OrElse stringToCheck.Length = 0 Then Return True
   Return IsWhitespaceOnly(stringToCheck, whitespaceChars)
End Function

 

Public Function IsWhitespaceOnly(ByVal stringToCheck As String, ByVal ParamArray whitespaceChars() As Char) As Boolean
   Dim length As Int32 = stringToCheck.Length
   Dim middle As Int32 = length \ 2
   Dim first As Int32 = middle
   Dim second As Int32 = middle + 1
   Dim chr As Char

   If whitespaceChars Is Nothing OrElse whitespaceChars.Length = 0 Then
      whitespaceChars = New Char() {" "c, ChrW(&H3000)}
   End If

   Dim whitespaceUbound As Int32 = whitespaceChars.Length - 1
   Dim hasNonWhitespace As Boolean = False

   For i As Int32 = 0 To middle

      If first >= 0 Then
         hasNonWhitespace = True        
         chr = stringToCheck.Chars(first)
         For j As Int32 = 0 To whitespaceUbound
            If chr = whitespaceChars(j) Then
               hasNonWhitespace = False
               Exit For
            End If
         Next
         first -= 1
      End If

      If hasNonWhitespace Then Return False

      If second < length Then
         hasNonWhitespace = True
         chr = stringToCheck.Chars(second)
         For j As Int32 = 0 To whitespaceUbound
            If chr = whitespaceChars(j) Then
               hasNonWhitespace = False
               Exit For
            End If
         Next
         second += 1
      End If

      If hasNonWhitespace Then Return False

   Next

   Return Not hasNonWhitespace

End Function

 

 

However at this point I think this is overkill especially for the simple cases. And there's a couple of problems faced in deciding which Char set to use. Unfortunately System.String doesn't public expose the Whitespace char array (I think it should), not does VB expose it's whitespace char array, although that one is easy to replace as it's only 2 chars.

In .NET 2.0, the String class also has an IsNullOrEmpty method, so that in fact addresses part of the problem, but I think that was added for those poor C# people who don't have a language utility library nor does their language support writing If someString = Nothing Then What would be useful though is if they had a IsNullEmptyOrWhitespace method. Really programmers should NOT have to worry themselves about this kind of stuff, it should just work, each time, predictably, and in the most performant way... That is what "frameworks" are for. Oh, well, maybe .NET 3.0 ???

 

Filed under: ,

Comments

# String Performance Part 2

Wednesday, June 08, 2005 7:04 AM by TrackBack

I thought I'd talk a little bit more about a post from a few days ago, String Performance Over Different...