Let's talk sentence-casing in Excel
Posted
Fri, May 22 2009 16:28
by
Nate Oliver
Good day,
So we probably agree that Excel and/or the VBA IDE aren't the ideal locations to do your word processing, however, it's done there, either way. Both Excel and VBA (the latter being based on VB6) provide us with some tools to help, in terms of lower-casing, upper-casing, and proper-casing (every first letter is capitalized). But what about sentence-casing?
There's no solid native way, that I know of, to do this in Excel, so, we're going to roll our own, naturally. I'm going to post two algorithms I have written in the past to look at this, and keep in mind the intent, to deal with multi-sentence capitalization issues, deal with the varying forms of "I", whether it's I'm, it's, etc... What I don't attempt to speak to is proper nouns, that's beyond what my scope was.
Here is my first take on what I call sCase():
Private Declare Function IsCharLower Lib "user32" Alias "IsCharLowerA" ( _
ByVal cChar As Byte) As Long
Private Declare Function CharUpperBuffB Lib "user32" Alias "CharUpperBuffA" ( _
lpsz As Byte, _
ByVal cchLength As Long) As Long
Public Function sCase(ByRef strIn As String) As String
Dim bArr() As Byte, I As Long, i2 As Long
If strIn = vbNullString Then Exit Function
Let bArr = StrConv(strIn, vbFromUnicode)
CharUpperBuffB bArr(0), 1
For I = 1 To UBound(bArr)
Select Case bArr(I)
Case 105
If Not I = UBound(bArr) Then
Select Case bArr(I + 1)
Case 32, 33, 39, 44, 46, 58, 59, 63, 148, 160
If bArr(I - 1) = 32 Then _
CharUpperBuffB bArr(I), 1
End Select
ElseIf bArr(I - 1) = 32 Or bArr(I - 1) = 160 Then _
CharUpperBuffB bArr(I), 1
End If
Case 33, 46, 58, 63
For i2 = I + 1 To UBound(bArr)
If IsCharLower(bArr(i2)) Then
CharUpperBuffB bArr(i2), 1
I = i2: Exit For
End If
If bArr(i2) <> 32 And bArr(i2) <> 33 And bArr(i2) <> 46 _
And bArr(i2) <> 63 And bArr(i2) <> 160 Then
I = i2: Exit For
End If
Next
End Select
Next
sCase = StrConv(bArr, vbUnicode)
End Function
And here is my 2nd:
Public Function sCase(ByRef strIn As String) As String
Dim bArr() As Byte, i As Long, i2 As Long
If strIn = vbNullString Then Exit Function
Let bArr = strIn
Select Case bArr(0)
Case 97 To 122
bArr(0) = bArr(0) - 32
End Select
For i = 2 To UBound(bArr) Step 2
Select Case bArr(i)
Case 105
If Not i = UBound(bArr) - 1 Then
Select Case bArr(i + 2)
Case 32, 33, 39, 44, 46, 58, 59, 63, 148, 160
If bArr(i - 2) = 32 Then _
bArr(i) = bArr(i) - 32
End Select
ElseIf bArr(i - 2) = 32 Then _
bArr(i) = bArr(i) - 32
End If
Case 33, 46, 58, 63
For i2 = i + 2 To UBound(bArr) Step 2
Select Case bArr(i2)
Case 97 To 122
bArr(i2) = bArr(i2) - 32
i = i2: Exit For
End Select
Select Case bArr(i2)
Case 32, 33, 46, 63, 160
Case Else
i = i2: Exit For
End Select
Next
End Select
Next
sCase = bArr
End Function
Why would I have two versions? Good question. The 2nd version is a lot faster, perhaps not to the eye, but %-wise. But it has an issue, it's very insensitive to international languages, I suspect it's safe for most US-based text strings. The first UDF, which requires the API calls (2nd does not), attempts to be a little more sensitive to international languages.
Note that my understanding of both English and various international languages is limited in scope, e.g., I tested the first UDF with limited amounts of French, etc... But these might not be perfect - think of them as a start, or approach.
Right, so how do you call these? Here are examples:
Sub testTime()
Debug.Print sCase(LCase$("À LA CARTE")) 'Something
Debug.Print LCase$("À LA CARTE") 'Something
Debug.Print sCase("À LA CARTE") 'Nothing
End Sub
And:
Sub testTime()
Debug.Print sCase(LCase$("hello? erm, i'M only testing, eh. indeed, " & _
"i am inquisitve."))
Debug.Print sCase(LCase$("how old?! 22 Years."))
Debug.Print sCase(LCase$("how old?! twenty-two Years."))
Debug.Print sCase(LCase$("hmmmm.... wOrking?!?! sam i am. yes-no? " & _
"isn't i'm isn't."))
Debug.Print sCase(LCase$("THE DAY WAS SUNNY AND I WORE A HAT.PETER WAS THERE "))
Debug.Print sCase(LCase$("no WorRies, i'm ONLY testIng! yes-no?"))
Debug.Print sCase(LCase$("mY fRiend & i"))
Debug.Print sCase(LCase$("iiiiiiiiiiiiii"))
Debug.Print sCase(LCase$(" T. toast %T i @"))
Debug.Print sCase(LCase$("re: sentences."))
Debug.Print sCase("hello? thought i'd test this for McDonald. NOTHING.")
End Sub
You can also call these in the Workbook, should they be available there, the way you'd call any normal UDF, e.g.,
=scase(A1)
Note my use of LCase(). I tried to bake in some flexibility, in terms of what your data might look like, e.g., It's generally intended for you to lower-case your String, before calling sCase(), but there might be scenarios where your proper nouns are in good shape, and you don't want to do that.
You might also note that I'm using a linear search against a Byte Array, and that this might appear to be slow. Give it a shot, time-wise, before fully coming to that conclusion. I've seen alternative approaches to this problem with Regular Expressions (RegExp) that can't quite reach the performance levels of these algorithms, nor are they as effective. Just my experience.
If you want to dig into Byte Arrays, I'd recommend starting with Bruce McKinney's thoughts:
http://vb.mvps.org/hardcore/html/whatisunicode.htm
http://brucem.mystarband.net/mckinney2b.htm#12
Some of my work, here, was assisted with these resources.