Making secure programming hard through bad documentation.

I ran into a little confusion when tracking down a bug in one of my programs today.

Direct quote from the sscanf_s formatting fields documentation (as of the time of posting, maybe it'll be corrected soon):

"The secure versions (those with the _s suffix) of the scanf family of functions require that a buffer size parameter be passed preceding each parameter of type c, C, s, S or [."

Uh... that should be "following"(or even "after", because people understand short words better), not "preceding", and an example would be good to make this distinction clear:

// Read a maximum of 19 characters and a null from input_string.
char destination[20];
int errval;
errval = sscanf_s( input_string,
                   "%s",
                   destination,
                   _countof(destination) );

This reminds me - I take a lot of time impressing on developers the difference between _countof and sizeof.

The sizeof operator (it's not a function - don't get confused by "sizeof x", and expect "sizeof(x)") has the advantage of being straight C, but it has the disadvantage of frequently returning 4 (the usual size of a pointer), or in Unicode programming, twice the amount you're looking for.

_countof() is a compile-time evaluation through smart use of C++ templates - when you accidentally pass it a pointer instead of an array, you get a compile error (a good thing!), and it always returns the value you're looking for in order to use the secure _s functions.

The documentation for those secure "_s" string functions could be far clearer on this point, too - so much of the documentation refer to phrases like "the count parameter is a count of bytes for char, and a count of characters for wchar", or "Parameters: sizeInBytes, sizeInWords" - no, it's always a count of characters, and if you think of it in such an unambiguous way as "count of characters", you will be less confused.

IMHO, sizeof should be reserved for the strict exact case of requiring to know how many bytes in memory an object will occupy - for when you're treating the destination as a pointer to memory, not characters of any sort (not even one-byte characters). _countof, where available, should be used in preference, to get the number of elements.

This ties into my earlier topic on SAL - _ecount should be used in preference to _bcount in SAL annotations, because you are dealing with the elements of an array, not the bytes at a memory address.

I know C++ allows you to think and code in the lowest levels, but that is not an invitation to always do so - take advantage of high-level constructs when dealing with high-level concepts ("string" is a significantly higher concept that "pointer to byte sequence").

Published Sat, Jun 3 2006 11:01 by Alun Jones

Comments

# re: Making secure programming hard through bad documentation.

"Uh... that should be "following"(or even "after", because people understand short words better), not "preceding""

Thats a small flaw, it should be "prior to" or "before".

e.g. %10c - the 10 is prior to the c, rather than following it.

Saturday, June 03, 2006 10:11 PM by nick

# re: Making secure programming hard through bad documentation.

No, you've become confused - and that's why it's not a small flaw.

Old style "sscanf" works as you describe - if you want to read exactly ten characters into a string, you do this:

char x[11];
sscanf(input_line,"%10c",x);
x[10]='\0'; // Null-terminate.

But what I'm talking about is using sscanf_s to ensure I don't overflow:

char x[11];
sscanf_s(input_line,"%10c",x,_countof(x));
// No need to null-terminate - or is there? Discuss...

Note that the size argument, "_countof(x)" comes after the buffer pointer argument, "x". The documentation says that the size comes first, and doing it like that will crash your program.

Thanks for commenting, though - it's pleasing to think that someone's reading my drivel :-)

Saturday, June 03, 2006 10:34 PM by Alun Jones

# re: Making secure programming hard through bad documentation.

Haha, thats right...the verbiage did confuse me :) So I missed out on your intention - what you wanted to correct.

"Thanks for commenting, though - it's pleasing to think that someone's reading my drivel :-)"

As long as you're writing about programming, security, networking, or crypto you'll have my readership in all likelihood :)

Sunday, June 04, 2006 7:46 PM by nick

# re: Making secure programming hard through bad documentation.

To add to what I said, the documentation seems to be describing the format string parameter, rather than the count parameter - quite confusing as you said!

Sunday, June 04, 2006 7:52 PM by nick

# re: Making secure programming hard through bad documentation.

In an email, one of Microsoft's folks makes the interesting point that the use of "bytes" and "words" is appropriate because of multi-byte character sets, and because of similar Unicode behaviours.
It's an interesting point - I'd like to see a term that adequately describes the nature of an element of a string array, without devolving into calling each element a character, since some characters are made of multiple elements, and without calling each element a byte or a word, because that just adds to the confusion.
When you think of strings, you shouldn't be thinking of byte representations in memory, you should be thinking of string representations.

Wednesday, June 07, 2006 11:08 AM by Alun Jones

Leave a Comment

(required) 
(required) 
(optional)
(required) 
If you can't read this number refresh your screen
Enter the numbers above: