|Date:||January 5, 2007 / year-entry #5|
|Summary:||Norman Diamond seems to have made a side career of harping on this topic on a fairly regular basis, although he never comes out and says that this is what he's complaining about. He just assumes everybody knows. (This usually leads to confusion, as you can see from the follow-ups.) Back in the ANSI days,...|
Norman Diamond seems to have made a side career of harping on this topic on a fairly regular basis, although he never comes out and says that this is what he's complaining about. He just assumes everybody knows. (This usually leads to confusion, as you can see from the follow-ups.)
Back in the ANSI days, terminology was simpler. Windows operated on
The use of the term byte throughout permitted the term character to be used for other purposes, and in 16-bit Windows, the term was repurposed to represent "one or bytes which together represent one (what I will call) linguistic character." For single-byte character sets, a linguistic character was the same as a byte, but for multi-byte character sets, a linguistic character could be one or two bytes.
Documentation for functions that operated on linguistic characters said characters, and functions that operated on
With the introduction of Unicode, things got ugly.
All documentation that previously used byte to describe the size of textual data had to be changed to read "the size of the buffer in bytes if calling the ANSI version of the function or in
Unfortunately, most documentation writers (and 99% of software developers, who provide the raw materials for the documentation writers) aren't familiar with the definition of character that was set down back in 1983, and they tend to use the term to mean storage character, which is a term I invented just now to mean "a unit of storage sufficient to hold a single
As a result, my recommendation to you, dear reader, is to enter every page of documentation with a bias towards storage character whenever you see the word character. Only if the function operates on the textual data linguistically should you even consider the possibility that the author actually meant linguistic character. The only functions I can think of off-hand that operate on linguistic characters are
<-- Back to Old New Thing Archive Index