Date: | September 14, 2007 / year-entry #344 |
Tags: | code |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20070914-00/?p=25123 |
Comments: | 10 |
Summary: | It's not enough to choose a code page. You have to choose the right code page. We have a system that reformats and reinstalls a network client computer each time it boots up. The client connects to the server to obtain a loader program, and the loader program then connects to the server to download... |
It's not enough to choose a code page. You have to choose the right code page.
Now, mind you, the argument against using The correct solution is to use There's really nothing deep going on here. If you're going to display an 8-bit string, you need to use the same code page when generating the string as you will use when displaying it. Keep your eye on the code page. |
Comments (10)
Comments are closed. |
7-bit codepage is enough for everyone.
Ick. Is there really such a thing as a 7 bit code page.
I thought the 0-127 char positions of all the code pages where equivalent to what used to be coquailly called ASCII.
sandman: that was his point. As long as you believe English is good enough for everyone…
Depends on how you define 7-bit codepage. Many codepages allow multibyte characters, and an example of a multibyte character codepage that uses only 7 bits per byte is utf-7.
And the first 128 positions of any given codepage do not need to match ASCII per se. The letters usually do, but the rest, not so much. A well-known example is 0x5C, which is the backslash in ASCII. This is different in many codepages, e.g. in JIS (Japanese) it’s a yen sign ¥. Which leads to the effect that many non-English versions of Windows use something other than a as the path separator. On a Japanese version of Windows, a path would like like C:¥Windows¥System32. This is still the case under Windows NT; although it probably doesn’t need to be the case for Unicode apps, people are used to it and changing it would mean Unicode and non-Unicode apps on the same machine would display the paths differently.
One thing I’ve learned in my years of programming: if you want people to do the right thing, it should be the path of least resistance. The more hoops people have to jump thru, the more likely they are to botch it.
Unicode is something that should have been sent back as half-baked, and let stew for a while until it’s as easy to use as ascii. It is NOT anywhere near as easy to use. And that’s bad for everyone — programmers, and end users.
Err, in what way is Unicode not as "easy to use" as ASCII? This post, for example, would not even be required if everything had been Unicode.
In all fairness, UTF-16 is just a bad idea. It turns a multi-byte encoding into a multi-word encoding, introducing endianness as an additional complexity. UT-8 is a much cleaner solution, if only for making all ASCII text also UTF-8.
According to IBM’s codepage list, codepage 367 is 7-bit US-ASCII: <ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP00367.pdf>. But I don’t think Windows includes support for it.
John Elliot: in Windows, US-ASCII is codepage number 20127
That’s why I hate when a computer (a program, OS, etc.) tries to talk to me in any language but English (which is not my mother tongue).
Anything can go wrong: code page not supported, font does not have appropriate characters (ever seen ????? ????? instead of text in a critical error message?).
In this particular case, there is a clear rule for client-server comms: "Never return a text. Return numeric error code, and let client display the text"