The code page on the server is not necessarily the code page on the client

Date:September 14, 2007 / year-entry #344
Tags:code
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20070914-00/?p=25123
Comments:    10
Summary:It's not enough to choose a code page. You have to choose the right code page. We have a system that reformats and reinstalls a network client computer each time it boots up. The client connects to the server to obtain a loader program, and the loader program then connects to the server to download...

It's not enough to choose a code page. You have to choose the right code page.

We have a system that reformats and reinstalls a network client computer each time it boots up. The client connects to the server to obtain a loader program, and the loader program then connects to the server to download the actual operating system. If anything goes wrong, the server sends an error message to the client, which is displayed on the screen while it's still in character mode. (No Unicode available here.)

Initially, we used FormatMessageA to generate the error message, but somebody told us we should use FormatMessageW followed by WideCharToMultiByte(CP_OEM). I'm not sure whether this is a valid suggestion, because the client hasn't yet installed Unicode support so it only is capable of displaying 8-bit text, and using CP_OEM will use the OEM code page on the server, which doesn't necessarily match the OEM code page on the client.

What is the correct way of generating the error message string?

Now, mind you, the argument against using CP_OEM is the same argument against using FormatMessageA! In neither case are you sure that the code page on the server matches the code page on the client. If CP_OEM is wrong, then so too is FormatMessageA (which uses CP_ACP).

The correct solution is to use FormatMessageW followed by WideCharToMultiByte(x), where x is the OEM code page of the client. You need to get this information from the client to the server somehow so that the server knows what character set the client is going to use for displaying strings.

There's really nothing deep going on here. If you're going to display an 8-bit string, you need to use the same code page when generating the string as you will use when displaying it. Keep your eye on the code page.


Comments (10)
  1. 640k says:

    7-bit codepage is enough for everyone.

  2. sandman says:

    Ick. Is there really such a thing as a 7 bit code page.

    I thought the 0-127 char positions of all the code pages where equivalent to what used to be coquailly called ASCII.

  3. Dean Harding says:

    sandman: that was his point. As long as you believe English is good enough for everyone…

  4. SvenGroot says:

    Depends on how you define 7-bit codepage. Many codepages allow multibyte characters, and an example of a multibyte character codepage that uses only 7 bits per byte is utf-7.

    And the first 128 positions of any given codepage do not need to match ASCII per se. The letters usually do, but the rest, not so much. A well-known example is 0x5C, which is the backslash in ASCII. This is different in many codepages, e.g. in JIS (Japanese) it’s a yen sign ¥. Which leads to the effect that many non-English versions of Windows use something other than a as the path separator. On a Japanese version of Windows, a path would like like C:¥Windows¥System32. This is still the case under Windows NT; although it probably doesn’t need to be the case for Unicode apps, people are used to it and changing it would mean Unicode and non-Unicode apps on the same machine would display the paths differently.

  5. Nathan says:

    One thing I’ve learned in my years of programming: if you want people to do the right thing, it should be the path of least resistance. The more hoops people have to jump thru, the more likely they are to botch it.

    Unicode is something that should have been sent back as half-baked, and let stew for a while until it’s as easy to use as ascii. It is NOT anywhere near as easy to use. And that’s bad for everyone — programmers, and end users.

  6. Dean Harding says:

    Err, in what way is Unicode not as "easy to use" as ASCII? This post, for example, would not even be required if everything had been Unicode.

  7. Michiel says:

    In all fairness, UTF-16 is just a bad idea. It turns a multi-byte encoding into a multi-word encoding, introducing endianness as an additional complexity. UT-8 is a much cleaner solution, if only for making all ASCII text also UTF-8.

  8. John Elliott says:

    According to IBM’s codepage list, codepage 367 is 7-bit US-ASCII: <ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP00367.pdf&gt;. But I don’t think Windows includes support for it.

  9. KJK::Hyperion says:

    John Elliot: in Windows, US-ASCII is codepage number 20127

  10. Dmitry says:

    That’s why I hate when a computer (a program, OS, etc.) tries to talk to me in any language but English (which is not my mother tongue).

    Anything can go wrong: code page not supported, font does not have appropriate characters (ever seen ????? ????? instead of text in a critical error message?).

    In this particular case, there is a clear rule for client-server comms: "Never return a text. Return numeric error code, and let client display the text"

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index