Why does the Resource Compiler complain about strings longer than 255 characters?

Date:March 19, 2004 / year-entry #107
Tags:history
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20040319-00/?p=40173
Comments:    12
Summary:As we learned in a previous entry, string resources group strings into bundles of 16, each Unicode string in the bundle prefixed by a 16-bit length. Why does the Resource Compiler complain about strings longer than 255 characters? This is another leftover from 16-bit Windows. Back in the Win16 days, string resources were also grouped...


Comments (12)
  1. Ben Hutchings says:

    Why do Microsoft employees call the system’s single-byte or multi-byte character encoding "ANSI" when it is never (AFAIK) an ANSI standard encoding?

  2. Steve Sheppard [MSFT] says:

    I always call ’em ASCII….

  3. Steve Sheppard [MSFT] says:

    Sorry, I misread your post. I thoguht you were just referring to single byte.

  4. Why do Microsoft employees call the system’s

    > single-byte or multi-byte character

    > encoding "ANSI" when it is never (AFAIK) an

    > ANSI standard encoding?

    Because it’s ISO-8859-1, which was created around 1987 by ISO/ANSI.

  5. J. Edward Sanchez says:

    Actually, Windows Latin 1 (a.k.a. Windows-1252, or CP1252) is different from ISO-8859-1; it contains 27 printable glyphs in the 80h-to-9Fh range, where ISO-8859-1 contains nonprintable control codes.

    The characters defined in ISO-8859-1 correspond exactly to the first 256 Unicode code points, while the Windows Latin 1 characters in the 80h-to-9Fh range correspond to Unicode code points scattered all over the place.

  6. Actually we typically call them ANSI because the actual interpretation of CHAR * strings is subject to CP_ACP ("the ANSI code page").

    Why is it called the ANSI code page? I dunno. I feel fortunate to have avoided the whole 16-bit era myself except for a few questions when I interviewed.

    ASCII is a 7-bit character set; and most code assumes that codes in the range 0-127 in a MBCS environment are the ASCII equivalents. I think that this assumption is so widely distributed that this is probably why we don’t have very good MBCS support for encodings where this assumption is not true.

  7. Yuri Khan says:

    > Why is it called the ANSI code page?

    Why *they* are called the ANSI code page, better to say, because there isn’t one fixed “ANSI” code page, there are lots, and it depends on the locale.

  8. Norman Diamond says:

    Base note:

    > If your 32-bit DLL contains strings longer

    > than 255 characters, then 16-bit programs

    > would be unable to read those strings.

    You mean 255 bytes. Depending on the actual characters, anywhere from 128 to 255 of them might be too many for a 16-bit program (when using these APIs). Microsoft still confuses "character" with "byte" too often. Now wait right there, you’re not getting off that easily.

    3/19/2004 12:45 PM Steve Sheppard [MSFT]:

    > I always call ’em ASCII….

    Only one ANSI code page is ASCII. The other ANSI code pages are not ASCII. I hope Mr. Chen gives you a stern lecture as soon as he finishes giving himself one.

  9. J. Edward Sanchez says:

    The term "ANSI" is commonly used to refer specifically to the Windows Latin 1 code page, which is also known as Windows-1252 and CP1252. It is not to be confused with ISO-8859-1, which, as I mentioned in an earlier post, contains more control codes but fewer printable characters.

    It should also be noted that many Windows code pages are indeed ASCII — or, to be more precise, supersets of ASCII. ASCII is a 7-bit character set, and many Windows code pages (including Latin 1 "ANSI") simply supplement ASCII by adding an eighth bit and up to 128 additional characters.

  10. Norman Diamond says:

    3/21/2004 6:12 PM J. Edward Sanchez:

    > The term "ANSI" is commonly used to refer

    > specifically to the Windows Latin 1 code page

    All through MSDN, the term "ANSI code pages" refer to all ANSI code pages.

    > It should also be noted that many Windows

    > code pages are indeed ASCII — or, to be

    > more precise, supersets of ASCII.

    Yup. Many are. Also, many aren’t.

    Of those that aren’t, many come close. Here’s one example: Among all the one-byte and two-byte characters of ANSI code page 932, 126 of the values are officially the same as ASCII values, and one more value is practically the same (no one minds that it displays as a tilde even though officially it’s an overline). For practical purposes only one of the values below 127, and all of the values in several ranges between 128 and 65535, are wildly different from ASCII.

    Meanwhile, "ANSI" doesn’t mean "ANSI code page 437" or "ANSI code page 850" or whichever you had in mind, ANSI code pages still mean all ANSI code pages.

  11. Ben Hutchings says:

    Norman: Code pages 437 and 850 are IBM code pages and can be the "OEM code page" on some machines. If I understand correctly, the "OEM code page" is the one that the BIOS uses and that DOS and NT consoles use by default.

    So far as I know ISO 8859-1 has nothing to do with ANSI – it is based on the DEC Multinational Character Set and Roman Czyborra says it was originally standardised by ECMA.

    ASCII was of course an ANSI (or ASA as it was back then) standard, and Windows "ANSI" code pages are based on ASCII, but then so are the OEM code pages, so that doesn’t explain it either.

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index