If the system says that an embedded string could not be converted from Unicode to ANSI, maybe it’s trying to tell you something

Date:July 3, 2007 / year-entry #239
Tags:code
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20070703-00/?p=26173
Comments:    17
Summary:It's probably trying to tell you that an embedded string could not be converted from Unicode to ANSI. One of our programs is throwing the exception "Type could not be marshaled because an embedded string could not be converted from Unicode to ANSI." It happens only if we use the Chinese version of the program....

It's probably trying to tell you that an embedded string could not be converted from Unicode to ANSI.

One of our programs is throwing the exception "Type could not be marshaled because an embedded string could not be converted from Unicode to ANSI." It happens only if we use the Chinese version of the program. Why are we getting this exception?

I may be going out on a limb here, but I bet it's because an embedded string could not be converted from Unicode to ANSI.

Unicode is big. Bigger than any ANSI code page. No matter what ANSI code page you pick, there will be Unicode characters that cannot be expressed in it. (And no, you can't set your ANSI code page to UTF-8. Michael Kaplan discussed it last October, and before that, last July, and before that, a week and a half previous (still July), and before that, two years ago February. I think Michael might need to change the subtitle of his blog to "Explaining why the ANSI code page can't be UTF-8 since 2005".)

In particular, if you grab some Chinese characters and try to map them to the ANSI code page on an English system, they definitely won't map successfully.

The real question, though, is why you're using ANSI in the first place. Get with the program, it's 2007 already. As Michael Kaplan noted, it's all Unicode all the time from now on. In that same article from Michael, he also does the necessary psychic debugging for the root cause: The DllImport declaration did not specify a CharSet, so you get the default, which is CharSet.Ansi!

Change your character set to CharSet.Unicode and you'll be all set.


Comments (17)
  1. ac says:

    Some of us still write for win9x (We don’t like the double size of strings either)

  2. David Walker says:

    Um, maybe the default is wrong?

    [Um, maybe I wasn’t the one who chose the default. Michael Kaplan already discussed this, by the way. -Raymond]
  3. Csabóka says:

    We don’t like the double size of strings either

    Maybe you, sir, are a native English speaker and think that those other funny characters aren’t that important, but the rest of the world doesn’t think so. I remember times when someone in Hungary (like me) couldn’t use French town names, for example, correctly, because France is in Western Europe and Hungary is in Central Europe. I think having double size for all strings is pretty much worth it if it means people can use their native language everywhere.

    Plus, if you’re that worried about size, UTF-8 is still there. If all you have is English text, there won’t be any difference in the size.

  4. I don’t know about marshaling or understand the situation here, but…

    Should this create an exception? Sounds like an overkill to me. I mean, it’s obvious that you cannot map any Unicode string to any ANSI code page. Why not just map non-existent characters to character 255 and maybe return a warning some way.

    Assume you got an exception from your browser or word processor just because current font didn’t have a few of the character glyphs for the document you are trying to open…

  5. mikeb says:

    > Assume you got an exception from your browser or word processor just because current font didn’t have a few of the character glyphs for the document you are trying to open… <<

    The exception is there for the software engineer to deal with, not the end user.  If the system did what you suggest, and the engineer did not fix the bug that ANSI APIs are being called instead of Unicode APIs, here’s what might happen: users start getting characters in their text mangled because Chinese characters do not map to ANSI chars so they get mapped to 255.  When the mapping is reversed,the 255’s get mapped to some nonsensical character, and the user’s text is garbage.

    Either way, throw an exception or translate to some dummy character, it’s still a bug.

    Sometimes you’ll see text from a non-english codepage displayed with ‘?’ characters. (http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=FontFAQ_Display#b0551258). That’s the program trying to display text that’s in a codepage it doesn’t understand.  It’s bad enough when the program doesn’t know how to display the text, but you’d better believe that a user would not want that transformation done in a way that causes the original text to actually be lost.

  6. Ifeanyi Echeruo says:

    A good start to understanding what Unicode is and why you should care.

    http://www.joelonsoftware.com/articles/Unicode.html

  7. Dean Harding says:

    Some of us still write for win9x

    That’s what MSLU was created for. Why cripple your Windows NT implementation just to cater for 5% (or less) of your market?

  8. Ben Cooke says:

    Burak,

    There is a number of reasons why an exception is appropriate here:

    * Your proposal silently corrupts the data. The caller has no way to detect that the data has been corrupted, and so may end up writing the corrupted data to backing store in place of the correct data.

    * An exception allows the caller to trap the condition and handle it in some appropriate manner. If your program is a character encoding converter, it’d probably be appropriate to tell the user “Sorry; this data can’t be represented in that encoding.”

    * In this particular case, I’m assuming that since we’re talking about “the Chinese language version of the app” this text is going to be displayed in the UI somehow. Turning every character into ANSI codepoint 255 is hardly useful behavior, since the text will be unreadable.

    [And if the substitution is being done on a file name, you have a potential security hole. -Raymond]
  9. steven says:

    The real question, though, is why

    you’re using ANSI in the first place.

    Get with the program, it’s 2007

    already.

    I entirely agree.

    I have had this unpleasant conversation too many times to remember. It is more difficult than you may imagine to convince a customer of the values of Unicode.

    I wish non-Unicode support would be pulled in a future version of Windows. Or at the very least display a big dialog at execution time warning the user that the 20th century has ended and it’s time to move on to Unicode.

  10. Jonathan says:

    And then there are Unicode-only languages, where you can’t represent the chars in *any* ANSI codepage.

  11. Cheong says:

    [And if the substitution is being done on a file name, you have a potential security hole. -Raymond]

    Remembering the days when people abuse javascript with US-ASCII encoding to workaround detection of popup blockers. :P

  12. >> The real question, though, is why you’re using ANSI in the first place. Get with the program, it’s 2007 already. <<

    Some of us use frameworks that *still* haven’t been updated for Unicode.  XYZ, I’m looking at you!

  13. Chris Oldwood says:

    To [mis]quote a respected blogger :-)

    "Microsoft Layer for Unicode – now you have two problems"

  14. slapout says:

    I have coworkers (programmers) who ask me what an error message they are getting means even when the error is spelled out in plain English.

  15. The question went something like this: I’m trying to display GB18030 text (say unicode 0x3400 character)

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index