Converting a byte[] to a System.String

Date:September 7, 2004 / year-entry #330
Tags:code
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20040907-00/?p=37943
Comments:    7
Summary:For some reason, this question gets asked a lot. How do I convert a byte[] to a System.String? (Yes, this is a CLR question. Sorry.) You can use String System.Text.UnicodeEncoding.GetString() which takes a byte[] array and produces a string. Note that this is not the same as just blindly copying the bytes from the byte[]...

For some reason, this question gets asked a lot. How do I convert a byte[] to a System.String? (Yes, this is a CLR question. Sorry.)

You can use String System.Text.UnicodeEncoding.GetString() which takes a byte[] array and produces a string.

Note that this is not the same as just blindly copying the bytes from the byte[] array into a hunk of memory and calling it a string. The GetString() method must validate the bytes and forbid invalid surrogates, for example.

You might be tempted to create a string and just mash the bytes into it, but that violates string immutability and can lead to subtle problems.


Comments (7)
  1. Ben Hutchings says:

    On a related question, how do those of us not using .NET achieve streamable character conversion – that is, conversion where the converter can perform a partial conversion, indicate errors such as "the last n bytes of input begin but don’t complete a multibyte character" or "the output buffer is too small so only converted m bytes of input were converted", and then allow you to continue with another block of input data and/or output buffer? MLang appeared to offer this but so far as I can see it doesn’t, or at least the documentation doesn’t cover it. Yet IE is presumably doing it, and MLang is part of IE…

  2. Ben Hutchings says:

    (Apologies for the slightly incoherent rambling sentence above.)

  3. Ben Lowery says:

    Something else to mention is that you should match the System.Text.Encoding subclass to the contents of the byte[]. For example, passing a byte[] that contains text encoded using UTF-8 to UnicodeEncoding’s GetString method won’t decode the byte[] properly. For example:

    <pre>

    using System;

    using System.Collections;

    using System.Text;

    public class MyClass

    {

    public static void Main()

    {

    byte[] text = Encoding.UTF8.GetBytes("my string");

    string s = Encoding.Unicode.GetString(text);

    Console.WriteLine(s);

    s = Encoding.UTF8.GetString(text);

    Console.WriteLine(s);

    }

    }

    </pre>

  4. Clinton Pierce says:

    Unicode? We don’ need no stinkin’ Unicode! :)

    string s=System.Text.Encoding.ASCII.GetString(buffer, 0, buffer.Length);

  5. Jon Potter says:

    not actually a .NET blog?

  6. Mr. Ed says:

    Regarding the Abrams link:

    Why, oh why, does the string have a cast operator to a non-const C-string, if the string is immutable?

  7. Norman Diamond says:

    In VC++ 2005 beta 1, either the _T() macro doesn’t work, or there’s something funny about macros that are or used to be UNICODE and _UNICODE. I haven’t had time to investigate. When I had time to practice with VC++ 2005 beta 1, I just worked around it by changing _T("string") to L"string", forcing them to be wide strings, and wide strings are Unicode in Windows.

    But … this didn’t have to be done with all strings. Some of them I just left as "string", forcing them to be multibyte strings. Automatic conversions and boxing to type System::String^ correctly converted some of these ANSI strings to Unicode, only garbaging up some others. I haven’t had time to investigate if there’s a reason for this.

    (This didn’t seem to be the worst issue I found in VC++ 2005 beta 1, because the IDE was still operating and forms could still be edited after that. But if I didn’t have time to investigate if there’s a more serious underlying cause or not.)

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index