Date: | January 5, 2006 / year-entry #11 |
Tags: | code |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20060105-00/?p=32753 |
Comments: | 17 |
Summary: | Occasionally, I see someone ask for a function that converts between LCIDs (such as 0x0409 for English-US) and RFC 1766 language identifiers (such as "en-us"). The rule of thumb is, if it's something a web browser would need, and it has to do with locales and languages, you should look in the MLang library. In this... |
Occasionally, I see someone ask for a function that converts between LCIDs (such as 0x0409 for English-US) and RFC 1766 language identifiers (such as "en-us"). The rule of thumb is, if it's something a web browser would need, and it has to do with locales and languages, you should look in the MLang library. In this case, the IMultiLanguage::GetRfc1766FromLcid method does the trick. For illustration, here's a program that takes US-English and converts it to RFC 1766 format. For fun, we also convert "sv-fi" (Finland-Swedish) to an LCID. #include <stdio.h> #include <ole2.h> #include <oleauto.h> #include <mlang.h> int __cdecl main(int argc, char **argv) { HRESULT hr = CoInitialize(NULL); if (SUCCEEDED(hr)) { IMultiLanguage * pml; hr = CoCreateInstance(CLSID_CMultiLanguage, NULL, CLSCTX_ALL, IID_IMultiLanguage, (void**)&pml); if (SUCCEEDED(hr)) { // Let's convert US-English to an RFC 1766 string BSTR bs; LCID lcid = MAKELCID(MAKELANGID(LANG_ENGLISH, SUBLANG_ENGLISH_US), SORT_DEFAULT); hr = pml->GetRfc1766FromLcid(lcid, &bs); if (SUCCEEDED(hr)) { printf("%ws\n", bs); SysFreeString(bs); } // And a sample reverse conversion just for good measure bs = SysAllocString(L"sv-fi"); if (bs && SUCCEEDED(pml->GetLcidFromRfc1766(&lcid, bs))) { printf("%x\n", lcid); } SysFreeString(bs); pml->Release(); } CoUninitialize(); } return 0; } When you run this program, you should get en-us 81d
"en-us" is the RFC 1766 way of saying "US-English",
and 0x081d is If you browse around, you'll find lots of other interesting functions in the MLang library. You may recall that earlier we saw how to use MLang to display strings without those ugly boxes.
Update (January 2008):
The globalization folks have told me that they'd prefer that
people didn't use MLang.
They recommend instead the functions
|
Comments (17)
Comments are closed. |
Simply read HKCU/MIME/Database/RFC1766. Problem solved.
Yes, because it’s always better to go delving into the registry than to use the documented API calls. . . .
Especially when that’s the wrong path. It should be HKLM/Software/Classes/MIME/Database/Rfc1766.
But yeah, use the APIs instead, even though it means more blasted COM.
Looks like you could also use Rfc1766ToLcid (http://msdn.microsoft.com/library/default.asp?url=/workshop/misc/mlang/reference/functions/rfc1766tolcid.asp) and LcidToRfc1766 (http://msdn.microsoft.com/library/default.asp?url=/workshop/misc/mlang/reference/functions/rfc1766tolcid.asp), functions exposed directly from mlang.dll. Of course this requires IE 5.5 or newer, whereas IMultiLanguage is available from IE 4.0 onward.
RFC 1766 is long obolete (replaced by rfc3066) – best conversion function to use is LCIDToLocaleName (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/nls_LCIDToLocaleName.asp?frame=true). This is base platform functionality rather than MLang.
There are other problems with MLang though. I will blog about this tonight….
How do you determine the character set of each code page? For example, for code page 932?
Hey Rob — Do you meant font charset? That would make a cool blog topic, too. :-)
If you mean the one you would use in web pages, that’s a bit harder.
What I want to do is to display all the characters in a particular code page i.e. codepage 932 is japanese. I want to gather all those characters and displays them using something like the following code snipplet.
SomeFunkyFunction(void)
{
CString s, s1;
for (int i = BEGIN; i < END; i++)
{
// concatenate the character into string.
s1.Format(_T("%c"), i);
s += s1;
}
return s;
}
// Any recommendations?
You should put this in the suggestion box on my blog (http://blogs.msdn.com/michkap/482609.aspx)….
Rob: HKEY_CLASSES_ROOTMIMEDatabaseCodepage for the codepage-to-charset conversion, and HKEY_CLASSES_ROOTMIMEDatabaseCharset for the reverse. There’s no API for it AFAIK
oh, nevermind
Thursday, January 05, 2006 5:53 PM by Rob
> all the characters in a particular code page
> […]
> for (int i = BEGIN; i < END; i++)
> s1.Format(_T("%c"), i);
I can’t find any spec for what happens when the value of i isn’t a valid character’s codepoint.
But there’s something else odd about this. In a Unicode compilation, you won’t know what code page(s) contain the character that you’re dealing with in Unicode. In an ANSI compilation, your %c format can only handle a single-byte character. In an ANSI compilation you’ll have two problems. One is the same as above, I can’t find any spec for what happens when the value of i isn’t a valid character’s codepoint (i.e. might just be the lead byte of a two-byte character, or might not be a valid lead byte of anything). The other is that you never format any double-byte characters, so you miss most of the characters in the code page.
KJK::Hyperion,
How do you use those data? Any idea? They don’t really display the characters. For the CharSet it only gives (Default) and AliasForCharset. Any recommendation would be appreciated.
Norman Diamond,
Yeah that’s the problem that i’m running into. Deciphering the ranging and the getting the correct double for each individual codepage seems like a daunting task. Any recommendation?
Hi Rob —
I posted what I think is the easiest approach over in my blog (http://blogs.msdn.com/michkap/510411.aspx). The method covers all of the problems outlined in this thread (and several others, like best fit mappings).
Enjoy!