Locale-sensitive number grouping

Date:April 17, 2006 / year-entry #135
Tags:code
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20060417-06/?p=31513
Comments:    67
Summary:Most westerners are familiar with the fact that the way numbers are formatted differ between the United States and much of Europe. Culture Format United States 1,234,567.89 France 1 234 567,89 Germany 1.234.567,89 Switzerland 1'234'567.89 What people don't realize is that the grouping is not always in threes. In India, the least significant group consists...

Most westerners are familiar with the fact that the way numbers are formatted differ between the United States and much of Europe.

Culture Format
United States 1,234,567.89
France 1 234 567,89
Germany 1.234.567,89
Switzerland 1'234'567.89

What people don't realize is that the grouping is not always in threes. In India, the least significant group consists of three digits, but subsequent groups are in pairs.

India 12,34,567.89

I've also seen reports that the first group consists of five digits, followed by pairs:

India 12,34567.89

Meanwhile, Chinese and Japanese traditionally group in fours.

China, Japan 123 4567.89

What does this mean for you? Don't assume that numbers group in threes, and of course you can't assume that the grouping separator is the comma and the decimal character is the period. Just use the GetNumberFormat function and let NLS do the work for you.

Next time, a little more about that NUMBERFMT structure.


Comments (67)
  1. Does any other operating system account for all these differences as much as Windows does?

    James

  2. BlakeHandler says:

    In English, our three number groupings have names like: hundreds, thousands, millions etc

    Do you know how they group their "number words"? How would you pronounce a long number?

    It seems difficult without our commas — but then the metric system confuses me too! (^_^)

  3. BryanK says:

    James Summerlin: Anything using GNU gettext goes to this much trouble (assuming the translation file is available).  In particular, LC_NUMERIC is used for number formats.

    There are also different settings per locale for LC_COLLATE (determines character classes in regular expressions, along with character sort orders), LC_CTYPE (character classification — isalpha(), etc. — this seems to be LC_COLLATE from the other direction), LC_MESSAGES (used for translations of strings), LC_MONETARY (used for formatting currency values), and LC_TIME (used for date/time formatting).

    For LC_MESSAGES, each program needs to provide its own translation file.  For LC_COLLATE, LC_CTYPE, LC_MONETARY, LC_NUMERIC, and LC_TIME, I would think that the settings would be global across all programs, but I don’t know for sure.  In particular, my Debian box here seems to have a couple per-program LC_TIME directories, but they appear to only be for coreutils, and they all appear to be symlinks to coreutils’ LC_MESSAGES file.

  4. Stephen Jones says:

    The point is that the important numbers in India are the ‘lakh’ and the ‘crore’ not the thousand and the million.

    So 12,34,000 rupees is twelve lakh, thirty-four thousand ruppees, and 12,34,56,000 Rs is twelve crore, thirty-four lakh, and fifty-six thousand rupees.

    I don’t know if the Indians continue in twos after that.

    Incidentally, did the fact that Arabic words are written from right to left, but the numbers go from left to right ever present a problem in the earlier days of nationalization?

  5. Rick Scott says:

    How well is the US numbering system tollerated in other countries? I’m assuming that if a frenchman saw "1,234,567.89" that they would understand it reasonably well, right?

    I know that, as an American, I would be confused if I saw "1 234 567,89" and assume it was a typo. But I have a feeling that the US method of numbering is on a (long) path towards ubiquity, much like the English language.

  6. Bryank,

    So the short answer is, "of course other operating systems do, but in many cases you may have to reach around your elbow to get to your ass."

    James

  7. Mihai says:
    • Rick Scott –

      No. European users are as confused as you are.

      Especially if the UI is translated (localization), but the numbers are still US style (bad internationalization).

      And even when they figure it out, the perception is "those Americans, totaly insensitive to the cultural differences"

      Plus, this is not allways so obvious. What is 123,456?
  8. Mihai says:

    Problem is, Windows does not allow 4 digits grouping for Japan & China, or 5 digits grouping for India.

  9. Stu says:

    BTW, the "US style" is what we use here in the UK. In fact it probably originated here.

    On a similar note, why does Windows default to 24-hour time on "English (UK)" localization?

  10. dave says:

    On a similar note, why does Windows default

    >to 24-hour time on "English (UK)" localization?

    Because in the UK, all the important stuff (like rail timetables) is done using a 24 hour clock.

    Me, I always set my PC to 24-hour time, even though I usually read 15:00 as "3 o’clock".  I never say "3 PM". If you can figure it out from context, I’ll say "3 o’clock". If the context is unclear, I’ll say "15:00".

    And I have no idea what the hell time "12 AM" is supposed to be. The only logical deduction is that "12 AM" and "12 PM" represent the same time.

  11. Mike Dunn says:

    12AM is midnight, 12PM is noon. "24" reminds us of this every season.

  12. Norton Mapes says:

    The styling most used in Canada is like the French styling, but with a dot instead of une virgule. 1 234 567.89

  13. dakirw says:

    The key number in Chinese is "ten thousand," hence the four digit separation.

  14. Rowland Shaw says:

    I always wonder if it was ever considered to add support for other international formats — Postal formats are well defined, although the logical progression to do formatting of telephone numbers is less clear cut, to pick on the UK, these three are all formatted "correctly":

    0118 496 0000 (011x rule 4 3 4)

    0161 496 0000 (01×1 rule 4 3 4)

    020 7946 0000 (02x rule 3 4 4)

    01632 960000 (geographic fallback rule 5 6)

    07700 900000 (non-geographic rule 5 6)

    (don’t bother trying to call any of those numbers, they’re reserved ranges for drama purposes: http://www.ofcom.org.uk/telecoms/ioi/numbers/num_drama )

  15. Gabest says:

    Pasting a number with decimals into the calculator never works for me because my settings are hungarian and it looks for a comma. Which is of course our "official" separator, but on computers it is rarely used (ever wanted to enter float f = 1,0f; in visual studio? yay… :) and just as widely accepted as the period in normal life. The other thing I cannot really understand is our qwertz kezboard lazout, but that’s a different storz.

  16. JamesW says:

    Our European users can’t decide what format they want. We do the right thing and use the decimal separator provided by the OS – we don’t care about grouping the digits though. The UK gets 123456.89 and France gets 123456,89 – everybody’s happy right? No, of course not!

    Turns out those pesky continental types want us to support the comma decimal separator, but they also want to use a full-stop (period for you USians) if it takes their fancy. So we end up with code that accepts full-stops or commas regardless of locale, but displays it correctly according to the country.

    I’ve never come across the Swiss method of using inverted commas – I guess our users use native German/French/Italian rather than complain. Living in India I’m familiar with the lakh/crore grouping – still looks wrong to me. But then so does using a comma as a decimal!

  17. BryanK says:

    Actually, printf follows the LC_NUMERIC category for any floating-point numbers being printed; all you need is a setlocale() call.  (There’s no way to handle thousands separators in general with printf, though, so it doesn’t follow that.  You can get at the information using localeconv() if you need to print thousands separators, though.  Most programs don’t.)

    Perhaps your "it’s too complicated!!!111" comment was aimed at my explanation of all the different categories; if so, you should note that glibc and the i18n standards have different categories only so the end-users can set them differently if they want/need to.  If they want to see India-style numbers with English text (perhaps because you’re testing, or perhaps because you are from India but you’re trying to learn English, words first), it’s possible.  You can also change the locale of a single program if you need to, when you launch it, because locales are based on environment variables.  (So each user on the machine can have a different preferred set of locale settings.)

    Contrast this with Windows’ insistence on using one set of locale settings for everything on the system (unless it is per user?  I don’t think so though), and not allowing the settings to be overridden for a program at its launch time (unless the program doesn’t do locale stuff at all — if that’s an option), and IMO i18n is much worse in Windows-land.

  18. Cooney says:

    Yeah, Japanese has ‘man’ (10,000) and ‘oku’ (100,000,000). Through a lucky accident, this gives an easy analogue to ‘one million dollars’ – Okuen (1e8 yen).

  19. Of course you could get really fancy and support other numeric systems, like the Indic numerals used in India and some Arabic countries.

    On the other-hand, there are much more pedestrian issues to deal with. So many US commercial websites give me the option of choosing from a list of dozens of countries on their address form, but still insist that I choose a US state, US zip/postal code format and digits with a US telephone number pattern.

  20. KJK::Hyperion says:

    BryanK: all internationalization settings on Windows are at least per-user, with the sole exception of the ANSI and OEM codepages (which are per-system, hardcoded so deep down that they even involve the boot loader) and the input locale (which is per-thread, with per-user defaults)

    Also all settings (single locale settings, locale id, UI language, etc.) are completely decoupled (i.e. they can be changed independently), unless the application thinks it knows better and instead only uses the default settings from the current locale id; almost all applications ported from UNIX are guilty of this – I have an Italian (Italy) locale but English UI language, and anything based on gettext consistently gets an UI in Italian – as are all browser-based applications, altough in that case it’s a limit of HTTP

    Finally, I believe the Microsoft C runtime supports overriding locale settings with environment variables, but it’s not a guarantee as most applications are going to use the Win32 API. Applications are generally allowed to override any setting on a per-call basis (which you can’t do on POSIX), but most are going to just use the default and not allow any form of override

    The notable exception is, of course, the UI language, that didn’t become a system setting until Windows 2000, and even then only supported on English (US) builds with the add-on MUI pack (not available retail) – all of which will (finally!) change in Vista.

  21. John Price says:

    Norton Mapes: As a Canadian, I can’t say I’ve seen the French style used, and nearly always see the American (British?) format with the commas in place.

    Perhaps the French style is most commonly used in Quebec, while the English-speaking provinces use the American format?

  22. There are still recent Microsoft applications which are guilty of using the wrong locale settings. For instance Windows Media Player uses my user locale rather than location to determine access to online stores or provide other location-sensitive information.

    Microsoft.com confuses language and locale almost as a matter of principle.

  23. dave says:

    12AM is midnight, 12PM is noon. "24" reminds us of this every season.

    Yes, but not for any sensible reason.

    12AM means ’12 hours before the meridian’, i.e. 12 hours before noon.

    12PM means ’12 hours after the meridian’, i.e., 12 hours afetr noon.

    Thus they’re both midnight, to sensible people.

  24. Mike says:

    C++ has good locale support. You can set the global locale to the system locale and all standard stream classes will use it for input/output:

    ============

    #include <locale>

    using namespace std;



    // load native system locale

    locale::global(locale(""));

    cout << 1234567 << endl;

    int number;

    cin >> number

    // etc

    ============

  25. Cooney says:

    Dave:

    > 12AM means ’12 hours before the meridian’, i.e. 12 hours before noon.

    No, it means 12 in the AM section of the day, which is defined to run from [00:00,12:00). Sorry, but you’re wrong.

  26. sandman says:

    Cooney:

    >Dave:

    > > 12AM means ’12 hours before the meridian’, i.e. 12 > > > hours before noon.

    > No,

    Agreed. In the UK where I live.

    1AM is definitedly not 1 hour before noon.

    >it means 12 in  AM section of the day,

    Unfortunaley AM stands for ante-meridian or before noon. (Assuming Zulu TZ).

    So in the before noon section what does 12hours mean at exactly noon.

    Its a contradictation which is fine if everone agrees what it mean. But it is a different matter if one tries to tyhink about it logically.

    Personaly I prefer the 24h clock and to avoid using 00:00 as a meaningful timespec. 00:01 does for ost purposes.

  27. steveg says:

    All UI designers/programmers should travel abroad…

    If you haven’t or can’t travel, then set up a few PCs or VirtualPCs running your OS of choice in different languages and see how your application behaves in the rest of the world (my pet peeve is the sodding English (US) dictionary in MS Office, which sticks like a fly to flypaper. Why the heck isn’t it smarter about setting the default language… look at the timezone and make an educated guess).

    Anyway, I digress. Travel broadens the parochial mind you never knew you had, and makes you a better programmer/designer/techie.

    As a bonus it also exposes you to better beer.

  28. BryanK says:

    So they are per-user then, and they are independent; good.

    But still, the user is not able to override any locale setting per-program; a given program will either use the user’s locale, or it will have no localization at all.  (Depending on whether it calls the translate function.)  You can see all your numbers in the format India uses, and all your messages in English, but you can’t do that for just one program.  (Unless that program uses the Microsoft C library calls — using the C library vs. using the raw system calls was discussed for a while here a few days ago.  At the time the discussion centered around stdio’s buffering vs. ReadFile, but here’s another instance where going with the C library is probably a good idea.)

    As far as gettext on Windows goes, that doesn’t surprise me.  That’s what happens when you have completely different standards on different OSes, and you try to write a program to work on both — it’s not impossible, it’s just really hard.

    gettext is probably getting its LC_MESSAGES setting from whatever you refer to as "the locale", instead of what you refer to as "the UI language".  Sounds like a bug in gettext on Windows, but I don’t know for sure.  (OTOH, you should be able to set the LC_MESSAGES or LC_ALL environment variable to get the correct translation for any gettext-using program.  From my knowledge of gettext, I’d guess that the Windows "locale" setting is probably a fallback for when LANG/LC_MESSAGES/LC_ALL aren’t set.)

    Or, since you said something about "the UI language" not being supported until Vista, perhaps you’re using a Vista beta?  In that case, it *REALLY* doesn’t surprise me that gettext fails; it has no idea that there’s a new setting it should look at.  If it’s really only supported on Vista, then I’d expect this to get fixed as soon as someone starts complaining about it to the right people; likely shortly after Vista is released, but maybe sooner.

  29. Norman Diamond says:

    Traditional grouping in Japan is four digits at a time exactly as stated, but other weird stuff comes up too.  Bankbooks have monetary amounts printed with commas between groups of three digits, though the column headers still specify the value of each digit position correctly.

    Then there are things like rents and some other monetary usages, expressed in this kind of form:

    12.3万円

    = (12 point 3) ten thousands of yen

    = 123,000 yen

    (or 50 years ago, = 12,3000 yen)

    Some people have mentioned gnu.  I’ve seen gnu stuff under Linux display dates like this:

    4月 18 2006

    or this:

    18 4月 2006

    which are nearly unreadable.  Correct are:

    2006年4月18日

    or:

    2006/04/18

    etc.

    Windows allows setting a 12-hour clock which halfway matches a lot of conventional usage (other than railway timetables) but it’s still a nuisance to decipher.  For example:

    12:54 午後

    Correct is:

    午後 12:54

    Monday, April 17, 2006 2:42 PM by Mike Williams

    > So many US commercial websites give me the

    > option of choosing from a list of dozens of

    > countries on their address form, but still

    > insist that I choose a US state, US

    > zip/postal code format and digits with a US

    > telephone number pattern.

    Microsoft is an exception.  I ordered something from them two days ago.  They let me select the country first and then input the rest of the address in Japanese order, including the postal code and prefecture (state) etc.  But then came the phone number.  They wouldn’t accept the phone number in any national pattern.  Then I noticed that my current phone number would fit if I omitted all punctuation, exactly ten digits.  But my previous phone number would not have fit even by that method.

  30. I remember our dev-lead send out a status report with seperators put in the Indian way. Later he had to send out an explanation that there was no typo in the number to a confused group of people in Redmond :)

    However the grouping of first five(12,34567.89) is very rare and we’ve standardized to 12,34,567.89

  31. Jonathan says:

    Chinese is like Japanese – it has:

    十  shí  10

    百  bǎi  100

    千  qiān  1000

    万  wàn  10000

    There’s also a character for 1e8 as well, don’t remember how to say it.

    So 230000 is pronounced like this:

    二十三万  èr shí sān wàn  3 10 2 10000

    steveg: "All UI designers/programmers should travel abroad… "

    I agree! I think I need to learn the language of Hawaii! and Aruba! And…

    Stephen Jones: "did the fact that Arabic words are written from right to left, but the numbers go from left to right ever present a problem in the earlier days of nationalization?"

    It still does, also in Hebrew (which is the same lke Arabic in this regard). Getting the reading order to function the way users expect it to is really hard.

  32. Wha'? says:

    look at the timezone and make an educated guess

    An "educated guess"? Well, let’s see, in GMT+1 there’s Dutch, French, Italian, Swedish, German, …

    Even if you knew the country you were sitting in, there are countries that have several official languages (four for Switzerland, for example).

  33. Goran says:

    Reminds me of my school days (eighties and the nineties)…

    In ex-Yugoslavia (now Serbia and Montenegro), date format we used was (example: june the 3rd 2006)… get this… 3. VI 2006. No such thing since now, with computers. Crying shame :-)

  34. Goran says:

    Reminds me of my school days (eighties and the nineties)…

    In ex-Yugoslavia (now Serbia and Montenegro), date format we used was (example: june the 3rd 2006)… get this… 3. VI 2006. No such thing since now, with computers. Crying shame :-)

  35. Anthony Williams says:

    12AM is accepted to mean midnight, and 12PM to mean noon, despite the fact that neither is really correct.

    It always amuses me that the clock on my oven shows 24:00 for midnight, and doesn’t then change until 00:01.

  36. standardize says:

    There is an international standard. The likelihood us would accept it is zero. The likelihood ms would use it is lower. Still waiting for weeknumbering in ISO format in outlook btw.

  37. David Candy says:

    Maybe you’d like to fix Australia’s grouping that Windows has never gotten right.

    nnn nnn nnn.dd

    nnn nnn nnn.dddddddddd

    nnnn

    Note 4 numbers is not grouped. The seperator character is the space (not comma since about 1970) and only used for 5 or more characters and groups in threes. See Aust Government Style Manual.

  38. CN says:

    Still, any dates written like XX/YY/ZZ are far more confusing, especially since we left the nineties.

    The most amusing part is software written by US or other users with a full stop/period decimal separator, that blatantly uses the locale format for internal configuration files, or even when trying to load external XML. The fact that the .NET "Parse" methods do the "right" thing and look at the locale has made this problem far more common than it used to be.

  39. David Candy says:

    While we are at it, kb is lower case and MB is upper case. Uppercase is only used for millions or more. And it is probably time to label correctly as kib and MiB (I’m don’t know if i should be upper or lower case).

  40. Richard Gadsden says:

    I’m amazed no-one has explained why 12:00 a.m. is midnight and 12:00 p.m. is noon.

    12:01-12:59 a.m. is definitely during the ante meridian half of the day

    12:02-12:59 p.m. is definitely during the post meridian half of the day.

    It’s therefore convienent to assign the 12:00 times to the same half-day as the 12:01-12:59 times that follow, so that 12 a.m. and 12 p.m. have the same meaning regardless of the minutes that then follow.

  41. steveg says:

    David Candy: Maybe you’d like to fix Australia’s grouping that Windows has never gotten right.

    [a short google later] You’re right! Apparently Aus uses the SI system. I never knew. Hmmm, writing a parser just got slightly more difficult. I bet many people would be up in arms if we (.au) changed in Vista — I did 3 years on product with very big numbers with commas, never heard a complaint.

    This is a fascinating post — more stories!

    Turkey (as I recall) seemed pretty much ignore the last three 0s in monetary figures — it was common to see prices written as 700 with an implicit 000. Even the banknotes had the last 3 zeros in a different colour (Turkey has more millionaires than any other country. 1 million lira = USD 0.75 :).

    Dates and Time, seems simple, yet always cause more bugs than anyone care to see.

  42. Xan says:

    > How well is the US numbering system tollerated in other countries?

    Here (Italy) the numbering system is quite tolerable. In fact, the only ambiguous case (the problems rise with ambiguity) is for numbers like 12,345 but the context is usually enough to tell which case it is.

    Dates are much more a problem! we use DD/MM/YY while USA use MM/DD/YY. Dates are also easier to find without a clear context, so guessing what 05/07/2006 means in the output of a dir command is not always easy/fast.

    For time, we use the 24h system, even if we often speak them using a 12h system. Anyway the 12.00 am/pm problem is not existant because in 24h system they are 0.00 and 12.00 and in speech they are "mezzanotte" and "mezzogiorno" (midnight and midday).

    Oh, BTW,

    kb = kilobit

    kB = kilobyte

    Mb = megabit

    MB = megabyte

    Kib = kibibit

    KiB = kibibyte

    Mib = mibibit

    MiB = mibibyte

  43. Yesterday, when I talked about that post from Raymond about numeric grouping the locale sensitive way,…

  44. Raymond, could you point this out to the guys and gals in the SQL server team? There cast and convert functions provide no option to allow coma to be used as the decimal separator.

    http://msdn2.microsoft.com/en-us/library/ms187928(SQL.90).aspx

  45. rolfhub says:

    What bugs me much more than different grouping of numbers are the different orders of the parts of a date, I think I’ve seen all of the following:

    DD.MM.YY (or DD.MM.YYYY) – for example: germany

    MM/DD/YY (or MM/DD/YYYY) – for example: US

    YY/MM/DD (or YYYY/MM/DD) – not sure where

    YYYY-MM-DD – international

    WWYY (ISO week and year) – microchips

    YYWW (ISO week and year) – microchips

    MMYY (or MMYYYY or MM.YY or MM.YYYY) – expiration date of food etc.

    DD.MM (or DDMM) – expiration date of food etc.

    Bad thing is, you sometimes just can’t tell for sure which system was used, the separator char (‘.’ or ‘/’ or ‘-‘ or whatever) doesn’t always point towards one specific order … so sometimes it’s really ambigious.

    Bonus nuisance: The USA seems to be the only country (I know of) that insists on sunday as the first day of a week, in other countries it’s monday … but of course one can’t say that one day is more logical that the other, it’s just a matter of taste (or definition).

    … but it’s all chaos, even though it could be so easy …

  46. Joe Butler says:

    To: steveg

    Make the office language settings stick to your prefered option by using the Start Menu, Office Tools, Language Settings link.

    I read somewhere that timetables delibrately use 1159 or 1201 and 2359 or 0001 in preference to 1200 and 0000 to avoid confusion.

    Question:

    How would a decimal be written in a maths paper in countries that use commas and other separators?

    e.g. pi = 3.141…

    or pi= 3,141…

    Anyone know?

    I understand that primary schools prefer a distinct vertically-centred dot rather than a period on the baseline (is this specific to the UK/US or is it the correct mathematical character)?

  47. Xan says:

    To Joe Butler :

    In Italy we have comma as the decimal operator.. and in maths papers you simply use a comma (no fancy rules).

    In universities it’s common to find use of a period instead of a comma which is tolerated anyway.

    Here a vertically centered dot is used for product in high schools instead of the classic "x".

  48. There was a version of Outlook that came very close to release without anyone noticing that Australia’s telephone system was switching to 10 digits. It was only because the short-lived MS dev group in Australia got access to pre-release builds that it was dog-fooded. I’m sure that Telecom Australia (the largest MS Mail/Exchange site in the world at the time) would have been doubly impressed.

    In general, English-language settings that are not all US probably fare the worst with MS testing. It’s pretty much non-existent (as testers in Windows and Office orgs have ‘fessed up to me). I used to dogfood with Australian locale or keyboard settings and watch builds of stuff crash because they assumed English always=English(US).

    There are certainly ways to be smarter about guessing languages. IE7 builds continue to assume that my default language should be English US, even though my locale & keyboard language are English(Australia) and I have a UK-layout keyboard. Still it’s a bit better than current builds of Vista which don’t allow me to choose anything OTHER than a US keyboard when I set up my account.

    There’s been a bug introduced in XP SP2 which makes the US keyboard language reinstate itself over other non-US English keyboard languages. This completely *****s spell-checking and other language-sensitive tools. I believe the bug has finally been addressed in recent Vista builds, but still nothing on the horizon for XP.

  49. Cooney says:

    Xan:

    > KiB = kibibyte

    Oh come on, that’s just silly :p Kilobytes are 1024 bytes.

  50. Norman Diamond says:

    Tuesday, April 18, 2006 1:06 PM by rolfhub

    > YY/MM/DD (or YYYY/MM/DD) – not sure where

    I was going to say that’s the international standard, which is also used in the world’s largest country (by population) and some of its neighbouring countries.  But then your next line gave the same ordering with hyphens and called that international, so I guess you mean particularly the slashes here.

    I don’t know if slashes with this order are part of a national standard but they are part of Microsoft’s standards.  The Windows 98 default was YY/MM/DD.  The Windows 2000 (and subsequent) default is YYYY/MM/DD.

    > The USA seems to be the only country (I know

    > of) that insists on sunday as the first day

    > of a week, in other countries it’s monday

    In countries whose calendars are based on the bible, Sunday is the first day because Saturday is the 7th day which is the holy day.  In other countries with different influences and/or indirect influences from the same original source, rules vary a bit more.  For example in many Muslim countries the workweek runs from Saturday to Thursday, so odds are that neither Sunday nor Monday is the first day.

    In Japan, maybe 95% of calendars show Sunday as the first day of the week and 5% show Monday as the first day of the week.  But in antiquity weeks were 10 days long.

  51. David Candy says:

    Xan,

    The rules are, except for several defined exceptions (and buts/bytes aren’t mentioned at all), is that the unit takes it capitalisation from the prefix.

    How hard would it be for interpreters/compilers to recognise COLOUR and COLOR as equiv words. Also Initilise/Initilize etc.

  52. KJK::Hyperion says:

    BryanK: for the programs that don’t play nice with locales because they assume a specific one (but then ask for the "default" whatever it is), the Microsoft internationalization team has released AppLocale, an application compatibility hack that lets you override locale settings on a per-application basis

    And other than those two, there’s a third class of applications virtually unique to Windows (among which SQL Server), which never use defaults and always store a locale alongside data (what does "sorting" strings mean without a locale specification?). This is made possible by the ability of the Windows NLS library to override defaults on a per-call basis, or special abilities like that to precalculate collation data

    Finally: the UI language was made a setting distinct from the locale in Windows 2000 – before then, multiple UI languages ("MUI") were simply not supported, different languages implied different builds. What will change in Windows Vista is that the system will, for the first time ever, *only* come in a build hardcoded to English (US), allowing you to install any extra languages at will. This is already possible with current versions, but you need an English (US) edition (which may be hard to get) and the additional MUI pack (which is virtually impossible to get, since it’s only available to Volume License customers)

    An aside: Microsoft, in general, doesn’t believe in environment variables. But, rather typically, they haven’t given us a 100% viable replacement either

    Mike Williams: you can’t blame any website for getting it wrong. It’s a limit of HTTP, which constricts three Windows settings (locale, language, location) into one (locale generally being the "least wrong"). This has shaped not only browsers, but most importantly web applications too, most of which (I have seen exceptions) simply have no concept of "location", nor a "language" distinct from the "locale"

  53. Xan says:

    > The rules […] is that the unit takes it capitalisation from the prefix.

    Actually any unit has its proper capitalisation.

    A is ampere, a is are. Electronvolt is always eV whether in meV or in GeV. Hertz is Hz.

    http://physics.nist.gov/Pubs/SP811/contents.html

    http://www.abdn.ac.uk/sms/ugradteaching/guidetosiunits.shtml

    While bytes are always B, you’re right that bits symbol is "bit" and not "b".

  54. David Candy says:

    I was refering to the seven basic units. The  Australian Government Publishing Service has recently pulled all referenced from the web, else I’d give links to australian LAW.

  55. rolfhub says:

    Answer to Norman Diamond

    Tuesday, April 18, 2006 1:06 PM by rolfhub

    >> YY/MM/DD (or YYYY/MM/DD) – not sure where

    > I was going to say that’s the international standard, which is also used in the

    > world’s largest country (by population) and some of its neighbouring countries.

    > But then your next line gave the same ordering with hyphens and called that

    > international, so I guess you mean particularly the slashes here.

    Yes, I wanted to say that you can’t simply tell from the separator chars used, which order the fields have, you can’t simply say (pseudocode):

    switch(separatorchar){

    case ‘/’: // Order is M,D,Y



    case ‘-‘: // Order is Y,M,D



    case ‘.’: // Order is D,M,Y



    }

    … because some orders can come with several separators. Quite a chaos.

    > I don’t know if slashes with this order are part of a national standard but they

    > are part of Microsoft’s standards.  The Windows 98 default was YY/MM/DD.  The

    > Windows 2000 (and subsequent) default is YYYY/MM/DD.

    I see. The ISO states YYYY-MM-DD HH:MM:SS (HH ranging from 00 to 23) as the international standard, but maybe the same order, but with slashes, is in use, too (at least i’ve seen it on several websites).

    >> The USA seems to be the only country (I know

    >> of) that insists on sunday as the first day

    >> of a week, in other countries it’s monday

    > In countries whose calendars are based on the bible, Sunday is the first day

    > because Saturday is the 7th day which is the holy day.  In other countries with

    > different influences and/or indirect influences from the same original source,

    > rules vary a bit more.

    Yes, but the ISO states (ISO 8601:1988) that monday is defined as first day of the week. So I think it would be best to regard monday as the first day of the week, but of course every country can have it’s own standards …

    > For example in many Muslim countries the workweek runs

    > from Saturday to Thursday, so odds are that neither Sunday nor Monday is the

    > first day.

    Interesting, I didn’t know that days other than sunday or monday are used as first day of the week, good to know.

    > In Japan, maybe 95% of calendars show Sunday as the first day of the week and 5%

    > show Monday as the first day of the week.

    Ah, similar to digital clocks here (in germany), most use the 24 hour system (which is the only one in use offically), but some stick to the 12 hour system, others are configureable.

    > But in antiquity weeks were 10 days

    > long.

    Yes, but I think (and hope) that no countries still stick to standards THAT old ;-)

  56. Anonymous Coward says:

    What’s the correct way to display a currency value in different locales (I know there’s an API to format a currency value for a locale, but that’s not what I want). For example, how do you display $12.34 (us dollars) to a French user (still in US dollars)? 12,34 USD ?

  57. Norman Diamond says:

    Wednesday, April 19, 2006 5:45 PM by rolfhub

    > Answer to Norman Diamond

    >> But in antiquity weeks were 10 days long.

    > Yes, but I think (and hope) that no countries

    > still stick to standards THAT old ;-)

    True I think ^_^  That is, I think it’s not a standard specified by law.  But actually it’s still common usage.

    3月上旬 = the first 10 days of March

    3月中旬 = the middle 10 days of March

    3月下旬 = the last 10 days of March

    These used to be the first week, middle week, and last week.  Of course in antiquity months weren’t 31 days long but they weren’t 30 days either, so there was always some slop factor in these phrases.  But weeks were 10 days long.

    Also some cities acquired names based on which day of the week they had their market day instead of having names assigned by whatever bureaucratic procedures, and some of those cities still have those names.  5th Day Market is a city in Tokyo.  I don’t remember where 4th Day Market and 10th Day Market are.

  58. David Candy says:

    Just in case cultural blindness hides from Raymond why non americians find this important.

    In my country many people think commas are the digit grouping character. When I started school they were (but not when I finished). But now software tells people what the standards are. We see americian standards slipping into everything. This included programming with colour/color and similar (and how many synax errors have I copped from this – millions). Also Resumes are all MSWord based templates – yuck – the word team has always made ugly documents.

    Also Monday is the Americian standard for first day of week. The ISO is adopted in two americian standard based bodies. Although having read the standard I don’t see that it applies to anything other than storage of dates. Just because the computer likes it I don’t see that it applies to me.

    The Year-Month-Day Date Format is defined for use in the following standards documents:

    International Implementation:

    International Standard: ISO 8601:1988

    (ISO 8601 replaces ISO 2014, ISO 2015, ISO 2711, ISO 3307 and ISO 4031).

    Other Major Implementations:

    European Norm: EN 28601:1992.

    USA Standard: ANSI X3.30-1985(R1991).

    USA Standard: NIST FIPS 4-1.

    Japan: JIS X 0301-1992.

    Canada: CSA Z234.5:1989.

    Australia: AS 3802:1997.

    South Africa: ARP 010:1989.

  59. rolfhub says:

    Answer to Norman Diamond

    > Wednesday, April 19, 2006 5:45 PM by rolfhub

    >> Answer to Norman Diamond

    >>> But in antiquity weeks were 10 days long.

    >> Yes, but I think (and hope) that no countries

    >> still stick to standards THAT old ;-)

    > True I think ^_^  That is, I think it’s not a standard specified by law.  But

    > actually it’s still common usage.

    > 3月上旬 = the first 10 days of March

    > 3月中旬 = the middle 10 days of March

    > 3月下旬 = the last 10 days of March

    > These used to be the first week, middle week, and last week.  Of course in

    > antiquity months weren’t 31 days long but they weren’t 30 days either, so there

    > was always some slop factor in these phrases.  But weeks were 10 days long.

    I see. If months were always 30 days long, it would even make a nice round system. 1 year=12 months, 1 month=3 weeks=30 days, 1 week=10 days, … would be nice.

    > Also some cities acquired names based on which day of the week they had their

    > market day instead of having names assigned by whatever bureaucratic procedures,

    > and some of those cities still have those names.  5th Day Market is a city in

    > Tokyo. I don’t remember where 4th Day Market and 10th Day Market are.

    That doesn’t sound sensible to me, what if a city changes it’s market day, does it have to change it’s name also…? Quite a bit strange. But a interesting bit of history.

  60. rolfhub says:

    Answer to David Candy

    > Just in case cultural blindness hides from Raymond why non americians find this

    > important.

    I can’t speek for Raymond, but i’m quite sure he is not guilty of being cultural blind, from reading this blog, i get the impression that he is very aware of cultural differences. But sadly cultural blindness seems to be quite common amongst americans (at least that’s the impression i got over the years). But that isn’t the fault of Raymond (or Microsoft), of course.

    > In my country many people think commas are the digit grouping character. When I

    > started school they were (but not when I finished). But now software tells

    > people what the standards are. We see americian standards slipping into

    > everything.

    Well, that’s just what this blog entry is about: getting computers to use the established standard of the applying culture, so it is "just right" for the user.

    > Also Monday is the Americian standard for first day of week.

    Well, I think the american standard is sunday, monday is the ISO standard (widely used troughout the world), but i repeat myself (se previous posts for details).

  61. Ken Hagan says:

    Following up on rolfhub…

    > But sadly cultural blindness seem to be quite common amongst americans …

    That’s probably because exposure to foreign cultures is so hard to get (at least on a day to day basis) in the US. If you live almost anywhere else on the planet, getting exposure to one other culture (the US) is relatively easy and in most of the developed world children are bi-cultural from an early age (and American English is probably the commonest second language). Awareness of cultural variation is instinctive and compulsory almost everywhere outside the US, but has to be learned and is in any case optional inside it.

    Americans should try harder and we should be more constructive and gently persuasive in criticising them. (For the record, I’m British and have similar problems with living in a linguistic monoculture but with the additional irritant of substantial parts of that monoculture not being my own. Have Microsoft any plans to produce an English version of Windows?)

  62. rolfhub says:

    Answer to Ken Hagan

    > Following up on rolfhub…

    >> But sadly cultural blindness seem to be quite common amongst americans …

    >

    > That’s probably because exposure to foreign cultures is so hard to get (at least

    > on a day to day basis) in the US. If you live almost anywhere else on the

    > planet, getting exposure to one other culture (the US) is relatively easy

    Yes, but one could think that most people involved in the process of creating computer hard- and software would be using the internet often (it’s just so extremely useful in many ways), getting quite some exposure to cultural differences, and getting sensitive to cultural differences that way. So that shouldn’t be to much of a problem -at least- in the IT sector, but it doesn’t seem to work well …

    > and in

    > most of the developed world children are bi-cultural from an early age (and

    > American English is probably the commonest second language).

    Well, i’m not sure if it’s American English, I think that most schools that teach english, teach the british variant (back in school we were told we’d be learning the "official" british english (oxford english), and that would be the normal thing around the globe), but the differences [between british and american english] seem marginal to me … so i think you’re right.

    > Awareness of

    > cultural variation is instinctive and compulsory almost everywhere outside the

    > US, but has to be learned and is in any case optional inside it.

    Yes, but on the other hand most countries are smaller that the U.S., so the inhabitants get more direct exposure to the neighbors, because they’re much nearer. (And [if living in the E.U.] thanks to the E.U., they are really close neighbors now, so it’s much more natural to get some cultural exposure frequently.)

    > Americans should try harder and we should be more constructive and gently

    > persuasive in criticising them.

    Agreed.

    > (For the record, I’m British and have similar

    > problems with living in a linguistic monoculture but with the additional

    > irritant of substantial parts of that monoculture not being my own. Have

    > Microsoft any plans to produce an English version of Windows?)

    Are the linguistic differences between great britain and the U.S.A. really that large? I got the impression that they are really minor, but of course my impression could be faulty …

  63. David Candy says:

    Another thing is that centimetre is not a legal unit in Australia. Millimetres, metres, and kilometres are the units of length. Yet Word, Wordpad, IE, or anywhere use centimetre (if not inches like DPI – isn’t 600 dots per 25.4 mm far easier to understand) such as page margins.

  64. rolfhub says:

    Answer to David Candy

    > Another thing is that centimetre is not a legal unit in Australia.

    > Millimetres,

    > metres, and kilometres are the units of length. Yet Word, Wordpad, IE, or

    > anywhere use centimetre.

    Well, that’s really strange, since the SI-prefixes are just that – prefixes with

    fixed meaning, to be combined with every SI base unit (when applicable). So

    every established prefix

    exa   – 10^18

    peta  – 10^15

    tera  – 10^12

    giga  – 10^9

    mega  – 10^6

    kilo  – 10^3

    hecto – 10^2

    deka  – 10^1

    deci  – 10^(-1)

    centi – 10^(-2)

    milli – 10^(-3)

    micro – 10^(-6)

    nano  – 10^(-9)

    pico  – 10^(-12)

    femto – 10^(-15)

    atto  – 10^(-18)

    combined (for example) with metre (or "meter" as i know it) should make a legal

    unit. To have some of them but not others is quite strange.

    But i guess every country has

    it’s fair share of strange pecularities that just make no sense but nevertheless stay that way, that’s

    life.

  65. Norman Diamond says:

    In Japan people’s heights are measured in centimetres and paper sizes are measured in millimetres, so a guess is that both are probably legal.  Though actually that’s a meaningless guess because it doesn’t matter if they’re legal or not — social conventions take priority over law, even legally.  Anyway, that is how things are done.  Except that Windows doesn’t let us specify which metric units, so sometimes odd units pop up.  Of course it’s trivial to move the decimal point, but it still takes a moment to notice that we do have to move the decimal point and adjust the unit in order to compare the number to other numbers that we’re accustomed to.

  66. David Candy says:

    We use the powers of 3 rule unless otherwise specified.

    http://www.measurement.gov.au/index.cfm?event=object.showContent&objectID=C3CDFE95-BCD6-81AC-124CE10A492450C2 is the people in charge of measurements. Centi is legal in a strict sense. Just not to be used.

    Why Australia sucessfully, and the US and UK unsucessfully, converted to metric was due to laws mandating units. You weren’t even allowed to import rulers with dual markings.

    Working in technical fields I can convert most units from memory (ie I don’t calculate I remember the nearest answer like [all approx] 5/8th = 16 mm, 1 mile = 1600 mtr, 1 NM = 2 km, 1 chain [22 yds] = 20 metre).

    I must admit that as far as humans go I’m still Imperial with feet-inches and stones-pounds. No point telling me a criminal is 1.8 m and 80 kg – I have no idea what that looks like. I’m 6′ and a around 11 st.

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index