On the ambiguity of uniqueness

Date:December 14, 2005 / year-entry #384
Tags:code
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20051214-15/?p=32983
Comments:    27
Summary:The MSDN documentation for System.Object.GetHashCode says [T]he implementation of GetHashCode provided by the String class returns unique hash codes for unique string values. This is another case of ambiguous use of the word "unique". The intended meaning is "for each string value, the same hash code is returned". Even though "unique" means "one and only...

The MSDN documentation for System.Object.GetHashCode says

[T]he implementation of GetHashCode provided by the String class returns unique hash codes for unique string values.

This is another case of ambiguous use of the word "unique". The intended meaning is "for each string value, the same hash code is returned".

Even though "unique" means "one and only one", the domain in which the term applies is often left unsaid, as here, where the domain of comparison is "all the hash codes returned for a specific string value". If you instead misinterpreted the domain as "all the hash codes returned for all string values", then you end up erroneously concluding that no two strings hash to the same value.

Another conflicting sense of "unique" is "you get the same one each time" as opposed to "you get a different one each time".

  • GetCurrentProcessId returns a unique value that identifies the process. You get the same one each time.
  • CoCreateGuid returns a unique GUID. You get a different one each time.

In the original C standard, malloc(0) is permitted to return NULL or "a unique pointer". What does "unique" mean here? Does it mean that the non-NULL return value is always the same? Can I assert(malloc(0) == malloc(0))? Or does it mean that the non-NULL return value is a value distinct from any other return value from malloc()?

In Technical Corrigendum 1, this ambiguity was resolved by removing the word "unique". Instead, the specification says "as if the size were some nonzero value" which makes it clear that it is the second interpretation that is intended.

My suggestion: Don't use the word "unique". It's too ambiguous.


Comments (27)
  1. Kuwanger says:

    ‘In the original C standard, malloc(0) is permitted to return NULL or "a unique pointer". … Can I assert(malloc(0) == malloc(0))?’

    How I read that (unless you’ve included more ambiguity) is malloc(0) could return NULL one time and "a unique pointer" another, so even if "a unique pointer" was always the same value, you couldn’t ‘assert(malloc(0) == malloc(0))’.

    Or am I over quibbling?

  2. mike says:

    How about finding someone who can write?

  3. Kuwanger: Yes, it could return NULL sometimes and a unique pointer sometimes, but try capturing that in a single expression that doesn’t ruin the flow of the paragraph. "Can I void *p1=malloc(0), p2=malloc(0), assert(!p1 || !p2 || p1==p2)?" doesn’t quite have the same ring to it. Everybody knew what I meant, no need to get bogged down in details.

  4. dave says:

    My suggestion: Don’t use the word "unique".

    >It’s too ambiguous.

    That’s a reasonable default :-)

  5. What a unique way of putting things. :: grins :: Sorry, I couldn’t resist.

  6. Yeesh. Like I said, "No need to get bogged down in details". Did somebody declare today to be "nitpick day" without telling me? (Besides, I already set the context to "malloc(0) returns a non-NULL value" in the previous paragraph and re-emphasized it in the previous sentence. Do I have to worry about every single sentence being taken out of context?)

  7. James Schend says:

    I hereby declare December 14th, 2005 as Worldwide Nitpick Day!

    Sorry you didn’t get that memo, Raymond.

    And Aaron: Leave the comedy to DJ Noname and the 107.7 The End morning show.

  8. Lance Fisher says:

    Aaron, there are those who appreciate cheesy puns and those who don’t. I do.

  9. K. Henriksen says:

    I’d write something like "GetHashCode provided by the String class returns a deterministic hash code when used on a string value."

  10. James says:

    Did the MSDN documentation even really need to say "unique hash codes"? Why not just say "hash codes"? It seems like a standard definition of hashing to me, and the extra verbiage just makes it confusing. Do people ever make hashing implementations that /aren’t/ deterministic?

  11. jon says:

    It’s very annoying that GetHashCode() doesn’t return a unique hash code! What if I wanted to be able to uniquely identify a string via its hash code??? Could you please add a GetHashCodeUnique() method into the next version of .NET please??? Thank you!!!

  12. Frank Richter says:

    "Could you please add a GetHashCodeUnique()"

    The very nature of hashes makes this impossible – you’re mapping a string of arbitrary length to a number of fixed size. You just can’t give every string a unique numbers, you don’t have enough.

    A point to start further research: http://en.wikipedia.org/wiki/Hashing

  13. Anonymous Coward says:

    http://burtleburtle.net/bob/hash/perfect.html

    It is possible to produce unique hashes for all valid inputs to a function – provided, of course, that you choose a suitable subset of inputs to define as ‘valid’.

    Not a general purpose tool, obviously, but still interesting.

    (And it’s always nitpick day on the internet.)

  14. Jonathan says:

    Here’s how I would write it:

    [T]he implementation of GetHashCode provided by the String class returns hash codes for string values. Identical string values are guaranteed to return identical hash codes.

  15. Wang-Lo says:

    "Could you please add a GetHashCodeUnique()"

    I once coded this function for a minicomputer operating system. My implementation worked without error but was rarely used due to its poor static efficiency.

    However for those developers who could afford the space, my function could generate a domain-unique hash key for strings up to 2^10 bytes long using only 1024 bytes for the key.

    -Wang-Lo.

  16. strik says:

    OldNewThing: Make it "void *p1=malloc(0), *p2=malloc(0); assert(!p1 || !p2 || p1==p2)" to do what you most probably intended (note the missing "*" before p2 and the replacement of ";" for "," before the assert().

  17. MSDN has a wide audience. Usually I get complaints from people that MSDN doesn’t state *enough* obvious things, not that it states too many…

  18. Norman Diamond says:

    But the word "unique" isn’t ambiguous enough. If you want it too ambiguous, you need uniques. And if you have get even more of them you have multiques.

    Reinder: malloc(10) isn’t malloc(0) so the particular question here doesn’t arise. Though now I feel compelled to reread that section of the standard to see if there’s another ambique statement along the lines you mentioned.

  19. Mr. Ed says:

    Oracle databases have UNIQUE indexes. But people have a very hard time understanding that an Oracle UNIQUE index can still contain *multiple* NULL values.

  20. Reinder says:

    It is good to hear that it is "Worldwide Nitpick Day".

    "Or does it mean that the non-NULL return value is a value distinct from any other return value from malloc()?"

    Regardless of the parameter passed to it, malloc never guarantees that its return value is distinct from any other return value from malloc. Consider the following:

    void * p = malloc( 10);

    free(p);

    void * q = malloc( 10);

    I doubt that it was ever the intention of the C standards committee to require p and q to be different here.

  21. Ritchie says:

    Why would you want to call malloc(0)? It doesn’t seem to do anything useful, unless it’s to coerce your implementation of malloc() to do something weird behind the scenes stuff.

  22. Stu says:

    Ritchie: Why would you want to call malloc(0)?

    So you can realloc() later…

  23. Alex Blekhman says:

    James Schend wrote:

    > I hereby declare December 14th, 2005 as

    > Worldwide Nitpick Day!

    Don’t be so light-hearted about that day. 66 years ago, on December 14 Soviet Union was expelled from League of Nations because violations of treaties with Finland and attacking it. That was a heck of a lot of nitpicking!

  24. RevMike says:

    "Oracle databases have UNIQUE indexes. But people have a very hard time understanding that an Oracle UNIQUE index can still contain *multiple* NULL values."

    This makes perfect sense. NULL is not a value, but the absence of value. Once people properly understand this, SQL’s NULL handling is sensible.

  25. MB says:

    "Oracle databases have UNIQUE indexes. But people have a very hard time understanding that an Oracle UNIQUE index can still contain *multiple* NULL values."

    "This makes perfect sense. NULL is not a value, but the absence of value. Once people properly understand this, SQL’s NULL handling is sensible."

    Sql Server doesn’t allow multiple NULL values in a unique constraint…and the reasoning is the same. NULL is not only the abscense of a value, but is an UNKNOWN value. Therefore, it is UNKNOWN whether 2 NULLs are equal. So, 2 (possibly) equal NULL values can’t be allowed into a unique constraint.

    For those solid reasons that you want Oracle behavior in Sql Server 2000, create an indexed view on the table that excludes nulls and enforce a unique constraint on the view.

  26. nil says:

    null in oracle sux, can’t save an empty string without it becoming null.

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index