The COM interface contract rules exist for a reason

Date:November 1, 2005 / year-entry #329
Tags:code
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20051101-54/?p=33533
Comments:    17
Summary:Some people believe that the COM rules on interfaces are needlessly strict. But the rules are there for a reason. Suppose you ship some interface in version N of your product. It's an internal interface, not documented to outsiders. Therefore, you are free to change it any time you want without having to worry about breaking...

Some people believe that the COM rules on interfaces are needlessly strict. But the rules are there for a reason.

Suppose you ship some interface in version N of your product. It's an internal interface, not documented to outsiders. Therefore, you are free to change it any time you want without having to worry about breaking compatibility with any third-party plug-ins.

But remember that if you change an interface, you need to generate a new Interface Identifier (IID). Because an interface identifier uniquely identifies the interface. (That's sort of implied by its name, after all.)

And this rule applies even to internal interfaces.

Suppose you decide to violate this rule and use the same IID to represent a slightly different interface in version N+1 of your program. Since this is an internal interface, you have no qualms about doing this.

Until you have to write a patch that services both versions.

Now your patch is in trouble. It can call IUnknown::QueryInterface and ask for that IID, and it will get something back. But you don't know whether this is the version N interface or the version N+1 interface. If you're not even aware that this has happened, your patch will probably just assume it has the version N+1 interface, and strange things happen when it is run on version N.

Debugging this problem is not fun. Neither is fixing it. Your patch has to use some other cues to decide which interface it actually got back. If your program has been patched previously, you need to have the version numbers of every single patch so that you can determine which version of the interface you have.

Note that this dependency can be hidden behind other interfaces. Consider:

[
    uuid("ABC")
]
interface IColorInfo
{
    HRESULT GetBackgroundColor([out] COLORREF *pcr);
    ...
};

[
    uuid("XYZ")
]
interface IGraphicImage
{
    ...
    HRESULT GetColorInfo([out] IColorInfo **ppci);
};

Suppose you want to add a new method to the IColorInfo interface:

[
    uuid("DEF")
]
interface IColorInfo
{
    HRESULT GetBackgroundColor([out] COLORREF *pcr);
    ...
    HRESULT AdjustColor(COLORREF clrOld,
                        COLORREF clrNew);
};

[
    uuid("XYZ")
]
interface IGraphicImage
{
    ...
    HRESULT GetColorInfo([out] IColorInfo **ppci);
};

You changed the interface, but you also changed the IID, so everything is just fine, right?

No, it isn't.

The IGraphicImage interface is dependent upon the IColorInfo interface. When you changed the IColorInfo interface, you implicitly changed the IGraphicImage::GetColorInfo method, since the returned interface is now the version N+1 IColorInfo interface.

Consider a patch written with the version N+1 header files.

void AdjustGraphicColorInfo(IGraphicImage* pgi,
                            COLORREF clrOld, COLORREF clrNew)
{
 IColorInfo *pci;
 if (SUCCEEDED(pgi->GetColorCount(&pci)) {
  pci->AdjustColor(clrOld, clrNew);
  pci->Release();
 }
}

If run against version N, the call to IGraphicImage::GetColorCount will return a version N IColorInfo, and that version doesn't support the IColorInfo::AdjustColor method. But you're going to call it anyway. Result: Walking off the end of the version N vtable and calling into space.

The quick solution is to change the IID for the IGraphicImage function to reflect the change on the IColorInfo interface on which it depends.

[
    uuid("UVW")
]
interface IGraphicImage
{
    ...
    HRESULT GetColorInfo([out] IColorInfo **ppci);
};

A more robust fix would be to change the IGraphicImage::GetColorInfo method so that you pass the interface you want to receive.

[
    uuid("RST")
]
interface IGraphicImage
{
    ...
    HRESULT GetColorInfo([in] REFIID riid,
                         [iid_is(riid), out] void** ppv);
};

This allows interfaces on which IGraphicImage depends to change without requiring a change to the IGraphicImage interface itself. Of course, the implementation needs to change to respond to the new value of IID_IColorInfo. But now the caller can feel safe in the knowledge that when it asks for an interface, it's actually getting it and not something else that coincidentally has the same name.


Comments (17)
  1. AB says:

    A previous company I worked for used a (misguided) attempt to get around this requirement: Instead of creating a new interface, there was general agreement that you could new functions to an interface, but only to the end. As long as the client knew that it was talking to a newer version of the service (which it did), it could safely call the additional function. In theory, older clients would only call the first N functions in the interface, which would always be safe, whether the service was new or old.

    Now this might have worked if we had been using ‘real’ COM (or maybe we would have run into the same problem). However, in order to run cross-platform, we had our own ‘COM lite’ implementation which in almost all respects worked the same. Instead of constructing the interfaces using IDL, we just used a C++ abstract base class. People just added functions to the end of the list, and since they get added to the end of the vtable, we were safe, as long as we were careful.

    And then one day we started getting crash reports. When I looked into it, I discovered that the compiler had decided to be ‘smart’: We had an interface like this:

    interface IMyComponent {

    HRESULT DoSomething( long x );

    HRESULT Print( long y );

    }

    Then someone had added:

    HRESULT DoSomething2( long x, long y );

    And the compiler, instead of adding it to the bottom of the vtable, like it always had before, decided that DoSomething2 belonged together with DoSomething, so the vtable looked like:

    DoSomething

    DoSomething2

    Print

    An old client tried to call Print, which ended up calling DoSomething2, which not only did the wrong thing, but popped too many arguments off the stack… Now I’m not necessarily going to call this a compiler bug, because the whole vtable concept is just an implementation detail, but it was certainly unexpected behavior. (Since we had a component autoupdate system, we were able to release a 3rd component that would detect the situation, ‘fix up’ the vtable, and keep the system alive long enough for it to be updated.

    Lesson learned: stop modifying interfaces, even at the end…

  2. Chris Becke says:

    hmmm, given that, once you are done with all the IDL stuff, what the c++ compiler gets are generated plain old header files with virtual class definitions… well, if that isn’t a bug, how do the MIDL generated header files avoid the compiler re-ordering the interfaces?

  3. AB says:

    Chris, of course you are right. I was thinking of the c-style pseudo-virtual tables that the MIDL compiler generates, but of course it generates C++ style class too, which could have the same problem. This occured in MSVC6, I don’t know if it is still around.

    (Typo: Of course my interface functions were declared virtual … =0 )

  4. Stephane Rodriguez says:

    "But remember that if you change an interface, you need to generate a new Interface Identifier (IID)."

    Huh? As long as you add new methods at the end of your interface (with greater IDs), and never change public methods, both old (version N) and new consumers (version N+1) of that interface are happy.

  5. julian_t says:

    Stephane said:

    "Huh? As long as you add new methods at the end of your interface (with greater IDs), and never change public methods, both old (version N) and new consumers (version N+1) of that interface are happy."

    And what happens if a new (version N+1) consumer happens to get hold of an older (version N) component? They try calling a method that doesn’t exist in the older component and…

  6. Bryan says:

    Not to mention Raymond’s comment about what happens if you have to figure out which version is installed on a specific machine.

    It’s impossible using COM alone (because QueryInterface won’t tell you); you’d have to use some other attribute of the component’s container file (the EXE or DLL), like a hash or version number. But this gets unwieldy pretty fast, especially if you’re patching relatively often. It’s simpler to just let the component tell you which version it is, based on the IIDs that it supports.

  7. PatriotB says:

    The pattern of using both a REFIID and an LPVOID* when retrieving objects is seen throughout the main COM/OLE interfaces as well as shell interfaces. I always wondered why they didn’t just return an IUnknown and make you manually query for the interface you want — but I suppose this way you can save making an extra call.

  8. Re: REFIID/LPVOID pattern in methods other than QueryInterface:

    One answer is the round tripping one. When your interfaces are not remoted, you lucky dog, maybe QIing for another interface is cheap. If your interfaces are remoted, doing two round trips to the remote machine instead of one is just kind of dumb.

    The second answer is that this is not even semantically equivalent to getting an IUnknown and QIing for a different interface. The provider is fully allowed to give up a different kind of object depending on the IID passed in. OLE/DB providers may exploit this pattern by giving different kinds of recordsets depending on the iid passed in when the query is executed or the table is opened. I use this pattern regularly.

  9. John C. Kirk says:

    Interesting points, and I must confess to having "bent the rules" (ahem) a few times in the past. A couple of minor typos, though:

    * "GetColorCount" should be "GetColorInfo".

    * "IGraphicImage function" should be "IGraphicImage interface".

  10. foxyshadis says:

    "And the compiler, instead of adding it to the bottom of the vtable, like it always had before, decided that DoSomething2 belonged together with DoSomething, so the vtable looked like:"

    This exact error occured in the new version of avisynth; they were trying to keep it binary compatible with its hundreds of plugins by adding to the end but then whoosh, it reordered a couple new functions. Could it be mitigated by reordering the vtable in the compiled output prior to linking? (As a quick and dirty fix.)

  11. Norman Diamond says:

    From my reading of the C++ standard, a class can contain some member functions (both static and non-static) and still be a POD-struct as long as it meets stated conditions on the member variables and not having a base class etc. And it is possible for two different POD-struct classes to be layout-compatible even if they have different member functions as long as their member variables meet stated conditions. I wonder how compilers handle those vtables.

  12. AB: That works (worked!) in the ‘real world’ very well. Because typelib binding is done in vtable order, you will rarely (if ever) brake an existing client by adding methods to an existing interface. This applies to ‘semi-late’ binding and COM+ interception as well.

    If you have a component doing some sort of nasty late binding IDispatch stupid tricks then it gets interesting, so you wouldn’t want to use it in a *commercial* product. But for COM-based distributed apps within a corporate environment, it was a good compromise between contract and future flexibility. I hate adding numbers to interface names – your typelibs start looking like movie sequel lists.

  13. Chris Becke says:

    So, as long as you stick to the rules, and ensure that the "interface" class is a POD struct… then you are immune to compiler re-ordering?

  14. Mike Weiss says:

    I’ve always developed patchs to versions of software using the ENTRIE set of source code at the point of that versions release.

    When patching two versions, I would merge those fixes into the source code of each version. And ship all, or just the changed, binaries for each version. The patch to version N was built using the headers from version N, N+1’s patch was built using N+1’s headers.

    Other then doubling SOME of the work when patching, whats the problem? A good SCM system helps in this area.

  15. Norman Diamond says:

    Wednesday, November 02, 2005 4:56 AM by Chris Becke

    > So, as long as you stick to the rules, and

    > ensure that the "interface" class is a POD

    > struct… then you are immune to compiler

    > re-ordering?

    As far as I could tell from reading relevant portions of the standard, it looks that way. But I can’t figure out how it can be true.

    In C days I could pounce on bugs in the standard and prove why they needed fixing. (A lot of them still need fixing, i.e. weren’t fixed, but I proved the need anyway.) But I don’t expect to become a C++ language lawyer. I’ll probably just remain in the status of not figuring out how things can work the way the standard says on this point.

  16. Neil says:

    Since Mozilla’s build system doesn’t track interface dependencies its typelibs aren’t regenerated when a base interface UUID changes. You can of course work around the problem using forward references and letting the compiler/typelib loader fix up the UUIDs later.

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index