When you're dealing with application compatibility,
you discover all sorts of things that worked only by accident.
Today, I'll talk about some of the "creative" ways people
mess up the
IUnknown::QueryInterface method.
Now, you'd think, "This interface is so critical to COM,
how could anybody possible mess it up?"
- Forgetting to respond to IUnknown.
-
Sometimes you get so excited about responding to all these
great interfaces that you forget to respond to IUnknown itself.
We have found objects where
IShellFolder *psf = some object;
IUnknown *punk;
psf->QueryInterface(IID_IUnknown, (void**)&punk);
fails with E_NOINTERFACE!
- Forgetting to respond to your own interface.
-
There are some methods which return an object with a specific
interface. And if you query that object for its own interface,
its sole reason for existing, it says "Huh?"
IShellFolder *psf = some object;
IEnumIDList *peidl, *peidl2;
psf->EnumObjects(..., &peidl);
peidl->QueryInterface(IID_IEnumIDList, (void**)&peidl2);
There are some objects which return E_NOINTERFACE to the QueryInterface
call, even though you're asking the object for itself!
"Sorry, I don't exist," it seems they're trying to say.
- Forgetting to respond to base interfaces.
-
When you implement a derived interface, you implicitly implement
the base interfaces, so don't forget to respond to them, too.
IShellView *psv = some object;
IOleView *pow;
psv->QueryInterface(IID_IOleView, (void**)&pow);
Some objects forget and the QueryInterface fails with E_NOINTERFACE.
- Requiring a secret knock.
-
In principle, the following two code fragments are equivalent:
IShellFolder *psf;
IUnknown *punk;
CoCreateInstance(CLSID_xyz, ..., IID_IShellFolder, (void**)&psf);
psf->QueryInterface(IID_IUnknown, (void**)&punk);
CoCreateInstance(CLSID_xyz, ..., IID_IUnknown, (void**)&punk);
punk->QueryInterface(IID_IShellFolder, (void**)&psf);
In reality, some implementations mess up and fail the second call
to CoCreateInstance. The only way to create the object successfully
is to create it with the IShellFolder interface.
- Forgetting to say "no" properly.
-
One of the rules for saying "no" is that you have to set the
output pointer to NULL before returning. Some people forget to do
that.
IMumble *pmbl;
punk->QueryInterface(IID_IMumble, (void**)&pmbl);
If the QueryInterface succeeds, then pmbl must be non-NULL on return.
If it fails, then pmbl must be NULL on return.
The shell has to be compatible with all these buggy objects because
if it weren't, customers would get upset and the press would have
a field day. Some of the offenders
are big-name programs. If they broke, people would report,
"Don't upgrade to Windows XYZ, it's not compatible with
<big-name program>."
Conspiracy-minded folks would shout,
"Microsoft intentionally broke <big-name program>!
Proof of unfair business tactics!"
[Raymond is currently on vacation; this message was pre-recorded.]
I once saw a whole family of COM objects whose clients were forced to clean up via something like :
while (pObj->Release() > 0) {
}
I.e. you had to keep calling Release() until the refcount dropped to zero, at which point you could be sure they’d destroyed themselves.
Unless you did that (cos their reference counting was broken) apps leaked horribly and machines eventually died.
Did I mention that this was on ultra-mission-critical City finance systems?!
Once again, these problems wouldn’t be there if Microsoft Windows had enforced correct behaviour to begin with.
That, and <big-name program> is most likely Office 95.
Every time I hear an accusation such as "don’t upgrade to MS update X because it introduces an incompatibility with program Y," I wish MS would publish details on why program Y was to blame. And links to MS public docs that spell out the rules that Y is violating.
Not only would this shut up the conspiraicy theorists, it just might shame the vendor of program Y into not doing these problems.
Of course, politics would let that happen.
FYI, IShellView is derived from IOleWindow, not IOleView. :-)
Somebody care to explain the "Requiring A Secret Knock" example?
The "Requiring A Secret Knock" example has to deal with failing to have the Class constructor’s CreateInstance function invoked by CoCreateInstance allow itself to be assigned to an IUnknown pointer on creation. Yet this is perfectly legal since all objects derive from IUnknown.
Its like the class constructor has this psuedo c++ code. NOTE: iid is a guid so equivalence is usually done differently.
CreateInstance(…) {
switch (iid) {
case: IID_IShellFolder
{ //instatiate IShellFolder and return interface IShellFolder
… }
case: IID_IUnknown
{ …; return E_NOINTERFACE}
…
}
instead of
CreateInstance(…) {
switch (iid) {
case: IID_IShellFolder
{ //instatiate IShellFolder and return interface IShellFolder
… }
case: IID_IUnknown
{ //instatiate IShellFolder and return interface IUnknown
}
Forgetting to respond to your own interface
QI==Are you you? NO!
"FOO: Once again, these problems wouldn’t be there if Microsoft Windows had enforced correct behaviour to begin with.
That, and <big-name program> is most likely Office 95. "
ooh! Great IDEA! Microsoft Windows the new stack machine enforcement agency regime (SMEAR). No CPU should be allowed to run without it. :)
With any software its a wonder dynamic binding works at all. I suppose we could all live in the realm of CNF and distributive protocol MATHEMATICS where if it doesn’t conform to safety and liveness properties or discrete mathematics the implementation shouldn’t exist or execute even if it does. Have a dull deterministic day foo.
Barry: Returning the value from Release is not optional; however, its value is binary – either the object has been completely released, or it’s in use.
Peter, thanks for the response. If I understand you correctly "Requiring A Secret Knock" is a re-wording of "Forgetting to respond to IUnknown"? At least that’s how I saw it and hence my confusion.
More to the point, trying to divine the reference count from outside is almost certainly an error. What would you do with the reference count if you knew it?
Follow up:
Specifically, think about the multi-threaded case. The reference count might change between AddRef/Release returning, and your thread executing the next instruction. So the value has no meaning that you can safely act upon. (What would you want to do with it anyway?)
re: calling Release until it returns 0
Don’t forget that returning the reference count from AddRef and Release is *optional*. Attempting to write code to debug COM object reference counting errors is a nightmare because of this.
You’re right Simon, I wrote in haste.
However, it still would have been *way* easier if AddRef and Release were required to return the reference count.
Of course, that raises the question of whether this is the reference count for the interface or the object, which is probably why it’s undefined. But it’s still a pain. Trying to divine the reference count from the outside is error-prone, to say the least.
I suspect that the reason why AddRef is not required to return exact reference count is to allow implementations to simply return the result of InterlockedIncrement.
On old CPUs (and OSes), return value of InterlockedIncement was not guaranteed to be exact. The only thing you could depend on was the sign of the return value (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/interlockedincrement.asp).
Judging by earlier topic on the different ways companies defraud WHQL certification process, enforcing correct COM behavior does not seem possible.
The best solution is good libraries, like ATL. I’ve seen dozens of such bugs in pre-ATL code, but don’t remember any in ATL-based code.
He stated that he would be using it *to debug* COM objects. I think this is a legitimate use of the return value; even MSDN itself agrees ( http://msdn.microsoft.com/library/default.asp?url=/library/en-us/com/htm/cmi_q2z_59np.asp ):
"Return Value
Returns the resulting value of the reference count, which is used for diagnostic/testing purposes only."
In the presence of proxies, the return value also loses its meaning, since it is the reference count of the proxy, not of the underlying object.
foo: How do you recommend that Microosft enforce correct behavior? Since an IID is just a GUID, you cannot prove just by looking at an IID whether it is derived from another IID. You also cannot prove that an object implements all the methods correctly. I’ve seen objects that break the rules but but only when you ask them for a *specific* interface:
IUnknown::QueryInterface() {
if (riid == IID_BadInterface)
{ return S_OK; /*error*/ }
else { … /* correct implementation */ }
}
The only way to detect this ahead of time is to attempt to QueryInterface all 2^128 IIDs in the hopes of finding the bad one. And it’s possible that the bad QI is written like this:
IUnknown::QueryInterface() {
if (… complex condition …)
{ return S_OK; /*error*/ }
else { … /* correct implementation */ }
}
how can you test externally for this error? "complex condition" could be anything.
The solution to the "Secret Knock" is to write your class factory like this:
CreateInstance(…) {
CObject *p = new CObject(…);
if (p) {
p->QueryInterface(…);
p->Release();
}
}
If you switch on the iid in your CreateInstance, then when you add a new interface to your CObject, you will also have to update the CreateInstance to support it – something you can easily forget to do.
Raymond: "How do you recommend that Microosft enforce correct behavior?"
Designing the interface so that it’s a lot harder to make mistakes, of course. Implementing CoQueryInterface as a free API function that takes care of all the trivial cases would do just nicely.
Same goes for CreateInstance(). There’s no reason for CoCreateInstance() not to handle the interface itself.
Well… yes, there’s a reason: CreateInstance() could decide not to create the object at all if the interface is not supported. But, if a COM class was required to publish its list of supported interfaces together with CreateInstance(), that wouldn’t be needed either. Besides, that would help a free CoQueryInterface function to properly handle hierarchies itself.
JCAB
The problem with the proposed CoQueryInterface is that it would prevent people from implementing tear-offs, dynamic interfaces, object pseudo-interfaces, and other clever games inside QI. But the worst part would be that everybody would have to register their interfaces so CoQI can know about everybody’s interface hierarchies. Today you can run guidgen and ta-da, you have a new interface. No registration necessary. Just start using it.
The new world of the CLR has basically done what you suggested – interface management is no longer done by the object itself. Instead, you tell the CLR what your objects and interfaces derive from and it implements QI for you. (Well, okay, the CLR QI doesn’t support tear-offs or dynamic interfaces…)
> IUnknown *punk;
> psf->QueryInterface(IID_IUnknown, (void**)
> &punk);
punk is not a void*. punk is an IUnknown*.
void** is not a universal pointer type. void* is a universal pointer type, and char* and relatives are grandparented in to be equivalent in that way, but void** is not.
Casting &punk to type (void**) is just as risky as casting it to type (int*) or (struct s*). Casting &punk to type (void**) is just as risky as casting &some_int to type (double*).
Casting &punk to type (void*) would be safe, but that is not what you did.
If you want to obey the calling convention and avoid horrible deaths, you have to do this:
IUnknown *punk;
void *punkvoid;
psf->QueryInterface(IID_IUnknown, &punkvoid);
punk = (IUnknown *) punkvoid;
And you know it. Lots of other MSDN contributors made the same identical mistake. Many of your articles correct the mistakes of MSDN contributors, and this should have been one of them. You’ll probably tell me that it works in all VC++ implementations to date, but that doesn’t make it correct code, you’re still violating the calling convention.
(By the way, the requirement for that (IUnknown *) cast, though safe, is still dangerous. All usage of casts desensitizes people so they forget to check how dangerous casts usually are. It would have been better if C++ still allowed C-style assignment directly from void* to other data pointer types, with implicit conversion instead of requiring the cast. Of course the compiler would still generate code necessary for this conversion.)
I agree that the distinction is theoretically necessary (on word-addressed machines), but COM assumes a byte-addressed machine.
I think the intentionally-incorrect (void**) was chosen to try to catch people making the mistake
IUnknown *punk;
psf->QueryInterface(IID_IUnknown, punk);
which would otherwise go unnoticed.
"It would have been better if C++ still allowed C-style assignment directly from void* to other data pointer types, with implicit conversion instead of requiring the cast"
"better" in the sense of preventing the compiler from catching certain bugs?
How’s that an improvement?
I don’t know about you, but I think it’s quite desirable that this issue an error:
char ch(0);
char* pc(&ch);
void* pv(pc);
int* pi(pv);
*pi = 0; // ruh roh raggy.
void* can point at anything, so it’s not unreasonable that I should have to specify exactly what I want to convert to.
It’s not immediately clear how this is a calling convention issue; object representations (the requirement that
T t;
foo((void**)&t)
is equivalent to
T t;
void* pv(&t);
foo(&pv);
insofar as in either case the pointer can be dereferenced once to yield a T* and twice to yield a T) should surely not be specified as part of a calling convention.
The use of C style casts is another abomination.
3/29/2004 12:53 AM Raymond Chen:
> I agree that the distinction is
> theoretically necessary (on word-addressed
> machines), but COM assumes a byte-addressed
> machine.
You’ve proved in other topics that you know why this excuse is invalid.
I’ve read that there’s a machine where some pointers can be 128 bytes long (that’s the pointer not the item being pointed to), but I don’t think all pointers are 128 bytes long on that machine. I don’t know if it’s a byte-addressed machine though. If we want to consider some known byte-addressed machines, we know of some cases where pointers to member functions aren’t the same size as pointers to ordinary functions, though I admit that neither of these is a pointer to data or void*.
> I think the intentionally-incorrect (void**)
> was chosen to try to catch people making the
> mistake
> IUnknown *punk;
> psf->QueryInterface(IID_IUnknown, punk);
> which would otherwise go unnoticed.
I’m really confused by this statement. It doesn’t see incorrect (intentionally or otherwise) to define the second parameter with type (void**). So I can only guess at the moment, would you consider it more correct to define the second parameter with type (IUnknown**)? But if that were done, then your example error would indeed be caught. Besides, it would prevent querying for interfaces other than IUnknown, so I don’t really think that would be more correct.
On the other hand, use of a cast prevents the compiler from diagnosing lots of possible actual errors. This is really why it is dangerous to desensitize people to the use of this dangerous tool.
Anyway, you know that the following code works, and you know why it should be used:
void *punkvoid;
psf->QueryInterface(IID_IUnknown, &punkvoid);
punk = (IUnknown *) punkvoid;
3/29/2004 4:49 AM DrPizza:
> "better" in the sense of preventing the
> compiler from catching certain bugs?
> int* pi(pv);
You think that C++’s requirement for coding casts prevents this bug?
int* pi = (int*) pv;
What else are you going to do with your pv? If you think it points to something, you’re planning to use the value of that something eventually. Sure when you think it points to something and it doesn’t, that’s a bug, but that bug is not eliminated by desensitizing people to the use of casts.
"You think that C++’s requirement for coding casts prevents this bug? "
It’s self-evident. The code won’t compile.
"What else are you going to do with your pv?"
Who knows? Normally it’s given to someone else to deal with because I’m working with some crufy C API.
"If you think it points to something, you’re planning to use the value of that something eventually. Sure when you think it points to something and it doesn’t, that’s a bug, but that bug is not eliminated by desensitizing people to the use of casts. "
There is no desensitization. Casts like that are extremely rare in good C++. Far from desensitizing programmers, it makes abundantly clear to them that they’re doing something that’s potentially dangerous. They only perform such casts infrequently, so it serves as an effective warning.
>> You think that C++’s requirement for coding
>> casts prevents this bug?
>> int* pi = (int*) pv;
>
> It’s self-evident. The code won’t compile.
That doesn’t look very self-evident to me.
> There is no desensitization. Casts like that
> are extremely rare in good C++.
I wonder how to reply to this.
1. C++ forces that kind of cast to be used, whereas C allows pointers of type (void*) to be assigned to pointers that point to data. That was because C knew that (void*) was added to meet a need, which it did meet, while it did not replace the need for ordinary pointers to be used normally. Do you mean that good C++ code extremely rarely has any need for pointers of type (void*), as opposed to good C code which only moderately needed them?
2. At least it seems you’re agreeing with me that this:
> […] (void**)&punk […]
is not good C++ code. But the problem remains that even when this bad code is removed, C++ still forces a cast to be used in a subsequent assignment, in the location where C would not force it.
"1. C++ forces that kind of cast to be used, whereas C allows pointers of type (void*) to be assigned to pointers that point to data."
Which is good, it prevents accidents.
"That was because C knew that (void*) was added to meet a need, which it did meet, while it did not replace the need for ordinary pointers to be used normally. Do you mean that good C++ code extremely rarely has any need for pointers of type (void*), as opposed to good C code which only moderately needed them? "
Correct. In general I would suggest that the only time void* should be used in C++ is when calling those unwrapped C APIs that require it. Many of those APIs (though not this one) probably deserve lightweight wrappers anyway.
"is not good C++ code. But the problem remains that even when this bad code is removed, C++ still forces a cast to be used in a subsequent assignment, in the location where C would not force it. "
But that’s a /good/ thing, not a bad one.
I’m gay
IOleCommandTarget is very useful.&nbsp; It provides a generic way of sending commands between objects.&nbsp;…
Returning E_NOINTERFACE from IUnknown::QueryInterface has very specific meaning.
You’re writing some code that uses COM. Do you use SmartPointers or not use SmartPointers? Debate on
Watch what?
It’s in the design.