Date: | April 28, 2004 / year-entry #163 |
Tags: | code |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20040428-00/?p=39613 |
Comments: | 39 |
Summary: | Both C++ and C# have the concept of virtual functions. These are functions which always invoke the most heavily derived implementation, even if called from a pointer to the base class. However, the two languages differ on the semantics of virtual functions during object construction and destruction. C# objects exist as their final type before... |
Both C++ and C# have the concept of virtual functions. These are functions which always invoke the most heavily derived implementation, even if called from a pointer to the base class. However, the two languages differ on the semantics of virtual functions during object construction and destruction. C# objects exist as their final type before construction begins, whereas C++ objects change type during the construction process. Here's an example: class Base { public: Base() { f(); } virtual void f() { cout << 1; } void g() { f(); } }; class Derived : public Base { public: Derived() { f(); } virtual void f() { cout << 2; } };
When a
In other words, constructing a Similar remarks apply to the destructor. The object is destructed in pieces, and a call to a virtual function invokes the function corresponding to the stage of destruction currently in progress.
This is why some coding guidelines recommend against
calling virtual functions from a constructor or destructor.
Depending on what stage of construction/destruction is taking place,
the same call to
On the other hand, if this sample were written
(with suitable syntactic changes) in C#,
the output would be "22"
because a C# object is created as its final type.
Both calls to Sometimes your C++ program may crash with the error "R6025 - pure virtual function call". This message comes from a function called __purecall. What does it mean? C++ and C# both have the concept of a "pure virtual function" (which C# calls "abstract"). This is a method which is declared by the base class, but for which no implementation is provided. In C++ the syntax for this is "=0": class Base { public: Base() { f(); } virtual void f() = 0; };
If you attempt to create a Of course, the mistake is rarely as obvious as this. Typically, the call to the pure virtual function occurs deep inside the call stack of the constructor. This raises the side issue of the "novtable" optimization. As we noted above, the identity of the object changes during construction. This change of identity is performed by swapping the vtables around during construction. If you have a base class that is never instantiated directly but always via a derived class, and if you have followed the rules against calling virtual methods during construction, then you can use the novtable optimization to get rid of the vtable swapping during construction of the base class. If you use this optimization, then calling virtual methods during the base class's constructor or destructor will result in undefined behavior. It's a nice optimization, but it's your own responsibility to make sure you conform to its requirements. Sidebar: Why does C# not do type morphing during construction? One reason is that it would result in the possibility, given two objects A and B, that typeof(A) == typeof(B) yet sizeof(A) != sizeof(B). This would happen if A were a fully constructed object and B were a partially-constructed object on its way to becoming a derived object. Why is this so bad? Because the garbage collector is really keen on knowing the size of each object so it can know how much memory to free. It does this by checking the object's type. If an object's type did not completely determine its size, this would result in the garbage collector having to do extra work to figure out exactly how big the object is, which means extra code in the constructor and destructor, as well as space in the object, to keep track of which stage of construction/destruction is currently in progress. And all this for something most coding guidelines recommend against anyway. |
Comments (39)
Comments are closed. |
Another instance where C++ does the Right Thing and other languages (C#, Java, et al.) do not.
It’s funny. C++ is clunky in so many areas (the syntax has so many horrors due to declaration mimics use, some pointless restrictions on what you can do with templates, the stupid inclusion model and use of the C preprocessor, = 0 for pure virtual functions, etc. etc.), yet time and time again manages to do the right thing where its "superior" derivatives do not.
If only the C++ Working Group would realize that breaking changes are not the end of the world, they could fix these problems, and make a much more approachable, even more expressive language.
Yeah, I didn’t know C# behaved that way. Between that and its lack of const, I’m starting to dislike the language more and more. Maybe I should read up on it to dislike it more :-)
Mr Pizza, well, IMO, Delphi did the right thing, ie, you cannot instantiate a class with at least one abstract method.
You guys should really look at Delphi.
Jack, you should try Delphi anyway… C# is derived from Delphi.
Also, take a look at the comparison between C# and Delphi, here http://chuacw.hn.org/chuacw/archive/2004/04/28/460.aspx and here http://chuacw.hn.org/chuacw/category/47.aspx
You’re not allowed to do that in C# or C++ either. You have to instantiate the base class. The question is, what is the object’s identity when the base class constructor is running?
I could see it being useful knowing your eventual type in the constructor.
It doesn’t feel intuitive to me, though. Even though I hadn’t known how C++ handled it (never really tried it before), it makes sense to me that the object would start out a Base and become a Derived. This also matches how the compiler invokes super() as a precondition before Derived’s constructor executes.
On the other hand, consider cases where you explicitly call super(), perhaps near the end of Derived’s constructor. By that point, the object must already be a Derived. So maybe the C# way makes sense, and maybe my intuition is just wrong?
C# is also a bit messy because of all the intra-language concerns (as was mentioned in the const links from yesterday). Besides, the name is silly. I keep wanting to call it "D flat". ;-)
And here is a bit of trivia for all the Morts: C++ lets you define the pure virtual function.
struct Foo {
virtual void foo() = 0;
};
inline void Foo::foo() { }
struct Boo : Foo {
void foo() { }
};
Boo b;
b.Foo::foo();
Making a virtual function call in the constructor rarely makes sense anyway. I have to vote that C++ got this better (though not right) – if the Derived class constructor hasn’t run, it doesn’t make sense for a virtual function to resolve to one defined in that class. That call is going to rely on class invariants that haven’t been established.
IMHO, the right way is with post-constructors, functions that runs at construction time, but only after the object is fully formed. ATL COM objects support these, I think. I forget what they call them.
brian: FinalConstruct() and FinalRelease().
… and while the pure virtual in C++ will crash, it will not be a random thing depending on the bits in memory, like the C# crashes/errors would be.
Well C# doesn’t permit uninitialized memory so the result won’t be random. It’ll get its default value (for integers, zero).
abstract class Base {
public Base() { x(); }
public abstract void x();
}
class Derived : Base {
public int j;
public override void x() { System.Console.WriteLine("j = {0}", j); }
}
class Program {
public static void Main()
{
Derived d = new Derived();
d.j = 9;
d.x();
}
}
this prints
j = 0
j = 9
"And here is a bit of trivia for all the Morts: C++ lets you define the pure virtual function."
Yes, you can define the body for any pure virtual function and in fact it’s required for pure desctructors (even if it’s just an empty body).
Actually, contrary to what Raymond seems to suggest with "which does not exist since it is a pure virtual function", a "pure" function doesn’t mean that there’s no implementation of the function in the base class.
It rather means that descendants are obliged to extend the implementation (the base class is abstract and cannot be instanciated directly) but, as mentioned, the base class can provide a helper or default implementation for the pure function.
Which reminds me of a VC6 bug where destructors of global variables (think singletons) couldn’t be private because the program finalization routine added by the compiler had to be able to call them.
Raymond:
So in C#, all types have a default constructor? Your own classes have to support default construction?
Jack: Sorry, I don’t see how that relates to this subject. Section 10.10.4 of the C# Language Specification is titled "Default Constructors" and discusses what C# default constructors are like. But I don’t see how that is relevant here.
"Making a virtual function call in the constructor rarely makes sense anyway."
It crops up naturally from time to time, and that C++ does things the right way is useful. That C# (and Java, for that matter) do not do things the right way is similarly unhelpful.
"IMHO, the right way is with post-constructors, functions that runs at construction time, but only after the object is fully formed."
It’s not clear what this would do or what problem it would solve.
Raymond:
If you have a private default constructor, then how is the memory "default constructed" ? By what you’re saying, before a class with virtuals is instantiated, it would have to be pre-default constructed because its first ctor is ever called.
It’s more likely the memory is just guaranteed to be zeroed, which is very much different than the objects all having default construction.
And if the memory is just zeroed, then it’s very much invalid data, though at least it’s not random.
"You can eat your cake and not choke on it."
I don’t choke on it in C++.
"The object is fully-formed by the time the code in the post-constructor runs, and virtual functions resolve to the most-derived implementation."
If I wanted the most derived version to be called I wouldn’t have called the virtual function from the flipping base class constructor!
"Compare this to the C++ approach where you can call undefined functions (the topic of this blog entry)"
Er… so? That’s not the wrong thing to do.
"The object is fully-formed by the time the code in the post-constructor runs, and virtual functions resolve to the most-derived implementation."
Wait, so if you have two classes where Foo is derived from Bar, then Bat’s variables get constructed, then Foo’s, THEN Bar’s ctor function then Foo’s? If so, that’s completely asinine. How do you go about showhorning code between the two without making some proxy object that calls back into Bar?
Jack: All variables get an initial value; I don’t know what the rules are but presumably rules exist.
And "How do you go about shoehorning…"? Presumably the CLR somehow manages to know how to construct the objects.
Delphi: So if you invoke a virtual method from the base class’s constructor, which function gets called? The one defined by the base class or the one defined by the derived class? Or is this merely something that Delphi people don’t do in the first place?
Yeagh, that’s horrible. So the problem is that the garbage collector has to violate the "external objects cannot access an object before it is fully constructed" kinda-rule?
Interesting, I have probably never give much thought to that (just like I don’t need to call virtual functions in constructor), so the answer is maybe that Delphi people don’t do that.
So I’ve made a little test. Results: Delphi calls the virtual function "correctly virtually", i.e. it calls the implementation of the derived class, even if called in constructor.
But it may be because of another difference — in Delphi, the inherited constructors are not called automatically, you have to explicitly call them using "inherited". (And, like C#, Delphi objects have their memory initialized to zeros at the time the constructor gets called.)
True, Delphi will create an object of the derived type before calling any constructors, but set all memory to 0. Delphi programmers tell me that they like this behavior, that they think it’s superior to the C++ behavior.
In reality it means Delphi does not have real constructors.
But to go back to the original subject:
If the problem is just that the GC needs to determine the size, what would be wrong with just putting the size somewhere? Isn’t this how all allocators/deallocators are supposed to work?
Putting the size with the object would increase the size of each object by 4 bytes just to cover a case that most people consider to be bad programming form anyway.
Since when are allocators "supposed" to work by putting the size with the allocation? There are many systems which do not do this. (The "buddy system" for example infers the size from the address. Most GCs infer the size from other metadata.)
In Delphi, constructors are a very peculiar thing.
First, they serve as initializers. After a new object is created, one of its constructors is used to give it initial value. Only the most derived constructor is called, which means the developer is responsible for calling a superclass constructor, either directly or by calling another constructor of the same class.
Second, a constructor is used as a class-scope factory method. The usual syntax is:
myObject := TMyClass.Create(…);
Another syntax is to call a constructor on an instance, in which case it (re)initializes the instance. This is commonly used in derived constructors to call the inherited constructor on the same object. However, it might be possible (I didn’t check) to take a pointer to raw memory, cast it into a reference to an object, and invoke a constructor on that reference, thus constructing an object in arbitrary memory space, not just the default heap.
Third, Delphi has a concept of class references. A class reference is a variable that is assigned a class as a whole. Class-scope methods of the class become instance-scope methods of the class reference. Since a constructor is a class-scope method behaving as a factory, it becomes an instance-scope factory method of a class reference. This, and the fact that constructors can be virtual (in terms of the class reference), allows to create objects whose type is only known at runtime.
var
MyClass: class of TControl;
MyObject: TControl;
…
if someCondition then
MyClass := TButton
else
MyClass := TCheckBox;
…
MyObject := MyClass.Create(MyForm); // creates a button or a checkbox
Why does C++ have to swap vtables? I would assume that, since the type is known in the constructor, calls to virtual functions could be made directly, without involving the vtable at all… Then attempting to call a pure virtual method could be a compile time error, or at least link time.
True if it’s coming from the constructor, but what about this:
class Base {
public:
Base() { g(); }
virtual void f() { cout << 1; }
void g() { f(); }
};
class Derived : public Base {
public:
virtual void f() { cout << 2; }
};
Derived d;
d.g();
What code should be generated for the function Base::g()? If it is called from the constructor, then you must call Base::f(), but if it is called from a fully-constructed Derived object, then you must call Derived::f().
That’s why you need to swap the vtable. For all the g()’s out there.
Raymond:
Putting the size with the object would increase the size of each object by 4 bytes just to cover a case that most people consider to be bad programming form anyway.
Since when are allocators "supposed" to work by putting the size with the allocation? There are many systems which do not do this. (The "buddy system" for example infers the size from the address. Most GCs infer the size from other metadata.)
First of all, i think the 4 bytes for the size would be a small overhead cmpared to the type information. But even if it’s small you would still have some point (every bit helps).
But you give a better formulation of what i said yourself: A memory manager will always have some way of knowing the size. It has to. Wether using the buddy system as you say, using blocks that contain the size, using pools of blocks for a given size, or any other method, you can always know the size, and this is essential to a memory manager. I just don’t see the use of relying on the type information for this.
And how is this implemented? I suppose there is just a size field in the type description that’s not used for anything else?
I don’t know precisely how it’s implemented, but I do know that the size is not kept with the object. The CLR folks have gone to extraordinary lengths to get the per-object overhead as low as possible. There are ways of doing this without increasing the overhead (e.g., "dummy types" which exist only during construction) but as I noted, it seems an excessive amount of effort for something that most people recommend against anyway. Why penalize people who don’t use it?
The Delphi translation would either be ‘2’ or ’22’ depending if you added the constructor chaining. Since constructor chaining is purely voluntary.
In Delphi, every class has a ‘class type’ metadata imbedded into the exe/dll. This ‘class type’ data contain the v-table + misc metdata as well as a number of ‘special’ virtual functions, which implement some neat fucntionality.
To create an object, you obtain a class-reference and call an object constructor in that class reference(constructors can be named). The size of the object, is determined by the class reference which knows how much memory to allocate as well as special actions to take for reference counted fields(Strings & interfaces are reference counted under Delphi for Win32). The object itself probably doesnt know it size, but the class-definition does.
Once you have a valid class reference, you have everything you need to create a new object from that class reference. Constructors are really initializers, as thats all they do under the Delphi Object model(Memory allocation & deallocation are handled by 2 virtual class reference methods).
Saddly, non of the internals of the class are publicly documented. As with some fleshing out, you could easily build a native win32 version of .NET Reflection.
I’ve derived some methods which actually allow you to create new class definitions, but it was incredibly crude due to the lacking metadata.
I don’t agree that none of the internals are documented. If you take a look at the TObject’s methods, you’ll see quite many useful things. (Like ClassParent, FieldAddress, MethodName, MethodAddress, InstanceSize, and of course the often-used ClassName.) And, if that is not enough, just take a look at System.pas, where you will find much fun. :-)
But, I am afraid that we are getting off-topic — these are "actually not a .NET comments". ;-)
<i>That’s why you need to swap the vtable. For all the g()’s out there.</i>
Ah, those pesky g()’s. Thanks for the explanation.
<hmpf/>
Raymond, assuming declarations
type
TBase = class
public
constructor Create;
procedure SomeProc; virtual;
end;
TDerived = class(TBase)
public
procedure SomeProc; override;
end;
constructor TBase.Create;
begin
SomeProc;
end;
procedure TBase.SomeProc;
begin
WriteLn(‘Hello’);
end;
procedure TDerived.SomeProc;
begin
WriteLn(‘World’);
end;
var
Base: TBase;
begin
Base := TDerived.Create;
end.
then, "World" is displayed, therefore, like Petr pointed out, the "correct" virtual function is called.
On the other hand, in the first comment, DrPizza claims that the C++ behavior is the "correct" behavior. Different people have different ideas as to what "should" be done here, and it is reflected in how each language is designed.