What is __purecall?

Date:April 28, 2004 / year-entry #163
Tags:code
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20040428-00/?p=39613
Comments:    39
Summary:Both C++ and C# have the concept of virtual functions. These are functions which always invoke the most heavily derived implementation, even if called from a pointer to the base class. However, the two languages differ on the semantics of virtual functions during object construction and destruction. C# objects exist as their final type before...

Both C++ and C# have the concept of virtual functions. These are functions which always invoke the most heavily derived implementation, even if called from a pointer to the base class. However, the two languages differ on the semantics of virtual functions during object construction and destruction.

C# objects exist as their final type before construction begins, whereas C++ objects change type during the construction process.

Here's an example:

class Base {
public:
  Base() { f(); }
  virtual void f() { cout << 1; }
  void g() { f(); }
};

class Derived : public Base {
public:
  Derived() { f(); }
  virtual void f() { cout << 2; }
};

When a Derived object is constructed, the object starts as a Base, then the Base::Base constructor is executed. Since the object is still a Base, the call to f() invokes Base::f and not Derived::f. After the Base::Base constructor completes, the object then becomes a Derived and the Derived::Derived constructor is run. This time, the call to f() invokes Derived::f.

In other words, constructing a Derived object prints "12".

Similar remarks apply to the destructor. The object is destructed in pieces, and a call to a virtual function invokes the function corresponding to the stage of destruction currently in progress.

This is why some coding guidelines recommend against calling virtual functions from a constructor or destructor. Depending on what stage of construction/destruction is taking place, the same call to f() can have different effects. For example, the function Base::g() above will call Base::f if called from the Base::Base constructor or destructor, but will call Derived::f if called after the object has been constructed and before it is destructed.

On the other hand, if this sample were written (with suitable syntactic changes) in C#, the output would be "22" because a C# object is created as its final type. Both calls to f() invoke Derived::f, since the object is always a Derived. Notice that means a method can be invoked on an object before its constructor has run. Something to bear in mind.

Sometimes your C++ program may crash with the error "R6025 - pure virtual function call". This message comes from a function called __purecall. What does it mean?

C++ and C# both have the concept of a "pure virtual function" (which C# calls "abstract"). This is a method which is declared by the base class, but for which no implementation is provided. In C++ the syntax for this is "=0":

class Base {
public:
  Base() { f(); }
  virtual void f() = 0;
};

If you attempt to create a Derived object, the base class will attempt to call Base::f, which does not exist since it is a pure virtual function. When this happens, the "pure virtual function call" error is raised and the program is terminated.

Of course, the mistake is rarely as obvious as this. Typically, the call to the pure virtual function occurs deep inside the call stack of the constructor.

This raises the side issue of the "novtable" optimization. As we noted above, the identity of the object changes during construction. This change of identity is performed by swapping the vtables around during construction. If you have a base class that is never instantiated directly but always via a derived class, and if you have followed the rules against calling virtual methods during construction, then you can use the novtable optimization to get rid of the vtable swapping during construction of the base class.

If you use this optimization, then calling virtual methods during the base class's constructor or destructor will result in undefined behavior. It's a nice optimization, but it's your own responsibility to make sure you conform to its requirements.

Sidebar: Why does C# not do type morphing during construction? One reason is that it would result in the possibility, given two objects A and B, that typeof(A) == typeof(B) yet sizeof(A) != sizeof(B). This would happen if A were a fully constructed object and B were a partially-constructed object on its way to becoming a derived object.

Why is this so bad? Because the garbage collector is really keen on knowing the size of each object so it can know how much memory to free. It does this by checking the object's type. If an object's type did not completely determine its size, this would result in the garbage collector having to do extra work to figure out exactly how big the object is, which means extra code in the constructor and destructor, as well as space in the object, to keep track of which stage of construction/destruction is currently in progress. And all this for something most coding guidelines recommend against anyway.


Comments (39)
  1. DrPizza says:

    Another instance where C++ does the Right Thing and other languages (C#, Java, et al.) do not.

    It’s funny. C++ is clunky in so many areas (the syntax has so many horrors due to declaration mimics use, some pointless restrictions on what you can do with templates, the stupid inclusion model and use of the C preprocessor, = 0 for pure virtual functions, etc. etc.), yet time and time again manages to do the right thing where its "superior" derivatives do not.

    If only the C++ Working Group would realize that breaking changes are not the end of the world, they could fix these problems, and make a much more approachable, even more expressive language.

  2. Jack Mathews says:

    Yeah, I didn’t know C# behaved that way. Between that and its lack of const, I’m starting to dislike the language more and more. Maybe I should read up on it to dislike it more :-)

  3. Mr Pizza, well, IMO, Delphi did the right thing, ie, you cannot instantiate a class with at least one abstract method.

    You guys should really look at Delphi.

    Jack, you should try Delphi anyway… C# is derived from Delphi.

    Also, take a look at the comparison between C# and Delphi, here http://chuacw.hn.org/chuacw/archive/2004/04/28/460.aspx and here http://chuacw.hn.org/chuacw/category/47.aspx

  4. Raymond Chen says:

    You’re not allowed to do that in C# or C++ either. You have to instantiate the base class. The question is, what is the object’s identity when the base class constructor is running?

  5. Marc Wallace says:

    I could see it being useful knowing your eventual type in the constructor.

    It doesn’t feel intuitive to me, though. Even though I hadn’t known how C++ handled it (never really tried it before), it makes sense to me that the object would start out a Base and become a Derived. This also matches how the compiler invokes super() as a precondition before Derived’s constructor executes.

    On the other hand, consider cases where you explicitly call super(), perhaps near the end of Derived’s constructor. By that point, the object must already be a Derived. So maybe the C# way makes sense, and maybe my intuition is just wrong?

    C# is also a bit messy because of all the intra-language concerns (as was mentioned in the const links from yesterday). Besides, the name is silly. I keep wanting to call it "D flat". ;-)

  6. asdf says:

    And here is a bit of trivia for all the Morts: C++ lets you define the pure virtual function.

    struct Foo {

    virtual void foo() = 0;

    };

    inline void Foo::foo() { }

    struct Boo : Foo {

    void foo() { }

    };

    Boo b;

    b.Foo::foo();

  7. Cooney says:

    C++ lets you define the pure virtual function.

    Lemme guess: you found out about this while tracking some bizarre bug?

  8. brian says:

    Making a virtual function call in the constructor rarely makes sense anyway. I have to vote that C++ got this better (though not right) – if the Derived class constructor hasn’t run, it doesn’t make sense for a virtual function to resolve to one defined in that class. That call is going to rely on class invariants that haven’t been established.

    IMHO, the right way is with post-constructors, functions that runs at construction time, but only after the object is fully formed. ATL COM objects support these, I think. I forget what they call them.

  9. Aarrgghh says:

    brian: FinalConstruct() and FinalRelease().

  10. Jack Mathews says:

    > You have to instantiate the base class. The question is, what is the object’s identity when the base class constructor is running? <<

    The identity in the ctor or dtor should be the class the ctor/dtor is in (not in the "actual" derived class that the class will become). Why? In a ctor, the other parts do not exist yet. They have not been constructed, so they are random bits. In a dtor, the data has been destroyed, so that part of the object is gone. Why would you want to call methods in an object that does not exist yet?

    In C++, what you get in a virtual call is a function knowing about the data for the parts of the class that have been made or a pure virtual (which I believe is a warning in lots of compilers now, unless the virtual call is a function call away). In C#, you get it acting on data that doesn’t exist yet. While you can obvious write code in either to STILL break, C# is a lot more crash prone in this regard.

  11. Jack Mathews says:

    … and while the pure virtual in C++ will crash, it will not be a random thing depending on the bits in memory, like the C# crashes/errors would be.

  12. Raymond Chen says:

    Well C# doesn’t permit uninitialized memory so the result won’t be random. It’ll get its default value (for integers, zero).

    abstract class Base {

    public Base() { x(); }

    public abstract void x();

    }

    class Derived : Base {

    public int j;

    public override void x() { System.Console.WriteLine("j = {0}", j); }

    }

    class Program {

    public static void Main()

    {

    Derived d = new Derived();

    d.j = 9;

    d.x();

    }

    }

    this prints

    j = 0

    j = 9

  13. Antonio Tejada says:

    "And here is a bit of trivia for all the Morts: C++ lets you define the pure virtual function."

    Yes, you can define the body for any pure virtual function and in fact it’s required for pure desctructors (even if it’s just an empty body).

    Actually, contrary to what Raymond seems to suggest with "which does not exist since it is a pure virtual function", a "pure" function doesn’t mean that there’s no implementation of the function in the base class.

    It rather means that descendants are obliged to extend the implementation (the base class is abstract and cannot be instanciated directly) but, as mentioned, the base class can provide a helper or default implementation for the pure function.

    Which reminds me of a VC6 bug where destructors of global variables (think singletons) couldn’t be private because the program finalization routine added by the compiler had to be able to call them.

  14. Jack Mathews says:

    Raymond:

    So in C#, all types have a default constructor? Your own classes have to support default construction?

  15. Raymond Chen says:

    Jack: Sorry, I don’t see how that relates to this subject. Section 10.10.4 of the C# Language Specification is titled "Default Constructors" and discusses what C# default constructors are like. But I don’t see how that is relevant here.

  16. DrPizza says:

    "Making a virtual function call in the constructor rarely makes sense anyway."

    It crops up naturally from time to time, and that C++ does things the right way is useful. That C# (and Java, for that matter) do not do things the right way is similarly unhelpful.

    "IMHO, the right way is with post-constructors, functions that runs at construction time, but only after the object is fully formed."

    It’s not clear what this would do or what problem it would solve.

  17. brian says:

    It’s not clear what this would do or what problem it would solve.

    You can eat your cake and not choke on it. The object is fully-formed by the time the code in the post-constructor runs, and virtual functions resolve to the most-derived implementation. Compare this to the C++ approach where you can call undefined functions (the topic of this blog entry) or the C#/Java approach where the function can manipulate members that haven’t been initialized (because the function call is defined in a more-derived class which hasn’t been constructed yet).

  18. Jack Mathews says:

    Raymond:

    If you have a private default constructor, then how is the memory "default constructed" ? By what you’re saying, before a class with virtuals is instantiated, it would have to be pre-default constructed because its first ctor is ever called.

    It’s more likely the memory is just guaranteed to be zeroed, which is very much different than the objects all having default construction.

    And if the memory is just zeroed, then it’s very much invalid data, though at least it’s not random.

  19. DrPizza says:

    "You can eat your cake and not choke on it."

    I don’t choke on it in C++.

    "The object is fully-formed by the time the code in the post-constructor runs, and virtual functions resolve to the most-derived implementation."

    If I wanted the most derived version to be called I wouldn’t have called the virtual function from the flipping base class constructor!

    "Compare this to the C++ approach where you can call undefined functions (the topic of this blog entry)"

    Er… so? That’s not the wrong thing to do.

  20. Jack Mathews says:

    "The object is fully-formed by the time the code in the post-constructor runs, and virtual functions resolve to the most-derived implementation."

    Wait, so if you have two classes where Foo is derived from Bar, then Bat’s variables get constructed, then Foo’s, THEN Bar’s ctor function then Foo’s? If so, that’s completely asinine. How do you go about showhorning code between the two without making some proxy object that calls back into Bar?

  21. Raymond,

    You said "The question is, what is the object’s identity when the base class constructor is running? "

    Why is knowing the object’s identity during construction important?

    For me, at least, this question never came up at all. I suspect this might be the case for some other Delphi developers.

  22. Raymond Chen says:

    Jack: All variables get an initial value; I don’t know what the rules are but presumably rules exist.

    And "How do you go about shoehorning…"? Presumably the CLR somehow manages to know how to construct the objects.

    Delphi: So if you invoke a virtual method from the base class’s constructor, which function gets called? The one defined by the base class or the one defined by the derived class? Or is this merely something that Delphi people don’t do in the first place?

  23. Yeagh, that’s horrible. So the problem is that the garbage collector has to violate the "external objects cannot access an object before it is fully constructed" kinda-rule?

  24. Petr Kadlec says:

    Interesting, I have probably never give much thought to that (just like I don’t need to call virtual functions in constructor), so the answer is maybe that Delphi people don’t do that.

    So I’ve made a little test. Results: Delphi calls the virtual function "correctly virtually", i.e. it calls the implementation of the derived class, even if called in constructor.

    But it may be because of another difference — in Delphi, the inherited constructors are not called automatically, you have to explicitly call them using "inherited". (And, like C#, Delphi objects have their memory initialized to zeros at the time the constructor gets called.)

  25. Henk Devos says:

    True, Delphi will create an object of the derived type before calling any constructors, but set all memory to 0. Delphi programmers tell me that they like this behavior, that they think it’s superior to the C++ behavior.

    In reality it means Delphi does not have real constructors.

    But to go back to the original subject:

    If the problem is just that the GC needs to determine the size, what would be wrong with just putting the size somewhere? Isn’t this how all allocators/deallocators are supposed to work?

  26. Raymond Chen says:

    Putting the size with the object would increase the size of each object by 4 bytes just to cover a case that most people consider to be bad programming form anyway.

    Since when are allocators "supposed" to work by putting the size with the allocation? There are many systems which do not do this. (The "buddy system" for example infers the size from the address. Most GCs infer the size from other metadata.)

  27. Centaur says:

    In Delphi, constructors are a very peculiar thing.

    First, they serve as initializers. After a new object is created, one of its constructors is used to give it initial value. Only the most derived constructor is called, which means the developer is responsible for calling a superclass constructor, either directly or by calling another constructor of the same class.

    Second, a constructor is used as a class-scope factory method. The usual syntax is:

    myObject := TMyClass.Create(…);

    Another syntax is to call a constructor on an instance, in which case it (re)initializes the instance. This is commonly used in derived constructors to call the inherited constructor on the same object. However, it might be possible (I didn’t check) to take a pointer to raw memory, cast it into a reference to an object, and invoke a constructor on that reference, thus constructing an object in arbitrary memory space, not just the default heap.

    Third, Delphi has a concept of class references. A class reference is a variable that is assigned a class as a whole. Class-scope methods of the class become instance-scope methods of the class reference. Since a constructor is a class-scope method behaving as a factory, it becomes an instance-scope factory method of a class reference. This, and the fact that constructors can be virtual (in terms of the class reference), allows to create objects whose type is only known at runtime.

    var

    MyClass: class of TControl;

    MyObject: TControl;



    if someCondition then

    MyClass := TButton

    else

    MyClass := TCheckBox;



    MyObject := MyClass.Create(MyForm); // creates a button or a checkbox

  28. josh says:

    Why does C++ have to swap vtables? I would assume that, since the type is known in the constructor, calls to virtual functions could be made directly, without involving the vtable at all… Then attempting to call a pure virtual method could be a compile time error, or at least link time.

  29. Raymond Chen says:

    True if it’s coming from the constructor, but what about this:

    class Base {

    public:

    Base() { g(); }

    virtual void f() { cout << 1; }

    void g() { f(); }

    };

    class Derived : public Base {

    public:

    virtual void f() { cout << 2; }

    };

    Derived d;

    d.g();

    What code should be generated for the function Base::g()? If it is called from the constructor, then you must call Base::f(), but if it is called from a fully-constructed Derived object, then you must call Derived::f().

    That’s why you need to swap the vtable. For all the g()’s out there.

  30. Henk Devos says:

    Raymond:

    Putting the size with the object would increase the size of each object by 4 bytes just to cover a case that most people consider to be bad programming form anyway.

    Since when are allocators "supposed" to work by putting the size with the allocation? There are many systems which do not do this. (The "buddy system" for example infers the size from the address. Most GCs infer the size from other metadata.)

    First of all, i think the 4 bytes for the size would be a small overhead cmpared to the type information. But even if it’s small you would still have some point (every bit helps).

    But you give a better formulation of what i said yourself: A memory manager will always have some way of knowing the size. It has to. Wether using the buddy system as you say, using blocks that contain the size, using pools of blocks for a given size, or any other method, you can always know the size, and this is essential to a memory manager. I just don’t see the use of relying on the type information for this.

    And how is this implemented? I suppose there is just a size field in the type description that’s not used for anything else?

  31. Raymond Chen says:

    I don’t know precisely how it’s implemented, but I do know that the size is not kept with the object. The CLR folks have gone to extraordinary lengths to get the per-object overhead as low as possible. There are ways of doing this without increasing the overhead (e.g., "dummy types" which exist only during construction) but as I noted, it seems an excessive amount of effort for something that most people recommend against anyway. Why penalize people who don’t use it?

  32. M Knight says:

    The Delphi translation would either be ‘2’ or ’22’ depending if you added the constructor chaining. Since constructor chaining is purely voluntary.

    In Delphi, every class has a ‘class type’ metadata imbedded into the exe/dll. This ‘class type’ data contain the v-table + misc metdata as well as a number of ‘special’ virtual functions, which implement some neat fucntionality.

    To create an object, you obtain a class-reference and call an object constructor in that class reference(constructors can be named). The size of the object, is determined by the class reference which knows how much memory to allocate as well as special actions to take for reference counted fields(Strings & interfaces are reference counted under Delphi for Win32). The object itself probably doesnt know it size, but the class-definition does.

    Once you have a valid class reference, you have everything you need to create a new object from that class reference. Constructors are really initializers, as thats all they do under the Delphi Object model(Memory allocation & deallocation are handled by 2 virtual class reference methods).

    Saddly, non of the internals of the class are publicly documented. As with some fleshing out, you could easily build a native win32 version of .NET Reflection.

    I’ve derived some methods which actually allow you to create new class definitions, but it was incredibly crude due to the lacking metadata.

  33. Petr Kadlec says:

    I don’t agree that none of the internals are documented. If you take a look at the TObject’s methods, you’ll see quite many useful things. (Like ClassParent, FieldAddress, MethodName, MethodAddress, InstanceSize, and of course the often-used ClassName.) And, if that is not enough, just take a look at System.pas, where you will find much fun. :-)

    But, I am afraid that we are getting off-topic — these are "actually not a .NET comments". ;-)

  34. josh says:

    <i>That’s why you need to swap the vtable. For all the g()’s out there.</i>

    Ah, those pesky g()’s. Thanks for the explanation.

  35. josh says:

    <hmpf/>

  36. Raymond, assuming declarations

    type

    TBase = class

    public

    constructor Create;

    procedure SomeProc; virtual;

    end;

    TDerived = class(TBase)

    public

    procedure SomeProc; override;

    end;

    constructor TBase.Create;

    begin

    SomeProc;

    end;

    procedure TBase.SomeProc;

    begin

    WriteLn(‘Hello’);

    end;

    procedure TDerived.SomeProc;

    begin

    WriteLn(‘World’);

    end;

    var

    Base: TBase;

    begin

    Base := TDerived.Create;

    end.

    then, "World" is displayed, therefore, like Petr pointed out, the "correct" virtual function is called.

  37. Raymond Chen says:

    On the other hand, in the first comment, DrPizza claims that the C++ behavior is the "correct" behavior. Different people have different ideas as to what "should" be done here, and it is reflected in how each language is designed.

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index