C++ vtables - Part 2 - Multiple Inheritance

(934 words)

The world of single-parent inheritance hierarchies is simpler for the compiler. As we saw in Part 1, each child class extends its parent vtable by appending entries for each new virtual method.

In this post we will cover multiple inheritance, which complicates things even when only inheriting from pure-interfaces.

Let’s look at the following piece of code:

class Mother {
 public:
  virtual void MotherMethod() {}
  int mother_data;
};

class Father {
 public:
  virtual void FatherMethod() {}
  int father_data;
};

class Child : public Mother, public Father {
 public:
  virtual void ChildMethod() {}
  int child_data;
};
Child’s layout
_vptr$Mother
mother_data (+ padding)
_vptr$Father
father_data
child_data1

Note that there are 2 vtable pointers. Intuitively I’d expect either 1 or 3 pointers (Mother, Father and Child). In reality it’s impossible to have a single pointer (more on this soon), and the compiler is smart enough to combine Child’s vtable entries as a continuation of Mother’s vtable, thus saving 1 pointer.

Why can’t Child have one vtable pointer for all 3 types? Remember that a Child pointer can be passed to a function accepting a Mother pointer or a Father pointer, and both will expect the this pointer to hold the correct data in the correct offsets. These functions don’t necessarily know of Child, and definitely shouldn’t assume that a Child is really what’s underneath the Mother/Father pointer they have in their hands.

1 Unrelated to this topic, but interesting nontheless, is that child_data is actually placed inside Father’s padding. This is called ‘tail padding’, and might be the topic of a future post.

Here’s the vtable layout:

Address Value Meaning
0x4008b8 0 top_offset (more on this later)
0x4008c0 0x400930 pointer to typeinfo for Child
0x4008c8 0x400800 Mother::MotherMethod(). _vptr$Mother points here.
0x4008d0 0x400810 Child::ChildMethod()
0x4008d8 -16 top_offset (more on this later)
0x4008e0 0x400930 pointer to typeinfo for Child
0x4008e8 0x400820 Father::FatherMethod(). _vptr$Father points here.

In this example, an instance of Child will have the same pointer when casted to a Mother pointer. But when casting to a Father pointer the compiler calculates an offset of the this pointer to point to the _vptr$Father part of Child (3rd field in Child’s layout, see table above).

In other words, for a given Child c;: (void*)&c != (void*)static_cast<Father*>(&c). Some people don’t expect this, and maybe some day this information will save you some debugging time. I found it useful more than once.

But wait, there’s more.

What if Child decided to override one of Father’s methods? Consider this code:

class Mother {
 public:
  virtual void MotherFoo() {}
};

class Father {
 public:
  virtual void FatherFoo() {}
};

class Child : public Mother, public Father {
 public:
  void FatherFoo() override {}
};

This gets tricky. A function may take a Father* argument and call FatherFoo() on it. But if you pass a Child instance, it is expected to invoke Child’s overridden method with the correct this pointer. However, the caller doesn’t know it’s really holding a Child. It has a pointer to a Child’s offset where Father’s layout is. Someone needs to offset this, but how is it done? What magic does the compiler perform to get this to work?

[Before we answer that, note that overriding one of Mother’s methods is not really tricky as the this pointer is the same. Child knows to read beyond the Mother vtable and expects the Child methods to be right after that.]

Here’s the solution: the compiler creates a ‘thunk’ method that corrects this and then calls the ‘real’ method. The address of the thunk method will sit under Child’s Father vtable, while the ‘real’ method will be under Child’s vtable.

Here’s Child’s vtable:

0x4008e8 <vtable for Child>:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x4008f0 <vtable for Child+8>:	0x60	0x09	0x40	0x00	0x00	0x00	0x00	0x00
0x4008f8 <vtable for Child+16>:	0x00	0x08	0x40	0x00	0x00	0x00	0x00	0x00
0x400900 <vtable for Child+24>:	0x10	0x08	0x40	0x00	0x00	0x00	0x00	0x00
0x400908 <vtable for Child+32>:	0xf8	0xff	0xff	0xff	0xff	0xff	0xff	0xff
0x400910 <vtable for Child+40>:	0x60	0x09	0x40	0x00	0x00	0x00	0x00	0x00
0x400918 <vtable for Child+48>:	0x20	0x08	0x40	0x00	0x00	0x00	0x00	0x00

Which means:

Address Value Meaning
0x4008e8 0 top_offset (soon!)
0x4008f0 0x400960 typeinfo for Child
0x4008f8 0x400800 Mother::MotherFoo()
0x400900 0x400810 Child::FatherFoo()
0x400908 -8 top_offset
0x400910 0x400960 typeinfo for Child
0x400918 0x400820 non-virtual thunk to Child::FatherFoo()

Explanation: as we saw earlier, Child has 2 vtables - one used for Mother and Child, and the other for Father. In Father’s vtable, FatherFoo() points to a thunk, while Child’s vtable points directly to Child::FatherFoo().

And what’s in this thunk, you ask?

(gdb) disas /m 0x400820, 0x400850
Dump of assembler code from 0x400820 to 0x400850:
15	  void FatherFoo() override {}
   0x0000000000400820 <non-virtual thunk to Child::FatherFoo()+0>:	push   %rbp
   0x0000000000400821 <non-virtual thunk to Child::FatherFoo()+1>:	mov    %rsp,%rbp
   0x0000000000400824 <non-virtual thunk to Child::FatherFoo()+4>:	sub    $0x10,%rsp
   0x0000000000400828 <non-virtual thunk to Child::FatherFoo()+8>:	mov    %rdi,-0x8(%rbp)
   0x000000000040082c <non-virtual thunk to Child::FatherFoo()+12>:	mov    -0x8(%rbp),%rdi
   0x0000000000400830 <non-virtual thunk to Child::FatherFoo()+16>:	add    $0xfffffffffffffff8,%rdi
   0x0000000000400837 <non-virtual thunk to Child::FatherFoo()+23>:	callq  0x400810 <Child::FatherFoo()>
   0x000000000040083c <non-virtual thunk to Child::FatherFoo()+28>:	add    $0x10,%rsp
   0x0000000000400840 <non-virtual thunk to Child::FatherFoo()+32>:	pop    %rbp
   0x0000000000400841 <non-virtual thunk to Child::FatherFoo()+33>:	retq   
   0x0000000000400842:	nopw   %cs:0x0(%rax,%rax,1)
   0x000000000040084c:	nopl   0x0(%rax)

Like we discussed - offsetting this and calling FatherFoo(). And by how much should we offset this to get Child? top_offset!

[Please note that I personally think that the name non-virtual thunk is extremely confusing as this is the entry in the virtual table to the virtual function. I’m not sure what’s not virtual about it, but that’s just my opinion.]

Stay tuned for Part 3 - Virtual inheritance - where things get even funkier.


Comments