C++ vtables - Part 1 - Basics

(1204 words)

In this mini post-series we’ll explore how clang implements vtables & RTTI. In this part we’ll start with some basic classes and later on cover multiple inheritance and virtual inheritance.

Please note that this mini-series will include some digging into the binary generated for our different pieces of code via gdb. This is somewhat low-level(ish), but I’ll do all the heavy lifting for you. I don’t believe many future posts will be this low-level.

Disclaimer: everything written here is implementation specific, may change in any future version, and should not be relied on. We look into this for educational reasons only.

☑ I agree

cool, let’s start.

Part 1 - vtables - Basics

Estimated read time: ~15 minutes.

Let’s examine the following code:

#include <iostream>
using namespace std;

class NonVirtualClass {
public:
	void foo() {}
};

class VirtualClass {
public:
	virtual void foo() {}
};

int main() {
	cout << "Size of NonVirtualClass: " << sizeof(NonVirtualClass) << endl;
	cout << "Size of VirtualClass: " << sizeof(VirtualClass) << endl;
}
$ # compile and run main.cpp
$ clang++ main.cpp && ./a.out 
Size of NonVirtualClass: 1
Size of VirtualClass: 8

NonVirtualClass has a size of 1 because in C++ classes can’t have zero size. However, this is not important right now.

VirtualClass’s size is 8 on a 64 bit machine. Why? Because there’s a hidden pointer inside it pointing to a vtable. vtables are static translation tables, created for each virtual-class. This post series is about their content and how they are used.

To get some deeper understanding on how vtables look let’s explore the following code with gdb to find out how the memory is laid out:

#include <iostream>

class Parent {
 public:
  virtual void Foo() {}
  virtual void FooNotOverridden() {}
};

class Derived : public Parent {
 public:
  void Foo() override {}
};

int main() {
  Parent p1, p2;
  Derived d1, d2;

  std::cout << "done" << std::endl;
}
$ # compile our code with debug symbols and start debugging using gdb
$ clang++ -std=c++14 -stdlib=libc++ -g main.cpp && gdb ./a.out
...
(gdb) # ask gdb to automatically demangle C++ symbols
(gdb) set print asm-demangle on
(gdb) set print demangle on
(gdb) # set breakpoint at main
(gdb) b main
Breakpoint 1 at 0x4009ac: file main.cpp, line 15.
(gdb) run
Starting program: /home/shmike/cpp/a.out 

Breakpoint 1, main () at main.cpp:15
15	  Parent p1, p2;
(gdb) # skip to next line
(gdb) n
16	  Derived d1, d2;
(gdb) # skip to next line
(gdb) n
18	  std::cout << "done" << std::endl;
(gdb) # print p1, p2, d1, d2 - we'll talk about what the output means soon
(gdb) p p1
$1 = {_vptr$Parent = 0x400bb8 <vtable for Parent+16>}
(gdb) p p2
$2 = {_vptr$Parent = 0x400bb8 <vtable for Parent+16>}
(gdb) p d1
$3 = {<Parent> = {_vptr$Parent = 0x400b50 <vtable for Derived+16>}, <No data fields>}
(gdb) p d2
$4 = {<Parent> = {_vptr$Parent = 0x400b50 <vtable for Derived+16>}, <No data fields>}

Here’s what we learned from the above:

Let’s continue with our gdb session to see the contents of the vtables. I will use the x command, which dumps memory to the screen. I ask it to print 300 bytes in hex format, starting at 0x400b40. Why this address? Because above we saw that the vtable pointer points to 0x400b50, and the symbol for that address is vtable for Derived+16 (16 == 0x10).

(gdb) x/300xb 0x400b40
0x400b40 <vtable for Derived>:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x400b48 <vtable for Derived+8>:	0x90	0x0b	0x40	0x00	0x00	0x00	0x00	0x00
0x400b50 <vtable for Derived+16>:	0x80	0x0a	0x40	0x00	0x00	0x00	0x00	0x00
0x400b58 <vtable for Derived+24>:	0x90	0x0a	0x40	0x00	0x00	0x00	0x00	0x00
0x400b60 <typeinfo name for Derived>:	0x37	0x44	0x65	0x72	0x69	0x76	0x65	0x64
0x400b68 <typeinfo name for Derived+8>:	0x00	0x36	0x50	0x61	0x72	0x65	0x6e	0x74
0x400b70 <typeinfo name for Parent+7>:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x400b78 <typeinfo for Parent>:	0x90	0x20	0x60	0x00	0x00	0x00	0x00	0x00
0x400b80 <typeinfo for Parent+8>:	0x69	0x0b	0x40	0x00	0x00	0x00	0x00	0x00
0x400b88:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x400b90 <typeinfo for Derived>:	0x10	0x22	0x60	0x00	0x00	0x00	0x00	0x00
0x400b98 <typeinfo for Derived+8>:	0x60	0x0b	0x40	0x00	0x00	0x00	0x00	0x00
0x400ba0 <typeinfo for Derived+16>:	0x78	0x0b	0x40	0x00	0x00	0x00	0x00	0x00
0x400ba8 <vtable for Parent>:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
0x400bb0 <vtable for Parent+8>:	0x78	0x0b	0x40	0x00	0x00	0x00	0x00	0x00
0x400bb8 <vtable for Parent+16>:	0xa0	0x0a	0x40	0x00	0x00	0x00	0x00	0x00
0x400bc0 <vtable for Parent+24>:	0x90	0x0a	0x40	0x00	0x00	0x00	0x00	0x00
...

Note: we’re looking at demangled symbols. If you really want to know, _ZTV is a prefix for vtable, _ZTS is a prefix for type-string (name) and _ZTI is for type-info.

Here’s Parent’s vtable layout:

Address Value Meaning
0x400ba8 0x0 top_offset (more on this later)
0x400bb0 0x400b78 Pointer to typeinfo for Parent (also part of the above memory dump)
0x400bb8 0x400aa0 Pointer to Parent::Foo()1. Parent’s _vptr points here.
0x400bc0 0x400a90 Pointer to Parent::FooNotOverridden()2

Here’s Derived’s vtable layout:

Address Value Meaning
0x400b40 0x0 top_offset (more on this later)
0x400b48 0x400b90 Pointer to typeinfo for Derived (also part of the above memory dump)
0x400b50 0x400a80 Pointer to Derived::Foo()3. Derived’s _vptr points here.
0x400b58 0x400a90 Pointer to Parent::FooNotOverridden() (same as Parent’s)

1:

(gdb) # find out what debug symbol we have for address 0x400aa0
(gdb) info symbol 0x400aa0
Parent::Foo() in section .text of a.out

2:

(gdb) info symbol 0x400a90
Parent::FooNotOverridden() in section .text of a.out

3:

(gdb) info symbol 0x400a80
Derived::Foo() in section .text of a.out

Remember that the vtable pointer in Derived pointed to a +16 bytes offset into the vtable? The 3rd pointer is the address of the first method pointer. Want the 3rd method? No problem - add 2 * sizeof(void*) to vtable-pointer. Want the typeinfo record? jump to the pointer before.

Moving on - what about the typeinfo records layout?

Parent’s:

Address Value Meaning
0x400b78 0x602090 Helper class for type_info methods1
0x400b80 0x400b69 String representing type name2
0x400b88 0x0 0 meaning no parent typeinfo record

And here’s Derived’s typeinfo record:

Address Value Meaning
0x400b90 0x602210 Helper class for type_info methods3
0x400b98 0x400b60 String representing type name4
0x400ba0 0x400b78 Pointer to Parent’s typeinfo record

1:

(gdb) info symbol 0x602090
vtable for __cxxabiv1::__class_type_info@@CXXABI_1.3 + 16 in section .bss of a.out

2:

(gdb) x/s 0x400b69
0x400b69 <typeinfo name for Parent>:	"6Parent"

3:

(gdb) info symbol 0x602210
vtable for __cxxabiv1::__si_class_type_info@@CXXABI_1.3 + 16 in section .bss of a.out

4:

(gdb) x/s 0x400b60
0x400b60 <typeinfo name for Derived>:	"7Derived"

If you want to read more about __si_class_type_info you can find some info here, and also here.

This exhausts my gdb skills, and also concludes this post. I assume some people will find this too low-level, or maybe just unactionable. If so, I’d recommend skipping parts 2 and 3, jumping straight to part 4.


Comments