C++ vtables - Part 1 - Basics
(1204 words) Tue, Mar 1, 2016In this mini post-series we’ll explore how clang implements vtables & RTTI. In this part we’ll start with some basic classes and later on cover multiple inheritance and virtual inheritance.
Please note that this mini-series will include some digging into the binary generated for our different pieces of code via gdb. This is somewhat low-level(ish), but I’ll do all the heavy lifting for you. I don’t believe many future posts will be this low-level.
Disclaimer: everything written here is implementation specific, may change in any future version, and should not be relied on. We look into this for educational reasons only.
☑ I agree
cool, let’s start.
Part 1 - vtables - Basics
Estimated read time: ~15 minutes.
Let’s examine the following code:
#include <iostream>
using namespace std;
class NonVirtualClass {
public:
void foo() {}
};
class VirtualClass {
public:
virtual void foo() {}
};
int main() {
cout << "Size of NonVirtualClass: " << sizeof(NonVirtualClass) << endl;
cout << "Size of VirtualClass: " << sizeof(VirtualClass) << endl;
}
$ # compile and run main.cpp
$ clang++ main.cpp && ./a.out
Size of NonVirtualClass: 1
Size of VirtualClass: 8
NonVirtualClass
has a size of 1 because in C++ class
es can’t have zero
size. However, this is not important right now.
VirtualClass
’s size is 8 on a 64 bit machine. Why? Because there’s a hidden
pointer inside it pointing to a vtable
. vtable
s are static translation
tables, created for each virtual-class. This post series is about their content
and how they are used.
To get some deeper understanding on how vtables
look let’s explore the
following code with gdb to find out how the memory is laid out:
#include <iostream>
class Parent {
public:
virtual void Foo() {}
virtual void FooNotOverridden() {}
};
class Derived : public Parent {
public:
void Foo() override {}
};
int main() {
Parent p1, p2;
Derived d1, d2;
std::cout << "done" << std::endl;
}
$ # compile our code with debug symbols and start debugging using gdb
$ clang++ -std=c++14 -stdlib=libc++ -g main.cpp && gdb ./a.out
...
(gdb) # ask gdb to automatically demangle C++ symbols
(gdb) set print asm-demangle on
(gdb) set print demangle on
(gdb) # set breakpoint at main
(gdb) b main
Breakpoint 1 at 0x4009ac: file main.cpp, line 15.
(gdb) run
Starting program: /home/shmike/cpp/a.out
Breakpoint 1, main () at main.cpp:15
15 Parent p1, p2;
(gdb) # skip to next line
(gdb) n
16 Derived d1, d2;
(gdb) # skip to next line
(gdb) n
18 std::cout << "done" << std::endl;
(gdb) # print p1, p2, d1, d2 - we'll talk about what the output means soon
(gdb) p p1
$1 = {_vptr$Parent = 0x400bb8 <vtable for Parent+16>}
(gdb) p p2
$2 = {_vptr$Parent = 0x400bb8 <vtable for Parent+16>}
(gdb) p d1
$3 = {<Parent> = {_vptr$Parent = 0x400b50 <vtable for Derived+16>}, <No data fields>}
(gdb) p d2
$4 = {<Parent> = {_vptr$Parent = 0x400b50 <vtable for Derived+16>}, <No data fields>}
Here’s what we learned from the above:
- Even though the classes have no data members, there’s a hidden pointer to a vtable;
- vtable for
p1
andp2
is the same. vtables are static data per-type; d1
andd2
inherit a vtable-pointer fromParent
which points toDerived
’s vtable;- All vtables point to an offset of 16 (0x10) bytes into the vtable. We’ll also discuss this later.
Let’s continue with our gdb session to see the contents of the vtables. I will
use the x
command, which dumps memory to the screen. I ask it to print 300
bytes in hex format, starting at 0x400b40. Why this address? Because above we
saw that the vtable pointer points to 0x400b50, and the symbol for that address
is vtable for Derived+16
(16 == 0x10).
(gdb) x/300xb 0x400b40
0x400b40 <vtable for Derived>: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x400b48 <vtable for Derived+8>: 0x90 0x0b 0x40 0x00 0x00 0x00 0x00 0x00
0x400b50 <vtable for Derived+16>: 0x80 0x0a 0x40 0x00 0x00 0x00 0x00 0x00
0x400b58 <vtable for Derived+24>: 0x90 0x0a 0x40 0x00 0x00 0x00 0x00 0x00
0x400b60 <typeinfo name for Derived>: 0x37 0x44 0x65 0x72 0x69 0x76 0x65 0x64
0x400b68 <typeinfo name for Derived+8>: 0x00 0x36 0x50 0x61 0x72 0x65 0x6e 0x74
0x400b70 <typeinfo name for Parent+7>: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x400b78 <typeinfo for Parent>: 0x90 0x20 0x60 0x00 0x00 0x00 0x00 0x00
0x400b80 <typeinfo for Parent+8>: 0x69 0x0b 0x40 0x00 0x00 0x00 0x00 0x00
0x400b88: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x400b90 <typeinfo for Derived>: 0x10 0x22 0x60 0x00 0x00 0x00 0x00 0x00
0x400b98 <typeinfo for Derived+8>: 0x60 0x0b 0x40 0x00 0x00 0x00 0x00 0x00
0x400ba0 <typeinfo for Derived+16>: 0x78 0x0b 0x40 0x00 0x00 0x00 0x00 0x00
0x400ba8 <vtable for Parent>: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x400bb0 <vtable for Parent+8>: 0x78 0x0b 0x40 0x00 0x00 0x00 0x00 0x00
0x400bb8 <vtable for Parent+16>: 0xa0 0x0a 0x40 0x00 0x00 0x00 0x00 0x00
0x400bc0 <vtable for Parent+24>: 0x90 0x0a 0x40 0x00 0x00 0x00 0x00 0x00
...
Note: we’re looking at demangled symbols. If you really want to know, _ZTV is a prefix for vtable, _ZTS is a prefix for type-string (name) and _ZTI is for type-info.
Here’s Parent
’s vtable layout:
Address | Value | Meaning |
---|---|---|
0x400ba8 | 0x0 | top_offset (more on this later) |
0x400bb0 | 0x400b78 | Pointer to typeinfo for Parent (also part of the above memory dump) |
0x400bb8 | 0x400aa0 | Pointer to Parent::Foo() 1. Parent ’s _vptr points here. |
0x400bc0 | 0x400a90 | Pointer to Parent::FooNotOverridden() 2 |
Here’s Derived
’s vtable layout:
Address | Value | Meaning |
---|---|---|
0x400b40 | 0x0 | top_offset (more on this later) |
0x400b48 | 0x400b90 | Pointer to typeinfo for Derived (also part of the above memory dump) |
0x400b50 | 0x400a80 | Pointer to Derived::Foo() 3. Derived ’s _vptr points here. |
0x400b58 | 0x400a90 | Pointer to Parent::FooNotOverridden() (same as Parent ’s) |
1:
(gdb) # find out what debug symbol we have for address 0x400aa0
(gdb) info symbol 0x400aa0
Parent::Foo() in section .text of a.out
2:
(gdb) info symbol 0x400a90
Parent::FooNotOverridden() in section .text of a.out
3:
(gdb) info symbol 0x400a80
Derived::Foo() in section .text of a.out
Remember that the vtable pointer in Derived
pointed to a +16 bytes offset
into the vtable? The 3rd pointer is the address of the first method pointer.
Want the 3rd method? No problem - add 2 * sizeof(void*)
to vtable-pointer.
Want the typeinfo record? jump to the pointer before.
Moving on - what about the typeinfo records layout?
Parent
’s:
Address | Value | Meaning |
---|---|---|
0x400b78 | 0x602090 | Helper class for type_info methods1 |
0x400b80 | 0x400b69 | String representing type name2 |
0x400b88 | 0x0 | 0 meaning no parent typeinfo record |
And here’s Derived
’s typeinfo record:
Address | Value | Meaning |
---|---|---|
0x400b90 | 0x602210 | Helper class for type_info methods3 |
0x400b98 | 0x400b60 | String representing type name4 |
0x400ba0 | 0x400b78 | Pointer to Parent ’s typeinfo record |
1:
(gdb) info symbol 0x602090
vtable for __cxxabiv1::__class_type_info@@CXXABI_1.3 + 16 in section .bss of a.out
2:
(gdb) x/s 0x400b69
0x400b69 <typeinfo name for Parent>: "6Parent"
3:
(gdb) info symbol 0x602210
vtable for __cxxabiv1::__si_class_type_info@@CXXABI_1.3 + 16 in section .bss of a.out
4:
(gdb) x/s 0x400b60
0x400b60 <typeinfo name for Derived>: "7Derived"
If you want to read more about __si_class_type_info
you can find some info
here,
and also
here.
This exhausts my gdb skills, and also concludes this post. I assume some people will find this too low-level, or maybe just unactionable. If so, I’d recommend skipping parts 2 and 3, jumping straight to part 4.