Shahar Mike's Web Spot

How We Optimized Dragonfly to Get 30x Throughput with BullMQ

Tue, 21 Nov 2023 00:00:00 +0000

It’s been a while since I added any new content here. Sorry about that!

However, in case you’ve been following along, I just posted a new blog post on my (new) company’s blog. I hope you’ll find it interesting!

It’s title is How We Optimized Dragonfly to Get 30x Throughput with BullMQ, and it tells the story of how we went about optimizing our service to get >30x throughput.

While our service is indeed written in C++, the blog post does not go into C++ code unfortunately. Instead, it describes the design decisions that we took in order to be more performant.

I hope you like it!

P.S: 30x is arbitrary, because I tested this on an 8 core machine. I could have added more cores to write 100x, but such beefy machines aren’t likely to be used by our users I guess :)

Longest C++ Variable Declaration

Sat, 26 Oct 2019 00:00:00 +0000

Here’s a challenge: what’s the longest variable declaration you can come up with?

That’s not useful in any way. Quite the opposite - the result is something you’d never want to see in your codebase. But it’s an interesting mental challenge, or at least so I think.

Ground rules:

Each keyword used gets you 1 point, repetitions allowed;
Must be global variable;
Must be valid C++17;
Must not declare new structs / classes / unions / functions / methods;
Variable must be initialized with empty braces, i.e {};
No comments of any type are allowed, nor usage of preprocessor;
Each ‘trick’ may only be used once.

Take a moment to think about it! Maybe you’ll come up with ideas I haven’t.

My Solution

Here’s roughly how my solution evolved:

// Most basic declaration - 1 point.
int a{};

// Some types are longer than others - 4 points
unsigned long long int a{};

// Adding constness - 6 points.
constexpr const unsigned long long int a{};

// With pointers you can add an additional const - 7 points.
constexpr const unsigned long long int* const a{};

// static - 8 points
static constexpr const unsigned long long int* const a{};

// Can you use thread_local with static? Apparently! - 9 points.
static thread_local constexpr const unsigned long long int* const a{};

// How 'bout static thread_local inline? Seems to work... - 10 points.
static thread_local inline constexpr const unsigned long long int* const a{};

// Here's a feature rarely used - volatile! - 11 points.
// (volatile constexpr - such a weird statement...)
static thread_local inline constexpr volatile const unsigned long long int* const a{};

Cheating, aka Stack Overflow

At this point, curious if I’m the only very bored C++ developer who came up with this challenge I decided to Google it.

I found this Stack Overflow thread with some interesting ideas. Someone came up with an easy way to get infinite points:

int const* const* const* const* const* const* const* const* /* ... */ a{};

This is the reason I added that last rule. It’s not well formulated I guess, but you get the spirit.

Additional idea is to use decltype with typeid:

// Adding decltype with typeid buys us additional 4 points, with a total of 15 points.
static thread_local inline constexpr volatile const decltype(typeid(const volatile unsigned long long int).name())* const a{};

And the last idea I read was to geniusly use alignas with sizeof, although I personally prefer to use alignof instead:

// Adding alignas with alignof / sizeof gets us to 25 points!
alignas(alignof(decltype(typeid(const volatile unsigned long long int)))) static thread_local inline constexpr volatile const decltype(typeid(const volatile unsigned long long int).name())* const a{};

This is as far as I got. I’m certain there are many more tricks I hadn’t thought of, or unaware of. Do you have any? Let me know in a comment!

Compiling Clang from Scratch

Fri, 07 Dec 2018 00:00:00 +0000

In this post I’ll show how easy it is to build clang from scratch on Linux, and how to use it both directly and with CMake.

But Why?

Clang is changing rapidly, and new features are added frequently. On the other hand, Linux distros have ancient versions shipped with them. That’s just frustrating.

Furthermore, building Clang is so easy that it really shouldn’t be a blocker for you to use the newest and bestest. Cloning the repositories takes a few minutes, then the actual build step will take between 7 minutes to 1 hour, depending on your hardware.

Step 1 - Clone

For this step you’ll need git installed on your system. Simply paste the following into a shell, while inside an empty directory you’d like to clone the sources into (I used ~/clang):

git clone -q  https://github.com/llvm-mirror/llvm llvm
git clone -q  https://github.com/llvm-mirror/clang llvm/tools/clang
git clone -q  https://github.com/llvm-mirror/clang-tools-extra llvm/tools/clang/tools/extra
git clone -q  https://github.com/llvm-mirror/compiler-rt llvm/projects/compiler-rt
git clone -q  https://github.com/llvm-mirror/libcxx llvm/projects/libcxx
git clone -q  https://github.com/llvm-mirror/libcxxabi llvm/projects/libcxxabi
git clone -q  https://github.com/llvm-mirror/lld llvm/tools/lld

This will create an llvm directory, inside which all the needed sources are found. Note that we’re building from the head of master branch, where all the goodies are. If you’re using this for something serious consider using one of the release branches instead.

Step 2 - Run CMake

Now we need to invoke cmake to generate build environment for us. It’s recommended to use the Ninja build system as it’s much faster, but you can also use good old Unix make: just drop the -GNinja.

You may also wish to build Clang in Release mode rather than the default Debug mode (pass -DCMAKE_BUILD_TYPE=Release to cmake). The reasons are:

Everything (sources + built objects, libs & executables) take 61gb with Debug, and only 4.6gb in Release
At least on my machine it takes more time (7m vs 9m) to build the Debug version (maybe due to I/O?). YMMV.

Paste the following into a bash shell to get going:

mkdir build
cd build
cmake -GNinja ../llvm

Step 3 - Build

This is fairly easy. Simply run ninja (or make if that’s how you configured cmake). Now go make a sandwich - your PC will be rather busy.

Step 4 - Profit

You now have LLVM, Clang, libc++ and other goodies built and ready to use.

Simple compilation:

$ cat main.cpp
#include <iostream>
using namespace std;

int main() {
  cout << "Hello, World!" << endl;
}

$ ~/clang/build/bin/clang++ main.cpp
$ ./a.out
Hello, World!

Use libc++ which we built above:

$ ~/clang/build/bin/clang++ main.cpp -nostdinc++ -I$HOME/clang/build/include/c++/v1 -L$HOME/clang/build/lib -Wl,-rpath,$HOME/clang/build/lib -L$HOME/clang/build/lib -lc++ -Wl,-rpath,$HOME/clang/build/lib
$ ./a.out
Hello, World!

Compile with C++20 (as of writing this - initializer inside ranged-based for is not in any released clang version):

$ cat main.cpp
#include <iostream>
using namespace std;

int main() {
  for (int i = 1; int j : {1, 2, 3, 4}) {
    cout << "i: " << i << ", j: " << j << endl;
  }
}

$ ~/clang/build/bin/clang++ main.cpp -std=c++2a
$ ./a.out
i: 1, j: 1
i: 1, j: 2
i: 1, j: 3
i: 1, j: 4

Compile with AddressSanitizer:

$ cat main.cpp
int main() {
  int* a = new int(123);
  delete a;
  delete a;
}

$ ~/clang/build/bin/clang++ -O0 -g -fno-omit-frame-pointer -fsanitize=address main.cpp
$ ./a.out
=================================================================
==239623==ERROR: AddressSanitizer: attempting double-free on 0x602000000010 in thread T0:
    #0 0x4f9d12  (/tmp/cpp_vmizcK/a.out+0x4f9d12)
    #1 0x4fc0f1  (/tmp/cpp_vmizcK/a.out+0x4fc0f1)
    #2 0x7feaefe3c2b0  (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)
    #3 0x41d989  (/tmp/cpp_vmizcK/a.out+0x41d989)

0x602000000010 is located 0 bytes inside of 4-byte region [0x602000000010,0x602000000014)
freed by thread T0 here:
    #0 0x4f9d12  (/tmp/cpp_vmizcK/a.out+0x4f9d12)
    #1 0x4fc0d3  (/tmp/cpp_vmizcK/a.out+0x4fc0d3)
    #2 0x7feaefe3c2b0  (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)

previously allocated by thread T0 here:
    #0 0x4f9092  (/tmp/cpp_vmizcK/a.out+0x4f9092)
    #1 0x4fc068  (/tmp/cpp_vmizcK/a.out+0x4fc068)
    #2 0x7feaefe3c2b0  (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)

SUMMARY: AddressSanitizer: double-free (/tmp/cpp_vmizcK/a.out+0x4f9d12)
==239623==ABORTING

Show proper symbols:

$ env ASAN_SYMBOLIZER_PATH=$HOME/clang/build/bin/llvm-symbolizer ./a.out
=================================================================
==239574==ERROR: AddressSanitizer: attempting double-free on 0x602000000010 in thread T0:
    #0 0x4f9d12 in operator delete(void*) /home/shmike/clang/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:167:3
    #1 0x4fc0f1 in main /tmp/cpp_vmizcK/main.cpp:4:3
    #2 0x7fb8a6c792b0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)
    #3 0x41d989 in _start (/tmp/cpp_vmizcK/a.out+0x41d989)

0x602000000010 is located 0 bytes inside of 4-byte region [0x602000000010,0x602000000014)
freed by thread T0 here:
    #0 0x4f9d12 in operator delete(void*) /home/shmike/clang/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:167:3
    #1 0x4fc0d3 in main /tmp/cpp_vmizcK/main.cpp:3:3
    #2 0x7fb8a6c792b0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)

previously allocated by thread T0 here:
    #0 0x4f9092 in operator new(unsigned long) /home/shmike/clang/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:106:3
    #1 0x4fc068 in main /tmp/cpp_vmizcK/main.cpp:2:12
    #2 0x7fb8a6c792b0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)

SUMMARY: AddressSanitizer: double-free /home/shmike/clang/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:167:3 in operator delete(void*)
==239574==ABORTING

Compile with MemorySanitizer:

$ cat main.cpp
int main() {
  int a;  // uninitialized
  return a;
}

$ ~/clang/build/bin/clang++ -O0 -g -fno-omit-frame-pointer -fsanitize=memory main.cpp
$ ./a.out
==243759==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x496dac  (/tmp/cpp_vmizcK/a.out+0x496dac)
    #1 0x7f6f9c9cc2b0  (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)
    #2 0x41e379  (/tmp/cpp_vmizcK/a.out+0x41e379)

SUMMARY: MemorySanitizer: use-of-uninitialized-value (/tmp/cpp_vmizcK/a.out+0x496dac)
Exiting

Compile for WebAssembly (wasm):

$ ~/clang/build/bin/clang++ --target=wasm32 -Os main.cpp -c -o out.wasm
$ file out.wasm
out.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)

(Note that it’s not trivial to use libc or STL natively, and thus it’s much easier to use Emscripten).

Integrate with CMake

While using the compiler manually is nice, it’s only rarely a good idea. For real projects we all use some sort of build system, and many of us use CMake.

Integrating our freshly built Clang with CMake is fairly simple, assuming you know how to paste :)

cmake -DCMAKE_CXX_COMPILER=$HOME/clang/build/bin/clang++ -DCMAKE_LINKER=$HOME/clang/build/bin/clang++

And if you’d like to also use the locally built libc++:

cmake -DCMAKE_CXX_FLAGS="-nostdinc++ -I$HOME/clang/build/include/c++/v1" -DCMAKE_EXE_LINKER_FLAGS="-L$HOME/clang/build/lib -lc++ -nostdinc++ -Wl,-rpath,$HOME/clang/build/lib" -DCMAKE_CXX_COMPILER=$HOME/clang/build/bin/clang++ -DCMAKE_LINKER=$HOME/clang/build/bin/clang++

That’s it. I hope you liked this mini tutorial, and that you found it useful, or at least interesting!

Move Semantics

Wed, 19 Sep 2018 00:00:00 +0000

Move Semantics are a C++11 feature which complements C++98’s RVO; Think of them as user-defined RVO-like optimization. While originally designed to only allow optimizations, one can also utilize move semantics to limit APIs. This is how std::unique_ptr is able to be a move-only type, allowing it to enforce single ownership (more about std::unique_ptr here).

Motivation

As we saw previously, RVO does not always take place. When it doesn’t, C++98 forced users to create expensive copies.

As an example, let’s see what the following program does when compiled with different flags:

#include <iostream>
#include <string>

// replace operator new and delete to log allocations
void* operator new(std::size_t n) throw(std::bad_alloc) {
  std::cout << "[Allocating " << n << " bytes]\n";
  return malloc(n);
}
void operator delete(void* p) throw() { free(p); }

std::string BuildLongString() {
  return "This string is so long it can't possibly be inline (SSO)";
}

int main() {
  BuildLongString();
}

(1) C++98 with RVO support: Single copy - RVO FTW!

$ clang++-libc++ -std=c++98 main.cpp && ./a.out
[Allocating 64 bytes]

(2) C++98 without RVO support: 2 copies - makes sense. Sad but true.

$ clang++-libc++ -std=c++98 -fno-elide-constructors main.cpp && ./a.out
[Allocating 64 bytes]
[Allocating 64 bytes]

(3) C++11 with RVO support: Single copy - no news.

$ clang++-libc++ -std=c++11 main.cpp && ./a.out
[Allocating 64 bytes]

(4) C++11 without RVO support: Single copy - that’s new!

$ clang++-libc++ -std=c++11 -fno-elide-constructors main.cpp && ./a.out
[Allocating 64 bytes]

Using move-semantics we are able to avoid allocating data even when RVO is disabled.

More Practical Example

Let’s consider the following example:

std::string s = BuildLongString();  // Same BuildLongString() from above
// Do something with s.
s = BuildLongString();  // Copy assignment - no RVO ever!
// Do something else with s.

In C++98 the above code had to create a copy of the long string returned by BuildLongString()¹ because RVO is not allowed on assignment. That’s unfortunate, because we can immediately see that the previous value of s will be lost as part of that assignment.

C++11’s Move Semantics allow us to avoid this copy by ‘stealing’ the pointer of the temporary object returned by BuildLongString() and directing that temporary object to not own that pointer anymore (so that it won’t attempt to delete it).

In other words, the compiler now provides us with a way to know that an object passed to us is temporary and will soon be destroyed. With this knowledge we can write smarter and better performing code. These temporary, soon-to-be-destroyed objects are annotated with && - a new C++ syntax.

Move Constructor / Move Assignment

Most commonly, move-semantics are used for creating a special type of constructor called a move constructor. Move constructors are similar to copy constructors both syntactically and logically. They can be implemented in addition to, or instead of, a copy constructor. Similarly one can implement a move assignment - in addition to, or instead of a copy assignment (like in a = b;).

class MyClass {
 public:
  MyClass();                             // Constructor

  MyClass(const MyClass& o);             // Copy constructor
  MyClass(MyClass&& o);                  // Move constructor

  MyClass& operator=(const MyClass& o);  // Copy assignment
  MyClass& operator=(MyClass&& o);       // Move assignment
};

As we saw above, MyClass&& is the syntax for a special reference to MyClass that is an rvalue, aka rvalue reference.

Move constructors / assignment operations will be invoked automatically by the compiler only if the parameter passed to them (o in the above example) are rvalues. Otherwise the compiler will invoke the safe-but-slow copy constructor / assignment.

Consider this: you’re tasked with implementing std::string’s assignment operator. Let’s assume std::string has 3 members: data_, size_ and capacity_. When implementing the assignment function you obviously want this to be like another std::string o (that’s the meaning of an assignment), but you also know that o will very soon need to be destroyed. With this knowledge you can implement that assignment operation in a very optimized fashion.

std::string& operator=(std::string&& o) {  // `o` is a temporary
  // Steal & copy data
  data_ = o.data_;  // data_ is a char*
  size_ = o.size_;  // size_ is a size_t
  capacity_ = o.capacity_;  // capacity_ is a size_t

  // Make sure `o` can be destroyed safely
  o.data_ = nullptr;
  // We can also do o.size_ = o.capacity_ = 0;

  return *this;
}

No memory allocation, no copying of buffer, O(1) operation. That’s much better than copy assignment!

Interim Summary

This special syntax, std::string&& o, is our entry point to using move semantics. Furthermore, this assignment operation is not a copy assignment but rather a move assignment. And && means that we have a special reference in our hands - a reference to a “temporary object”.

Temporary Objects - Intuition

What exactly is considered to be temporary?

You might have seen the term rhs (right-hand side) or rvalue (right-hand value) in some compiler errors in the past. For example, when attempting to compile code such as this:

int foo() { return 42; }

// ...
foo() = 5;  // Error: expression is not assignable

It doesn’t make sense to assign to the value returned from foo(). It might have made sense if foo() were to return a reference, but that’s not the case here. Since it doesn’t make sense for this variable to be assigned to, the compiler forbids us to do so.

This is the first rule of thumb in deciding whether an object is a temporary: can it be used in the left-hand side of an assignment equation? That’s exactly what we tried to do with foo() = 5; above. If we can’t - the object is a temporary.

Unfortunately this rule doesn’t always work. The compiler will allow us to invoke assignment operations on a custom class (unless they used ref-qualifiers, which are uncommon and beyond the scope of this post).

Another rule of thumb, and one I like better, is to consider whether it is possible to take the address of an object. For example:

int foo() { return 42; }

// ...
int i = foo();
int* p = &i;  // OK: `i` is an lvalue

p = &foo();  // Error: cannot take the address of an rvalue of type 'int'.

Note that it’s fine to take the address of the function. We can’t take the address of the value returned from a function. I.e:

auto t = &foo;    // OK - taking the address of function foo
auto v = &foo();  // Error - can't take the address of the value returned from foo

In a future post I plan to explain better what is the definition of rvalues, but for now we will consider an object which will be destroyed by the end of the statement as rvalue. They are usually temporary objects, as in the above. You can find a more accurate definition of what’s an rvalue in cppreference.

`std::move()`

Any function (such as move constructor, move assignment, or just a global function) which accepts rvalue references (&&) can only be called with an rvalue object. This is where the true power of move semantics comes into play. The compiler knows when it’s safe to pass an object as an rvalue reference. If it’s not, we’ll get a compile error:

void foo(std::string&& s) { /* ... */ }

// ...

foo("hello");  // Temporary objects are always rvalues.

// Return values are rvalues as well (except when the function returns a
// reference)
foo(BuildLongString());

std::string s;
foo(s);  // Compile error - `s` isn't an rvalue

This is good, and by design. However there are cases where we do want to convert an object to an rvalue - where we know it won’t be used in the future. What then? This is where std::move() comes into play:

// std::move() converts an object to rvalue.
foo(std::move(s));

Rule of 3 becomes Rule of 5

One last thing before I wrap up: If you ever heard of the rule of three - with move semantics we now have a complementary rule - the rule of 5.

That’s it

In the next post I plan to look into what rvalue categories are and how they differ from rvalue references. See you next time!

In C++98 days it was popular to implement std::string as copy-on-write. Technically speaking a copy of the string would be created, but it would not allocate the buffer again but would instead atomically increment a counter. This was abandoned in recent years (and is also forbidden by the C++11 Standard) because for the most part it’s faster on multi-core computers to create copies rather than use atomics. I’ve written a bit on the subject here ^[return]

Dollhouse

Wed, 15 Aug 2018 00:00:00 +0000

As our oldest daughter was nearing her 4th birthday, my wife and I considered to give her a dollhouse. Very soon after we decided to build her one instead of purchasing.

Tools Used

We have a wood workshop at work. It’s sort of an employee benefit, rather than something that’s really work-related. Awesome, I know. It also has a great view :)

So most of the tools I used aren’t mine, and are mostly professional:

I’m too scared to use a table saw, so instead I mostly used a circular saw with tracks.
Jointer and Planer.
Router for corners.
Scroll saw for cutting the windows and doors. I could have created a frame and use the router for straighter lines, but I’m OK with the result.
Lots of clamps for applying pressure while gluing.

From Design to Product

Being the pedantic engineer I am, I started with a SketchUp model:

Next, time to purchase lumber. I couldn’t get this huge piece through the stairs, nor through the elevator, so I had to cut it outside:

Next I cut it to length according to plan:

Here I made my first mistake, see below.

I worked with a router for round corners, cut the windows / door manually, and then it was time to glue it all together:

Add the roof, some color and voila!:

With some purchased furniture and dolls on Ali Express we’re done :)

What I Learned

I made quite a few mistakes, most of which I contribute to this being my first “serious” wood project:

I trusted the seller’s dimensions instead of verifying. Too late did I realize that instead of the listed 20cm depth by 2.5cm width it was in fact closer to 19cm by 2cm. My design was not affected by changed depth (20cm or 19cm), but having a different width (and discovering it too late) made a few angles quirky and glued less strongly. Conclusion is simple: always measure, never trust.
I used the jointer and planer after cutting the wood to pieces. This caused a lot of unnecessary work for no good reason.
I thought I good skip using a planer, and only use a jointer instead. I got trapezoid which I had fix with a planer.
Lastly, I made a few mistakes with the roof, which changed it significantly. To test my gluing abilities I glued it before cutting the 45° angels at the bottom. Huge mistake, as now I couldn’t use the circular saw’s tracks as it simply did not fit in width anymore. The roof is not symmetric, but it still looks OK I think

Hope you like it! My daughter sure does ;)

Free Cloud VM & HTTPS

Sun, 15 Oct 2017 00:00:00 +0000

If your knowledge of what it takes to have an SSL encrypted website (aka https) is as outdated as mine was a couple of weeks ago - you will find this quick post useful.

A colleague of mine recently mentioned the fact that you can get SSL certificate for any website for free. I know that issuing certificates was every CA’s business model for ages, and so I was very surprised to learn that. Well, I was in for a pleasant surprise.

So I looked around (well, first result of ‘free https’ on Google) and it turns out a non-profit named letsencrypt gives free certificates. 2017 baby. So I immediately went to the instructions, which to my disappointment said that if one is using a shared-hosting he/she can either manually create and renew certificates periodically (unless of course the shared-hosting company supports letsencrypt, but why would they if they make money selling the same product?).

Disappointed, I searched for an alternative solution. Luckily, Google recently announced that they started a Cloud program called ‘Always Free’ which gives an f1-micro instance for free without expiration (more details here). A free always-on, cloud-based Linux machine? That’s even better than https for my blog!

If you’re still here you must love freebies, like myself; keep on reading. I already have a dot-com domain. But my friend also mentioned dot.tk, which is a free .tk domain anyone can register. So if you’re starting fresh you could get free domain, hosting and https. I love freebies!

Return Value Optimization

Fri, 18 Aug 2017 00:00:00 +0000

Return Value Optimization (RVO), Named RVO (NRVO) and Copy-Elision are in C++ since C++98. In this post I will explain what these concepts mean and how they help improve runtime performance.

I will use our old friend Snitch - a class dedicated to printing at key events:

struct Snitch {   // Note: All methods have side effects
  Snitch() { cout << "c'tor" << endl; }
  ~Snitch() { cout << "d'tor" << endl; }

  Snitch(const Snitch&) { cout << "copy c'tor" << endl; }
  Snitch(Snitch&&) { cout << "move c'tor" << endl; }

  Snitch& operator=(const Snitch&) {
    cout << "copy assignment" << endl;
    return *this;
  }

  Snitch& operator=(Snitch&&) {
    cout << "move assignment" << endl;
    return *this;
  }
};

Let’s get goin’.

Return Value Optimization

RVO basically means the compiler is allowed to avoid creating temporary objects for return values, even if they have side effects.

Here’s a simple example:

Snitch ExampleRVO() {
  return Snitch();
}

int main() {
  Snitch snitch = ExampleRVO();
}

Output (note that -fno-elide-constructors disables RVO in clang):

$ clang++ -std=c++11 main.cpp && ./a.out
c'tor
d'tor

$ clang++ -std=c++11 -fno-elide-constructors main.cpp && ./a.out
c'tor
move c'tor
d'tor
move c'tor
d'tor
d'tor

In the first run (without -fno-elide-constructors) the compiler refrained from calling user code despite it having a clear side effect (being printing to console). This is also the default behavior, meaning practically all C++ programs utilize RVO.

Without RVO the compiler creates 3 Snitch objects instead of 1:

A temporary object inside ExampleRVO() (when printing c'tor);
A temporary object for the returned object inside main() (when printing the first move c'tor);
The named object snitch (when printing the second move c'tor).

Performance

The neat thing about RVO is that it makes returning objects free. It works via allocating memory for the to-be-returned object in the caller’s stack frame. The returning function then uses that memory as if it was in its own frame without the programmer knowing / caring.

In C++98 days this was significant:

#include <vector>
using namespace std;

vector<int> ReturnVector() {
  return vector<int>(1, 1);
}

int main() {
  for (int i = 0; i < 1000000000; ++i) {
    ReturnVector();
  }
}

Output:

$ clang++ -fno-elide-constructors -std=c++98 -stdlib=libc++ -O3 main.cpp && time ./a.out
real	0m37.235s
user	0m37.168s
sys	0m0.024s

$ clang++ -std=c++98 -stdlib=libc++ -O3 main.cpp && time ./a.out
real	0m17.681s
user	0m17.668s
sys	0m0.000s

217% difference on my machine by simply avoiding the copy of the vector. In C++11 (or newer) environments it is even marginally faster to disable RVO:

$ clang++ -fno-elide-constructors -std=c++11 -stdlib=libc++ -O3 main.cpp && time ./a.out
real	0m18.195s
user	0m18.188s
sys	0m0.000s

$ clang++ -std=c++11 -stdlib=libc++ -O3 main.cpp && time ./a.out
real	0m18.356s
user	0m18.340s
sys	0m0.000s

This is due to Move Semantics, which is the subject of the next post.

In trying to come up with an example where RVO is faster on modern C++ using STL containers I hit a wall, mostly because of move-semantics but also because on x86_84 RVO is in the ABI so disabling it is harder. Please post such examples if you have any!

Named Return Value Optimization (NRVO)

Named RVO is when an object with a name is returned but is nevertheless not copied. A simple example is:

Snitch ExampleNRVO() {
  Snitch snitch;
  return snitch;
}

int main() {
  ExampleNRVO();
}

Which has a similar output to ExampleRVO() above:

$ clang++ -std=c++11 main.cpp && ./a.out
c'tor
d'tor

While RVO is almost always going to happen, NRVO is more restricted, as we will see below. I personally don’t think NRVO deserves its own acronym.

Copy Elision

RVO is part of a larger group of optimizations called copy-elision. Essentials are the same, except copy-elision is not required to happen as part of return statements, for example:

void foo(Snitch s) {
}

int main() {
  foo(Snitch());
}

Output:

c'tor
d'tor

In my experience, RVO is more frequent (and thus useful) than other copy-elision practices, but your mileage may vary.

When RVO doesn’t / can’t happen

RVO is an optimization the compiler is allowed to apply (starting C++17 it is in fact required to in certain cases). However, even in C++17 it is not always guaranteed. Let’s look at a few examples.

The following examples are cases where, on my environment, RVO doesn’t happen. Some of them may change with other compiler / versions.

Deciding on Instance at Runtime

When the compiler can’t know from within the function which instance will be returned it must disable RVO:

Snitch CreateSnitch(bool runtime_condition) {
  Snitch a, b;
  if (runtime_condition) {
    return a;
  } else {
    return b;
  }
}

int main() {
  Snitch snitch = CreateSnitch(true);
}

Output:

$ clang++ -std=c++11 -stdlib=libc++ main.cpp && ./a.out
c'tor
c'tor
move c'tor
d'tor
d'tor
d'tor

Returning a Parameter / Global

When returning an object that is not created in the scope of the function there is no way to do RVO:

Snitch global_snitch;

Snitch ReturnParameter(Snitch snitch) {
  return snitch;
}

Snitch ReturnGlobal() {
  return global_snitch;
}

int main() {
  Snitch snitch = ReturnParameter(global_snitch);
  Snitch snitch2 = ReturnGlobal();
}

Output:

$ clang++ -std=c++11 -stdlib=libc++ main.cpp && ./a.out
c'tor
copy c'tor
move c'tor
d'tor
copy c'tor
d'tor
d'tor
d'tor

Returning by `std::move()`

Returning by calling std::move() on the return value is an anti-pattern. It is wrong most of the times. It will indeed attempt to force move-constructor, but in doing so it will disable RVO. It is also redundant, as move will happen if it can even without explicitly calling std::move() (see here).

Snitch CreateSnitch() {
  Snitch snitch;
  return std::move(snitch);
}

int main() {
  Snitch snitch = CreateSnitch();
}

Output:

$ clang++ -std=c++11 -stdlib=libc++ main.cpp && ./a.out
c'tor
move c'tor
d'tor
d'tor

Assignment

RVO can only happen when an object is created from a returned value. Using operator= on an existing object rather than copy/move constructor might be mistakenly thought of as RVO, but it isn’t:

Snitch CreateSnitch() {
  return Snitch();
}

int main() {
  Snitch s = CreateSnitch();
  s = CreateSnitch();
}

$ clang++ -std=c++11 -stdlib=libc++ main.cpp && ./a.out
c'tor
c'tor
move assignment
d'tor
d'tor

Returning Member

In some cases even an unnamed variable can’t RVO:

struct Wrapper {
  Snitch snitch;
};

Snitch foo() {
  return Wrapper().snitch;
}

int main() {
  Snitch s = foo();
}

Output:

$ clang++ -std=c++11 -stdlib=libc++ main.cpp && ./a.out
c'tor
move c'tor
d'tor
d'tor

Conclusion

While we can’t count on RVO to always take place, it will in most cases. For those cases where it doesn’t we always have Move Semantics, which is the topic of the next post. As always, optimize for readability rather than performance when writing code, unless you have a quantifiable reason.

Using libclang to Parse C++ (aka libclang 101)

Tue, 03 Jan 2017 00:00:00 +0000

In this post I’ll provide a quick tutorial for using libclang. I started playing around with libclang while implementing Reflang – an open source reflection framework for C++. Then I came to appreciate the amazing work done by its developers.

Please note that we will start with a program and will gradually add code. Scroll to the end of the post to view the complete solution.

libclang?

Clang, if you haven’t heard of yet, is a wonderful C++ (and other C language family) compiler. Well, not exactly a compiler, but a frontend to the LLVM compiler.

You see, compilers have a very tough problem to solve, and so most of them split it into 2 easier problems:

Translating a programming language (C++ in our case) to some intermediate code – this is called the frontend, and is exactly what Clang does.
Translate the above intermediate code to machine code – this is called the back-end. Clang uses LLVM for that.

The neat thing about Clang is that it is designed to be also used as a library. There are many types of applications that must truly understand code – IDEs, documentation-generators, static-analysis tools, etc. Instead of each of them having to implement C++ parsing (which is an extremely difficult task!), libclang can be used to correctly handle all language features and edge-cases.

libclang!

And it’s so darn easy. Really. Those Clang folks really did an awesome work. In the rest of this post we will use its C-API to explore the following code:

// header.hpp

class MyClass
{
public:
  int field;
  virtual void method() const = 0;

  static const int static_field;
  static int static_method();
};

Basic example

Let’s look at the simplest of examples. The following program parses the above file and immediately exists:

#include <iostream>
#include <clang-c/Index.h>  // This is libclang.
using namespace std;

int main()
{
  CXIndex index = clang_createIndex(0, 0);
  CXTranslationUnit unit = clang_parseTranslationUnit(
    index,
    "header.hpp", nullptr, 0,
    nullptr, 0,
    CXTranslationUnit_None);
  if (unit == nullptr)
  {
    cerr << "Unable to parse translation unit. Quitting." << endl;
    exit(-1);
  }

  clang_disposeTranslationUnit(unit);
  clang_disposeIndex(index);
}

There are many 0s and nullptrs - these allow us to do some more advanced stuff (like pass argv & argc, use in-memory files, etc). Let’s not get into these.

So what do we have after clang_parseTranslationUnit() has finished successfully? We have a parsed Abstract Syntax Tree (AST) which we can traverse and inspect. Which is exactly what we’ll do.

Cursors

Pointers to the AST are called Cursors in libclang lingo. A Cursor can have a parent and children. It can also have related cursors (like a default value for a parameter, an explicit value to an enum entry, etc).

The ‘entry point’ cursor we will use is the cursor representing the Translation Unit (TU), which is a C++ term meaning a single file including all #included code. To get the TU’s cursor we will use the very descriptive clang_getTranslationUnitCursor(). Now that we have a cursor we can investigate it or iterate using it.

Visit children

Any cursor has a kind, which represents the essence of the cursor. Kind can be one of many, many options, as can be seen here. A few examples are:

  /** \brief A C or C++ struct. */
  CXCursor_StructDecl                    = 2,
  /** \brief A C or C++ union. */
  CXCursor_UnionDecl                     = 3,
  /** \brief A C++ class. */
  CXCursor_ClassDecl                     = 4,
  /** \brief An enumeration. */
  CXCursor_EnumDecl                      = 5,

We can get the kind from a cursor using clang_getCursorKind().

For now lets visit all children of the TU:

  CXCursor cursor = clang_getTranslationUnitCursor(unit);
  clang_visitChildren(
    cursor,
    [](CXCursor c, CXCursor parent, CXClientData client_data)
    {
      cout << "Cursor kind: " << clang_getCursorKind(c) << endl;
      return CXChildVisit_Recurse;
    },
    nullptr);

The second-parameter lambda is a function called for every cursor visited. Inside we always return CXChildVisit_Recurse (although other options exist), because we want to explore everything in our file.

Output:

Cursor kind: 4
Cursor kind: 39
Cursor kind: 6
Cursor kind: 21
Cursor kind: 9
Cursor kind: 21

That’s a bit cryptic, and requires us to skip back and forth to Index.h. Fortunately, there’s a built-in function to convert cursor kind to a string, but first we need to discuss libclang’s strings.

CXString

CXString is a type representing a pointer to the AST. To retrieve an actually useful string (const char * for example), one must call clang_getCString() which internally increments a ref-count, and then clang_disposeString() when done.

Since we’re going to do this a lot, let’s create a helper function:

ostream& operator<<(ostream& stream, const CXString& str)
{
  stream << clang_getCString(str);
  clang_disposeString(str);
  return stream;
}

Print meaningful output

Now that we can extract strings, let’s modify our lambda to print something that is actually useful:

  CXCursor cursor = clang_getTranslationUnitCursor(unit);
  clang_visitChildren(
    cursor,
    [](CXCursor c, CXCursor parent, CXClientData client_data)
    {
      cout << "Cursor '" << clang_getCursorSpelling(c) << "' of kind '"
        << clang_getCursorKindSpelling(clang_getCursorKind(c)) << "'\n";
      return CXChildVisit_Recurse;
    },
    nullptr);

Output:

Cursor 'MyClass' of kind 'ClassDecl'
Cursor '' of kind 'CXXAccessSpecifier'
Cursor 'field' of kind 'FieldDecl'
Cursor 'method' of kind 'CXXMethod'
Cursor 'static_field' of kind 'VarDecl'
Cursor 'static_method' of kind 'CXXMethod'

Now, that’s friggin’ neat.

A more complicated example

I was very careful not to #include any header in header.hpp. Why? Well, by merely adding #include <string> to header.hpp the output size is 1.51MB. Ever got pissed at the compiler for taking so long? That’s why. It’s very educating to read such a file, but for everyone’s sake I won’t post it here.

Instead, let’s parse the following file:

enum class Cpp11Enum
{
  RED = 10,
  BLUE = 20
};

struct Wowza
{
  virtual ~Wowza() = default;
  virtual void foo(int i = 0) = 0;
};

struct Badabang : Wowza
{
  void foo(int) override;

  bool operator==(const Badabang& o) const;
};

template <typename T>
void bar(T&& t);

Same program’s output for this file:

Cursor 'Cpp11Enum' of kind 'EnumDecl'
Cursor 'RED' of kind 'EnumConstantDecl'
Cursor '' of kind 'IntegerLiteral'
Cursor 'BLUE' of kind 'EnumConstantDecl'
Cursor '' of kind 'IntegerLiteral'
Cursor 'Wowza' of kind 'StructDecl'
Cursor '~Wowza' of kind 'CXXDestructor'
Cursor 'foo' of kind 'CXXMethod'
Cursor 'i' of kind 'ParmDecl'
Cursor '' of kind 'IntegerLiteral'
Cursor 'Badabang' of kind 'StructDecl'
Cursor 'struct Wowza' of kind 'C++ base class specifier'
Cursor 'struct Wowza' of kind 'TypeRef'
Cursor 'foo' of kind 'CXXMethod'
Cursor '' of kind 'attribute(override)'
Cursor '' of kind 'ParmDecl'
Cursor 'operator==' of kind 'CXXMethod'
Cursor 'o' of kind 'ParmDecl'
Cursor 'struct Badabang' of kind 'TypeRef'
Cursor 'bar' of kind 'FunctionTemplate'
Cursor 'T' of kind 'TemplateTypeParameter'
Cursor 't' of kind 'ParmDecl'
Cursor 'T' of kind 'TypeRef'

Conclusion

libclang is awesome:

It allows checking whether code has been expanded from a macro, and to jump there;
It allows checking the location (file+line+column) for each cursor;
It allows getting function’s parameter names, types and return type;
It understands templates, autos, lambdas, and, well, everything in C++.

I hope this short post made you curious, and that you’ll also try exploring what this amazing API provides. Please do write a comment below if you have anything you want to add or ask!

Complete Code

For your convenience, here’s the complete code we implemented today:

#include <iostream>
#include <clang-c/Index.h>
using namespace std;

ostream& operator<<(ostream& stream, const CXString& str)
{
  stream << clang_getCString(str);
  clang_disposeString(str);
  return stream;
}

int main()
{
  CXIndex index = clang_createIndex(0, 0);
  CXTranslationUnit unit = clang_parseTranslationUnit(
    index,
    "header.hpp", nullptr, 0,
    nullptr, 0,
    CXTranslationUnit_None);
  if (unit == nullptr)
  {
    cerr << "Unable to parse translation unit. Quitting." << endl;
    exit(-1);
  }

  CXCursor cursor = clang_getTranslationUnitCursor(unit);
  clang_visitChildren(
    cursor,
    [](CXCursor c, CXCursor parent, CXClientData client_data)
    {
      cout << "Cursor '" << clang_getCursorSpelling(c) << "' of kind '"
        << clang_getCursorKindSpelling(clang_getCursorKind(c)) << "'\n";
      return CXChildVisit_Recurse;
    },
    nullptr);

  clang_disposeTranslationUnit(unit);
  clang_disposeIndex(index);
}

Exploring std::shared_ptr

Sun, 13 Nov 2016 00:00:00 +0000

Today we’ll talk about C++’s built-in smart pointer std::shared_ptr. If you have not yet read my previous post about std::unique_ptr I would highly recommend doing so before continuing.

`std::shared_ptr`

shared_ptr is another C++11 managed pointer. Like unique_ptr, it also saves you the need to call new and delete (and to generally worry about forgetting to release etc).

Unlike unique_ptr, shared_ptr can be shared. This means that multiple instances of shared_ptr<T> pointing to the same instance of T can co-exist. This is achieved via reference counting in a control block that’s shared by all shared_ptrs pointing to the same object. When the last shared_ptr pointing to an instance of T is released, T is released as well.

Releasing a shared_ptr can be done in the following ways:

Most commonly: via going out of scope, meaning calling the destructor automatically;
Through assignment of another shared_ptr;
Through calling reset() (more on this later).

Let’s look at an example:

#include <iostream>
#include <memory>

struct Snitch {
public:
  Snitch() { std::cout << "c'tor" << std::endl; }
  ~Snitch() { std::cout << "d'tor" << std::endl; }
  Snitch(Snitch const&) { std::cout << "copy c'tor" << std::endl; }
  Snitch(Snitch&&) { std::cout << "move c'tor" << std::endl; }
};

int main() {
  auto snitch = std::make_shared<Snitch>();
  auto another_snitch = snitch;
  std::cout << "Equal?: " << (snitch == another_snitch) << std::endl;

  {
    auto scoped_snitch = snitch;
    auto another_scoped_snitch = scoped_snitch;
  }  // destroy 'another_scoped_snitch' and 'scoped_snitch'
}  // destroy 'snother_snitch' and 'snitch'

Output:

c'tor
Equal?: 1
d'tor

Note that only 1 instance of Snitch is ever created, and that no copy/move constructors are used. All of snitch, another_snitch, scoped_snitch and another_scoped_snitch are equal. Also note that snitch has the type shared_ptr<Snitch>, as this is make_shared()’s return type:

std::is_same<decltype(snitch), std::shared_ptr<Snitch>>::value == true

We’ll go into make_shared soon.

Performance & thread safety

unique_ptr has performance similar to a raw pointer (with compiler optimizations), and also the size of a raw pointer. This is not the case for shared_ptr.

shared_ptr must have at least 2 pointers, so it’s bigger than a raw pointer. It also guarantees thread-safety for all methods as long as each thread has its own copy. Thread safety includes assigment, reference increment / decrement and all other operations. However, it does not mean that locks are acquired prior to calling any of T’s methods - only shared_ptr’s own methods are guaranteed to be thread-safe. In other words, if you want T to be used from multiple thread concurrently you will have to implement thread-safety yourself.

As always, thread safety comes at a price – performance. For most cases it would probably be minimal, by utilizing atomic operations, but it’s not guaranteed to be atomic and even atomics are not free.

If you’d like to read more on performance of C++ smart pointers vs raw pointers check out this cool post by Davide Coppola.

`std::make_shared()`

Much like make_unique(), make_shared() saves us from using new directly, arguably prodoces cleaner code, and is exception safe. In addition to all of these, and unlike make_unique(), it also brings a performance advantage.

Performance? Yes, performance. shared_ptr<T> manages a reference count to know when to release T. This is done via a shared control block to which all shared_ptrs point. Therefore, it must be dynamically allocated. And of course there’s also the object itself, T, which needs to be dynamically allocated as well.

So creating a new shared_ptr would create 2 objects, thus call new twice. However, make_shared() can make a single allocation for both, and thus save some load. Cool, right?

It’s good to know that:

This can’t possibly be done through shared_ptr’s constructor, as the constructor is called on an already allocated memory block and there’s no realloc() equivalent in C++.
You may use std::allocate_shared() if you need a custom allocator.

Construct from `unique_ptr`

shared_ptr has a special constructor that accepts a unique_ptr&&. This is useful when working with factories that return a unique_ptr, but you want to assign the value to a shared_ptr:

std::unique_ptr<MyObject> CreateMyObject() {
  return std::make_unique<MyObject>();
}

int main() {
  std::shared_ptr<MyObject> shared_object = CreateMyObject();
}

If you’re implementing a factory and don’t know if your callers will assign the value to a unique_ptr or a shared_ptr - always return unique_ptr.

No `release()` method, `reset()` doesn’t necessarily release

Unlike unique_ptr, shared_ptr does not have a release() method. It wouldn’t make sense to implement such a method since there’s oftentimes no way to determine at compile time how many shared_ptrs point to the same instance.

On the other hand, reset() exists, but it does not necessarily delete the underlying object. Here’s an example:

#include <iostream>
#include <type_traits>
#include <memory>

struct Snitch {  // Same as above, no changes
public:
  Snitch() { std::cout << "c'tor" << std::endl; }
  ~Snitch() { std::cout << "d'tor" << std::endl; }
  Snitch(Snitch const&) { std::cout << "copy c'tor" << std::endl; }
  Snitch(Snitch&&) { std::cout << "move c'tor" << std::endl; }
};

int main() {
  std::cout << "Creating 1st Snitch" << std::endl;
  auto snitch1 = std::make_shared<Snitch>();
  auto snitch2 = snitch1;

  std::cout << "Calling reset" << std::endl;
  snitch1.reset();  // object will *not* be released

  std::cout << "Moving out of scope" << std::endl;
}

Output:

Creating 1st Snitch
c'tor
Calling reset
Moving out of scope
d'tor

Cyclic references & `std::weak_ptr`

shared_ptrs are almost perfect. Their one imperfection is that they don’t support cycles. Example:

#include <iostream>
#include <type_traits>
#include <memory>

struct Node {  // Binary tree
  Node() { std::cout << "c'tor" << std::endl; }
  ~Node() { std::cout << "d'tor" << std::endl; }

  std::shared_ptr<Node> parent;
  std::shared_ptr<Node> left;
  std::shared_ptr<Node> right;
};

int main() {
  auto root = std::make_shared<Node>();
  root->left = std::make_shared<Node>();
  root->left->parent = root;
}

Output:

c'tor
c'tor

As you can see, no destructor has been called due to the vicious cycle I introduced, thus a memory leak occurred.

weak_ptr was created to allow us to have cycles that won’t leak. A weak_ptr holds a non-owning pointer. Essentially it means that weak_ptr won’t prevent its pointee from being released.

In the above example, simply modifying the parent declaration from

  std::shared_ptr<Node> parent;

To:

  std::weak_ptr<Node> parent;

Without changing anything else in the code, and we get the following output:

c'tor
c'tor
d'tor
d'tor

Magic, right? That’s cool, however there are a few things we should know. The most important is that the object pointed by the weak_ptr can be released while the weak_ptr is still alive. That’s pretty much the definition of a weak pointer.

Another thing is that weak_ptr has a very basic API, which does not even include a get() method. WAT? Yes. But, of course, it’s not useless. In order to use the object pointed to by a weak_ptr one must upgrade it to a shared_ptr by calling lock() and checking if the returned shared_ptr is empty:

  std::weak_ptr<std::string> weak = // ...
  std::shared_ptr<std::string> shared = weak.lock();
  if (shared) {
    // object exists
  } else {
    // object has been released
  }

Once we have a shared_ptr in our hands the object no longer can be released, so that’s a pretty smart and cool design decision.

One caveat of using weak_ptr is that while the object is released after the last shared_ptr is released, the control block remains alive until the last weak_ptr is released. If the control block and object are allocated together (see make_shared above) – this would mean that weak_ptr will cause the memory to remain alive (even though the object will be destroyed).

Control block

As previously mentioned, shared_ptrs are managed via a control block. These control blocks are up to the implementations to define, however they generally contain the following:

The managed object (either a pointer or the object itself if created via make_unique());
Reference count (for both other shared_ptrs and weak_ptrs);
Deletion function.

The control block is always accessed in a thread-safe way, either via atomics or a mutex.

Earlier I wrote that a shared_ptr has the size of 2 pointers, while here I descrive the control block as pointing to (or containing) the object. So what does shared_ptr need to point to, other than the control block? Read the next section to find out :)

Point to `A`, manage `B`

This may sound somewhat bizarre at first, so bear with me. Say you have an internal object, B. This B has a few fields, but one of them, A, is exposed externally. One possible such scenario is where B is a channel to a database including the internal socket etc, and A is the API object on which a user acts. Usually you would have these as private members, or hide them behind an interface. But, again, bear with me – suppose you have a good reason.

Now, you want to manage B in a shared way, but you only want to give users A what do you do? It turns out shared_ptr supports this via a feature called aliasing.

With aliasing one can create a shared_ptr from another shared_ptr, so that their control blocks are the same, but have the get() method return any arbitrary pointer, even one that has nothing to do with them.

Here’s an example:

struct DatabaseConnection {}; // exposed to the user

struct InternalDatabaseConnection {
  // socket
  // authentication information
  DatabaseConnection connection;
};

std::shared_ptr<DatabaseConnection> CreateDatabaseConnection() {
  auto tmp = std::make_shared<InternalDatabaseConnection>();
  return std::shared_ptr<DatabaseConnection>(tmp, &tmp->connection);
}

Note that delete will never be called on &tmp->connection which is a DatabaseConnection, but rather only on InternalDatabaseConnection allocated by make_shared.

Casting

Like unique_ptr, shared_ptr also supports automatic cast from shared_ptr<T> to shared_ptr<U> if T* is convertible to U*.

Unlike unique_ptr, shared_ptr will always call the destructor it was constructed with, even when casting to a parent with no virtual destructor. Example:

#include <iostream>
#include <memory>

struct Base { ~Base() { std::cout << "non-virtual ~Base()" << std::endl; } };
struct Derived : Base { ~Derived() { std::cout << "~Derived()" << std::endl; } };

int main() {
	std::shared_ptr<Base> base = std::make_shared<Derived>();
}

Output:

~Derived()
non-virtual ~Base()

If we were to replace shared_ptr with unique_ptr (and make_shared with make_unique) the program would not call ~Derived.

In addition to that, there are 4 utility functions to allow creating a shared_ptr when implicit conversion doesn’t happen:

std::static_pointer_cast
std::dynamic_pointer_cast
std::const_pointer_cast
std::reinterpret_pointer_cast

Let’s look at an example:

auto derived = std::make_shared<Derived>();
std::shared_ptr<Base> base = derived;  // OK.
//std::shared_ptr<Derived> derived2 = base;  // ERROR: no implicit down-cast.
std::shared_ptr<Derived> derived2 = std::static_pointer_cast<Derived>(base);
std::shared_ptr<Derived> derived3 = std::dynamic_pointer_cast<Derived>(base);

Note that T* needs to be convertible to U*, which is different from T being convertible to U:

auto shared_short = std::make_shared<short>(123);
//std::shared_ptr<int> shared_int = shared_short;  // ERROR: no cast from short* to int*.

`std::enable_shared_from_this`

This is somewhat odd and very specific, so one last time I need you to bear with me;

Suppose you’re implementing a class WeirdClass. And suppose you know that this class will be managed by shared_ptr. And suppose that for some reason you would like to return a shared_ptr to yourself (yourself being an instance of WeirdClass). How would you do that? Let’s consider the following:

#include <iostream>
#include <memory>

struct WeirdClass {
  std::shared_ptr<WeirdClass> CreateSharedPtrToThis() {
    return std::shared_ptr<WeirdClass>(this);  // DON'T DO THIS.
  }
};

int main() {
  auto weird_class = std::make_shared<WeirdClass>();
  auto tmp = weird_class->CreateSharedPtrToThis();
}  // ERROR: double delete

This kind of error can generally be avoided by not calling shared_ptr’s constructor directly, but std::make_shared instead.

However, in this specific case the object is already allocated - we merely want to copy a shared_ptr that already exists, but is unknown in the context of WeirdClass. What do we do? This is exactly why std::enable_shared_from_this was invented:

#include <iostream>
#include <memory>

struct WeirdClass : std::enable_shared_from_this<WeirdClass> {
  std::shared_ptr<WeirdClass> CreateSharedPtrToThis() {
    return shared_from_this();
  }
};

int main() {
  auto weird_class = std::make_shared<WeirdClass>();
  auto tmp = weird_class->CreateSharedPtrToThis();
}  // no problem!

But please, don’t take this as a reason to use enable_shared_from_this. Some features are best not used :)

That’s it for today

I hope you found this post useful. Please let me know if I missed anything, have an error somewhere, or if you have any question!

Exploring std::unique_ptr

Sat, 12 Nov 2016 00:00:00 +0000

Today we’ll talk about C++’s built-in smart pointer std::unique_ptr, which is an extremely powerful, simple & common tool.

`std::unique_ptr`

C++98 had std::auto_ptr. It’s problematic in areas I do not wish to discuss here, and so it was deprecated and replaced by the awesome unique_ptr.

unique_ptr is a simple ‘smart’ pointer: it holds an instance of an object and deletes it when it goes out of scope. In terms of lifetime behvior it’s much like any regular C++ object with a constructor and destructor, only with dynamic memory allocation. No reference counting, no fancy tricks.

And that’s the beauty. Simple, elegant & efficient, yet extremely powerful. With unique_ptr you no longer have to worry about new and delete. Simply call make_unique() (instead of new), and the destructor will call delete automatically. It’s as simple as that, and covers 90% of use-cases that require dynamic memory.

`std::make_unique()`

Above I mentioned make_unique(). This is a new standard function that came in C++14. You may think that it’s supposed to save us the need to typing the type we’re interested in, similar to make_pair. Well, not quite. Let’s look at make_unique()’s signature:

// inside namespace std
template <typename T, typename ... Args>
std::unique_ptr<T> make_unique(Args&&... args);
// '&&' here means forwarding references, not necessarily rvalues. I hope to
// explain what these are in a future post.

Note that T is not an argument to the function, thus it can’t possible be deduced by the compiler. So to use make_unique() one must always provide T explicitly, like so:

auto u_int = std::make_unique<int>(123);
cout << (*u_int == 123) << endl;
auto u_string = std::make_unique<std::string>(3, '#');
cout << (*u_string == "###") << endl;

Output:

1
1

So essentially all make_unique() does is call new and pass args as arguments to T’s constructor. As a matter of fact, here is a feature-complete implementation:

template <typename T, typename ... Args>
std::unique_ptr<T> make_unique(Args&& ... args) {
  return std::unique_ptr<T>(new T(std::forward<Args>(args)...));
  // If you don't know what std::forward it simply read it as:
  // return std::unique_ptr<T>(new T(args...));
}

That’s it. If so, why do we even need it? What’s the difference between the following?:

auto a = std::make_unique<MyClass>();
auto b = std::unique_ptr<MyClass>(new MyClass());
std::unique_ptr<MyClass> c(new MyClass());

First difference is exception safety. In C++ < 17 calling the following can lead to a memory leak:

MyFunction(std::unique_ptr<MyClass>(new MyClass()),
           std::unique_ptr<MyClass>(new MyClass()));

How? Well, C++ doesn’t define the order of evaluation even between sub-expressions, so in theory it could evaluate the first new MyClass(), then the second new MyClass() and only then unique_ptr’s constructors. Now, if the first call to new succeeds and the second call to new throws an exception (like from MyClass’s constructor) it would leak memory as no class owns the newly created object yet.

But it’s not just exception safety. Look at the lines above, comparing the initialization of a, b and c. I think that a is the cleanest and least verbose. Furthermore, it’s the only one who doesn’t repeat MyClass twice.

And one last thing - I also prefer using make_unique() for assignment, not only for construction:

auto a = std::make_unique<int>(123);
a = std::make_unique<int>(456);  // Cool.
a.reset(new int(789));  // Works, but not as nice.

How to use `unique_ptr`

Use unique_ptr to represent any owned object that’s not shared. Here are some common examples:

Return a dynamically allocated object from a function

std::unique_ptr<MyObject> CreateMyObject();

This way whoever calls your function won’t leak. Even if they don’t assign the returned value to a variable:

SomeFunction();
CreateMyObject();  // no assignment, yet no leak
SomeOtherFunction();

Take ownership of a dynamically-allocated object

void TakeOwnership(std::unique_ptr<AnObject> obj);

With such signature callers can’t ignore the fact that you’re taking ownership of obj:

TakeOwnership(3);  // ERROR: no conversion from 'int' to 'std::unique_ptr<int>'

int* p = new int(3);
TakeOwnership(p);  // ERROR: no conversion from 'int*' to 'std::unique_ptr<int>'
                   // This is because unique_ptr's constructor is explicit.

auto u = std::make_unique<int>(3);
TakeOwnership(u);  // ERROR: no copy constructor

TakeOwnership(std::move(u));  // OK
TakeOwnership(std::make_unique<int>(3));  // OK
TakeOwnership(nullptr);  // OK -- due to constructor accepting nullptr_t

As you may have noticed, unique_ptr supports C++11’s move semantics, but does not allow copying. This makes sense – as the name suggests, each instance is supposed to only track a unique instance.

Dynamically-allocated class members

Use unique_ptr to automatically release class members when a class is released:

class SomeObject {
  // No destructor needed

private:
  std::unique_ptr<SomethingElse> m_SomethingElse;
};

Casting

unique_ptr supports construction of unique_ptr<T> from unique_ptr<U> if T* is convertible to U* (which usually means up-casting). Example:

struct Base { virtual ~Base() = default; };
struct Derived : Base {};

// ...
std::unique_ptr<Derived> derived = std::make_unique<Derived>();
std::unique_ptr<Base> base(std::move(derived));

Note that we must call std::move() on derived – that’s because there can’t be 2 instances of unique_ptr pointing to the same object, even if they are of different types.

Custom deleter

unique_ptr allows to specify a custom object that will be used for releasing the object. To use this, however, one must provide a second template argument. For example:

FILE* file = fopen("...", "r");
auto FILE_releaser = [](FILE* f) { fclose(f); };
std::unique_ptr<FILE, decltype(FILE_releaser)> file_ptr(file, FILE_releaser);

As demonstrated above, this can be very useful when working with C APIs, or APIs which have custom release logic.

Notes:

file_ptr above is not compatible with std::unique_ptr<FILE> as their 2nd template argument is different. Move-assignment, for example, will fail.
file_ptr still have the size of a single pointer. However, if we created FILE_releaser such that it captured variables – then file_ptr’s size would have increased as well.

Misusing `unique_ptr`

If you try hard, you could do some nasty things with unique_ptr. But you have to put some effort to do so. Here are a few examples:

Assigning the same pointer to multiple `unique_ptr`s

int* p = new int(123);
std::unique_ptr<int> a(p);
std::unique_ptr<int> b(p);  // Oops - b's destructor will double delete p

The above example can be avoided by always using make_unique() instead of calling new directly.

Here’s a similar example, but done without calling new directly:

std::unique_ptr<int> a = std::make_unique<int>(123);
std::unique_ptr<int> b(a.get());

Using unique_ptr’s constructor directly is not recommended, and passing the pointer returned by .get() to a function that is taking ownership is an error.

Deleting memory managed by `unique_ptr`s

auto u = std::make_unique<int>(123);
delete u.get();

This is an example of why you:

Don’t want to call delete directly;
Are not supposed to mess with unique_ptr’s memory

A word about arrays

unique_ptr also have partial specializaion to handle arrays. Specifically it calls delete[] rather than delete. However, using C-style arrays is something that should generally be avoided. Prefer std::array or std::vector where possible.

That’s it for today

In the next post we’ll talk about std::shared_ptr – unique_ptr’s brother which is very interesting, however less frequently used.

Exploring std::string

Mon, 12 Sep 2016 00:00:00 +0000

Every C++ developer knows that std::string represents a sequence of characters in memory. It manages its own memory, and is very intuitive to use. Today we’ll explore std::string as defined by the C++ Standard, and also by looking at 4 major implementations.

Quick note: in this post I use the notation of compilers to inspect implementations. This is technically incorrect, as a Standard library is not necessarily tied to a specific compiler. So when I say, for example, ‘GCC’ I really mean ‘GCC with its default Standard library’, which is libstdc++. For clang the library is libc++. For Microsoft Visual Studio its the library that ships with it, which is implemented by Dinkumware (I think).

Size

We’ll start with the most basic thing - what’s the sizeof of a std::string (not including any dynamic allocation)?

A naive implementation would require 3 fields, each the size of a pointer:

Pointer to the allocated memory;
Logical size of string;
Size of allocated memory (which must be bigger than or equal to logical size).

On a 64-bit machine the above would consume 24 bytes. Let’s examine the output of the following code on different compilers, all using 64-bit architecture:

#include <string>
#include <iostream>

int main() {
    std::cout << sizeof(std::string) << std::endl;
}

Visual Studio 14:

GCC < 5:

$ g++ -std=c++11 main.cpp && ./a.out
8

GCC >= 5:

$ g++-5 -std=c++11 main.cpp && ./a.out
32

clang:

$ clang++ -std=c++11 -stdlib=libc++ main.cpp && ./a.out
24

Keep these very different numbers in mind, and let’s continue exploring.

Small Object Optimization

One particular optimization found its way to pretty much all implementations: small objects optimization (aka small buffer optimization). Simply put, Small Object Optimization means that the std::string object has a small buffer for small strings, which saves dynamic allocations.

One might add a buffer on top of existing fields of std::string. However, there are some smart tricks to better use existing size.

#include <cstdlib>
#include <iostream>
#include <string>

// replace operator new and delete to log allocations
void* operator new(std::size_t n) {
    std::cout << "[Allocating " << n << " bytes]";
    return malloc(n);
}
void operator delete(void* p) throw() {
    free(p);
}

int main() {
    for (size_t i = 0; i < 24; ++i) {
        std::cout << i << ": " << std::string(i, '=') << std::endl;
    }
}

Visual Studio 14:

0:
1: =
2: ==
3: ===
4: ====
5: =====
6: ======
7: =======
8: ========
9: =========
10: ==========
11: ===========
12: ============
13: =============
14: ==============
15: ===============
[Allocating 32 bytes]16: ================
[Allocating 32 bytes]17: =================
[Allocating 32 bytes]18: ==================
[Allocating 32 bytes]19: ===================
[Allocating 32 bytes]20: ====================
[Allocating 32 bytes]21: =====================
[Allocating 32 bytes]22: ======================
[Allocating 32 bytes]23: =======================

With a size of 40 bytes (5 pointers), it appears as if there are 16 bytes dedicated to hold small strings. As we will see below, this is a somewhat wasteful implementation, and more juice can be squeezed from even smaller structs.

GCC < 5:

0: 
1: [Allocating 26 bytes]=
2: [Allocating 27 bytes]==
3: [Allocating 28 bytes]===
4: [Allocating 29 bytes]====
5: [Allocating 30 bytes]=====
6: [Allocating 31 bytes]======
7: [Allocating 32 bytes]=======
8: [Allocating 33 bytes]========
9: [Allocating 34 bytes]=========
10: [Allocating 35 bytes]==========
11: [Allocating 36 bytes]===========
12: [Allocating 37 bytes]============
13: [Allocating 38 bytes]=============
14: [Allocating 39 bytes]==============
15: [Allocating 40 bytes]===============
16: [Allocating 41 bytes]================
17: [Allocating 42 bytes]=================
18: [Allocating 43 bytes]==================
19: [Allocating 44 bytes]===================
20: [Allocating 45 bytes]====================
21: [Allocating 46 bytes]=====================
22: [Allocating 47 bytes]======================
23: [Allocating 48 bytes]=======================

As we will see soon, older libstdc++ implements copy-on-write, and so it makes sense for them to not utilize small objects optimization.

GCC >= 5:

$ g++-5 -std=c++11 main.cpp && ./a.out
0:
1: =
2: ==
3: ===
4: ====
5: =====
6: ======
7: =======
8: ========
9: =========
10: ==========
11: ===========
12: ============
13: =============
14: ==============
15: ===============
[Allocating 17 bytes]16: ================
[Allocating 18 bytes]17: =================
[Allocating 19 bytes]18: ==================
[Allocating 20 bytes]19: ===================
[Allocating 21 bytes]20: ====================
[Allocating 22 bytes]21: =====================
[Allocating 23 bytes]22: ======================
[Allocating 24 bytes]23: =======================

Recent GCC versions use a union of buffer (16 bytes) and capacity (8 bytes) to store small strings. Since reserve() is mandatory (more on this later), the internal pointer to the beginning of the string either points to this union or to the dynamically allocated string.

clang:

0: 
1: =
2: ==
3: ===
4: ====
5: =====
6: ======
7: =======
8: ========
9: =========
10: ==========
11: ===========
12: ============
13: =============
14: ==============
15: ===============
16: ================
17: =================
18: ==================
19: ===================
20: ====================
21: =====================
22: ======================
23: [Allocating 32 bytes]=======================

clang is by-far the smartest and coolest. While std::string has the size of 24 bytes, it allows strings up to 22 bytes(!!) with no allocation. To achieve this libc++ uses a neat trick: the size of the string is not saved as-is but rather in a special way: if the string is short (< 23 bytes) then it stores size() * 2. This way the least significant bit is always 0. The long form always bitwise-ors the LSB with 1, which in theory might have meant unnecessarily larger allocations, but this implementation always rounds allocations to be of form 16*n - 1 (where n is an integer). By the way, the allocated string is actually of form 16*n, the last character being '\0'. So freaking cool :).

Copy on Write

In the past it used to be valid (and some might even say encouraged) to implement std::string as a copy-on-write (using a reference-count) object. The Standard had careful wording to allow that first call to non-const methods (like operator[], begin(), at()) invalidate existing iterators.

As multi-threaded computing became more popular, it turned out that copy-on-write is slower than naive implementations due to necessary locking in almost every non-const operation. C++11 changed the wording, making copy-on-write incompliant.

GCC < 5 remained incompliant many years after the Standard had changed as they didn’t want to introduce an ABI change on such a critical component. Microsoft’s Visual Studio’s stl also dropped copy-on-write from as long as I can remember.

In the following piece of code I create a std::string with 50 times the character c, then change the first character without the std::string being aware of it (and of course don’t use this in production code!). I use 50 to make sure I don’t come across any small-object-optimization.

#include <string>
#include <iostream>

int main() {
  std::string a(50, 'c');
  std::string b = a;
  
  *const_cast<char*>(a.c_str()) = 'A';
  std::cout << "a: " << a << "\nb: " << b << std::endl;
}

Visual Studio 14:

a: Accccccccccccccccccccccccccccccccccccccccccccccccc
b: cccccccccccccccccccccccccccccccccccccccccccccccccc

GCC < 5:

$ g++ -std=c++11 main.cpp && ./a.out
a: Accccccccccccccccccccccccccccccccccccccccccccccccc
b: Accccccccccccccccccccccccccccccccccccccccccccccccc

Aha! Copy-on-write in action.

GCC >= 5:

$ g++-5 -std=c++11 main.cpp && ./a.out
a: Accccccccccccccccccccccccccccccccccccccccccccccccc
b: cccccccccccccccccccccccccccccccccccccccccccccccccc

clang:

$ clang++ -std=c++11 -stdlib=libc++ main.cpp && ./a.out
a: Accccccccccccccccccccccccccccccccccccccccccccccccc
b: cccccccccccccccccccccccccccccccccccccccccccccccccc

Allocate on `.reserve()`?

As you may know, std::string has a .size() method returning the size of the string it contains. This is not necessarily the allocated size, which might be bigger (but can’t be smaller).

One might wonder if there are any guarantees on capacity() and reserve(). In other words: can we implement a Standard compliant std::string with 2 members (pointer and size)? The answer to this question is a straight no. To quote the Standard:

After reserve(), capacity() is greater or equal to the argument of reserve.

So calling reserve() with a number bigger than size() forces std::string to grow. By the way, calling reserve() with a number <= size() is similar to calling shrink_to_fit(), which is a non-binding request.

`.c_str()` and `.data()` are the same

Calling .data() in C++98 would not necessarily return a null-terminated string. C++11 changes this and now both data() and c_str() return a string that must terminate with a '\0'.

Another thing C++11 brings is the ability to dereference the pointer returned from c_str()/data() even if empty() == true. No longer is it undefined to do for (const char* c = s.data(); *c != 0; ++c) ..., as now c_str()/data() will return a pointer to '\0'.

Binary data

std::string can hold binary data. That is, a string with '\0' in any position. Note that it’s not entirely trivial to create such a string. For example, the following will create a string with the content "hello" by-design, as we’re passing in a C-string:

std::string s = "hello\0world";

Even doing something like the following will fail for the exact same reason:

auto cstr = "hello\0world";
std::string s(cstr, strlen(cstr));

The following, however, will work:

// We could use const char[] or decltype(auto) here, but not auto. Why?
// Because auto will be decayed to const char*, which has the wrong size.
decltype(auto) cstr = "Hello\0World!";
std::string s(cstr, sizeof(cstr));

Also note that printing such strings may also be problematic, as some implementations stop at '\0'.

Growth strategy

Like std::vector, it’s important for std::string to grow in an efficient way. Let’s examine the following:

#include <cstdlib>
#include <iostream>
#include <string>

// replace operator new and delete to log allocations
void* operator new(std::size_t n) {
    std::cout << "Allocating " << n << " bytes" << std::endl;
    return malloc(n);
}
void operator delete(void* p) throw() {
    free(p);
}

int main() {
    std::string s;
    for (size_t i = 0; i != 1000000; ++i) {
        s += '.';
    }
}

Visual Studio 14:

Allocating 32 bytes
Allocating 48 bytes
Allocating 71 bytes
Allocating 106 bytes
Allocating 158 bytes
Allocating 236 bytes
Allocating 353 bytes
Allocating 529 bytes
Allocating 793 bytes
Allocating 1189 bytes
Allocating 1783 bytes
Allocating 2674 bytes
Allocating 4010 bytes
Allocating 6053 bytes
Allocating 9059 bytes
Allocating 13568 bytes
Allocating 20332 bytes
Allocating 30478 bytes
Allocating 45697 bytes
Allocating 68525 bytes
Allocating 102767 bytes
Allocating 154130 bytes
Allocating 231175 bytes
Allocating 346742 bytes
Allocating 520093 bytes
Allocating 780119 bytes
Allocating 1170158 bytes

Growth strategy is 1.5x.

GCC < 5:

$ g++ -std=c++11 main.cpp && ./a.out
Allocating 26 bytes
Allocating 27 bytes
Allocating 29 bytes
Allocating 33 bytes
Allocating 41 bytes
Allocating 57 bytes
Allocating 89 bytes
Allocating 153 bytes
Allocating 281 bytes
Allocating 537 bytes
Allocating 1049 bytes
Allocating 2073 bytes
Allocating 8160 bytes
Allocating 16352 bytes
Allocating 32736 bytes
Allocating 65504 bytes
Allocating 131040 bytes
Allocating 262112 bytes
Allocating 524256 bytes
Allocating 1048544 bytes

Growth strategy is a bit tricky. On my machine the page size is 4096, and so for strings larger than that it rounds up to fit to a page and grows 2x. For small strings, growth is done by adding multiples of 2 (thanks Marek!).

GCC >= 5:

$ g++-5 -std=c++11 main.cpp && ./a.out
Allocating 31 bytes
Allocating 61 bytes
Allocating 121 bytes
Allocating 241 bytes
Allocating 481 bytes
Allocating 961 bytes
Allocating 1921 bytes
Allocating 3841 bytes
Allocating 7681 bytes
Allocating 15361 bytes
Allocating 30721 bytes
Allocating 61441 bytes
Allocating 122881 bytes
Allocating 245761 bytes
Allocating 491521 bytes
Allocating 983041 bytes
Allocating 1966081 bytes

Growth strategy is 2x.

clang:

$ clang++ -std=c++11 -stdlib=libc++ main.cpp && ./a.out
Allocating 48 bytes
Allocating 96 bytes
Allocating 192 bytes
Allocating 384 bytes
Allocating 768 bytes
Allocating 1536 bytes
Allocating 3072 bytes
Allocating 6144 bytes
Allocating 12288 bytes
Allocating 24576 bytes
Allocating 49152 bytes
Allocating 98304 bytes
Allocating 196608 bytes
Allocating 393216 bytes
Allocating 786432 bytes
Allocating 1572864 bytes

Growth strategy is 2x.

Move semantics

Taking advantage of C++11’s move semantics will benefit every library. Apache’s retired stdcxx implementation is the only one I know of to not support them.

One interesting this to note is that with small objects, where Small Object Optimization exists, a move constructor is implemented as a copy constructor. This of course is purely semantic, as the data structure’s fields need to be copied anyway, but is nevertheless interesting to think about.

Memory

The Standard instructs implementations to use contiguous memory for the entire string, which is convenient when working with some C-APIs.

Summary

C++’s built-in std::string is a well-designed class with some great implementations. Today we looked at some important features and hopefully learned a thing or two. Please let me know if there are more things you’re interested in, either via a comment below or through the ‘About’ page.

Template SFINAE & type-traits

Sat, 21 May 2016 00:00:00 +0000

Today’s post is about template SFINAE & type-traits - cool C++ features with great compile-time power.

Template SFINAE

SFINAE is a scary-looking C++ acronym, which joins a long list of hard-to-remember capital-letter concepts (such as RAII, RVO, RTTI, PIMPL, etc).

SFINAE stands for “Substitution Failure In Not An Error”. Simply put, it means that when a compiler fails to substitute a template parameter it should continue looking instead of giving up.

Here’s a quick example:

#include <iostream>
using namespace std;

template <typename T> void foo(typename T::type) { cout << "1st" << endl; }
template <typename T> void foo(T) { cout << "2nd" << endl; }

struct MyStruct {
    using type = int;
};

int main() {
    foo<MyStruct>(2);  // ok - calls first version
    foo<int>(2);       // also ok - calls second version
}

Output:

1st
2nd

One important thing to note is that SFINAE, like its name suggests, only happens during substitution. Essentially it means that, like above, providing an error during function matching (aka overload resolution) is OK and the compiler will continue on its journey to find the proper function. However, failure inside a function’s body will yield an unrecoverable compiler error, afterwhich the compiler will not continue to search for other functions.

Here’s an example. If you don’t know what forward references are, you can safely ignore the second version of zoo(); All you need to know is that both functions could match on any parameter passed.

template <typename Container>
void zoo(const Container& container) {  // 1st version
    auto it = container.begin();
}

template <typename NonContainer>
void zoo(NonContainer&& non_container) {}  // 2nd version

int main() {
    std::less<int> l;

    // take a const-reference to ensure 1st version is called; If we used 'l'
    // the 2nd version would be preferred with NonContainer == std::less<int>&
    const std::less<int>& r = l;

    zoo(r);  // ERROR: no member named 'begin' in 'std::less<int>';
             // Compiler will *not* attempt to use the 2nd version of zoo
}

Type Traits

SFINAE has already been in C++98. C++11 introduces a new feature, not directly related but often used together: template-partial-specialization (which might have its own post at a later point). By combining these 2 features together one can create some powerful tricks.

Here’s a simple implementation of std::enable_if (found in <type_traits>):

// this is an actual complete implementation of std::enable_if found in std
// header <type_traits>
template <bool Condition, typename T = void>
struct enable_if {
    // No 'type' here, so any attempt to use it will fail substitution
};

// partial specialization for when Condition==true
template <typename T>
struct enable_if<true, T> {
    using type = T;
};

This simple-but-powerful struct allows us to set compile-time conditions on our functions and classes. The way one uses enable_if is by specifying a condition as the first template-parameter, and a type T (optional) that will be used if the condition is true. Take a minute to think about it as it’s not trivial.

Here’s a simple example of a function bar() which only accepts arguments that are enums. Attempting to pass non-enum will fail compilation:

// T must be an enum type.
// Second template argument is only used to enforce T's type, therefore it
// doesn't have a name and it is not used.
template <typename T,
          typename = typename enable_if<std::is_enum<T>::value, void>::type>
void bar(T t) {}

enum Enum1 { A, B };
enum class Enum2 { C, D };

int main() {
    bar(A);
    bar(Enum2::C);
    bar(1); // compile error - "no matching function for call to 'bar(int)'"
}

Another way to achieve basically the same thing is as follows:

// If T is enum - return value is void, otherwise - substitution failure.
template <typename T>
typename std::enable_if<std::is_enum<T>::value, void>::type bar2(T t) {}

// Rest unchanged.

Here’s a summary of the differences between bar() and bar2():

bar()	bar2()
2 template arguments	1 template argument
Simple `void` return value	Conditional, harder to understand return-value
Bypassable “security” (by specifying 2nd template argument)	Unbreakable
Requires additional template parameter	No additional parameters

Please note that the above bar*()s could have been implemented using static_assert() instead of enable_if. Choose the right tool for the job. Here’s a quick comparison:

`enable_if`	`static_assert()`
Allows complicated overload rules without template specialization	Forces compilation to fail when a criteria was not met
Takes place during function matching	Only takes place after overload resolution, when a function has been selected
Long, harder to parse compilation errors	Compile error is user-defined (2nd parameter to `static_assert()`)

For more awesome type traits check out the standard C++ header <type_traits> - it’s full of goodies you might find useful. Some traits are even impossible to implement in C++, and require special compiler support (like has_virtual_destructor).

Try it out!

5 Minute Practice (use cpp.sh if you don’t have a compiler handy - it’s an online C++ compiler)

As a practice, try to implement the following classes (they all should define static constexpr bool value as true or false, according to their names:

std::is_pointer<T> - value is true if T is a pointer, false otherwise
std::is_const<T> - value is true if T has const qualifier, false otherwise
std::is_void<T> - value is true if T is void, false otherwise
std::is_same<T, U> - value is true if T is of the same type as U, false otherwise

UserData class

Tue, 26 Apr 2016 00:00:00 +0000

Many libraries provide their users with a way to add user-defined content to the library’s objects. This is especially true for libraries in the graphics domain. Examples:

Win32’s SetWindowLongPtr(): allows setting void* on an HWND;
OGRE3d’s UserObjectBindings: allows setting Any (similar to boost::any) and also map strings to Any on various objects;
Cocos2d-x’s get/setUserData() and get/setUserObject(): allows setting void* or inherit from Object and store it in various objects;
Box2d’s Get/SetUserData(): allows saving a void* on various physics objects in a world;
And so on…

You get the point. Sometimes libraries help you add context to their objects, but they don’t know your classes (nor should they). They are stuck with suboptimal solutions:

void* - type-unsafe, can’t be deleted by the library, only allows 1 object;
mapping a string to a void* - type-unsafe, strings are typo-prone, can’t be deleted by the library;
Inheriting from an Object-like class - downcast is type-unsafe, allows only 1 object.

There are many variations of these solutions, but I have yet to encounter one I like. So I’d like to propose one.

UserData class

Let’s start with an API:

class UserData {
public:
    template <class T>
    void Set(const T& t);

    template <class T>
    T& Get() const;

    template <class T>
    bool Has() const;

    template <class T>
    void Clear();
};

This allows users to store and retrieve their very own classes (plural!), in a type-safe manner. It is also not a templated class (though it obviously has templated methods), so it has a fixed size no matter what you store in it. I think this is a very neat API. In my particular scenario I don’t care about thread-safety, which simplifies things a bit.

Now take a moment to think about how you would implement this. It is non-trivial. If you don’t have a compiler at hand, try Ideone.com or cpp.sh. Come back with an answer!

You’re back? Awesome. Let’s explore a few possible solutions for this interesting problem.

`static`-based solution

This is the first solution I came up with, and it is far from perfect. It is interesting nontheless, and so I decided to put it here despite how embarrassing it is.

The idea here it to have a static template member-function (called it Multitool()). This function has a static unordered_map<int, T> (defined inside the function) which is used as storage for Ts. Each UserData instance has a unique int id (which is the key in the above map). This function takes care of storing and retrieving Ts, and is called by the templated-public API methods.

This is one ugly solution:

Storage for Ts is static, so there’s some trickery to release memory when the UserData object is released (see m_Clear below);
Multitool() has an unsafe & ugly API (although it’s private so it’s less horrible);
We need to assign a unique id to each UserData instance. Again, not horrible, but it would be nice to avoid.

Here’s a complete implementation:

class UserData final {
public:
    UserData();
    ~UserData();

    template <class T>
    bool Has() const {
        void* vt = nullptr;
        Multitool<T>(MultitoolCommand::Get, &vt, m_UniqueId);
        T* t = (T*)vt;
        return (t != nullptr);
    }

    template <class T>
    T& Get() const {
        void* vt = nullptr;
        Multitool<T>(MultitoolCommand::Get, &vt, m_UniqueId);
        T* t = (T*)vt;
        assert(t != nullptr);

        m_Clear.insert(Multitool<T>);

        return *t;
    }

    template <class T>
    void Set(const T& t) {
        T* tmp = &t;
        void* vt = (void*)tmp;
        Multitool<T>(MultitoolCommand::Set, &vt, m_UniqueId);

        m_Clear.insert(Multitool<T>);
    }

    template <class T>
    void Clear() {
        void* tmp = nullptr;
        Multitool<T>(MultitoolCommand::Clear, &tmp, m_UniqueId);

        m_Clear.erase(Multitool<T>);
    }

private:
    enum class MultitoolCommand {
        Get,
        Set,
        Clear,
    };

    // This function is weird because it is the single point of
    // storage for Ts, and so has to satisfy all public operations
    template <class T>
    static void Multitool(MultitoolCommand command, void** vt, int id) {
        static std::unordered_map<int, T> store;
        T*& t = *((T**)(vt));

        switch (command) {
        case MultitoolCommand::Get: {
                auto it = store.find(id);
                if (it == store.end()) {
                    t = nullptr;
                } else {
                    t = &(it->second);
                }
            }
            break;
        case MultitoolCommand::Set: {
                auto it = store.find(id);
                if (it == store.end()) {
                    store.emplace(id, *t);
                } else {
                    it->second = *t;
                }
            }
            break;
        case MultitoolCommand::Clear:
            store.erase(id);
            break;
        }
    }

    static int m_NextUniqueId;
    int const m_UniqueId;

    typedef void (*TClearFunc)(MultitoolCommand, void**, int);
    mutable std::set<TClearFunc> m_Clear;
};

// .cpp file:
int UserData::m_NextUniqueId = 0;

UserData::UserData()
:    m_UniqueId(m_NextUniqueId++) {
}

UserData::~UserData() {
    void* tmp = nullptr;
    for (auto it : m_Clear) {
        it(MultitoolCommand::Clear, &tmp, m_UniqueId);
    }
}

`type_index`-based solution

In >= C++11 we can use type_index to store types as keys in associative containers (in this case an unordered_map).

We need some small trickery to call T’s correct destructor. We also require RTTI support (although we don’t use the ‘runtime’ part of it - we just need the tables to exist in the binary).

Here’s the complete solution:

class UserData final {
public:
    UserData() = default;
    ~UserData() = default;

    template <class T>
    bool Has() const {
        auto const& it = m_Items.find(typeid(T));
        return (it != m_Items.end() && it->second.get() != nullptr);
    }

    template <class T>
    T& Get() const {
        auto const& it = m_Items.find(typeid(T));
        assert(it != m_Items.end());
        assert(it->second.get() != nullptr);
        return static_cast<Wrapper<T>*>(it->second.get())->t;
    }

    // It may have been better to take a pointer to T instead. Up to you.
    template <class T>
    void Set(const T& t) {
        m_Items[typeid(T)] = std::make_unique<Wrapper<T>>(t);
    }

    template <class T>
    void Clear() {
        m_Items.erase(typeid(T));
    }

private:
    class EmptyBase {
    public:
        virtual ~EmptyBase() = default;
    };

    template <class T>
    class Wrapper : public EmptyBase {
    public:
        Wrapper() = default;
        Wrapper(T t_) : t(t_) {}

        T t;
    };

    std::unordered_map<std::type_index, std::unique_ptr<EmptyBase>> m_Items;
};

Custom type-id solution

I have not yet encountered a team who is asking the compiler to not generate RTTI tables. I’m sure that there are such teams out there. But even if you have RTTI - it’s much slower than, say, a simple integer.

But how can we assign a unique integer per-type? Using a simple templates trick:

size_t type_id = 0;

template <typename T>
size_t GetTypeId() {
    static size_t t_id = type_id++; // use std::atomic for thread safety
    return t_id;
}

This even allows us to have a contiguous std::vector (as long as we never shrink it). With a solution similar to EmptyBase and Wrapper above we can get better performance, better CPU caching and less memory usage. Unfortunately, we still have to use a vector of pointers rather than have the objects directly in it, as we don’t know their sizes up front.

Complete solution:

class UserData {
public:
    template <class T>
    void Set(const T& t) {
        auto id = GetTypeId<T>();
        assert(m_Items.size() >= id);
        if (id <= m_Items.size()) {
            m_Items.resize(id+1);
        }
        m_Items[id] = std::make_unique<Wrapper<T>>(t);
    }

    template <class T>
    T& Get() const {
        auto id = GetTypeId<T>();
        assert(m_Items.size() > id);
        auto const& it = m_Items[id];
        assert(it);
        return static_cast<Wrapper<T>&>(*it.get()).t;
    }

    template <class T>
    bool Has() const {
        auto id = GetTypeId<T>();
        return m_Items.size() > id && m_Items[id];
    }

    template <class T>
    void Clear() {
        auto id = GetTypeId<T>();
        assert(m_Items.size() > id);
        m_Items[id].reset();
    }

private:
    class EmptyBase {
    public:
        virtual ~EmptyBase() = default;
    };

    template <class T>
    class Wrapper : public EmptyBase {
    public:
        Wrapper() = default;
        Wrapper(T t_) : t(t_) {}

        T t;
    };

    std::vector<std::unique_ptr<EmptyBase>> m_Items;
};

If you have better ideas I am happy to hear! Let me know what you think in the comments below.

Naive std::function implementation

Fri, 08 Apr 2016 00:00:00 +0000

After exploring std::function in a previous post, I thought that it might be a good practice to implement a simple (and partial) std::function. It turned out to be much less code than I anticipated. I hope you’ll like it.

Features

While std::function has a few typedefs and methods, the core functionality is assignment and invocation:

Assignment through operator=
Invocation through operator()

Even though std::function has a bit more functionality to it, we will only implement the above 2 methods.

Declaration

The standard specifies that std::function will be declared as follows:

namespace std {
	template <typename>
	class function; // no definition

	template <typename ReturnValue, typename ... Args>
	class function<ReturnValue(Args...)> {
		// ...
	};
}

Why have a std::function with no definition that is never used? Well, ideally you’d only have the second version. However, that version is a partial template specialization of the first. Another way could have been to define std::function as:

template <typename ReturnValue, typename ... Args>
class function { ... };

But this would mean that clients would look like std::function<int, bool, float> rather than std::function<int(bool, float)>. I personally think that the latter is much nicer, but there’s just no syntax to express this without partial specialization.

So let’s copy that:

template <typename>
class naive_function; // no definition

template <typename ReturnValue, typename ... Args>
class naive_function<ReturnValue(Args...)> {
public:
	// operator= goes here
	// operator() goes here
private:
	...
};

Now any attempt to (mis)use naive_function with a simple argument list (example: naive_function<bool, int>) will yield a compiler error along the lines of “using undefined class naive_function”.

Groundwork

Before we move to implement operator= and operator() we need to write some supporting code. The following classes will be internal private to naive_function, so they know ReturnValue and Args.

Let’s start with an interface:

class ICallable {
public:
	virtual ~ICallable() = default;
	virtual ReturnValue Invoke(Args...) = 0;
};

Easy enough. Now for a concrete implementor:

template <typename T>
class CallableT : public ICallable {
public:
	CallableT(const T& t)
		: t_(t) {
	}

	~CallableT() override = default;

	ReturnValue Invoke(Args... args) override {
		return t_(args...);
	}

private:
	T t_;
};

Implementation

With the help of the above very simple classes it is now almost trivial to implement naive_function:

template <typename ReturnValue, typename... Args>
class naive_function<ReturnValue(Args...)> {
public:
	template <typename T>
	naive_function& operator=(T t) {
		callable_ = std::make_unique<CallableT<T>>(t);
		return *this;
	}

	ReturnValue operator()(Args... args) const {
		assert(callable_);
		return callable_->Invoke(args...);
	}

private:
	// ICallable as implemented above.
	// CallableT as implemented above.

	std::unique_ptr<ICallable> callable_;
};

There’s not even a lot of magic here:

operator= is templated, where T is anything that can be called with Args... and return ReturnValue. There we dynamically create a CallableT<T> which is assigned to callable_ (of type std::unique_ptr<ICallable>). Now the vtable knows how to execute the proper code at runtime.

operator() is trivial. It is not allowed to be called before operator= was called, thus the assert. After that, simply Invoke callable_ and return its return-value.

Let’s test our creation:

void func() {
	cout << "func" << endl;
}

struct functor {
	void operator()() {
		cout << "functor" << endl;
	}
};

int main() {
	naive_function<void()> f;
	f = func;
	f();
	f = functor();
	f();
	f = []() { cout << "lambda" << endl; };
	f();
}

Output:

func
functor
lambda

Future improvements

This implementation lacks a few things which I consider beyond the scope of this post, but feel free to implement them on your own:

Forwarding references and perfect forwarding

Specifically in the following places:

naive_function’s operator=
naive_function’s operator()
CallableT’s constructor

Small-object optimization

For more details about Small Object Optimization (SOO) or Small String Optimization (SSO) see my previous posts about std::string and std::function

Clang reserves 16 bytes for small objects in order to save dynamic allocations. In naive_function we always allocate dynamically.

Special handling for `operator=` with `naive_function`

We have a templated operator=. Can you guess what happens in the following piece of code?:

naive_function<void()> f;
naive_function<void()> f2;
f2 = f;

I was surprised by this, but this actually fails to compile (tested on Visual Studio and clang). Reason is that copy-assignment-operator is deleted due to the fact that callable_ has no copy-assignment-operator. It does not fallback to our operator=.

But even if we got it to work, it would create an inefficient double-dereference (or more if this was assigned to yet another naive_function).

An operator which would copy the internals would save this, and will also behave more sanely when the user changes objects that have been copied.

Appendix: Full code

#include <iostream>
#include <memory>
#include <cassert>
using namespace std;

template <typename T>
class naive_function;

template <typename ReturnValue, typename... Args>
class naive_function<ReturnValue(Args...)> {
public:
	template <typename T>
	naive_function& operator=(T t) {
		callable_ = std::make_unique<CallableT<T>>(t);
		return *this;
	}

	ReturnValue operator()(Args... args) const {
		assert(callable_);
		return callable_->Invoke(args...);
	}

private:
	class ICallable {
	public:
		virtual ~ICallable() = default;
		virtual ReturnValue Invoke(Args...) = 0;
	};

	template <typename T>
	class CallableT : public ICallable {
	public:
		CallableT(const T& t)
			: t_(t) {
		}

		~CallableT() override = default;

		ReturnValue Invoke(Args... args) override {
			return t_(args...);
		}

	private:
		T t_;
	};

	std::unique_ptr<ICallable> callable_;
};

void func() {
	cout << "func" << endl;
}

struct functor {
	void operator()() {
		cout << "functor" << endl;
	}
};

int main() {
	naive_function<void()> f;
	f = func;
	f();
	f = functor();
	f();
	f = []() { cout << "lambda" << endl; };
	f();
}

C++ vtables - Part 4 - Compiler-Generated Code

Tue, 22 Mar 2016 00:00:00 +0000

So far in this mini-series we learned how the vtables and typeinfo records are placed in our binaries and how the compiler uses them. Now we’ll understand some of the work the compiler does for us automatically.

Constructors

For any class’s constructor the following code is generated:

Call parent(s) constructors if there are any;
Set vtable pointer(s) if there are any;
Initialize members according to initializer list;
Execute code inside constructor’s brackets.

All of the above can happen without explicit code:

Parent default constructors happen automatically unless otherwise specified;
Members are default initialized unless they have a default value or an entry in the initializer list;
The entire constructor can be marked = default.
Only the vtable assignment is always hidden.

Here’s an example:

#include <iostream>
#include <string>
using namespace std;

class Parent {
public:
    Parent() { Foo(); }
    virtual ~Parent() = default;
    virtual void Foo() { cout << "Parent" << endl; }
    int i = 0;
};

class Child : public Parent {
public:
    Child() : j(1) { Foo(); }
    void Foo() override { cout << "Child" << endl; }
    int j;
};

class Grandchild : public Child {
public:
    Grandchild() { Foo(); s = "hello"; }
    void Foo() override { cout << "Grandchild" << endl; }
    string s;
};

int main() {
    Grandchild g;
}

Let’s write the pseudo-code for each class’s constructor:

Parent	Child	Grandchild
1. vtable = Parent’s vtable;	1. Call Parent’s default c’tor;	1. Call Child’s default c’tor;
2. i = 0;	2. vtable = Child’s vtable;	2. vtable = Grandchild’s vtable;
3. Call Foo();	3. j = 1;	3. Call s’s default c’tor;
	4. Call Foo();	4. Call Foo();
		5. Call operator= on s;

Given this, it’s no surprise that in the context of a class constructor, the vtable points to that very class’s vtable rather than its concrete class. This means that virtual calls are resolved as if no inheritors are available. Thus the output is:

Parent
Child
Grandchild

What about pure virtual functions? If they are not implemented (yes, you can implement pure virtual functions, but why would you?) you’re probably (and hopefully) going to segfault. Some compilers actually omit an error about this, which is cool.

Destructors

As one might imagine, destructors have the same behavior of constructors, only happen in reverse order.

Here’s a quick thought-exercise: why do destructors change the vtable pointer to point to the their own class’s rather than keep it pointing to the concrete class? Answer: Because by the time the destructor runs, any inheriting class had already been destroyed. Calling such class’s methods is not something you want to do.

Implicit casts

As we saw in Part 2 & Part 3, a pointer to a child is not necessarily equal to the same instance’s parent pointer (like in multiple inheritance).

Yet, there’s no added work for you (the developer) to call a function that receives a parent’s pointer. This is because the compiler implicitly offsets this when you up-cast pointers and references to parent classes.

Dynamic casts (RTTI)

Dynamic casts use the typeinfo tables we explored in Part 1. They do it in runtime by looking at the typeinfo record that’s 1 pointer before what vtable pointer points to, and use the class there to check whether or not a cast is possible.

This explains the cost of dynamic_cast when used a lot.

Method pointers

I plan to write a full post about method pointers in the future. Until then I’d like to stress that a method pointer pointing at a virtual function will actually call the overridden method (unlike non-member function pointers).

// TODO: add a link when the post is alive

Test yourself!

You should now be able to explain to yourself why the following piece of code behaves the way it does:

#include <iostream>
using namespace std;

class FooInterface {
public:
	virtual ~FooInterface() = default;
	virtual void Foo() = 0;
};

class BarInterface {
public:
	virtual ~BarInterface() = default;

	virtual void Bar() = 0;
};

class Concrete : public FooInterface, public BarInterface {
public:
	void Foo() override { cout << "Foo()" << endl; }
	void Bar() override { cout << "Bar()" << endl; }
};

int main() {
	Concrete c;
	c.Foo();
	c.Bar();

	FooInterface* foo = &c;
	foo->Foo();

	BarInterface* bar = (BarInterface*)(foo);
	bar->Bar(); // Prints "Foo()" - WTF?
}

This concludes my first blog post, which grew to become a 4 piece post. I hope you learned some new things, I know I sure did.

Shahar Mike's Web Spot

How We Optimized Dragonfly to Get 30x Throughput with BullMQ

Longest C++ Variable Declaration

My Solution

Cheating, aka Stack Overflow

Compiling Clang from Scratch

But Why?

Step 1 - Clone

Step 2 - Run CMake

Step 3 - Build

Step 4 - Profit

Integrate with CMake

Move Semantics

Motivation

More Practical Example

Move Constructor / Move Assignment

Interim Summary

Temporary Objects - Intuition

std::move()

Rule of 3 becomes Rule of 5

That’s it

Dollhouse

Tools Used

From Design to Product

What I Learned

Free Cloud VM & HTTPS

Return Value Optimization

Return Value Optimization

Performance

Named Return Value Optimization (NRVO)

Copy Elision

When RVO doesn’t / can’t happen

Deciding on Instance at Runtime

Returning a Parameter / Global

Returning by std::move()

Assignment

Returning Member

Conclusion

Using libclang to Parse C++ (aka libclang 101)

libclang?

libclang!

Basic example

Cursors

Visit children

CXString

Print meaningful output

A more complicated example

Conclusion

Complete Code

Exploring std::shared_ptr

std::shared_ptr

Performance & thread safety

std::make_shared()

Construct from unique_ptr

No release() method, reset() doesn’t necessarily release

Cyclic references & std::weak_ptr

Control block

Point to A, manage B

Casting

std::enable_shared_from_this

That’s it for today

Exploring std::unique_ptr

std::unique_ptr

std::make_unique()

How to use unique_ptr

Return a dynamically allocated object from a function

Take ownership of a dynamically-allocated object

Dynamically-allocated class members

Casting

Custom deleter

Misusing unique_ptr

Assigning the same pointer to multiple unique_ptrs

Deleting memory managed by unique_ptrs

A word about arrays

That’s it for today

Exploring std::string

Size

Small Object Optimization

Copy on Write

Allocate on .reserve()?

`std::move()`

Returning by `std::move()`

`std::shared_ptr`

`std::make_shared()`

Construct from `unique_ptr`

No `release()` method, `reset()` doesn’t necessarily release

Cyclic references & `std::weak_ptr`

Point to `A`, manage `B`

`std::enable_shared_from_this`

`std::unique_ptr`

`std::make_unique()`

How to use `unique_ptr`

Misusing `unique_ptr`

Assigning the same pointer to multiple `unique_ptr`s

Deleting memory managed by `unique_ptr`s

Allocate on `.reserve()`?

`.c_str()` and `.data()` are the same

`static`-based solution

`type_index`-based solution

Special handling for `operator=` with `naive_function`