Exploring std::string

(2029 words)

Every C++ developer knows that std::string represents a sequence of characters in memory. It manages its own memory, and is very intuitive to use. Today we’ll explore std::string as defined by the C++ Standard, and also by looking at 4 major implementations.

Quick note: in this post I use the notation of compilers to inspect implementations. This is technically incorrect, as a Standard library is not necessarily tied to a specific compiler. So when I say, for example, ‘GCC’ I really mean ‘GCC with its default Standard library’, which is libstdc++. For clang the library is libc++. For Microsoft Visual Studio its the library that ships with it, which is implemented by Dinkumware (I think).

Size

We’ll start with the most basic thing - what’s the sizeof of a std::string (not including any dynamic allocation)?

A naive implementation would require 3 fields, each the size of a pointer:

On a 64-bit machine the above would consume 24 bytes. Let’s examine the output of the following code on different compilers, all using 64-bit architecture:

#include <string>
#include <iostream>

int main() {
    std::cout << sizeof(std::string) << std::endl;
}

Visual Studio 14:

40

GCC < 5:

$ g++ -std=c++11 main.cpp && ./a.out
8

GCC >= 5:

$ g++-5 -std=c++11 main.cpp && ./a.out
32

clang:

$ clang++ -std=c++11 -stdlib=libc++ main.cpp && ./a.out
24

Keep these very different numbers in mind, and let’s continue exploring.

Small Object Optimization

One particular optimization found its way to pretty much all implementations: small objects optimization (aka small buffer optimization). Simply put, Small Object Optimization means that the std::string object has a small buffer for small strings, which saves dynamic allocations.

One might add a buffer on top of existing fields of std::string. However, there are some smart tricks to better use existing size.

#include <cstdlib>
#include <iostream>
#include <string>

// replace operator new and delete to log allocations
void* operator new(std::size_t n) {
    std::cout << "[Allocating " << n << " bytes]";
    return malloc(n);
}
void operator delete(void* p) throw() {
    free(p);
}

int main() {
    for (size_t i = 0; i < 24; ++i) {
        std::cout << i << ": " << std::string(i, '=') << std::endl;
    }
}

Visual Studio 14:

0:
1: =
2: ==
3: ===
4: ====
5: =====
6: ======
7: =======
8: ========
9: =========
10: ==========
11: ===========
12: ============
13: =============
14: ==============
15: ===============
[Allocating 32 bytes]16: ================
[Allocating 32 bytes]17: =================
[Allocating 32 bytes]18: ==================
[Allocating 32 bytes]19: ===================
[Allocating 32 bytes]20: ====================
[Allocating 32 bytes]21: =====================
[Allocating 32 bytes]22: ======================
[Allocating 32 bytes]23: =======================

With a size of 40 bytes (5 pointers), it appears as if there are 16 bytes dedicated to hold small strings. As we will see below, this is a somewhat wasteful implementation, and more juice can be squeezed from even smaller structs.

GCC < 5:

0: 
1: [Allocating 26 bytes]=
2: [Allocating 27 bytes]==
3: [Allocating 28 bytes]===
4: [Allocating 29 bytes]====
5: [Allocating 30 bytes]=====
6: [Allocating 31 bytes]======
7: [Allocating 32 bytes]=======
8: [Allocating 33 bytes]========
9: [Allocating 34 bytes]=========
10: [Allocating 35 bytes]==========
11: [Allocating 36 bytes]===========
12: [Allocating 37 bytes]============
13: [Allocating 38 bytes]=============
14: [Allocating 39 bytes]==============
15: [Allocating 40 bytes]===============
16: [Allocating 41 bytes]================
17: [Allocating 42 bytes]=================
18: [Allocating 43 bytes]==================
19: [Allocating 44 bytes]===================
20: [Allocating 45 bytes]====================
21: [Allocating 46 bytes]=====================
22: [Allocating 47 bytes]======================
23: [Allocating 48 bytes]=======================

As we will see soon, older libstdc++ implements copy-on-write, and so it makes sense for them to not utilize small objects optimization.

GCC >= 5:

$ g++-5 -std=c++11 main.cpp && ./a.out
0:
1: =
2: ==
3: ===
4: ====
5: =====
6: ======
7: =======
8: ========
9: =========
10: ==========
11: ===========
12: ============
13: =============
14: ==============
15: ===============
[Allocating 17 bytes]16: ================
[Allocating 18 bytes]17: =================
[Allocating 19 bytes]18: ==================
[Allocating 20 bytes]19: ===================
[Allocating 21 bytes]20: ====================
[Allocating 22 bytes]21: =====================
[Allocating 23 bytes]22: ======================
[Allocating 24 bytes]23: =======================

Recent GCC versions use a union of buffer (16 bytes) and capacity (8 bytes) to store small strings. Since reserve() is mandatory (more on this later), the internal pointer to the beginning of the string either points to this union or to the dynamically allocated string.

clang:

0: 
1: =
2: ==
3: ===
4: ====
5: =====
6: ======
7: =======
8: ========
9: =========
10: ==========
11: ===========
12: ============
13: =============
14: ==============
15: ===============
16: ================
17: =================
18: ==================
19: ===================
20: ====================
21: =====================
22: ======================
23: [Allocating 32 bytes]=======================

clang is by-far the smartest and coolest. While std::string has the size of 24 bytes, it allows strings up to 22 bytes(!!) with no allocation. To achieve this libc++ uses a neat trick: the size of the string is not saved as-is but rather in a special way: if the string is short (< 23 bytes) then it stores size() * 2. This way the least significant bit is always 0. The long form always bitwise-ors the LSB with 1, which in theory might have meant unnecessarily larger allocations, but this implementation always rounds allocations to be of form 16*n - 1 (where n is an integer). By the way, the allocated string is actually of form 16*n, the last character being '\0'. So freaking cool :).

Copy on Write

In the past it used to be valid (and some might even say encouraged) to implement std::string as a copy-on-write (using a reference-count) object. The Standard had careful wording to allow that first call to non-const methods (like operator[], begin(), at()) invalidate existing iterators.

As multi-threaded computing became more popular, it turned out that copy-on-write is slower than naive implementations due to necessary locking in almost every non-const operation. C++11 changed the wording, making copy-on-write incompliant.

GCC < 5 remained incompliant many years after the Standard had changed as they didn’t want to introduce an ABI change on such a critical component. Microsoft’s Visual Studio’s stl also dropped copy-on-write from as long as I can remember.

In the following piece of code I create a std::string with 50 times the character c, then change the first character without the std::string being aware of it (and of course don’t use this in production code!). I use 50 to make sure I don’t come across any small-object-optimization.

#include <string>
#include <iostream>

int main() {
  std::string a(50, 'c');
  std::string b = a;
  
  *const_cast<char*>(a.c_str()) = 'A';
  std::cout << "a: " << a << "\nb: " << b << std::endl;
}

Visual Studio 14:

a: Accccccccccccccccccccccccccccccccccccccccccccccccc
b: cccccccccccccccccccccccccccccccccccccccccccccccccc

GCC < 5:

$ g++ -std=c++11 main.cpp && ./a.out
a: Accccccccccccccccccccccccccccccccccccccccccccccccc
b: Accccccccccccccccccccccccccccccccccccccccccccccccc

Aha! Copy-on-write in action.

GCC >= 5:

$ g++-5 -std=c++11 main.cpp && ./a.out
a: Accccccccccccccccccccccccccccccccccccccccccccccccc
b: cccccccccccccccccccccccccccccccccccccccccccccccccc

clang:

$ clang++ -std=c++11 -stdlib=libc++ main.cpp && ./a.out
a: Accccccccccccccccccccccccccccccccccccccccccccccccc
b: cccccccccccccccccccccccccccccccccccccccccccccccccc

Allocate on .reserve()?

As you may know, std::string has a .size() method returning the size of the string it contains. This is not necessarily the allocated size, which might be bigger (but can’t be smaller).

One might wonder if there are any guarantees on capacity() and reserve(). In other words: can we implement a Standard compliant std::string with 2 members (pointer and size)? The answer to this question is a straight no. To quote the Standard:

After reserve(), capacity() is greater or equal to the argument of reserve.

So calling reserve() with a number bigger than size() forces std::string to grow. By the way, calling reserve() with a number <= size() is similar to calling shrink_to_fit(), which is a non-binding request.

.c_str() and .data() are the same

Calling .data() in C++98 would not necessarily return a null-terminated string. C++11 changes this and now both data() and c_str() return a string that must terminate with a '\0'.

Another thing C++11 brings is the ability to dereference the pointer returned from c_str()/data() even if empty() == true. No longer is it undefined to do for (const char* c = s.data(); *c != 0; ++c) ..., as now c_str()/data() will return a pointer to '\0'.

Binary data

std::string can hold binary data. That is, a string with '\0' in any position. Note that it’s not entirely trivial to create such a string. For example, the following will create a string with the content "hello" by-design, as we’re passing in a C-string:

std::string s = "hello\0world";

Even doing something like the following will fail for the exact same reason:

auto cstr = "hello\0world";
std::string s(cstr, strlen(cstr));

The following, however, will work:

// We could use const char[] or decltype(auto) here, but not auto. Why?
// Because auto will be decayed to const char*, which has the wrong size.
decltype(auto) cstr = "Hello\0World!";
std::string s(cstr, sizeof(cstr));

Also note that printing such strings may also be problematic, as some implementations stop at '\0'.

Growth strategy

Like std::vector, it’s important for std::string to grow in an efficient way. Let’s examine the following:

#include <cstdlib>
#include <iostream>
#include <string>

// replace operator new and delete to log allocations
void* operator new(std::size_t n) {
    std::cout << "Allocating " << n << " bytes" << std::endl;
    return malloc(n);
}
void operator delete(void* p) throw() {
    free(p);
}

int main() {
    std::string s;
    for (size_t i = 0; i != 1000000; ++i) {
        s += '.';
    }
}

Visual Studio 14:

Allocating 32 bytes
Allocating 48 bytes
Allocating 71 bytes
Allocating 106 bytes
Allocating 158 bytes
Allocating 236 bytes
Allocating 353 bytes
Allocating 529 bytes
Allocating 793 bytes
Allocating 1189 bytes
Allocating 1783 bytes
Allocating 2674 bytes
Allocating 4010 bytes
Allocating 6053 bytes
Allocating 9059 bytes
Allocating 13568 bytes
Allocating 20332 bytes
Allocating 30478 bytes
Allocating 45697 bytes
Allocating 68525 bytes
Allocating 102767 bytes
Allocating 154130 bytes
Allocating 231175 bytes
Allocating 346742 bytes
Allocating 520093 bytes
Allocating 780119 bytes
Allocating 1170158 bytes

Growth strategy is 1.5x.

GCC < 5:

$ g++ -std=c++11 main.cpp && ./a.out
Allocating 26 bytes
Allocating 27 bytes
Allocating 29 bytes
Allocating 33 bytes
Allocating 41 bytes
Allocating 57 bytes
Allocating 89 bytes
Allocating 153 bytes
Allocating 281 bytes
Allocating 537 bytes
Allocating 1049 bytes
Allocating 2073 bytes
Allocating 8160 bytes
Allocating 16352 bytes
Allocating 32736 bytes
Allocating 65504 bytes
Allocating 131040 bytes
Allocating 262112 bytes
Allocating 524256 bytes
Allocating 1048544 bytes

Growth strategy is a bit tricky. On my machine the page size is 4096, and so for strings larger than that it rounds up to fit to a page and grows 2x. For small strings, growth is done by adding multiples of 2 (thanks Marek!).

GCC >= 5:

$ g++-5 -std=c++11 main.cpp && ./a.out
Allocating 31 bytes
Allocating 61 bytes
Allocating 121 bytes
Allocating 241 bytes
Allocating 481 bytes
Allocating 961 bytes
Allocating 1921 bytes
Allocating 3841 bytes
Allocating 7681 bytes
Allocating 15361 bytes
Allocating 30721 bytes
Allocating 61441 bytes
Allocating 122881 bytes
Allocating 245761 bytes
Allocating 491521 bytes
Allocating 983041 bytes
Allocating 1966081 bytes

Growth strategy is 2x.

clang:

$ clang++ -std=c++11 -stdlib=libc++ main.cpp && ./a.out
Allocating 48 bytes
Allocating 96 bytes
Allocating 192 bytes
Allocating 384 bytes
Allocating 768 bytes
Allocating 1536 bytes
Allocating 3072 bytes
Allocating 6144 bytes
Allocating 12288 bytes
Allocating 24576 bytes
Allocating 49152 bytes
Allocating 98304 bytes
Allocating 196608 bytes
Allocating 393216 bytes
Allocating 786432 bytes
Allocating 1572864 bytes

Growth strategy is 2x.

Move semantics

Taking advantage of C++11’s move semantics will benefit every library. Apache’s retired stdcxx implementation is the only one I know of to not support them.

One interesting this to note is that with small objects, where Small Object Optimization exists, a move constructor is implemented as a copy constructor. This of course is purely semantic, as the data structure’s fields need to be copied anyway, but is nevertheless interesting to think about.

Memory

The Standard instructs implementations to use contiguous memory for the entire string, which is convenient when working with some C-APIs.

Summary

C++’s built-in std::string is a well-designed class with some great implementations. Today we looked at some important features and hopefully learned a thing or two. Please let me know if there are more things you’re interested in, either via a comment below or through the ‘About’ page.


Comments