Compiling Clang from Scratch

(881 words)

In this post I’ll show how easy it is to build clang from scratch on Linux, and how to use it both directly and with CMake.

But Why?

Clang is changing rapidly, and new features are added frequently. On the other hand, Linux distros have ancient versions shipped with them. That’s just frustrating.

Furthermore, building Clang is so easy that it really shouldn’t be a blocker for you to use the newest and bestest. Cloning the repositories takes a few minutes, then the actual build step will take between 7 minutes to 1 hour, depending on your hardware.

Step 1 - Clone

For this step you’ll need git installed on your system. Simply paste the following into a shell, while inside an empty directory you’d like to clone the sources into (I used ~/clang):

git clone -q  https://github.com/llvm-mirror/llvm llvm
git clone -q  https://github.com/llvm-mirror/clang llvm/tools/clang
git clone -q  https://github.com/llvm-mirror/clang-tools-extra llvm/tools/clang/tools/extra
git clone -q  https://github.com/llvm-mirror/compiler-rt llvm/projects/compiler-rt
git clone -q  https://github.com/llvm-mirror/libcxx llvm/projects/libcxx
git clone -q  https://github.com/llvm-mirror/libcxxabi llvm/projects/libcxxabi
git clone -q  https://github.com/llvm-mirror/lld llvm/tools/lld

This will create an llvm directory, inside which all the needed sources are found. Note that we’re building from the head of master branch, where all the goodies are. If you’re using this for something serious consider using one of the release branches instead.

Step 2 - Run CMake

Now we need to invoke cmake to generate build environment for us. It’s recommended to use the Ninja build system as it’s much faster, but you can also use good old Unix make: just drop the -GNinja.

You may also wish to build Clang in Release mode rather than the default Debug mode (pass -DCMAKE_BUILD_TYPE=Release to cmake). The reasons are:

  1. Everything (sources + built objects, libs & executables) take 61gb with Debug, and only 4.6gb in Release
  2. At least on my machine it takes more time (7m vs 9m) to build the Debug version (maybe due to I/O?). YMMV.

Paste the following into a bash shell to get going:

mkdir build
cd build
cmake -GNinja ../llvm

Step 3 - Build

This is fairly easy. Simply run ninja (or make if that’s how you configured cmake). Now go make a sandwich - your PC will be rather busy.

Step 4 - Profit

You now have LLVM, Clang, libc++ and other goodies built and ready to use.

Simple compilation:

$ cat main.cpp
#include <iostream>
using namespace std;

int main() {
  cout << "Hello, World!" << endl;
}

$ ~/clang/build/bin/clang++ main.cpp
$ ./a.out
Hello, World!

Use libc++ which we built above:

$ ~/clang/build/bin/clang++ main.cpp -nostdinc++ -I$HOME/clang/build/include/c++/v1 -L$HOME/clang/build/lib -Wl,-rpath,$HOME/clang/build/lib -L$HOME/clang/build/lib -lc++ -Wl,-rpath,$HOME/clang/build/lib
$ ./a.out
Hello, World!

Compile with C++20 (as of writing this - initializer inside ranged-based for is not in any released clang version):

$ cat main.cpp
#include <iostream>
using namespace std;

int main() {
  for (int i = 1; int j : {1, 2, 3, 4}) {
    cout << "i: " << i << ", j: " << j << endl;
  }
}

$ ~/clang/build/bin/clang++ main.cpp -std=c++2a
$ ./a.out
i: 1, j: 1
i: 1, j: 2
i: 1, j: 3
i: 1, j: 4

Compile with AddressSanitizer:

$ cat main.cpp
int main() {
  int* a = new int(123);
  delete a;
  delete a;
}

$ ~/clang/build/bin/clang++ -O0 -g -fno-omit-frame-pointer -fsanitize=address main.cpp
$ ./a.out
=================================================================
==239623==ERROR: AddressSanitizer: attempting double-free on 0x602000000010 in thread T0:
    #0 0x4f9d12  (/tmp/cpp_vmizcK/a.out+0x4f9d12)
    #1 0x4fc0f1  (/tmp/cpp_vmizcK/a.out+0x4fc0f1)
    #2 0x7feaefe3c2b0  (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)
    #3 0x41d989  (/tmp/cpp_vmizcK/a.out+0x41d989)

0x602000000010 is located 0 bytes inside of 4-byte region [0x602000000010,0x602000000014)
freed by thread T0 here:
    #0 0x4f9d12  (/tmp/cpp_vmizcK/a.out+0x4f9d12)
    #1 0x4fc0d3  (/tmp/cpp_vmizcK/a.out+0x4fc0d3)
    #2 0x7feaefe3c2b0  (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)

previously allocated by thread T0 here:
    #0 0x4f9092  (/tmp/cpp_vmizcK/a.out+0x4f9092)
    #1 0x4fc068  (/tmp/cpp_vmizcK/a.out+0x4fc068)
    #2 0x7feaefe3c2b0  (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)

SUMMARY: AddressSanitizer: double-free (/tmp/cpp_vmizcK/a.out+0x4f9d12)
==239623==ABORTING

Show proper symbols:

$ env ASAN_SYMBOLIZER_PATH=$HOME/clang/build/bin/llvm-symbolizer ./a.out
=================================================================
==239574==ERROR: AddressSanitizer: attempting double-free on 0x602000000010 in thread T0:
    #0 0x4f9d12 in operator delete(void*) /home/shmike/clang/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:167:3
    #1 0x4fc0f1 in main /tmp/cpp_vmizcK/main.cpp:4:3
    #2 0x7fb8a6c792b0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)
    #3 0x41d989 in _start (/tmp/cpp_vmizcK/a.out+0x41d989)

0x602000000010 is located 0 bytes inside of 4-byte region [0x602000000010,0x602000000014)
freed by thread T0 here:
    #0 0x4f9d12 in operator delete(void*) /home/shmike/clang/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:167:3
    #1 0x4fc0d3 in main /tmp/cpp_vmizcK/main.cpp:3:3
    #2 0x7fb8a6c792b0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)

previously allocated by thread T0 here:
    #0 0x4f9092 in operator new(unsigned long) /home/shmike/clang/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:106:3
    #1 0x4fc068 in main /tmp/cpp_vmizcK/main.cpp:2:12
    #2 0x7fb8a6c792b0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)

SUMMARY: AddressSanitizer: double-free /home/shmike/clang/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:167:3 in operator delete(void*)
==239574==ABORTING

Compile with MemorySanitizer:

$ cat main.cpp
int main() {
  int a;  // uninitialized
  return a;
}

$ ~/clang/build/bin/clang++ -O0 -g -fno-omit-frame-pointer -fsanitize=memory main.cpp
$ ./a.out
==243759==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x496dac  (/tmp/cpp_vmizcK/a.out+0x496dac)
    #1 0x7f6f9c9cc2b0  (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)
    #2 0x41e379  (/tmp/cpp_vmizcK/a.out+0x41e379)

SUMMARY: MemorySanitizer: use-of-uninitialized-value (/tmp/cpp_vmizcK/a.out+0x496dac)
Exiting

Compile for WebAssembly (wasm):

$ ~/clang/build/bin/clang++ --target=wasm32 -Os main.cpp -c -o out.wasm
$ file out.wasm
out.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)

(Note that it’s not trivial to use libc or STL natively, and thus it’s much easier to use Emscripten).

Integrate with CMake

While using the compiler manually is nice, it’s only rarely a good idea. For real projects we all use some sort of build system, and many of us use CMake.

Integrating our freshly built Clang with CMake is fairly simple, assuming you know how to paste :)

cmake -DCMAKE_CXX_COMPILER=$HOME/clang/build/bin/clang++ -DCMAKE_LINKER=$HOME/clang/build/bin/clang++

And if you’d like to also use the locally built libc++:

cmake -DCMAKE_CXX_FLAGS="-nostdinc++ -I$HOME/clang/build/include/c++/v1" -DCMAKE_EXE_LINKER_FLAGS="-L$HOME/clang/build/lib -lc++ -nostdinc++ -Wl,-rpath,$HOME/clang/build/lib" -DCMAKE_CXX_COMPILER=$HOME/clang/build/bin/clang++ -DCMAKE_LINKER=$HOME/clang/build/bin/clang++

That’s it. I hope you liked this mini tutorial, and that you found it useful, or at least interesting!


Comments