Building a Host-Tuned GCC to Make GCC Compile Faster

Building a Host-Tuned GCC to Make GCC Compile Faster | File Descriptor TwoA while ago, I became interested in reducing my compile times, and examined various methods for this. This blog post is about enabling all compiler options that I believe can make gcc faster when compiling code, and measuring the gains I got as a result. How to Build It# mkdir build-gcc16 cd build-gcc16

../gcc-16.1.0/configure \ --prefix="$HOME/opt/gcc16-super" \ --program-prefix=super- \ --enable-languages=c,c++ \ --disable-multilib \ --with-build-config='bootstrap-native bootstrap-lto bootstrap-O3'

make profiledbootstrap -j$(nproc) make install-strip

Expect to be waiting for a while on make profiledbootstrap. It took 72 minutes of wall time on my laptop (AMD Ryzen AI MAX+ PRO 395, 16c/32t, -j32). Now, I will break down the various configure options, and what they precisely do. If you wish to see the configure options used to compile any build of gcc, gcc -v will dump them. Configure Options Explained# --with-build-config='bootstrap-native bootstrap-lto bootstrap-O3' --with-build-config selects small Makefile fragments from gcc/config/*.mk that control how the bootstrap stages are compiled. The relevant ones here are: bootstrap-native adds -march=native -mtune=native to the flags, so the resulting compiler binary is optimized for the build host. Without this, the compiler is built with generic x86-64 codegen, and can’t use AVX2, AVX512, etc. Note that this affects the compiler itself, not the code it generates. If you also want the compiler to default to native-tuned output for your programs, see --with-arch/--with-cpu/--with-tune below. bootstrap-lto enables link-time optimization for the bootstrap stages. LTO lets the optimizer work across translation unit boundaries when building the compiler itself, which is a large codebase that benefits from it. bootstrap-O3 raises the optimization level for the bootstrap stages from -O2 to -O3. make profiledbootstrap This is a make target, not a configure option. It performs a profile-guided optimization (PGO) build: it first builds an instrumented compiler, runs a training workload to collect execution profiles, then rebuilds using that feedback. The GCC build docs describe this as producing “a faster compiler binary”, and I believe it accounts for a large part of the gain here. --program-prefix=super- Prepends super- to the names of installed programs, so gcc installs as super-gcc, g++ as super-g++, and so on. This lets the custom compiler coexist with the distro compiler without overwriting it. This is optional, you can omit it and control which compiler gets invoked by their order in your PATH. --with-cpu=native / --with-arch=native / --with-tune=native You might also want these, but I did not use them in this benchmark, since this can affect compile times, so it’s a control variable in the experiment. These change the defaults for code the compiler emits when compiling your programs, they have no effect on how the compiler binary itself is built. If you want the compiler to default to native-tuned output so you don’t have to pass -march=native by hand every time, add them. --enable-languages=c,c++ I restricted this to C and C++ to reduce build time. I don’t use any other language frontend, but you can add them if you need them. --disable-multilib Skips building 32-bit target libraries. --enable-checking=release Worth knowing about if you are comparing a GCC release to an in-development version. Release branches default to --enable-checking=release (cheap assertions only), while trunk defaults to --enable-checking=yes,extra, which enables more internal consistency checks and slows the compiler down. If a trunk-based GCC seems slower than expected, that is probably why. You could also go further and build with --enable-checking=no to disable all assertions, which might squeeze out a bit more performance. I did not try that. Benchmark Comparison# I benchmarked compile time, not runtime of generated code. The workloads were clean parallel rebuilds (-j32) of four codebases, timed with hyperfine (2 runs each, cleaning between runs). The baseline was Arch Linux’s distro gcc 16.1.1, which is already built with bootstrap-lto. It was configured likeso: /build/gcc/src/gcc/configure --enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++,rust,cobol --enable-bootstrap --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://gitlab.archlinux.org/archlinux/packaging/packages/gcc/-/issues --with-build-config=bootstrap-lto --with-linker-hash-style=gnu --with-system-zlib --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-linker-build-id --enable-lto --enable-multilib --enable-plugin --enable-shared...

Building a Host-Tuned GCC to Make GCC Compile Faster

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits