Introducing Sample Profile Guided Optimization in MSVC - C++ Team Blog
Skip to main content
Search<br>Search
No results
Cancel
David Gillies
Principal Software Engineer
Profile Guided Optimization (PGO) has long been one of the most powerful tools in the MSVC compiler’s arsenal for improving the runtime performance of C and C++ applications. By using execution profile data collected from representative workloads, PGO enables the compiler to make smarter decisions about inlining, code layout, and hot/cold code separation – decisions that are impossible to make from static analysis alone. In practice, PGO can deliver large performance improvements for C/C++ code.
Today, we’re introducing Sample Profile Guided Optimization (SPGO) , a new approach to profile-guided optimization that makes it dramatically easier to bring PGO quality optimizations to your codebase without the overhead and complexity of traditional instrumented PGO. SPGO is available in all versions of Visual Studio 2022, and Visual Studio 2026.
(There is a full tutorial which covers all details of using SPGO, if you want to dive right in: Tutorial: Use Sample Profile-Guided Optimization (SPGO) to improve C++ performance | Microsoft Learn.)
Limitations of Traditional PGO
Traditional PGO (sometimes called "instrumented PGO") works in three phases:
Instrument : Compile your application with special instrumentation probes inserted at key points in the code.
Train : Run the instrumented binary through representative workloads to collect execution counts.
Optimize : Recompile using the collected profile data to guide optimizations.
While this approach produces high-quality profile data, it comes with significant practical challenges:
Performance overhead : The instrumented binary runs significantly slower than the release build, making it impractical to deploy to production.
Training burden : You need to create and maintain representative training scenarios that accurately reflect real-world usage patterns.
Workflow complexity : The three-phase build process adds complexity to your CI/CD pipeline and release workflow.
Staleness : Profile data can go stale as your code evolves, requiring frequent re-training.
Deployment constraints : Instrumented binaries cannot be shipped to customers, so the training scenarios may not perfectly represent real customer workloads.
For many teams, these barriers mean that PGO, despite its significant performance benefits, remains out of reach.
Enter SPGO: Profile Guidance from Production Sampling
SPGO takes a fundamentally different approach. Instead of instrumenting your binary and running it through synthetic training scenarios, SPGO uses hardware performance counter sampling collected from your actual release binaries. Modern processors provide hardware sampling capabilities such as Last Branch Records (LBR) and retired instruction counters. These can be collected with negligible runtime overhead, making it practical to gather runtime profiles directly from production.
Because SPGO profiles release bits, not instrumented builds, it enables much more flexibility in where and how you collect data. You can gather runtime profiles from production servers, developer workstations, performance labs, or any combination. The upper end of SPGO’s performance range is unlocked by the quality, completeness, and consistency of the input data you provide.
The SPGO Workflow
The SPGO workflow is an iterative cycle of building, collecting, converting, and rebuilding. For those familiar with the legacy PGO approach, the key difference here is that there is no dedicated instrumentation step, and everything is done with fast, release binaries:
Step 0: Environmental Setup
There are some one-time prerequisites to collecting data for SPGO. Please see full details on updating xperf’s perfcore.ini along with a full walk-through here: Tutorial: Use Sample Profile-Guided Optimization (SPGO) to improve C++ performance | Microsoft Learn
Step 1: Build with Link-Time Code Generation (LTCG) and SPGO Switch
Compile your application with LTCG and add the /spgo linker switch. During this build, the compiler produces a Sample Profile Database (SPD) file (one per binary) containing the static structure of your code: control flow graphs, block layouts, and inline expansion information. Save the SPD files alongside your build output. For example:
cl /EHsc /GL /O2 /Zi app.cpp /link /debug /spgo
Step 2: Collect Hardware Samples
Run your application under representative workloads with xperf hardware sampling enabled. The Windows Performance Toolkit (xperf) is commonly used for collection. There are two collection modes:
1. IP (Instruction Pointer) sampling – periodic snapshots of where the CPU is executing. Effective on all platforms. For example:
xperf -on LOADER+PROC_THREAD+PROFILE -MinBuffers 4096 -BufferSize 4096 -setProfInt Timer 1221 -stackwalk profile
2. LBR (Last Branch Records) – records of recently taken branches,...