Performance
cmakefmt is fast enough that you never have to think twice about running it
— in local workflows, editor integrations, pre-commit hooks, or CI. That is
not an accident. Speed is a design goal, not a side effect.
Highlights
Section titled “Highlights”- 104× geometric-mean speedup on per-file benchmarks
- 95× geometric-mean speedup on whole-repository benchmarks (serial)
- 150× geometric-mean speedup over
cmake-formaton whole-repository benchmarks (parallel) - 2,853× fastest whole-repo speedup (OpenCV, 282 files, parallel)
- Formats the 55,000-line gRPC root
CMakeLists.txtin ~39ms (cmake-format: 84 s)
Benchmark Environment
Section titled “Benchmark Environment”Current headline measurements were captured on:
- macOS
26.3.1 aarch64-apple-darwin- 10 logical CPUs
rustc 1.94.1hyperfine 1.20.0cmake-format 0.6.13
Exact numbers vary by machine. What matters across releases is that relative performance trends remain strong and regressions are caught early.
Benchmark Approach
Section titled “Benchmark Approach”There are two phases, each using the most natural command for its scenario.
Per-file phase
Section titled “Per-file phase”Single-file timings are measured against a corpus of 11 real-world
CMakeLists.txt files sourced from well-known open-source projects, ranging
from a 2-line stub to the 55,000-line gRPC root. Each fixture is run in
isolation — one file, one process — to isolate single-file formatting
performance from batch overhead.
Both cmakefmt and cmake-format are timed in the same hyperfine invocation
so they share the same system conditions:
hyperfine --warmup 10 --runs 50 \ "cmakefmt path/to/CMakeLists.txt" \ "cmake-format path/to/CMakeLists.txt"This is the default “format this file” operation: parse, format, emit to
stdout. No --check is used because the goal is to compare the most natural
single-file invocation, and both tools do equivalent work (parse + format +
serialize). Adding --check would slightly favour cmakefmt because
--check (without --diff) skips diff computation, so the per-file numbers
here are the conservative measurement.
Whole-repo phase
Section titled “Whole-repo phase”Whole-repository timings format every CMake file in the project. Emitting
thousands of formatted files to stdout would mostly measure the terminal, not
the formatter, so this phase uses --check for both tools — which is also the
realistic CI usage pattern:
hyperfine --warmup 3 --runs 10 \ --command-name "cmakefmt-parallel" "cmakefmt --check path/to/repo" \ --command-name "cmakefmt-serial" "cmakefmt --check --parallel 1 path/to/repo" \ --command-name "cmake-format" "xargs cmake-format --check < repo-files.txt"cmake-format has no native multi-file discovery, so the file list is
pre-computed via cmakefmt --list-input-files and piped in via xargs. Both
tools therefore see the same set of files.
Common methodology
Section titled “Common methodology”- Warmup runs — allow the OS page cache and branch predictor to reach a steady state before measurements are recorded
- Timed runs — give hyperfine enough samples for a stable mean and tight confidence interval (50 for per-file, 10 for the much slower whole-repo runs)
- Geometric mean — used instead of arithmetic mean because it is resistant to outliers and gives equal weight to each fixture regardless of absolute runtime
The fixture corpus and its pinned commit SHAs are stored in
tests/fixtures/real_world/manifest.toml. To reproduce the measurements
locally, fetch the corpus and run hyperfine against each file using the commands
in How To Reproduce below.
Benchmark Results
Section titled “Benchmark Results”Time comparison across real-world fixtures. Hover a bar to see the exact timings and speedup for that fixture.
Per-fixture speedup. The dashed line marks the geometric mean (104×). A few
fixtures (opencv_root, grpc_root) show very large speedups — these reflect
pathological behaviour in cmake-format on those specific files rather than
cherry-picking, and are kept un-capped for transparency.
Whole-Repository Benchmarks
Section titled “Whole-Repository Benchmarks”The per-file numbers above are dominated by process startup (~10ms), which is a fixed cost paid once per invocation. Whole-repository benchmarks show the real-world speedup when formatting entire projects:
| Repository | Files | cmakefmt (parallel) | cmakefmt (serial) | cmake-format | Speedup (parallel) |
|---|---|---|---|---|---|
| googletest | 4 | 10ms | 10ms | 321ms | 31.1× |
| spdlog | 7 | 10ms | 10ms | 376ms | 38.5× |
| fmt | 11 | 10ms | 15ms | 849ms | 82.3× |
| catch2 | 20 | 12ms | 16ms | 3,320ms | 280.8× |
| protobuf | 25 | 10ms | 12ms | 1,182ms | 118.7× |
| nlohmann_json | 41 | 13ms | 19ms | 4,526ms | 359.6× |
| grpc | 46 | 44ms | 53ms | 86,855ms | 1,968.6× |
| bullet3 | 58 | 12ms | 15ms | 557ms | 47.1× |
| vulkan_hpp | 135 | 20ms | 36ms | 1,244ms | 63.5× |
| opencv | 282 | 35ms | 101ms | 100,711ms | 2,853.0× |
| blender | 419 | 46ms | 133ms | 42,950ms | 939.0× |
| oomph-lib | 611 | 59ms | 144ms | 12,879ms | 218.2× |
| llvm | 2,621 | 102ms | 117ms | 4,811ms | 47.2× |
| cmake | 11,264 | 265ms | 636ms | 3,908ms | 14.7× |
Geometric-mean speedup: 150× (parallel) / 95× (serial).
Aggregate speedup across the full corpus: 408× (264 s of cmake-format
collapses to 0.65 s of cmakefmt --parallel).
Component Microbenchmarks
Section titled “Component Microbenchmarks”The hyperfine numbers above measure end-to-end CLI invocations, including process startup, I/O, and discovery. The Criterion estimates below isolate individual pipeline stages on a 1,000+ line synthetic stress-test file (in-process, no I/O):
| Metric | Estimate | 95% CI |
|---|---|---|
| Parser-only | 4.8075 ms | 4.7833–4.8471 ms |
| Formatter-only (from parsed AST) | 2.9462 ms | 2.9312–2.9665 ms |
End-to-end format_source | 7.8537 ms | 7.7971–7.9319 ms |
| Debug/barrier-heavy formatting | 1.6846 ms | 1.6805–1.6887 ms |
All Criterion estimates show a point estimate with a 95% confidence interval — the range within which the true mean is expected to fall 95% of the time. “AST” (Abstract Syntax Tree) is the structured in-memory representation produced by parsing, before formatting.
Parallel Batch Throughput
Section titled “Parallel Batch Throughput”Parallelism scales well across worker counts. The chart shows two real-world repositories:
- opencv (282 files) — a large vendor-style CMake codebase
- oomph-lib (611 files) — oomph-lib, a larger real-world CMake repository used to measure scaling behavior at a more realistic project size
Hover a point to see the time and speedup vs serial for each repository.
Peak RSS (Resident Set Size — the RAM physically held in memory by the process)
rises from 14.9 MB serial to 19.4 MB at --parallel 8 on opencv, and from
13.6 MB to 16.8 MB on oomph-lib.
Version-by-Version Trend
Section titled “Version-by-Version Trend”Single-file performance across every release, measured on the same
fixture (mariadb_server/CMakeLists.txt, 656 lines) using each
version’s official macOS aarch64 binary from GitHub Releases.
The solid line shows formatting time (left axis); the dashed line shows binary size (right axis). Hover a point to see the details and what changed in that release.
The gradual increase from ~10ms to ~15ms across early releases is entirely
process startup overhead — the formatting engine itself remains sub-millisecond.
As the binary grew from 4.7MB to 6.0MB with new features (LSP, shell
completions, watch mode, JSON Schema, parse tree dump), the OS needed to load
more pages from disk at startup. The v1.1.0 datapoint shows the binary
brought back down to 4.8MB and wall time back under 8ms on the same 656-line
fixture via link-time optimization (lto = true, codegen-units = 1), a
hand-written recursive-descent parser replacing pest, and removal of an
unused dependency. This fixed cost is paid once per invocation and is
irrelevant for multi-file runs.
The v1.2.0 datapoint moves down to ~6.6ms — about 0.2ms faster than
v1.1.0 on the same fixture under matched-methodology hyperfine
measurements (--shell=none --style basic, 100 warmups, 200 runs).
The embedded built-in command spec moved from a TOML file parsed at
every process startup to a MessagePack blob produced once at build
time and decoded at startup with rmp-serde, which is roughly 20×
faster than parsing structured text. The YAML source is still
maintained in src/spec/builtins.yaml for human editing; build.rs
re-emits it as MessagePack into OUT_DIR and the binary
include_bytes!-bundles the blob.
(The v1.1.0 datapoint was re-measured against the same fixture under
this methodology and revised from a previously-reported 7.1ms to
6.8ms; the older number was inflated by hyperfine’s progress-bar
rendering, which we now disable via --style basic.)
What The Numbers Mean In Practice
Section titled “What The Numbers Mean In Practice”The headline numbers matter not as abstract benchmarks, but because they change what feels viable:
- repository-wide
--checkin CI — comfortable - pre-commit hooks on staged files — instant
- repeated local formatting during development — no delay you will notice
- editor-triggered format-on-save — faster than the save dialog
How To Reproduce
Section titled “How To Reproduce”Run the formatter benchmark suite:
cargo bench --bench formatterSave a baseline before a risky change:
cargo bench --bench formatter -- --save-baseline before-changeCompare a later run against that baseline:
cargo bench --bench formatter -- --baseline before-changeBenchmark Governance
Section titled “Benchmark Governance”Performance regressions are tracked automatically:
- Every push to
mainand every PR runs the benchmark suite via CodSpeed in simulation mode. - Weekly scheduled runs (Monday 06:00 UTC) provide a consistent baseline independent of PR activity.
- CodSpeed flags regressions > 10% with a warning on the PR check. Regressions that appear only across different runner environments are noise — check the “different runtime environments” warning before investigating.
Regression review policy:
- If CodSpeed flags a regression on a PR, investigate before merging.
- If the regression is real (same runner, same code path), either fix it or document why the trade-off is acceptable in the PR description.
- If the regression appears only on a scheduled run (no code change), it is likely runner variance — note it and move on.
Baselines:
CodSpeed tracks baselines automatically per branch. For local comparison, use Criterion baselines as described above.
Related Reading
Section titled “Related Reading”
Docs track main. For historical docs, check out a release tag in
the repository and build
docs/ locally.