Performance
cmakefmt is fast enough that you never have to think twice about running it
— in local workflows, editor integrations, pre-commit hooks, or CI. That is
not an accident. Speed is a design goal, not a side effect.
Current Benchmark Signal
The headline numbers from the current local benchmark set:
| Metric | Current local signal |
|---|---|
Geometric-mean speedup vs cmake-format | 20.69x |
| Parser-only, large synthetic input (1000+ lines) | estimate 7.1067 ms (95% CI 7.0793–7.1359 ms) |
| Formatter-only from parsed AST, large synthetic input | estimate 1.7545 ms (95% CI 1.7425–1.7739 ms) |
End-to-end format_source, large synthetic input | estimate 8.8248 ms (95% CI 8.8018–8.8519 ms) |
| Debug/barrier-heavy formatting | estimate 313.98 µs (95% CI 311.89–317.54 µs) |
All Criterion estimates show a point estimate with a 95% confidence interval —
the range within which the true mean is expected to fall 95% of the time.
“Large synthetic input” refers to a 1000+ line stress-test CMakeLists.txt
generated for benchmarking purposes. “AST” (Abstract Syntax Tree) is the
structured in-memory representation produced by parsing, before formatting.
Real-World Comparison
The current local corpus comparison measured cmakefmt against cmake-format
on real CMakeLists.txt files drawn from projects including:
- Abseil
- Catch2
- CLI11
- GoogleTest
- ggml
- llama.cpp
- MariaDB Server
- LLVM
- Qt
- nlohmann/json
- protobuf
- spdlog
Fetch the pinned local corpus before rerunning those comparisons:
python3 scripts/fetch-real-world-corpus.py
Results across that corpus:
cmakefmtwas faster on every single fixture- speedup ranged from
10.91xto48.49x - geometric-mean speedup:
20.69x
Parallel Batch Throughput
Multi-file runs are single-threaded by default, but opt-in parallelism scales well:
| Mode | Time |
|---|---|
| serial | 184.5 ms ± 1.3 ms |
--parallel 2 | 111.5 ms ± 11.9 ms |
--parallel 4 | 64.7 ms ± 1.1 ms |
--parallel 8 | 48.5 ms ± 1.5 ms |
Peak RSS (Resident Set Size — the RAM physically held in memory by the process)
rises from 13.2 MB (serial) to 20.7 MB (--parallel 8) on this batch. That
is why the tool defaults to single-threaded execution unless you explicitly ask
for more.
Large Repository Parallelism Survey
Phase 12 validation was also run against oomph-lib (local checkout with 612
discovered CMake files):
| Mode | Time |
|---|---|
| serial | 412.5 ms ± 9.0 ms |
--parallel 2 | 296.0 ms ± 3.5 ms |
--parallel 4 | 191.8 ms ± 4.7 ms |
--parallel 8 | 152.5 ms ± 2.8 ms |
That corresponds to a 2.71x speedup at --parallel 8 versus serial, with
peak RSS moving from 11.3 MB to 17.0 MB.
For a direct tool baseline on the same full oomph-lib tree (612 discovered
files), /usr/bin/time -l measured:
cmake-format(sequential over discovered files):45.69 srealcmakefmtserial:0.47 sreal (~97xfaster)cmakefmt --parallel 8:0.19 sreal (~240xfaster)
What The Numbers Mean In Practice
The headline numbers matter not as abstract benchmarks, but because they change what feels viable:
- repository-wide
--checkin CI — comfortable - pre-commit hooks on staged files — instant
- repeated local formatting during development — no delay you will notice
- editor-triggered format-on-save — faster than the save dialog
Benchmark Environment
Current headline measurements were captured on:
- macOS
26.3.1 aarch64-apple-darwin10logical CPUsrustc 1.94.1hyperfine 1.20.0cmake-format 0.6.13
Exact numbers vary by machine. What matters release to release is that relative trends stay strong and regressions are caught quickly.
How To Reproduce
Run the formatter benchmark suite:
cargo bench --bench formatter
Save a baseline before a risky change:
cargo bench --bench formatter -- --save-baseline before-change
Compare a later run against that baseline:
cargo bench --bench formatter -- --baseline before-change