Performance

cmakefmt is fast enough that you never have to think twice about running it — in local workflows, editor integrations, pre-commit hooks, or CI. That is not an accident. Speed is a design goal, not a side effect.

Highlights

104× geometric-mean speedup on per-file benchmarks
95× geometric-mean speedup on whole-repository benchmarks (serial)
150× geometric-mean speedup over cmake-format on whole-repository benchmarks (parallel)
2,853× fastest whole-repo speedup (OpenCV, 282 files, parallel)
Formats the 55,000-line gRPC root CMakeLists.txt in ~39ms (cmake-format: 84 s)

Benchmark Environment

Current headline measurements were captured on:

macOS 26.3.1
aarch64-apple-darwin
10 logical CPUs
rustc 1.94.1
hyperfine 1.20.0
cmake-format 0.6.13

Exact numbers vary by machine. What matters across releases is that relative performance trends remain strong and regressions are caught early.

Benchmark Approach

There are two phases, each using the most natural command for its scenario.

Per-file phase

Single-file timings are measured against a corpus of 11 real-world CMakeLists.txt files sourced from well-known open-source projects, ranging from a 2-line stub to the 55,000-line gRPC root. Each fixture is run in isolation — one file, one process — to isolate single-file formatting performance from batch overhead.

Both cmakefmt and cmake-format are timed in the same hyperfine invocation so they share the same system conditions:

hyperfine --warmup 10 --runs 50 \
  "cmakefmt path/to/CMakeLists.txt" \
  "cmake-format path/to/CMakeLists.txt"

This is the default “format this file” operation: parse, format, emit to stdout. No --check is used because the goal is to compare the most natural single-file invocation, and both tools do equivalent work (parse + format + serialize). Adding --check would slightly favour cmakefmt because --check (without --diff) skips diff computation, so the per-file numbers here are the conservative measurement.

Whole-repo phase

Whole-repository timings format every CMake file in the project. Emitting thousands of formatted files to stdout would mostly measure the terminal, not the formatter, so this phase uses --check for both tools — which is also the realistic CI usage pattern:

hyperfine --warmup 3 --runs 10 \
  --command-name "cmakefmt-parallel" "cmakefmt --check path/to/repo" \
  --command-name "cmakefmt-serial"   "cmakefmt --check --parallel 1 path/to/repo" \
  --command-name "cmake-format"      "xargs cmake-format --check < repo-files.txt"

cmake-format has no native multi-file discovery, so the file list is pre-computed via cmakefmt --list-input-files and piped in via xargs. Both tools therefore see the same set of files.

Common methodology

Warmup runs — allow the OS page cache and branch predictor to reach a steady state before measurements are recorded
Timed runs — give hyperfine enough samples for a stable mean and tight confidence interval (50 for per-file, 10 for the much slower whole-repo runs)
Geometric mean — used instead of arithmetic mean because it is resistant to outliers and gives equal weight to each fixture regardless of absolute runtime

The fixture corpus and its pinned commit SHAs are stored in tests/fixtures/real_world/manifest.toml. To reproduce the measurements locally, fetch the corpus and run hyperfine against each file using the commands in How To Reproduce below.

Benchmark Results

Time comparison across real-world fixtures. Hover a bar to see the exact timings and speedup for that fixture.

Linear Log

Per-fixture speedup. The dashed line marks the geometric mean (104×). A few fixtures (opencv_root, grpc_root) show very large speedups — these reflect pathological behaviour in cmake-format on those specific files rather than cherry-picking, and are kept un-capped for transparency.

Whole-Repository Benchmarks

The per-file numbers above are dominated by process startup (~10ms), which is a fixed cost paid once per invocation. Whole-repository benchmarks show the real-world speedup when formatting entire projects:

Repository	Files	cmakefmt (parallel)	cmakefmt (serial)	cmake-format	Speedup (parallel)
googletest	4	10ms	10ms	321ms	31.1×
spdlog	7	10ms	10ms	376ms	38.5×
fmt	11	10ms	15ms	849ms	82.3×
catch2	20	12ms	16ms	3,320ms	280.8×
protobuf	25	10ms	12ms	1,182ms	118.7×
nlohmann_json	41	13ms	19ms	4,526ms	359.6×
grpc	46	44ms	53ms	86,855ms	1,968.6×
bullet3	58	12ms	15ms	557ms	47.1×
vulkan_hpp	135	20ms	36ms	1,244ms	63.5×
opencv	282	35ms	101ms	100,711ms	2,853.0×
blender	419	46ms	133ms	42,950ms	939.0×
oomph-lib	611	59ms	144ms	12,879ms	218.2×
llvm	2,621	102ms	117ms	4,811ms	47.2×
cmake	11,264	265ms	636ms	3,908ms	14.7×

Geometric-mean speedup: 150× (parallel) / 95× (serial). Aggregate speedup across the full corpus: 408× (264 s of cmake-format collapses to 0.65 s of cmakefmt --parallel).

Component Microbenchmarks

The hyperfine numbers above measure end-to-end CLI invocations, including process startup, I/O, and discovery. The Criterion estimates below isolate individual pipeline stages on a 1,000+ line synthetic stress-test file (in-process, no I/O):

Metric	Estimate	95% CI
Parser-only	`4.8075 ms`	`4.7833–4.8471 ms`
Formatter-only (from parsed AST)	`2.9462 ms`	`2.9312–2.9665 ms`
End-to-end `format_source`	`7.8537 ms`	`7.7971–7.9319 ms`
Debug/barrier-heavy formatting	`1.6846 ms`	`1.6805–1.6887 ms`

All Criterion estimates show a point estimate with a 95% confidence interval — the range within which the true mean is expected to fall 95% of the time. “AST” (Abstract Syntax Tree) is the structured in-memory representation produced by parsing, before formatting.

Parallel Batch Throughput

Parallelism scales well across worker counts. The chart shows two real-world repositories:

opencv (282 files) — a large vendor-style CMake codebase
oomph-lib (611 files) — oomph-lib, a larger real-world CMake repository used to measure scaling behavior at a more realistic project size

Hover a point to see the time and speedup vs serial for each repository.

Peak RSS (Resident Set Size — the RAM physically held in memory by the process) rises from 14.9 MB serial to 19.4 MB at --parallel 8 on opencv, and from 13.6 MB to 16.8 MB on oomph-lib.

Version-by-Version Trend

Single-file performance across every release, measured on the same fixture (mariadb_server/CMakeLists.txt, 656 lines) using each version’s official macOS aarch64 binary from GitHub Releases.

The solid line shows formatting time (left axis); the dashed line shows binary size (right axis). Hover a point to see the details and what changed in that release.

The gradual increase from ~10ms to ~15ms across early releases is entirely process startup overhead — the formatting engine itself remains sub-millisecond. As the binary grew from 4.7MB to 6.0MB with new features (LSP, shell completions, watch mode, JSON Schema, parse tree dump), the OS needed to load more pages from disk at startup. The v1.1.0 datapoint shows the binary brought back down to 4.8MB and wall time back under 8ms on the same 656-line fixture via link-time optimization (lto = true, codegen-units = 1), a hand-written recursive-descent parser replacing pest, and removal of an unused dependency. This fixed cost is paid once per invocation and is irrelevant for multi-file runs.

The v1.2.0 datapoint moves down to ~6.6ms — about 0.2ms faster than v1.1.0 on the same fixture under matched-methodology hyperfine measurements (--shell=none --style basic, 100 warmups, 200 runs). The embedded built-in command spec moved from a TOML file parsed at every process startup to a MessagePack blob produced once at build time and decoded at startup with rmp-serde, which is roughly 20× faster than parsing structured text. The YAML source is still maintained in src/spec/builtins.yaml for human editing; build.rs re-emits it as MessagePack into OUT_DIR and the binary include_bytes!-bundles the blob.

(The v1.1.0 datapoint was re-measured against the same fixture under this methodology and revised from a previously-reported 7.1ms to 6.8ms; the older number was inflated by hyperfine’s progress-bar rendering, which we now disable via --style basic.)

What The Numbers Mean In Practice

The headline numbers matter not as abstract benchmarks, but because they change what feels viable:

repository-wide --check in CI — comfortable
pre-commit hooks on staged files — instant
repeated local formatting during development — no delay you will notice
editor-triggered format-on-save — faster than the save dialog

How To Reproduce

Run the formatter benchmark suite:

cargo bench --bench formatter

Save a baseline before a risky change:

cargo bench --bench formatter -- --save-baseline before-change

Compare a later run against that baseline:

cargo bench --bench formatter -- --baseline before-change

Benchmark Governance

Performance regressions are tracked automatically:

Every push to main and every PR runs the benchmark suite via CodSpeed in simulation mode.
Weekly scheduled runs (Monday 06:00 UTC) provide a consistent baseline independent of PR activity.
CodSpeed flags regressions > 10% with a warning on the PR check. Regressions that appear only across different runner environments are noise — check the “different runtime environments” warning before investigating.

Regression review policy:

If CodSpeed flags a regression on a PR, investigate before merging.
If the regression is real (same runner, same code path), either fix it or document why the trade-off is acceptable in the PR description.
If the regression appears only on a scheduled run (no code change), it is likely runner variance — note it and move on.

Baselines:

CodSpeed tracks baselines automatically per branch. For local comparison, use Criterion baselines as described above.

Docs track main. For historical docs, check out a release tag in the repository and build docs/ locally.