Skip to content

Performance

cmakefmt is fast enough that you never have to think twice about running it — in local workflows, editor integrations, pre-commit hooks, or CI. That is not an accident. Speed is a design goal, not a side effect.

  • 104× geometric-mean speedup on per-file benchmarks
  • 95× geometric-mean speedup on whole-repository benchmarks (serial)
  • 150× geometric-mean speedup over cmake-format on whole-repository benchmarks (parallel)
  • 2,853× fastest whole-repo speedup (OpenCV, 282 files, parallel)
  • Formats the 55,000-line gRPC root CMakeLists.txt in ~39ms (cmake-format: 84 s)

Current headline measurements were captured on:

  • macOS 26.3.1
  • aarch64-apple-darwin
  • 10 logical CPUs
  • rustc 1.94.1
  • hyperfine 1.20.0
  • cmake-format 0.6.13

Exact numbers vary by machine. What matters across releases is that relative performance trends remain strong and regressions are caught early.

There are two phases, each using the most natural command for its scenario.

Single-file timings are measured against a corpus of 11 real-world CMakeLists.txt files sourced from well-known open-source projects, ranging from a 2-line stub to the 55,000-line gRPC root. Each fixture is run in isolation — one file, one process — to isolate single-file formatting performance from batch overhead.

Both cmakefmt and cmake-format are timed in the same hyperfine invocation so they share the same system conditions:

Terminal window
hyperfine --warmup 10 --runs 50 \
"cmakefmt path/to/CMakeLists.txt" \
"cmake-format path/to/CMakeLists.txt"

This is the default “format this file” operation: parse, format, emit to stdout. No --check is used because the goal is to compare the most natural single-file invocation, and both tools do equivalent work (parse + format + serialize). Adding --check would slightly favour cmakefmt because --check (without --diff) skips diff computation, so the per-file numbers here are the conservative measurement.

Whole-repository timings format every CMake file in the project. Emitting thousands of formatted files to stdout would mostly measure the terminal, not the formatter, so this phase uses --check for both tools — which is also the realistic CI usage pattern:

Terminal window
hyperfine --warmup 3 --runs 10 \
--command-name "cmakefmt-parallel" "cmakefmt --check path/to/repo" \
--command-name "cmakefmt-serial" "cmakefmt --check --parallel 1 path/to/repo" \
--command-name "cmake-format" "xargs cmake-format --check < repo-files.txt"

cmake-format has no native multi-file discovery, so the file list is pre-computed via cmakefmt --list-input-files and piped in via xargs. Both tools therefore see the same set of files.

  • Warmup runs — allow the OS page cache and branch predictor to reach a steady state before measurements are recorded
  • Timed runs — give hyperfine enough samples for a stable mean and tight confidence interval (50 for per-file, 10 for the much slower whole-repo runs)
  • Geometric mean — used instead of arithmetic mean because it is resistant to outliers and gives equal weight to each fixture regardless of absolute runtime

The fixture corpus and its pinned commit SHAs are stored in tests/fixtures/real_world/manifest.toml. To reproduce the measurements locally, fetch the corpus and run hyperfine against each file using the commands in How To Reproduce below.

Time comparison across real-world fixtures. Hover a bar to see the exact timings and speedup for that fixture.

Per-fixture speedup. The dashed line marks the geometric mean (104×). A few fixtures (opencv_root, grpc_root) show very large speedups — these reflect pathological behaviour in cmake-format on those specific files rather than cherry-picking, and are kept un-capped for transparency.

The per-file numbers above are dominated by process startup (~10ms), which is a fixed cost paid once per invocation. Whole-repository benchmarks show the real-world speedup when formatting entire projects:

RepositoryFilescmakefmt (parallel)cmakefmt (serial)cmake-formatSpeedup (parallel)
googletest410ms10ms321ms31.1×
spdlog710ms10ms376ms38.5×
fmt1110ms15ms849ms82.3×
catch22012ms16ms3,320ms280.8×
protobuf2510ms12ms1,182ms118.7×
nlohmann_json4113ms19ms4,526ms359.6×
grpc4644ms53ms86,855ms1,968.6×
bullet35812ms15ms557ms47.1×
vulkan_hpp13520ms36ms1,244ms63.5×
opencv28235ms101ms100,711ms2,853.0×
blender41946ms133ms42,950ms939.0×
oomph-lib61159ms144ms12,879ms218.2×
llvm2,621102ms117ms4,811ms47.2×
cmake11,264265ms636ms3,908ms14.7×

Geometric-mean speedup: 150× (parallel) / 95× (serial). Aggregate speedup across the full corpus: 408× (264 s of cmake-format collapses to 0.65 s of cmakefmt --parallel).

The hyperfine numbers above measure end-to-end CLI invocations, including process startup, I/O, and discovery. The Criterion estimates below isolate individual pipeline stages on a 1,000+ line synthetic stress-test file (in-process, no I/O):

MetricEstimate95% CI
Parser-only4.8075 ms4.7833–4.8471 ms
Formatter-only (from parsed AST)2.9462 ms2.9312–2.9665 ms
End-to-end format_source7.8537 ms7.7971–7.9319 ms
Debug/barrier-heavy formatting1.6846 ms1.6805–1.6887 ms

All Criterion estimates show a point estimate with a 95% confidence interval — the range within which the true mean is expected to fall 95% of the time. “AST” (Abstract Syntax Tree) is the structured in-memory representation produced by parsing, before formatting.

Parallelism scales well across worker counts. The chart shows two real-world repositories:

  • opencv (282 files) — a large vendor-style CMake codebase
  • oomph-lib (611 files)oomph-lib, a larger real-world CMake repository used to measure scaling behavior at a more realistic project size

Hover a point to see the time and speedup vs serial for each repository.

Peak RSS (Resident Set Size — the RAM physically held in memory by the process) rises from 14.9 MB serial to 19.4 MB at --parallel 8 on opencv, and from 13.6 MB to 16.8 MB on oomph-lib.

Single-file performance across every release, measured on the same fixture (mariadb_server/CMakeLists.txt, 656 lines) using each version’s official macOS aarch64 binary from GitHub Releases.

The solid line shows formatting time (left axis); the dashed line shows binary size (right axis). Hover a point to see the details and what changed in that release.

The gradual increase from ~10ms to ~15ms across early releases is entirely process startup overhead — the formatting engine itself remains sub-millisecond. As the binary grew from 4.7MB to 6.0MB with new features (LSP, shell completions, watch mode, JSON Schema, parse tree dump), the OS needed to load more pages from disk at startup. The v1.1.0 datapoint shows the binary brought back down to 4.8MB and wall time back under 8ms on the same 656-line fixture via link-time optimization (lto = true, codegen-units = 1), a hand-written recursive-descent parser replacing pest, and removal of an unused dependency. This fixed cost is paid once per invocation and is irrelevant for multi-file runs.

The v1.2.0 datapoint moves down to ~6.6ms — about 0.2ms faster than v1.1.0 on the same fixture under matched-methodology hyperfine measurements (--shell=none --style basic, 100 warmups, 200 runs). The embedded built-in command spec moved from a TOML file parsed at every process startup to a MessagePack blob produced once at build time and decoded at startup with rmp-serde, which is roughly 20× faster than parsing structured text. The YAML source is still maintained in src/spec/builtins.yaml for human editing; build.rs re-emits it as MessagePack into OUT_DIR and the binary include_bytes!-bundles the blob.

(The v1.1.0 datapoint was re-measured against the same fixture under this methodology and revised from a previously-reported 7.1ms to 6.8ms; the older number was inflated by hyperfine’s progress-bar rendering, which we now disable via --style basic.)

The headline numbers matter not as abstract benchmarks, but because they change what feels viable:

  • repository-wide --check in CI — comfortable
  • pre-commit hooks on staged files — instant
  • repeated local formatting during development — no delay you will notice
  • editor-triggered format-on-save — faster than the save dialog

Run the formatter benchmark suite:

Terminal window
cargo bench --bench formatter

Save a baseline before a risky change:

Terminal window
cargo bench --bench formatter -- --save-baseline before-change

Compare a later run against that baseline:

Terminal window
cargo bench --bench formatter -- --baseline before-change

Performance regressions are tracked automatically:

  • Every push to main and every PR runs the benchmark suite via CodSpeed in simulation mode.
  • Weekly scheduled runs (Monday 06:00 UTC) provide a consistent baseline independent of PR activity.
  • CodSpeed flags regressions > 10% with a warning on the PR check. Regressions that appear only across different runner environments are noise — check the “different runtime environments” warning before investigating.

Regression review policy:

  1. If CodSpeed flags a regression on a PR, investigate before merging.
  2. If the regression is real (same runner, same code path), either fix it or document why the trade-off is acceptable in the PR description.
  3. If the regression appears only on a scheduled run (no code change), it is likely runner variance — note it and move on.

Baselines:

CodSpeed tracks baselines automatically per branch. For local comparison, use Criterion baselines as described above.

Docs track main. For historical docs, check out a release tag in the repository and build docs/ locally.