Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Performance

cmakefmt is fast enough that you never have to think twice about running it — in local workflows, editor integrations, pre-commit hooks, or CI. That is not an accident. Speed is a design goal, not a side effect.

Current Benchmark Signal

The headline numbers from the current local benchmark set:

MetricCurrent local signal
Geometric-mean speedup vs cmake-format20.69x
Parser-only, large synthetic input (1000+ lines)estimate 7.1067 ms (95% CI 7.0793–7.1359 ms)
Formatter-only from parsed AST, large synthetic inputestimate 1.7545 ms (95% CI 1.7425–1.7739 ms)
End-to-end format_source, large synthetic inputestimate 8.8248 ms (95% CI 8.8018–8.8519 ms)
Debug/barrier-heavy formattingestimate 313.98 µs (95% CI 311.89–317.54 µs)

All Criterion estimates show a point estimate with a 95% confidence interval — the range within which the true mean is expected to fall 95% of the time. “Large synthetic input” refers to a 1000+ line stress-test CMakeLists.txt generated for benchmarking purposes. “AST” (Abstract Syntax Tree) is the structured in-memory representation produced by parsing, before formatting.

Real-World Comparison

The current local corpus comparison measured cmakefmt against cmake-format on real CMakeLists.txt files drawn from projects including:

  • Abseil
  • Catch2
  • CLI11
  • GoogleTest
  • ggml
  • llama.cpp
  • MariaDB Server
  • LLVM
  • Qt
  • nlohmann/json
  • protobuf
  • spdlog

Fetch the pinned local corpus before rerunning those comparisons:

python3 scripts/fetch-real-world-corpus.py

Results across that corpus:

  • cmakefmt was faster on every single fixture
  • speedup ranged from 10.91x to 48.49x
  • geometric-mean speedup: 20.69x

Parallel Batch Throughput

Multi-file runs are single-threaded by default, but opt-in parallelism scales well:

ModeTime
serial184.5 ms ± 1.3 ms
--parallel 2111.5 ms ± 11.9 ms
--parallel 464.7 ms ± 1.1 ms
--parallel 848.5 ms ± 1.5 ms

Peak RSS (Resident Set Size — the RAM physically held in memory by the process) rises from 13.2 MB (serial) to 20.7 MB (--parallel 8) on this batch. That is why the tool defaults to single-threaded execution unless you explicitly ask for more.

Large Repository Parallelism Survey

Phase 12 validation was also run against oomph-lib (local checkout with 612 discovered CMake files):

ModeTime
serial412.5 ms ± 9.0 ms
--parallel 2296.0 ms ± 3.5 ms
--parallel 4191.8 ms ± 4.7 ms
--parallel 8152.5 ms ± 2.8 ms

That corresponds to a 2.71x speedup at --parallel 8 versus serial, with peak RSS moving from 11.3 MB to 17.0 MB.

For a direct tool baseline on the same full oomph-lib tree (612 discovered files), /usr/bin/time -l measured:

  • cmake-format (sequential over discovered files): 45.69 s real
  • cmakefmt serial: 0.47 s real (~97x faster)
  • cmakefmt --parallel 8: 0.19 s real (~240x faster)

What The Numbers Mean In Practice

The headline numbers matter not as abstract benchmarks, but because they change what feels viable:

  • repository-wide --check in CI — comfortable
  • pre-commit hooks on staged files — instant
  • repeated local formatting during development — no delay you will notice
  • editor-triggered format-on-save — faster than the save dialog

Benchmark Environment

Current headline measurements were captured on:

  • macOS 26.3.1
  • aarch64-apple-darwin
  • 10 logical CPUs
  • rustc 1.94.1
  • hyperfine 1.20.0
  • cmake-format 0.6.13

Exact numbers vary by machine. What matters release to release is that relative trends stay strong and regressions are caught quickly.

How To Reproduce

Run the formatter benchmark suite:

cargo bench --bench formatter

Save a baseline before a risky change:

cargo bench --bench formatter -- --save-baseline before-change

Compare a later run against that baseline:

cargo bench --bench formatter -- --baseline before-change