Motivation
Root-causing a frontend memory blowup (with check src/main.w peaked at 4.76 GB; a trivial file is ~25 MB) needed a way to attribute allocations to a line of code, and none of the existing tooling worked at scale.
The bug was ultimately localized by committed-bytes-delta instrumentation: a temporary with_alloc_committed_bytes() accessor bracketing each comptime-eval phase (read committed before/after, accumulate deltas + a call count, print). That pinned the growth to prepare_comptime_eval_copy (src/Sema.w) deep-cloning the full ~8.5 MB compiler source text on every one of ~239 comptime evals — ~2 GB of dead copies. Fixed by sharing the read-only source vectors; check src/main.w is now 2.51 GB, fixpoint byte-identical.
That manual bracketing works but is per-investigation and only localizes to a phase, not a call site. with_alloc_committed_bytes() is now a permanent rt primitive (see docs/debug-allocator.md) so the interim technique needs no throwaway runtime edit — but we still want a real call-site profiler.
What exists and why each is inadequate
--debug-alloc / debug allocator — coarse origin tags + leak detection, but a fixed-size ledger that overflows at this scale (debug-alloc: ledger full, tracking truncated). Reports leaks, not a call-site/bytes breakdown.
- committed-bytes-delta bracketing (the interim that cracked this bug) — localizes to a manually-chosen phase, not automatically to a call site; requires hand-editing the phase brackets each time.
- macOS
sample — CPU sampling, not allocation.
--stats — no memory/allocation reporting.
- lldb breakpoint per allocation — too slow at high frequency (timed out at 80k ignore-count).
- OS allocator tools (Instruments,
malloc_history, leaks) — can't see rt_mmap; our allocator bypasses malloc.
- Constraint: the runtime deliberately avoids in-process frame-pointer backtraces; at
-O1 frame pointers may be omitted, so a naive runtime fp-walk is unreliable.
Ask
A call-site allocation profiler for the rt_mmap/rt_alloc allocator:
- Aggregates bytes + counts by call site into a bounded hashtable (survives millions of allocations — unlike the per-allocation ledger).
- Gated by an env var (e.g.
WITH_ALLOC_PROFILE=1) so it is zero-cost when off.
- Dumps the top allocators by total bytes at exit, symbolized to
file:line.
- Call-site capture approaches to evaluate: DWARF-based unwinding, a sampled statistical allocation profiler, or a reliable bounded frame walk that stops at the first non-runtime frame.
Payoff
Pinpoint allocation hot spots directly and verify memory fixes quantitatively instead of by rebuild-and-remeasure. Reusable across the compiler.
Reference: docs/debug-allocator.md, docs/deep-debugging-tools.md. Related: the source-text fix (comptime eval), the --emit-c self-compile OOM (#619).
Motivation
Root-causing a frontend memory blowup (
with check src/main.wpeaked at 4.76 GB; a trivial file is ~25 MB) needed a way to attribute allocations to a line of code, and none of the existing tooling worked at scale.The bug was ultimately localized by committed-bytes-delta instrumentation: a temporary
with_alloc_committed_bytes()accessor bracketing each comptime-eval phase (read committed before/after, accumulate deltas + a call count, print). That pinned the growth toprepare_comptime_eval_copy(src/Sema.w) deep-cloning the full ~8.5 MB compiler source text on every one of ~239 comptime evals — ~2 GB of dead copies. Fixed by sharing the read-only source vectors;check src/main.wis now 2.51 GB, fixpoint byte-identical.That manual bracketing works but is per-investigation and only localizes to a phase, not a call site.
with_alloc_committed_bytes()is now a permanent rt primitive (seedocs/debug-allocator.md) so the interim technique needs no throwaway runtime edit — but we still want a real call-site profiler.What exists and why each is inadequate
--debug-alloc/ debug allocator — coarse origin tags + leak detection, but a fixed-size ledger that overflows at this scale (debug-alloc: ledger full, tracking truncated). Reports leaks, not a call-site/bytes breakdown.sample— CPU sampling, not allocation.--stats— no memory/allocation reporting.malloc_history,leaks) — can't seert_mmap; our allocator bypassesmalloc.-O1frame pointers may be omitted, so a naive runtime fp-walk is unreliable.Ask
A call-site allocation profiler for the
rt_mmap/rt_allocallocator:WITH_ALLOC_PROFILE=1) so it is zero-cost when off.file:line.Payoff
Pinpoint allocation hot spots directly and verify memory fixes quantitatively instead of by rebuild-and-remeasure. Reusable across the compiler.
Reference:
docs/debug-allocator.md,docs/deep-debugging-tools.md. Related: the source-text fix (comptime eval), the--emit-cself-compile OOM (#619).