Skip to content

Latest commit

 

History

History
35 lines (28 loc) · 1.94 KB

File metadata and controls

35 lines (28 loc) · 1.94 KB

Changelog

2026-02-28 — Decode optimisation: serial route matmul + AVX2 sigmoid

Baseline: 152.7 ms/tok (6.55 tok/s) at 12 threads on 5900X (WSL2, DDR4 dual-channel) After: 119.0 ms/tok (8.40 tok/s) — 22% latency reduction, 28% throughput improvement

Changes kept

  • Serial route matmul: Changed MoE routing from parallel to serial dispatch. Rayon thread wake-up latency (~30us x 11 threads x 40 layers) dominated for 256 tiny dot products (8KB per expert). Saved 32ms/tok (moe_route: 35.1ms to 3.1ms).
  • AVX2 vectorized sigmoid: Replaced scalar sigmoid in SiLU activation with 8-wide AVX2 fast exp approximation. Marginal (~1ms), but cleaner code with no downside.

Changes tested and reverted

  • Sub-tile splitting: cache line contention from two threads reading same tile data (+7%)
  • Flattened MoE dispatch: cross-expert task mixing destroyed L2 cache locality (+22%)
  • No nested MoE parallelism: only 8/12 threads active, lost work-stealing benefit (+10%)
  • MADV_COLLAPSE (THP alternative): WSL2 returns EINVAL, Hyper-V doesn't support it
  • Page table warming: pre-faulting PTEs polluted L1/L2, net slower (+5%)
  • PREFETCHNTA (non-temporal prefetch on weights): ~2% improvement but within WSL2 noise

Key findings

  • 24 threads (SMT) is 2.3x slower than 12 threads (physical cores only) on 5900X
  • THP does not work in WSL2 (AnonHugePages stays 0 despite madvise calls)
  • TLB misses from 4KB pages on a 67GB model account for most of the remaining gap to theoretical bandwidth (~11 GB/s effective vs ~38 GB/s practical STREAM)
  • MADV_COLLAPSE would likely work on native Linux and is worth retesting there

2026-02-28 — Log management and run notes

  • On server start, existing krasis.log is archived to logs/krasis_YYYYMMDD_HHMMSS.log (timestamped from file mtime)
  • Fresh krasis.log started for each run
  • New --note parameter writes a run description header at the top of each log
  • logs/ directory gitignored (except .gitkeep)