Skip to content

SylphxAI/sylphx-testing-playbook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Sylphx Testing Playbook

A taxonomy of 35 SOTA software testing methods for 2027, with templates, decision guides, and a real-world case study.

Testing is not one thing. "Write unit tests" is career advice from 2010. In 2027, a production-grade system uses a portfolio of 10–15 distinct testing methods, each catching a different class of defect. This playbook is the index.

Who this is for:

  • Engineers asking "what else should I be testing?"
  • Tech leads setting up a new project's quality bar
  • SREs / platform teams designing reliability targets
  • Anyone trying to justify a testing budget with evidence

What's inside:


TL;DR β€” what "SOTA testing in 2027" actually means

If you're time-starved, adopt these 10 methods in order. Each one catches defects that the previous ones miss:

# Method Catches Tool
1 Unit testing Logic bugs in pure functions vitest / bun:test
2 Type-level testing API contract drift at compile time tsd / expect-type
3 Integration testing Cross-module wiring bugs vitest + mocks
4 Property-based testing Edge cases you didn't imagine fast-check
5 Contract testing Runtime data corruption at boundaries Zod
6 Mutation testing Tests that exist but don't assert Stryker
7 E2E browser testing UI regressions Playwright
8 Load testing Performance under real traffic k6
9 Chaos engineering Failure mode bugs Chaos Mesh
10 SAST / security Injection, taint, secret leaks Semgrep / CodeQL

Running these 10 in CI puts you above 95% of production codebases. The remaining 25 methods in this playbook are specialised power tools for when your requirements exceed the baseline.


The 35 methods

Ranked by production leverage Γ— implementation effort. The top rows are "must have for any serious project"; the bottom rows are "only for domains that need them."

βœ… Core (must have)

# Method What it catches See
1 Unit testing Logic bugs in pure functions methods/01
2 Type-level testing Type contract drift methods/02
3 Integration testing Wiring bugs between modules methods/03
4 Property-based testing Edge cases from random inputs methods/04
5 Contract testing (Zod) Runtime data corruption at boundaries methods/05
6 Mutation testing "Tests that exist but don't assert" methods/06

πŸ”₯ Critical for user-facing products

# Method What it catches See
7 E2E browser testing UI regressions, cross-browser methods/07
17 Visual regression Pixel-level CSS drift methods/17
18 Accessibility testing WCAG violations methods/18

πŸ›‘οΈ Critical for production reliability

# Method What it catches See
8 Load / stress testing Performance under scale methods/08
9 Chaos engineering Failure mode bugs methods/09
11 Canary SLO gates Regressions during rollout methods/11
19 Synthetic monitoring Production-only bugs methods/19
26 Fault injection Retry + timeout logic methods/26

πŸ” Critical for security + compliance

# Method What it catches See
10 SAST (Semgrep / CodeQL) Injection, taint, secret leaks methods/10
12 Dependency / supply chain Malicious packages, CVEs methods/12
20 Fuzzing Parser / deserializer crashes methods/20

πŸ—„οΈ Critical for data + infra

# Method What it catches See
13 In-memory DB integration SQL semantic bugs fast methods/13
14 testcontainers Full-fidelity infra integration methods/14
15 Schema / migration testing Destructive migrations methods/15
16 Snapshot / golden testing Generated output drift methods/16

🧠 High leverage for specialised domains

# Method What it catches See
21 Benchmark regression Perf cliff drops methods/21
22 Differential testing Migration behaviour drift methods/22
23 Metamorphic testing ML models / compilers methods/23
24 Combinatorial (pairwise) Config matrix explosion methods/24
25 Record-and-replay Real traffic drift methods/25
27 Coverage-guided fuzzing Deep parser bugs methods/27

🎯 Frontier / specialist

# Method What it catches See
28 Formal verification (TLA+) Concurrency race conditions methods/28
29 Concolic / symbolic execution Paths no test can reach methods/29
30 LLM-assisted test generation Coverage of forgotten cases methods/30

🟑 Situational

# Method Notes See
31 Approval testing For complex human-reviewed output methods/31
32 BDD / Gherkin acceptance Stakeholder communication, not test quality methods/32
33 Smoke testing Cheap last-resort canary methods/33
34 Penetration testing (manual) Compliance requirement methods/34
35 Accessibility manual audits Beyond axe-core automation methods/35

Quick-start β€” set up 5 methods in one afternoon

Start with these five and you'll have a better test suite than 80% of production TypeScript codebases:

# 1. Unit + integration (already have it if you use vitest / bun:test)

# 2. Type-level tests
bun add -d expect-type

# 3. Property-based tests
bun add -d fast-check

# 4. Contract testing (Zod)
bun add zod

# 5. Mutation testing
bun add -d @stryker-mutator/core

Then copy our template configs into your project:


How we picked these 35

Every method in this playbook has been validated on at least one production codebase. We're not listing hypotheticals β€” every entry links to either:

  1. A real commit in case-studies/ showing adoption
  2. A working template in templates/ you can copy
  3. A tool with >1000 GitHub stars actively maintained in 2026+

See the case study for the full evolution of a managed-resource controller from 320 tests + no mutation testing to 521 tests + 82.6% mutation score in three refactor phases (P5 β†’ P6 β†’ P7).


Principles

The playbook encodes these beliefs, drawn from shipping real systems:

1. Code coverage is the minimum bar, not the goal

100% line coverage with weak assertions is worse than 60% coverage with strong assertions. Mutation score is the truer signal β€” see methods/06.

2. Tests catching the same bugs are wasted tests

Each testing method in this playbook catches a distinct class of bug:

  • Unit tests catch logic errors
  • Type tests catch API drift
  • Property tests catch unimagined edge cases
  • Mutation tests catch weak assertions
  • Contract tests catch runtime data corruption
  • Fuzzing catches parser crashes
  • Load tests catch perf regressions
  • Chaos catches failure mode bugs
  • SAST catches security injection

Having 10,000 unit tests β‰  having 1,000 unit tests + 100 property tests + 50 contract tests. The latter catches more bug classes.

3. Tests are infrastructure, not documentation

Tests should be:

  • Fast (ms per test, <30s total for the "inner loop")
  • Deterministic (no flakes, no ordering dependency)
  • Isolated (no shared global state, no mock.module pollution)
  • Parallel-safe (run with -j 8 without issue)

If your tests don't meet these bars, fix the infrastructure before adding more tests. See methods/03 Β§ gotchas.

4. The test pyramid is outdated β€” use the trophy

Old thinking: 70% unit, 20% integration, 10% E2E. New thinking (Kent C. Dodds' Testing Trophy):

  • πŸ† Static (types, lint) β€” fast, runs on save
  • πŸ₯‡ Integration β€” the actual sweet spot
  • πŸ₯ˆ Unit β€” for pure functions only
  • πŸ₯‰ E2E β€” for critical user paths

Most of our tests are integration β€” they exercise real module wiring against mock I/O boundaries. Pure unit tests are reserved for genuinely pure functions (math, parsers, formatters).

5. Every test must be able to answer "what bug would this catch?"

If you can't answer that, delete the test. It's load-bearing noise.


Repository structure

sylphx-testing-playbook/
β”œβ”€β”€ README.md                    ← you are here
β”œβ”€β”€ methods/                     ← one markdown per testing method
β”‚   β”œβ”€β”€ 01-unit-testing.md
β”‚   β”œβ”€β”€ 02-type-level-testing.md
β”‚   └── ...
β”œβ”€β”€ matrix/
β”‚   β”œβ”€β”€ maturity-model.md        ← crawl / walk / run / fly thresholds
β”‚   └── selection-guide.md       ← decision tree: which method next?
β”œβ”€β”€ templates/                   ← copy-paste config starters
β”‚   β”œβ”€β”€ stryker/
β”‚   β”œβ”€β”€ playwright/
β”‚   β”œβ”€β”€ k6/
β”‚   └── ...
β”œβ”€β”€ case-studies/
β”‚   └── sylphx-managed-resource-controller.md
β”œβ”€β”€ languages/                   ← language-specific notes
β”‚   β”œβ”€β”€ typescript.md
β”‚   β”œβ”€β”€ python.md
β”‚   β”œβ”€β”€ go.md
β”‚   └── rust.md
└── .github/workflows/
    └── ci.yml                   ← link check + markdown lint

Contributing

See CONTRIBUTING.md. TL;DR:

  • Add new methods as methods/NN-name.md following the template
  • Every method needs a working example in templates/
  • Every claim needs either a commit link or a tool with >1000 stars
  • Keep it opinionated β€” the value is in the recommendations, not the neutrality

License

MIT. Fork it, adapt it, reference it from your org's engineering handbook.


Acknowledgements

Born out of the Sylphx managed-resource controller refactor (Phase 5 β†’ 7, 2026 Q2). Inspired by:

About

35 SOTA software testing methods for 2027. Taxonomy, decision guides, templates, and a real-world case study.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors