A taxonomy of 35 SOTA software testing methods for 2027, with templates, decision guides, and a real-world case study.
Testing is not one thing. "Write unit tests" is career advice from 2010. In 2027, a production-grade system uses a portfolio of 10β15 distinct testing methods, each catching a different class of defect. This playbook is the index.
Who this is for:
- Engineers asking "what else should I be testing?"
- Tech leads setting up a new project's quality bar
- SREs / platform teams designing reliability targets
- Anyone trying to justify a testing budget with evidence
What's inside:
- 35 testing methods, categorised and ranked by leverage
- Maturity matrix β crawl / walk / run / fly
- Selection guide β which methods to adopt first
- Copy-paste templates for the top 12 tools
- Real case study β how we took a codebase from 320 to 521 tests + 82.6% mutation score across 3 phases of refactor
If you're time-starved, adopt these 10 methods in order. Each one catches defects that the previous ones miss:
| # | Method | Catches | Tool |
|---|---|---|---|
| 1 | Unit testing | Logic bugs in pure functions | vitest / bun:test |
| 2 | Type-level testing | API contract drift at compile time | tsd / expect-type |
| 3 | Integration testing | Cross-module wiring bugs | vitest + mocks |
| 4 | Property-based testing | Edge cases you didn't imagine | fast-check |
| 5 | Contract testing | Runtime data corruption at boundaries | Zod |
| 6 | Mutation testing | Tests that exist but don't assert | Stryker |
| 7 | E2E browser testing | UI regressions | Playwright |
| 8 | Load testing | Performance under real traffic | k6 |
| 9 | Chaos engineering | Failure mode bugs | Chaos Mesh |
| 10 | SAST / security | Injection, taint, secret leaks | Semgrep / CodeQL |
Running these 10 in CI puts you above 95% of production codebases. The remaining 25 methods in this playbook are specialised power tools for when your requirements exceed the baseline.
Ranked by production leverage Γ implementation effort. The top rows are "must have for any serious project"; the bottom rows are "only for domains that need them."
| # | Method | What it catches | See |
|---|---|---|---|
| 1 | Unit testing | Logic bugs in pure functions | methods/01 |
| 2 | Type-level testing | Type contract drift | methods/02 |
| 3 | Integration testing | Wiring bugs between modules | methods/03 |
| 4 | Property-based testing | Edge cases from random inputs | methods/04 |
| 5 | Contract testing (Zod) | Runtime data corruption at boundaries | methods/05 |
| 6 | Mutation testing | "Tests that exist but don't assert" | methods/06 |
| # | Method | What it catches | See |
|---|---|---|---|
| 7 | E2E browser testing | UI regressions, cross-browser | methods/07 |
| 17 | Visual regression | Pixel-level CSS drift | methods/17 |
| 18 | Accessibility testing | WCAG violations | methods/18 |
| # | Method | What it catches | See |
|---|---|---|---|
| 8 | Load / stress testing | Performance under scale | methods/08 |
| 9 | Chaos engineering | Failure mode bugs | methods/09 |
| 11 | Canary SLO gates | Regressions during rollout | methods/11 |
| 19 | Synthetic monitoring | Production-only bugs | methods/19 |
| 26 | Fault injection | Retry + timeout logic | methods/26 |
| # | Method | What it catches | See |
|---|---|---|---|
| 10 | SAST (Semgrep / CodeQL) | Injection, taint, secret leaks | methods/10 |
| 12 | Dependency / supply chain | Malicious packages, CVEs | methods/12 |
| 20 | Fuzzing | Parser / deserializer crashes | methods/20 |
| # | Method | What it catches | See |
|---|---|---|---|
| 13 | In-memory DB integration | SQL semantic bugs fast | methods/13 |
| 14 | testcontainers | Full-fidelity infra integration | methods/14 |
| 15 | Schema / migration testing | Destructive migrations | methods/15 |
| 16 | Snapshot / golden testing | Generated output drift | methods/16 |
| # | Method | What it catches | See |
|---|---|---|---|
| 21 | Benchmark regression | Perf cliff drops | methods/21 |
| 22 | Differential testing | Migration behaviour drift | methods/22 |
| 23 | Metamorphic testing | ML models / compilers | methods/23 |
| 24 | Combinatorial (pairwise) | Config matrix explosion | methods/24 |
| 25 | Record-and-replay | Real traffic drift | methods/25 |
| 27 | Coverage-guided fuzzing | Deep parser bugs | methods/27 |
| # | Method | What it catches | See |
|---|---|---|---|
| 28 | Formal verification (TLA+) | Concurrency race conditions | methods/28 |
| 29 | Concolic / symbolic execution | Paths no test can reach | methods/29 |
| 30 | LLM-assisted test generation | Coverage of forgotten cases | methods/30 |
| # | Method | Notes | See |
|---|---|---|---|
| 31 | Approval testing | For complex human-reviewed output | methods/31 |
| 32 | BDD / Gherkin acceptance | Stakeholder communication, not test quality | methods/32 |
| 33 | Smoke testing | Cheap last-resort canary | methods/33 |
| 34 | Penetration testing (manual) | Compliance requirement | methods/34 |
| 35 | Accessibility manual audits | Beyond axe-core automation | methods/35 |
Start with these five and you'll have a better test suite than 80% of production TypeScript codebases:
# 1. Unit + integration (already have it if you use vitest / bun:test)
# 2. Type-level tests
bun add -d expect-type
# 3. Property-based tests
bun add -d fast-check
# 4. Contract testing (Zod)
bun add zod
# 5. Mutation testing
bun add -d @stryker-mutator/coreThen copy our template configs into your project:
templates/stryker/stryker.conf.jsontemplates/fast-check/example.test.tstemplates/expect-type/example.types.test.tstemplates/github-actions/ci.yml
Every method in this playbook has been validated on at least one production codebase. We're not listing hypotheticals β every entry links to either:
- A real commit in
case-studies/showing adoption - A working template in
templates/you can copy - A tool with >1000 GitHub stars actively maintained in 2026+
See the case study for the full evolution of a managed-resource controller from 320 tests + no mutation testing to 521 tests + 82.6% mutation score in three refactor phases (P5 β P6 β P7).
The playbook encodes these beliefs, drawn from shipping real systems:
100% line coverage with weak assertions is worse than 60% coverage with strong assertions. Mutation score is the truer signal β see methods/06.
Each testing method in this playbook catches a distinct class of bug:
- Unit tests catch logic errors
- Type tests catch API drift
- Property tests catch unimagined edge cases
- Mutation tests catch weak assertions
- Contract tests catch runtime data corruption
- Fuzzing catches parser crashes
- Load tests catch perf regressions
- Chaos catches failure mode bugs
- SAST catches security injection
Having 10,000 unit tests β having 1,000 unit tests + 100 property tests + 50 contract tests. The latter catches more bug classes.
Tests should be:
- Fast (ms per test, <30s total for the "inner loop")
- Deterministic (no flakes, no ordering dependency)
- Isolated (no shared global state, no
mock.modulepollution) - Parallel-safe (run with
-j 8without issue)
If your tests don't meet these bars, fix the infrastructure before adding more tests. See methods/03 Β§ gotchas.
Old thinking: 70% unit, 20% integration, 10% E2E. New thinking (Kent C. Dodds' Testing Trophy):
- π Static (types, lint) β fast, runs on save
- π₯ Integration β the actual sweet spot
- π₯ Unit β for pure functions only
- π₯ E2E β for critical user paths
Most of our tests are integration β they exercise real module wiring against mock I/O boundaries. Pure unit tests are reserved for genuinely pure functions (math, parsers, formatters).
If you can't answer that, delete the test. It's load-bearing noise.
sylphx-testing-playbook/
βββ README.md β you are here
βββ methods/ β one markdown per testing method
β βββ 01-unit-testing.md
β βββ 02-type-level-testing.md
β βββ ...
βββ matrix/
β βββ maturity-model.md β crawl / walk / run / fly thresholds
β βββ selection-guide.md β decision tree: which method next?
βββ templates/ β copy-paste config starters
β βββ stryker/
β βββ playwright/
β βββ k6/
β βββ ...
βββ case-studies/
β βββ sylphx-managed-resource-controller.md
βββ languages/ β language-specific notes
β βββ typescript.md
β βββ python.md
β βββ go.md
β βββ rust.md
βββ .github/workflows/
βββ ci.yml β link check + markdown lint
See CONTRIBUTING.md. TL;DR:
- Add new methods as
methods/NN-name.mdfollowing the template - Every method needs a working example in
templates/ - Every claim needs either a commit link or a tool with >1000 stars
- Keep it opinionated β the value is in the recommendations, not the neutrality
MIT. Fork it, adapt it, reference it from your org's engineering handbook.
Born out of the Sylphx managed-resource controller refactor (Phase 5 β 7, 2026 Q2). Inspired by:
- Kent C. Dodds β Testing Trophy
- Martin Fowler β Integration vs Unit distinction
- Google SRE β SLO / error budget doctrine
- Netflix β Chaos engineering canon
- Stripe β Contract testing at scale
- AWS β TLA+ in cloud infrastructure