feat(testing): add cfn_handler.testing module with replay() and helpers#24
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ac19152e16
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| polling_interval_minutes=self._polling_interval_minutes, | ||
| ) | ||
| provision = self._provision_poller if self._provision_poller is not None else setup_polling | ||
| provision(event, context.function_name, self._polling_interval_minutes, None) |
There was a problem hiding this comment.
Preserve legacy poller callable arity in provisioning call
The provisioning path now always calls provision(..., None) with four positional arguments. Any existing monkey-patched setup_polling (or custom callable) that still uses the prior 3-argument shape will raise TypeError; this exception is not handled in this block (only CfnHandlerError is), so dispatch falls into the generic backstop and returns an internal-error failure instead of the expected polling behavior. Using keyword args (or omitting the explicit region when not needed) avoids this regression.
Useful? React with 👍 / 👎.
Adds the OpenSpec change `add-testing-helpers` documenting:
* proposal.md - rationale, scope (replay(), Replay, factories,
assertions, pytest fixtures), explicit non-goals (no integration
helpers, no fluent builders, no async support), and the
soft-deprecation path for the existing `test_mode` flag.
* design.md - architecture, the dispatch flow diagram, eight key
decisions each with at least one rejected alternative, risks,
and open questions reserved for the implementation phase.
* specs/testing-helpers/spec.md - 9 ADDED requirements with ~25
scenarios covering the full public contract.
* tasks.md - 13 phases / 43 verifiable tasks in TDD order.
Adds `docs/ROADMAP.md` capturing the wider library-surface evolution
discussion: testing helpers (this change), better logging, optional
idempotency module, typed events for v2.0, and the cfn-lint plugin
parallel-track work. The Decision Log section records explicit non-goals
(CDK construct, SFN polling, CFN macros, async handlers) with rationale.
This commit lands the plan only - no implementation. Subsequent commits
in this PR implement it, ending with the test-suite migration.
Introduces a public testing surface so users can unit-test custom-resource
handlers without HTTP, without boto3, and without reaching into
`cfn_handler._internal/`.
Public API (`cfn_handler.testing`):
* `Replay` - frozen dataclass capturing the dispatch outcome
(status / data / reason / no_echo / payload / request_type /
physical_resource_id). The `status` field is one of
"SUCCESS" / "FAILED" / "DEFERRED"; the last is a replay-only
sentinel signalling "would have entered polling".
* `CustomResource.replay(event, context=None)` - drives the full
dispatch pipeline in-process using internal seams to capture the
response payload. No HTTP, no boto3 import. Polling is stubbed:
a deferred replay mutates the event with marker keys and returns
`Replay(status="DEFERRED")`; a follow-up `replay()` with the
mutated event correctly resumes through the poll handler.
* `make_event(...)` / `make_context(...)` - factories with safe
defaults (RFC 6761 `example.invalid`, AWS-reserved 111111111111
account ID). `make_event` enforces `physical_resource_id` for
Update/Delete events.
* `assert_success` / `assert_failed` / `assert_deferred` -
pytest-style assertion helpers with informative AssertionError
messages on mismatch.
* pytest fixtures `cfn_create_event`, `cfn_update_event`,
`cfn_delete_event`, `cfn_lambda_context` - auto-discovered via
a new `pytest11` entry point. No `pytest_plugins` declaration
required in user conftest.
Internal seams (private API, used by replay() but also available to
power users via the constructor kwargs `transport=`,
`provision_poller=`, `teardown_poller=`):
* `Transport` - `Callable[[str, dict], None]` replacing the default
urllib PUT (`send_response`). Late-bound default lookup so existing
tests that `patch("cfn_handler.resource.send_response")` continue
to work.
* `PollerProvision` / `PollerTeardown` - mirror seams for the
boto3-using polling provisioning/teardown calls. Late-bound for
backwards compatibility with existing patches.
* `CustomResource._replay_seams(...)` - context manager that swaps
all three seams atomically and restores them on exit (including
exceptional return). Used by the runner; not part of the public
API contract.
CI / tooling:
* `pyproject.toml`: registers the `pytest11` entry point.
* `justfile` + `.github/workflows/ci.yml`: switch `test-cov` from
`pytest --cov` to `coverage run -m pytest`. The pytest11 entry
point causes pytest to import `cfn_handler.testing.fixtures` (and
transitively `cfn_handler`) during plugin collection, BEFORE the
pytest-cov instrumentation hooks attach. Module-level code in
`__init__.py` then runs uninstrumented and the report shows
artificial 0% on those lines, dropping aggregate coverage to ~68%.
`coverage run` ensures the tracer is active before any imports.
Documented in both the recipe and the workflow step.
Tests added:
* `tests/unit/test_transport_seam.py` - the seam intercepts; default
behaviour preserved.
* `tests/unit/test_poller_seam.py` - poller stubs work; boto3 is
never imported during a stubbed deferral.
* `tests/unit/testing/test_replay.py` - SUCCESS/FAILED replays;
`Replay` is frozen; no HTTP I/O; default context fallback.
* `tests/unit/testing/test_replay_polling.py` - DEFERRED status,
event mutation, two-step deferral->resume flow.
* `tests/unit/testing/test_factories.py` - make_event /
make_context with overrides + validation.
* `tests/unit/testing/test_assertions.py` - all helpers, both
pass and fail paths, message contents on failure.
* `tests/integration/test_replay_parity.py` - same handler via
`__call__` (moto + fake transport) vs `replay()` produce
equivalent payloads.
* `tests/integration/test_fixture_discovery.py` - pytest11 entry
point auto-discovers fixtures in a fresh subprocess project,
fixture invocations are independent.
No breaking changes. The legacy `test_mode` flag continues to work.
The deprecation of `test_mode` and `last_response` lands in the next
commit.
…play()
The legacy `test_mode=True` constructor flag and `last_response` capture
attribute are superseded by the new `replay()` method shipped in the
previous commit. They had known issues:
* Mutable state on the resource (tests must reset `last_response`
between assertions or risk false positives).
* Sentinel string `__cfn_handler_polling__` for the polling-defer
case lives on the public `last_response` surface with no type
or documentation guarantees.
* Polling re-invocation can't be tested: `test_mode` short-circuits
`setup_polling` entirely, so the marker keys never get added to
the event and a follow-up dispatch can't be simulated.
This commit:
* Emits a `DeprecationWarning` from `CustomResource.__init__` when
`test_mode=True`, with a message pointing at `replay()` and the
`cfn_handler.testing` module. `stacklevel=2` so the warning
surfaces at the user's call site.
* Updates the docstrings on `test_mode` (constructor parameter) and
`last_response` (instance attribute) to mark them as deprecated
and reference `replay()`.
* Adds `tests/unit/test_test_mode_deprecation.py` covering the
warning, that `test_mode=False` does NOT warn, and that the
legacy behaviour still functions verbatim (so existing user code
keeps working in v1.x).
Removal is scheduled for v2.0 (separate change). The behaviour is
unchanged in v1.3 - users see only the warning. The internal test
suite migrates onto `replay()` in the next commit; until that lands
the project's own pytest `filterwarnings` would promote the warning
to an error in tests that still use `test_mode=True`. To avoid
artificial failures during the migration window, that single warning
is filtered to `ignore` in pyproject.toml's `[tool.pytest.ini_options]
.filterwarnings` block - reverted in the migration commit.
Bulk migration of the 100+ pre-existing references to
`CustomResource(test_mode=True)` and `resource.last_response` over to
the new `replay()` API. Verbatim translation:
CustomResource(test_mode=True) → CustomResource()
resource(event, ctx) → replay = resource.replay(event, ctx)
resource.last_response["Status"] == "X" → replay.status == "X"
resource.last_response["Data"] → replay.data
resource.last_response["Reason"] → replay.reason
resource.last_response["PhysicalResourceId"] → replay.physical_resource_id
resource.last_response["NoEcho"] → replay.no_echo
Where the assertion shape matches, the migration uses the helpers
`assert_success(replay, data=...)` / `assert_failed(replay,
reason_contains=...)` for clearer intent and informative error
messages.
Files migrated:
* tests/unit/test_resource.py - decorator registration tests,
lifecycle dispatch, exception handling, PhysicalResourceId
semantics, init_failure, log_level acceptance.
* tests/unit/test_backstops.py - the lone test_mode-specific test
(`test_safe_teardown_skipped_in_test_mode`) was moved to
`test_test_mode_deprecation.py` since it tests deprecated
behaviour we keep working until v2.0.
* tests/unit/test_polling_dispatch.py - the
`__cfn_handler_polling__` sentinel test (also deprecated-only
behaviour) was moved to test_test_mode_deprecation.py.
* tests/unit/test_state_machine.py - hypothesis-driven invariant
tests, all migrated to inspect `Replay` fields rather than
`last_response`.
After this commit `grep -rn 'test_mode\|last_response' tests/` returns
hits only from `test_test_mode_deprecation.py` (intentional, testing
the deprecated path) and a single docstring mention in
`test_resource.py`. The temporary `filterwarnings` exception added
in the previous commit is removed - any new test using `test_mode=True`
would now fail the suite (as intended), forcing the test author to use
the new API.
Coverage stays at 98%.
Adds a 'Testing your handlers' section between 'Examples' and
'Project status' that:
* Shows a minimal one-screen example: build a CustomResource,
register a handler, call `replay(make_event())`, assert via
`assert_success`.
* Lists the public testing surface (`replay`, `make_event`,
`make_context`, the three assertion helpers, and the four
auto-discovered pytest fixtures).
* Calls out the polling-deferral semantics: first `replay()`
returns `Replay(status='DEFERRED')` and mutates the event;
a second `replay()` resumes through the poll handler. Useful
for testing both halves of a polled lifecycle without
provisioning EventBridge rules.
The detailed contract lives in docstrings, the spec, and the
roadmap doc; the README only carries enough context to nudge a
new reader toward unit-testing their handlers from day one.
…ity)
The CodeQL GitHub Action's post-analysis step calls the GH REST API
(`/repos/{owner}/{repo}/actions/runs/{run_id}`) for telemetry and
status-page reporting. Under `act`, the synthesized GITHUB_RUN_ID
doesn't exist on github.com, so the call 404s and the action sets the
job status to JOB_STATUS_CONFIGURATION_ERROR even when:
* 174/174 queries loaded successfully
* 42/42 Python files extracted
* SARIF generated and post-processed
* Zero findings in our code
The result is gha-pre-release ALWAYS failing on the codeql step under
act, which masks real failures in upstream steps. Cleanest fix: drop
CodeQL from the local replay sequence with a clear note about why,
and rely on the real GH Actions run on every PR (which has actual
access to the workflow-runs endpoint via GITHUB_TOKEN).
Changes:
* Recipe header documents the skip with rationale and a one-liner
showing how to run CodeQL locally on demand for ad-hoc inspection.
The user runs `act push` directly and inspects the SARIF; a
configuration-error exit with a successfully-generated SARIF means
no findings.
* Step counters renumbered from N/7 to N/6 throughout the recipe.
* Step 5 prints an explicit SKIPPED notice explaining the situation
so the user isn't left wondering whether CodeQL ran.
* Final "safe to merge" message clarifies CodeQL still gates merge
on the actual PR via real GH Actions.
Discovered while running gha-pre-release on the testing-helpers PR -
CodeQL kept reporting failure despite analyzing cleanly.
Move `ReplayRequestType` and `ReplayStatus` into the
`if TYPE_CHECKING:` block. They were imported at module scope but only
referenced in annotations (resolved as strings via
`from __future__ import annotations`) and in a string-form
`cast("ReplayRequestType", ...)` call (per ruff's TC006 rule, which
prefers the string form to keep type-only symbols out of the runtime
import graph). CodeQL's py/unused-import sees the runtime import and
the string-only references, and reasonably concludes the import is
unused.
Cleaner resolution: keep the imports type-only, expand the explanatory
comment so future readers see why both static analyzers (CodeQL +
ruff TC006) end up happy with this shape.
The two CodeQL py/cyclic-import alerts on the surrounding
`if TYPE_CHECKING:` block remain false positives — CodeQL does not
model conditional imports, and the runtime cycle is broken by the
lazy import in `CustomResource.replay` (resource.py:354). Both have
been dismissed in the GitHub UI with rationale (alerts #15, #16).
a6c3e20 to
0f4cbf4
Compare
PR #24 (commit f0f9507) shipped the implementation: the `cfn_handler.testing` module with `replay()`, `Replay`, `make_event` / `make_context`, the three assertion helpers, and auto-discovered pytest fixtures, plus the soft-deprecation of `test_mode` / `last_response`. This archive: * Moves the change to `openspec/changes/archive/2026-05-25-add-testing-helpers/`, preserving proposal / design / tasks / spec for historical reference. * Promotes the delta spec to a new top-level capability at `openspec/specs/testing-helpers/spec.md` (verbatim copy of the ADDED requirements + a fresh Purpose section). `openspec validate testing-helpers --type spec --strict` passes. `openspec list` is now empty — no active changes.
Issue
No tracked issue —
cfn-handlerusers currently have no first-classway to unit-test their custom-resource handlers without reaching into
cfn_handler._internal/(which the README explicitly says is unstable).The legacy
test_mode=Trueflag introduced during early v1 developmentworked around this but accumulated known issues: mutable state on the
resource, an awkward sentinel string for the polling-defer case, and an
inability to test polling re-invocation because the marker keys are
never written to the event in test mode.
Planned in OpenSpec change
add-testing-helpers.Roadmap context:
docs/ROADMAP.md(this is item #1of the v1.x evolution; item #2 is better logging, item #3 is optional
idempotency).
Summary
Adds a public
cfn_handler.testingmodule so users can unit-testcustom-resource handlers in-process: no HTTP, no
boto3, nomotosetup. The headline API is
CustomResource.replay(event, context=None)which executes the full dispatch pipeline and returns a structured
Replayvalue. Polling-aware: a deferred replay mutates the eventwith marker keys and returns
Replay(status="DEFERRED"); a follow-upreplay()of the mutated event resumes through the poll handler —without provisioning a real EventBridge rule.
Soft-deprecates the existing
test_mode/last_responsesurface(continues to work in v1.x; emits
DeprecationWarning; removed in v2.0).The internal test suite is migrated onto the new API in the same PR
to dogfood it.
Changes
Public API surface (
cfn_handler.testing)Replaymake_event(...)make_context(...)LambdaContext-protocol objectassert_successassert_failedreason_containsassert_deferredcfn_*_eventpytest11cfn_lambda_contextpytest11The module supports zero-pytest, zero-boto3 environments — the entry
point only loads inside a pytest run, and
boto3is never importedon the non-polling path.
One-line example
Migration for users on
test_modeMechanical translation:
The polling case improves materially: where
test_modeonly wrote a{"__cfn_handler_polling__": True}sentinel and short-circuited theprovisioning,
replay()now mutates the event with the real markerkeys, so a follow-up
replay()correctly resumes through the pollhandler — letting users test both halves of a polled lifecycle without
moto.
Tests
just test-cov— 155 passing, 98% line+branch coverage (gate: 95%)just typecheck— mypy strict + pyright strict, 16 source files, no issuesjust lint— ruff + ruff format + cfn-lint, all checks passedjust gha-pre-release— all locally-replayable gating jobs green (CodeQL skipped underactdue to its post-analysis REST API call against a synthesized run id; validated by real GH Actions on this PR — see commitac19152)replay(make_event(), make_context())returnsSUCCESS, fixtures auto-discoverCoverage measurement note
Switched
test-covtocoverage run -m pytest(frompytest --cov) inboth
justfileandci.yml. The newpytest11entry point causespytest to import
cfn_handler.testing.fixtures(and transitivelycfn_handleritself) at plugin-collection time — before pytest-covattaches its instrumentation hooks. Module-level code in
__init__.pythen runs uninstrumented and the report shows artificial 0% coverage on
those lines, dropping aggregate to ~68%.
coverage run -m pytestattaches the tracer before any imports. Documented in both the recipe
and the workflow.
Breaking changes?
No. v1.3 ships the new
replay()API as purely additive. Thetest_modeflag andlast_responseattribute continue to functionexactly as before — they just emit a
DeprecationWarningdirectingusers to the new API. Removal is scheduled for v2.0 (separate change,
no timeline).
The internal test suite is migrated as part of this PR, but that's
a project-internal concern, not a breaking change for downstream
consumers.
Checklist
feat(testing):)cfn_handler.testing,docs/ROADMAP.mdupdated to reflect "shipped in v1.3.0")add-testing-helpers, validates--strict)Commit-by-commit story (for review)
The squash-merge will collapse these, but each is reviewable on its own
and leaves the suite in a green state at every checkpoint:
chore(openspec): propose v1.3 testing helpers + roadmap— plan onlyfeat(testing): add cfn_handler.testing module with replay() and helpers— the implementationfeat(resource): deprecate test_mode and last_response in favour of replay()— DeprecationWarning + tests + temporaryfilterwarningsignore for the migration windowtest: migrate existing test suite from test_mode to replay()— bulk translation, removes the temporary filterdocs(readme): add Testing section showing replay() workflowci(justfile): skip CodeQL in gha-pre-release (act/CodeQL incompatibility)— tangential discovery, kept on the same branch since it directly affects validation of this PR