Skip to content

feat(testing): add cfn_handler.testing module with replay() and helpers#24

Merged
igorlg merged 7 commits into
mainfrom
feat/add-testing-helpers
May 22, 2026
Merged

feat(testing): add cfn_handler.testing module with replay() and helpers#24
igorlg merged 7 commits into
mainfrom
feat/add-testing-helpers

Conversation

@igorlg

@igorlg igorlg commented May 22, 2026

Copy link
Copy Markdown
Owner

Issue

No tracked issue — cfn-handler users currently have no first-class
way to unit-test their custom-resource handlers without reaching into
cfn_handler._internal/ (which the README explicitly says is unstable).
The legacy test_mode=True flag introduced during early v1 development
worked around this but accumulated known issues: mutable state on the
resource, an awkward sentinel string for the polling-defer case, and an
inability to test polling re-invocation because the marker keys are
never written to the event in test mode.

Planned in OpenSpec change add-testing-helpers.
Roadmap context: docs/ROADMAP.md (this is item #1
of the v1.x evolution; item #2 is better logging, item #3 is optional
idempotency).

Summary

Adds a public cfn_handler.testing module so users can unit-test
custom-resource handlers in-process: no HTTP, no boto3, no moto
setup. The headline API is CustomResource.replay(event, context=None)
which executes the full dispatch pipeline and returns a structured
Replay value. Polling-aware: a deferred replay mutates the event
with marker keys and returns Replay(status="DEFERRED"); a follow-up
replay() of the mutated event resumes through the poll handler —
without provisioning a real EventBridge rule.

Soft-deprecates the existing test_mode / last_response surface
(continues to work in v1.x; emits DeprecationWarning; removed in v2.0).
The internal test suite is migrated onto the new API in the same PR
to dogfood it.

Changes

src/cfn_handler/testing/                       NEW public module
├── __init__.py                                Re-exports __all__
├── py.typed
├── fixtures.py                                pytest11 entry point
└── _internal/
    ├── replay_result.py                       Replay frozen dataclass
    ├── runner.py                              run_replay() driver
    ├── event_factory.py                       make_event()
    ├── context_factory.py                     make_context()
    └── assertions.py                          assert_success / _failed / _deferred

src/cfn_handler/resource.py
├── + Transport / PollerProvision / PollerTeardown seam kwargs
│     (late-bound defaults; existing patches keep working)
├── + _replay_seams() context manager
├── + replay() public method
└── + DeprecationWarning when test_mode=True

src/cfn_handler/_internal/{response,poller}.py
└── + Transport, PollerProvision, PollerTeardown type aliases

pyproject.toml                                 + [pytest11] entry point
justfile                                       test-cov uses `coverage run`
.github/workflows/ci.yml                       same fix in CI
README.md                                      + Testing section

tests/                                         migrated + new
├── unit/test_resource.py                      ~104 refs migrated to replay()
├── unit/test_backstops.py                     test_mode-specific test moved
├── unit/test_polling_dispatch.py              sentinel test moved
├── unit/test_state_machine.py                 hypothesis tests migrated
├── unit/test_test_mode_deprecation.py         NEW (deprecation tests)
├── unit/test_transport_seam.py                NEW
├── unit/test_poller_seam.py                   NEW
├── unit/testing/                              NEW (4 files, 42 tests)
├── integration/test_replay_parity.py          NEW (production-vs-replay)
└── integration/test_fixture_discovery.py      NEW (pytest11 entry point)

Public API surface (cfn_handler.testing)

Name Kind Purpose
Replay frozen dataclass Outcome of a replay (status / data / reason / no_echo / payload / ...)
make_event(...) factory Canonical CFN custom-resource event with safe defaults
make_context(...) factory Minimal LambdaContext-protocol object
assert_success helper Pass when status=SUCCESS, optionally match data / pid / no_echo
assert_failed helper Pass when status=FAILED, optionally match reason_contains
assert_deferred helper Pass when status=DEFERRED (replay-only sentinel)
cfn_*_event pytest fixture Auto-discovered via pytest11
cfn_lambda_context pytest fixture Auto-discovered via pytest11

The module supports zero-pytest, zero-boto3 environments — the entry
point only loads inside a pytest run, and boto3 is never imported
on the non-polling path.

One-line example

from cfn_handler import CustomResource
from cfn_handler.testing import assert_success, make_event

resource = CustomResource()

@resource.create
def on_create(event, ctx):
    return {"Endpoint": "https://x.example"}

assert_success(resource.replay(make_event()), data={"Endpoint": "https://x.example"})

Migration for users on test_mode

Mechanical translation:

# Before (still works in v1.x, emits DeprecationWarning)
resource = CustomResource(test_mode=True)
resource(event, ctx)
assert resource.last_response["Status"] == "SUCCESS"
assert resource.last_response["Data"] == {"x": 1}

# After
resource = CustomResource()
replay = resource.replay(event, ctx)
assert replay.status == "SUCCESS"
assert replay.data == {"x": 1}
# or, more idiomatically:
assert_success(replay, data={"x": 1})

The polling case improves materially: where test_mode only wrote a
{"__cfn_handler_polling__": True} sentinel and short-circuited the
provisioning, replay() now mutates the event with the real marker
keys, so a follow-up replay() correctly resumes through the poll
handler — letting users test both halves of a polled lifecycle without
moto.

Tests

  • New tests added for new behaviour (~50 across replay, factories, assertions, fixtures, parity)
  • Existing tests still pass after migration onto the new API (~100 sites translated)
  • just test-cov155 passing, 98% line+branch coverage (gate: 95%)
  • just typecheck — mypy strict + pyright strict, 16 source files, no issues
  • just lint — ruff + ruff format + cfn-lint, all checks passed
  • just gha-pre-release — all locally-replayable gating jobs green (CodeQL skipped under act due to its post-analysis REST API call against a synthesized run id; validated by real GH Actions on this PR — see commit ac19152)
  • Manual smoke test in a fresh venv outside the repo: replay(make_event(), make_context()) returns SUCCESS, fixtures auto-discover

Coverage measurement note

Switched test-cov to coverage run -m pytest (from pytest --cov) in
both justfile and ci.yml. The new pytest11 entry point causes
pytest to import cfn_handler.testing.fixtures (and transitively
cfn_handler itself) at plugin-collection time — before pytest-cov
attaches its instrumentation hooks. Module-level code in __init__.py
then runs uninstrumented and the report shows artificial 0% coverage on
those lines, dropping aggregate to ~68%. coverage run -m pytest
attaches the tracer before any imports. Documented in both the recipe
and the workflow.

Breaking changes?

No. v1.3 ships the new replay() API as purely additive. The
test_mode flag and last_response attribute continue to function
exactly as before — they just emit a DeprecationWarning directing
users to the new API. Removal is scheduled for v2.0 (separate change,
no timeline).

The internal test suite is migrated as part of this PR, but that's
a project-internal concern, not a breaking change for downstream
consumers.

Checklist

  • Conventional Commits prefix in the PR title (feat(testing):)
  • CHANGELOG entry will be generated automatically by release-please
  • Documentation updated (README "Testing" section, docstrings on every public name in cfn_handler.testing, docs/ROADMAP.md updated to reflect "shipped in v1.3.0")
  • OpenSpec change opened (add-testing-helpers, validates --strict)

Commit-by-commit story (for review)

The squash-merge will collapse these, but each is reviewable on its own
and leaves the suite in a green state at every checkpoint:

  1. chore(openspec): propose v1.3 testing helpers + roadmap — plan only
  2. feat(testing): add cfn_handler.testing module with replay() and helpers — the implementation
  3. feat(resource): deprecate test_mode and last_response in favour of replay() — DeprecationWarning + tests + temporary filterwarnings ignore for the migration window
  4. test: migrate existing test suite from test_mode to replay() — bulk translation, removes the temporary filter
  5. docs(readme): add Testing section showing replay() workflow
  6. ci(justfile): skip CodeQL in gha-pre-release (act/CodeQL incompatibility) — tangential discovery, kept on the same branch since it directly affects validation of this PR

Comment thread src/cfn_handler/resource.py Dismissed
Comment thread src/cfn_handler/testing/_internal/runner.py Fixed
Comment thread src/cfn_handler/testing/_internal/runner.py Fixed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ac19152e16

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

polling_interval_minutes=self._polling_interval_minutes,
)
provision = self._provision_poller if self._provision_poller is not None else setup_polling
provision(event, context.function_name, self._polling_interval_minutes, None)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve legacy poller callable arity in provisioning call

The provisioning path now always calls provision(..., None) with four positional arguments. Any existing monkey-patched setup_polling (or custom callable) that still uses the prior 3-argument shape will raise TypeError; this exception is not handled in this block (only CfnHandlerError is), so dispatch falls into the generic backstop and returns an internal-error failure instead of the expected polling behavior. Using keyword args (or omitting the explicit region when not needed) avoids this regression.

Useful? React with 👍 / 👎.

Comment thread src/cfn_handler/testing/_internal/runner.py Dismissed
Comment thread src/cfn_handler/testing/_internal/runner.py Dismissed
igorlg added 7 commits May 22, 2026 16:28
Adds the OpenSpec change `add-testing-helpers` documenting:

  * proposal.md - rationale, scope (replay(), Replay, factories,
    assertions, pytest fixtures), explicit non-goals (no integration
    helpers, no fluent builders, no async support), and the
    soft-deprecation path for the existing `test_mode` flag.
  * design.md - architecture, the dispatch flow diagram, eight key
    decisions each with at least one rejected alternative, risks,
    and open questions reserved for the implementation phase.
  * specs/testing-helpers/spec.md - 9 ADDED requirements with ~25
    scenarios covering the full public contract.
  * tasks.md - 13 phases / 43 verifiable tasks in TDD order.

Adds `docs/ROADMAP.md` capturing the wider library-surface evolution
discussion: testing helpers (this change), better logging, optional
idempotency module, typed events for v2.0, and the cfn-lint plugin
parallel-track work. The Decision Log section records explicit non-goals
(CDK construct, SFN polling, CFN macros, async handlers) with rationale.

This commit lands the plan only - no implementation. Subsequent commits
in this PR implement it, ending with the test-suite migration.
Introduces a public testing surface so users can unit-test custom-resource
handlers without HTTP, without boto3, and without reaching into
`cfn_handler._internal/`.

Public API (`cfn_handler.testing`):

  * `Replay` - frozen dataclass capturing the dispatch outcome
    (status / data / reason / no_echo / payload / request_type /
    physical_resource_id). The `status` field is one of
    "SUCCESS" / "FAILED" / "DEFERRED"; the last is a replay-only
    sentinel signalling "would have entered polling".
  * `CustomResource.replay(event, context=None)` - drives the full
    dispatch pipeline in-process using internal seams to capture the
    response payload. No HTTP, no boto3 import. Polling is stubbed:
    a deferred replay mutates the event with marker keys and returns
    `Replay(status="DEFERRED")`; a follow-up `replay()` with the
    mutated event correctly resumes through the poll handler.
  * `make_event(...)` / `make_context(...)` - factories with safe
    defaults (RFC 6761 `example.invalid`, AWS-reserved 111111111111
    account ID). `make_event` enforces `physical_resource_id` for
    Update/Delete events.
  * `assert_success` / `assert_failed` / `assert_deferred` -
    pytest-style assertion helpers with informative AssertionError
    messages on mismatch.
  * pytest fixtures `cfn_create_event`, `cfn_update_event`,
    `cfn_delete_event`, `cfn_lambda_context` - auto-discovered via
    a new `pytest11` entry point. No `pytest_plugins` declaration
    required in user conftest.

Internal seams (private API, used by replay() but also available to
power users via the constructor kwargs `transport=`,
`provision_poller=`, `teardown_poller=`):

  * `Transport` - `Callable[[str, dict], None]` replacing the default
    urllib PUT (`send_response`). Late-bound default lookup so existing
    tests that `patch("cfn_handler.resource.send_response")` continue
    to work.
  * `PollerProvision` / `PollerTeardown` - mirror seams for the
    boto3-using polling provisioning/teardown calls. Late-bound for
    backwards compatibility with existing patches.
  * `CustomResource._replay_seams(...)` - context manager that swaps
    all three seams atomically and restores them on exit (including
    exceptional return). Used by the runner; not part of the public
    API contract.

CI / tooling:

  * `pyproject.toml`: registers the `pytest11` entry point.
  * `justfile` + `.github/workflows/ci.yml`: switch `test-cov` from
    `pytest --cov` to `coverage run -m pytest`. The pytest11 entry
    point causes pytest to import `cfn_handler.testing.fixtures` (and
    transitively `cfn_handler`) during plugin collection, BEFORE the
    pytest-cov instrumentation hooks attach. Module-level code in
    `__init__.py` then runs uninstrumented and the report shows
    artificial 0% on those lines, dropping aggregate coverage to ~68%.
    `coverage run` ensures the tracer is active before any imports.
    Documented in both the recipe and the workflow step.

Tests added:

  * `tests/unit/test_transport_seam.py` - the seam intercepts; default
    behaviour preserved.
  * `tests/unit/test_poller_seam.py` - poller stubs work; boto3 is
    never imported during a stubbed deferral.
  * `tests/unit/testing/test_replay.py` - SUCCESS/FAILED replays;
    `Replay` is frozen; no HTTP I/O; default context fallback.
  * `tests/unit/testing/test_replay_polling.py` - DEFERRED status,
    event mutation, two-step deferral->resume flow.
  * `tests/unit/testing/test_factories.py` - make_event /
    make_context with overrides + validation.
  * `tests/unit/testing/test_assertions.py` - all helpers, both
    pass and fail paths, message contents on failure.
  * `tests/integration/test_replay_parity.py` - same handler via
    `__call__` (moto + fake transport) vs `replay()` produce
    equivalent payloads.
  * `tests/integration/test_fixture_discovery.py` - pytest11 entry
    point auto-discovers fixtures in a fresh subprocess project,
    fixture invocations are independent.

No breaking changes. The legacy `test_mode` flag continues to work.
The deprecation of `test_mode` and `last_response` lands in the next
commit.
…play()

The legacy `test_mode=True` constructor flag and `last_response` capture
attribute are superseded by the new `replay()` method shipped in the
previous commit. They had known issues:

  * Mutable state on the resource (tests must reset `last_response`
    between assertions or risk false positives).
  * Sentinel string `__cfn_handler_polling__` for the polling-defer
    case lives on the public `last_response` surface with no type
    or documentation guarantees.
  * Polling re-invocation can't be tested: `test_mode` short-circuits
    `setup_polling` entirely, so the marker keys never get added to
    the event and a follow-up dispatch can't be simulated.

This commit:

  * Emits a `DeprecationWarning` from `CustomResource.__init__` when
    `test_mode=True`, with a message pointing at `replay()` and the
    `cfn_handler.testing` module. `stacklevel=2` so the warning
    surfaces at the user's call site.
  * Updates the docstrings on `test_mode` (constructor parameter) and
    `last_response` (instance attribute) to mark them as deprecated
    and reference `replay()`.
  * Adds `tests/unit/test_test_mode_deprecation.py` covering the
    warning, that `test_mode=False` does NOT warn, and that the
    legacy behaviour still functions verbatim (so existing user code
    keeps working in v1.x).

Removal is scheduled for v2.0 (separate change). The behaviour is
unchanged in v1.3 - users see only the warning. The internal test
suite migrates onto `replay()` in the next commit; until that lands
the project's own pytest `filterwarnings` would promote the warning
to an error in tests that still use `test_mode=True`. To avoid
artificial failures during the migration window, that single warning
is filtered to `ignore` in pyproject.toml's `[tool.pytest.ini_options]
.filterwarnings` block - reverted in the migration commit.
Bulk migration of the 100+ pre-existing references to
`CustomResource(test_mode=True)` and `resource.last_response` over to
the new `replay()` API. Verbatim translation:

  CustomResource(test_mode=True)             → CustomResource()
  resource(event, ctx)                        → replay = resource.replay(event, ctx)
  resource.last_response["Status"] == "X"     → replay.status == "X"
  resource.last_response["Data"]              → replay.data
  resource.last_response["Reason"]            → replay.reason
  resource.last_response["PhysicalResourceId"] → replay.physical_resource_id
  resource.last_response["NoEcho"]            → replay.no_echo

Where the assertion shape matches, the migration uses the helpers
`assert_success(replay, data=...)` / `assert_failed(replay,
reason_contains=...)` for clearer intent and informative error
messages.

Files migrated:

  * tests/unit/test_resource.py - decorator registration tests,
    lifecycle dispatch, exception handling, PhysicalResourceId
    semantics, init_failure, log_level acceptance.
  * tests/unit/test_backstops.py - the lone test_mode-specific test
    (`test_safe_teardown_skipped_in_test_mode`) was moved to
    `test_test_mode_deprecation.py` since it tests deprecated
    behaviour we keep working until v2.0.
  * tests/unit/test_polling_dispatch.py - the
    `__cfn_handler_polling__` sentinel test (also deprecated-only
    behaviour) was moved to test_test_mode_deprecation.py.
  * tests/unit/test_state_machine.py - hypothesis-driven invariant
    tests, all migrated to inspect `Replay` fields rather than
    `last_response`.

After this commit `grep -rn 'test_mode\|last_response' tests/` returns
hits only from `test_test_mode_deprecation.py` (intentional, testing
the deprecated path) and a single docstring mention in
`test_resource.py`. The temporary `filterwarnings` exception added
in the previous commit is removed - any new test using `test_mode=True`
would now fail the suite (as intended), forcing the test author to use
the new API.

Coverage stays at 98%.
Adds a 'Testing your handlers' section between 'Examples' and
'Project status' that:

  * Shows a minimal one-screen example: build a CustomResource,
    register a handler, call `replay(make_event())`, assert via
    `assert_success`.
  * Lists the public testing surface (`replay`, `make_event`,
    `make_context`, the three assertion helpers, and the four
    auto-discovered pytest fixtures).
  * Calls out the polling-deferral semantics: first `replay()`
    returns `Replay(status='DEFERRED')` and mutates the event;
    a second `replay()` resumes through the poll handler. Useful
    for testing both halves of a polled lifecycle without
    provisioning EventBridge rules.

The detailed contract lives in docstrings, the spec, and the
roadmap doc; the README only carries enough context to nudge a
new reader toward unit-testing their handlers from day one.
…ity)

The CodeQL GitHub Action's post-analysis step calls the GH REST API
(`/repos/{owner}/{repo}/actions/runs/{run_id}`) for telemetry and
status-page reporting. Under `act`, the synthesized GITHUB_RUN_ID
doesn't exist on github.com, so the call 404s and the action sets the
job status to JOB_STATUS_CONFIGURATION_ERROR even when:

  * 174/174 queries loaded successfully
  * 42/42 Python files extracted
  * SARIF generated and post-processed
  * Zero findings in our code

The result is gha-pre-release ALWAYS failing on the codeql step under
act, which masks real failures in upstream steps. Cleanest fix: drop
CodeQL from the local replay sequence with a clear note about why,
and rely on the real GH Actions run on every PR (which has actual
access to the workflow-runs endpoint via GITHUB_TOKEN).

Changes:

  * Recipe header documents the skip with rationale and a one-liner
    showing how to run CodeQL locally on demand for ad-hoc inspection.
    The user runs `act push` directly and inspects the SARIF; a
    configuration-error exit with a successfully-generated SARIF means
    no findings.
  * Step counters renumbered from N/7 to N/6 throughout the recipe.
  * Step 5 prints an explicit SKIPPED notice explaining the situation
    so the user isn't left wondering whether CodeQL ran.
  * Final "safe to merge" message clarifies CodeQL still gates merge
    on the actual PR via real GH Actions.

Discovered while running gha-pre-release on the testing-helpers PR -
CodeQL kept reporting failure despite analyzing cleanly.
Move `ReplayRequestType` and `ReplayStatus` into the
`if TYPE_CHECKING:` block. They were imported at module scope but only
referenced in annotations (resolved as strings via
`from __future__ import annotations`) and in a string-form
`cast("ReplayRequestType", ...)` call (per ruff's TC006 rule, which
prefers the string form to keep type-only symbols out of the runtime
import graph). CodeQL's py/unused-import sees the runtime import and
the string-only references, and reasonably concludes the import is
unused.

Cleaner resolution: keep the imports type-only, expand the explanatory
comment so future readers see why both static analyzers (CodeQL +
ruff TC006) end up happy with this shape.

The two CodeQL py/cyclic-import alerts on the surrounding
`if TYPE_CHECKING:` block remain false positives — CodeQL does not
model conditional imports, and the runtime cycle is broken by the
lazy import in `CustomResource.replay` (resource.py:354). Both have
been dismissed in the GitHub UI with rationale (alerts #15, #16).
@igorlg igorlg force-pushed the feat/add-testing-helpers branch 2 times, most recently from a6c3e20 to 0f4cbf4 Compare May 22, 2026 06:34
@igorlg igorlg merged commit f0f9507 into main May 22, 2026
16 checks passed
@igorlg igorlg deleted the feat/add-testing-helpers branch May 22, 2026 06:36
igorlg added a commit that referenced this pull request May 25, 2026
PR #24 (commit f0f9507) shipped the implementation: the
`cfn_handler.testing` module with `replay()`, `Replay`,
`make_event` / `make_context`, the three assertion helpers, and
auto-discovered pytest fixtures, plus the soft-deprecation of
`test_mode` / `last_response`.

This archive:

  * Moves the change to
    `openspec/changes/archive/2026-05-25-add-testing-helpers/`,
    preserving proposal / design / tasks / spec for historical reference.
  * Promotes the delta spec to a new top-level capability at
    `openspec/specs/testing-helpers/spec.md` (verbatim copy of the
    ADDED requirements + a fresh Purpose section). `openspec
    validate testing-helpers --type spec --strict` passes.

`openspec list` is now empty — no active changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants