Skip to content

Decouple AtomData from frames, unify 2-D shape validation, drop top-level re-exports#71

Merged
bjmorgan merged 20 commits intomainfrom
privatise-atom-data
Apr 11, 2026
Merged

Decouple AtomData from frames, unify 2-D shape validation, drop top-level re-exports#71
bjmorgan merged 20 commits intomainfrom
privatise-atom-data

Conversation

@bjmorgan
Copy link
Copy Markdown
Owner

@bjmorgan bjmorgan commented Apr 11, 2026

Closes #69.

Makes the per-atom metadata container frame-agnostic, tightens the scene's write API, and unifies 2-D shape validation into a single prospective-state walk inside the container.

Container changes

  • AtomData is no longer re-exported from hofmann or hofmann.model. The class is still public-named and documented in docs/api.rst so users can inspect instances returned by scene.atom_data; construction is considered an internal implementation detail and scene.atom_data is the only supported way to obtain one. Follows stdlib precedent -- types.MappingProxyType, dict_keys, dict_values, range, enumerate are all public, documented return types obtained via other APIs.
  • Constructor takes only n_atoms; frames= parameter and n_frames property are gone.
  • Inherits from collections.abc.Mapping (not MutableMapping), so mutation lives on private _set / _del / _clear_2d methods. Direct scene.atom_data[key] = arr and del scene.atom_data[key] raise TypeError (from Python's subscript bytecode, since Mapping defines no __setitem__ / __delitem__); .pop, .popitem, .setdefault, .update, .clear raise AttributeError (those methods do not exist on a Mapping).
  • __repr__ shows shapes in numpy tuple format, e.g. AtomData({'charge': (3,), 'energy': (5, 3)}).

Unified 2-D shape validation

  • AtomData._check_2d_consistency(expected, *, pending=None) validates the prospective post-operation state. pending is a mapping of keys being written; entries in pending override stored entries of the same key. Pending entries are checked first so user errors on a call-site array beat stale-stored errors.
  • AtomData._set(key, value, *, expected_frames: int) takes a required keyword frame count and calls _check_2d_consistency(expected_frames, pending={key: arr}). There is no separate scene-level shape check; all 2-D validation lives in this one walk.
  • Input validation (ndim, per-atom shape, dtype whitelist) runs before the state-dependent consistency walk. This ensures a bad-dtype write fires the dtype error directly instead of sending the user on an unnecessary clear_2d_atom_data() recovery for a stale stored entry that was going to stay valid anyway.

Single-entry in-place reassign

  • Reassigning a single 2-D entry at a new shape after scene.frames.append(...) now works without clear_2d_atom_data(). The stored version of the key is treated as overridden by the pending write and skipped during the consistency walk. Multi-entry scenes still require clear_2d_atom_data because the other stored entries are stale; the error names the actually-stale key so the user knows what to drop.

Scene write API

  • StructureScene gains del_atom_data(key) and clear_2d_atom_data(), alongside the existing set_atom_data.
  • set_atom_data and the __init__ atom_data loop pass expected_frames=len(self.frames) through to _set; there is no scene-side shape helper. The old _check_2d_atom_data_shape was deleted.
  • The @atom_data.setter is removed.

Render validation

  • Private StructureScene._validate_for_render() calls self._atom_data._check_2d_consistency(len(self.frames)) with no pending, validating the current stored state as-is. Invoked at the top of render_mpl, render_mpl_interactive, and render_animation as a backstop for the specific case where scene.frames is mutated after the last set_atom_data but before the next render.
  • Stale 2-D data (the original Detect stale 2-D atom_data after frames mutation #69 scenario) now raises a clear ValueError at the start of the render call, naming the stale key and pointing at clear_2d_atom_data() for recovery.

Documentation

  • AtomData class docstring describes the two enforcement sites honestly (write-time via _set, render-time via the scene's backstop). The prior draft's "the container itself does not know about the scene's trajectory" was aspirational; post-refactor the container enforces consistency on every write.
  • docs/api.rst documents AtomData via its fully qualified module path with an explicit :members: n_atoms, ranges, labels, so internal _set / _del / _clear_2d do not leak into the rendered reference.
  • docs/colouring.rst documents the single-entry in-place reassign shortcut alongside the multi-entry clear_2d_atom_data workflow.
  • docs/changelog.rst 0.19.0 entry reflects the final shipped design: no-re-export of AtomData, the unified validation walk, the single-entry shortcut, and the honest TypeError vs AttributeError split at the mutation surface. 0.17.0 "update values by reassigning the key" how-to (which would now raise AttributeError) has been rewritten to point at scene.set_atom_data.

Breaking API changes

  • from hofmann import AtomData raises ImportError (still importable via from hofmann.model.atom_data import AtomData for type annotations only; construction is not a supported workflow).
  • AtomData.n_frames removed; use len(scene.frames).
  • AtomData.__init__(frames=...) removed.
  • scene.atom_data[key] = arr and del scene.atom_data[key] raise TypeError.
  • scene.atom_data.pop(...) / .popitem() / .setdefault(...) / .update(...) / .clear() raise AttributeError.
  • scene.atom_data = new_container raises AttributeError (setter removed).

None of the removed paths were documented in the user guide or API reference. Migrations are mechanical one-line changes to the corresponding StructureScene methods, all covered in the 0.19.0 changelog.

Copilot AI review requested due to automatic review settings April 11, 2026 16:24

This comment was marked as resolved.

The earlier commits in this branch renamed the container to _AtomData
to signal "don't construct this", relying on the leading underscore
as the hiding mechanism. On review that position is incoherent: the
public StructureScene.atom_data property returns this object, users
can and should iterate it, read membership, and access the derived
ranges / labels / n_atoms metadata, yet the type they receive has a
private name. A public accessor returning a privately-named type
mismatches stdlib precedent (MappingProxyType, dict_keys,
dict_values, range, enumerate are all public, documented return
types whose constructors are effectively private), and forces a
nitpick_ignore entry in Sphinx to paper over the inconsistency.

The load-bearing defences against the original bugs (#69) are:

  1. Removed @atom_data.setter — no externally-built container can
     be attached to a scene at all.
  2. Decoupled AtomData from frames — the "stale frames reference"
     state is no longer representable.
  3. Cross-entry 2-D shape invariant in _set.
  4. _validate_for_render at the start of every public render_*
     method.

None of these depend on the class-level underscore. Privatising the
name was redundant with the real defences, and it pushed the class's
type annotation into Sphinx-unresolvable territory.

This commit:

  - Renames _AtomData -> AtomData in src/, tests/, and internal
    imports. The class is still kept out of hofmann.__all__ and
    hofmann.model.__all__, so `from hofmann import AtomData` and
    `from hofmann.model import AtomData` both fail as before. The
    only supported way to obtain an instance is `scene.atom_data`.
  - Adds AtomData to docs/api.rst via its fully qualified module
    path `hofmann.model.atom_data.AtomData`, so Sphinx can
    cross-link the return type of the atom_data property. Drops
    the nitpick_ignore entry that was covering the private name.
  - Updates the class docstring to state that the class is obtained
    via scene.atom_data and not instantiated directly.
  - Rewrites the changelog 0.19.0 entry: the breaking change is
    "no longer re-exported" rather than "renamed to _AtomData".
  - Addresses Copilot review comment on the atom_data property
    docstring: clarifies that 2-D shapes are *expected* to match
    len(self.frames) and are validated at render time, rather than
    implying the invariant always holds.
  - Declines the other Copilot review comment (annotate return as
    Mapping[str, np.ndarray]): doing so would erase the derived
    ranges / labels / n_atoms attributes from type information,
    which are useful for inspection. The right fix is the one above
    — expose the concrete type with a public name.

Downstream effects: pytest 1024 passing, mypy clean, sphinx nitpicky
warning count 87 (baseline was 88 — one fewer, none introduced).
@bjmorgan bjmorgan changed the title Privatise _AtomData and decouple container from frames Decouple AtomData container from frames and drop top-level re-exports Apr 11, 2026
Collapse the two duplicate 2-D shape checks (scene-level
``_check_2d_atom_data_shape`` and container-level
``_check_frames_compatibility``) into a single container method,
``AtomData._check_2d_consistency(expected, *, pending=None)``, that
validates the prospective post-operation state.

The new helper takes an optional ``pending`` mapping of keys to
arrays being written.  Pending entries replace stored entries of the
same key (for reassignment) or add new entries (for first writes);
stored entries whose keys appear in ``pending`` are skipped during the
walk because they will no longer exist in the post-operation state.
Pending entries are checked before stored entries so that a user
passing a wrong-shape array gets a call-site error rather than a
confusing stale-stored error.

``AtomData._set`` now takes a required ``expected_frames: int``
kwarg and calls ``_check_2d_consistency(expected_frames,
pending={key: arr})`` for 2-D writes.  ``StructureScene`` passes
``expected_frames=len(self.frames)`` at every write site
(``set_atom_data`` and the ``__init__`` atom_data loop).  The
scene-level ``_check_2d_atom_data_shape`` helper is deleted; its job
is now done entirely inside ``_set`` via the container check.

``StructureScene._validate_for_render`` now calls
``self._atom_data._check_2d_consistency(len(self.frames))`` with no
``pending``, which validates the current stored state (no write is
happening at render time).

Three substantive user-visible effects:

1. Error messages name the key in every failure mode.  A wrong-shape
   write reports ``atom_data['foo'] has N rows but M frames were
   expected``; a stale stored entry reports ``atom_data has stale
   2-D entry 'foo' sized for N frames, but M frames were expected;
   call StructureScene.clear_2d_atom_data() to recover before
   reassigning``.  The old scene-level check and container-level
   check had divergent wording; both now flow from one helper.

2. Reassigning a single 2-D entry in place with a new shape now
   succeeds when it is the only 2-D entry.  The stored version is
   overridden by the pending entry and skipped during the walk, so
   ``_set("energy", new_4_row_array, expected_frames=4)`` on a
   container that previously held ``"energy"`` at ``(3, n_atoms)``
   now works directly without requiring ``clear_2d_atom_data``.
   Multi-entry cases still require recovery because the walk fires
   on the other stale entry, naming the actually-stale key.

3. ``_set`` cannot be called without an ``expected_frames`` kwarg.
   A required keyword argument is type-system enforced, closing the
   "direct caller forgets the scene check" bug surface that existed
   when the scene-level ``_check_2d_atom_data_shape`` was the
   outer wrapper.

Test updates:

- Every ``_set`` call gains an ``expected_frames=`` kwarg.  For 1-D
  tests any value works; the kwarg is required regardless to keep
  the internal API uniform.
- ``test_reassigning_only_2d_to_new_shape_raises`` was already
  removed in the previous commit; no further action needed.
- New ``test_in_place_reassign_same_key_new_shape_succeeds`` pins
  the single-entry reassignment case.
- New ``test_in_place_reassign_fails_when_other_2d_stale`` pins the
  multi-entry case and verifies the error names the actually-stale
  key.
- ``test_set_detects_stale_stored_entry`` and
  ``test_set_rejects_2d_wrong_new_shape`` already match the new
  error wording since they were added in the previous commit.
- Render-time stale-2-D integration tests updated to match the new
  ``stale 2-D entry 'energy'`` wording.
Copilot AI review requested due to automatic review settings April 11, 2026 17:41
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/hofmann/model/atom_data.py Outdated
Comment thread src/hofmann/model/atom_data.py Outdated
Comment thread src/hofmann/model/structure_scene.py Outdated
Previously, a 2-D write with an unsupported dtype could trigger a
stale-stored-entry error before the dtype error ever ran, because
_check_2d_consistency was called from inside the ndim==2 branch and
ran before the dtype whitelist check.

Concrete failure: container holds "energy" at shape (3, N), scene is
extended to 4 frames, user calls set_atom_data("new_key", complex_4N)
-- the dtype is bad, so the write was never going to land, but the
cross-entry walk fires first on the now-stale "energy" and the error
reads "call clear_2d_atom_data() to recover before reassigning".
The user duly destroys "energy", retries, and only then gets the
dtype error they should have seen on the first call.

Reorder _set so all input-validation checks (ndim, per-atom shape,
dtype) run before the state-dependent cross-entry check.  Input
errors now take priority, and the state check runs only after the
input is known to be valid.  Add a regression test pinning the
ordering: a bad-dtype 2-D array assigned alongside a stale stored
entry raises the dtype error, not the stale error.
Docstring and documentation accuracy:

- AtomData class docstring no longer claims "the container itself
  does not know about the scene's trajectory".  Post-3629f6d every
  _set takes a required expected_frames kwarg and the container
  enforces consistency on every write; the previous claim was
  aspirational but false.  Rewrite to describe both enforcement
  sites (write-time via _set, render-time via _validate_for_render
  as a backstop).
- StructureScene.set_atom_data Raises: section adds the previously
  undocumented unsupported-dtype case and the stale-stored-other-
  key case introduced by the unified consistency walk (the error
  names a different key from the one being written and points at
  clear_2d_atom_data).
- StructureScene.set_atom_data body prose replaces the stale "in
  addition to the cross-entry ... check performed by the
  underlying container" framing (which described the deleted two-
  check design) with an honest description of the single
  prospective-state walk.
- StructureScene.atom_data property docstring stops softening the
  invariant to "expected to match"; instead names the two
  enforcement sites explicitly and mentions the single-entry
  in-place reassign behaviour.
- StructureScene._validate_for_render docstring trimmed from 16
  lines to 2: a backstop helper does not need elaborate prose
  explaining what a backstop is.

Naming:

- AtomData.clear_2d -> AtomData._clear_2d.  Only non-underscored
  mutation method on a class that says "do not instantiate
  directly"; renders as a public method in the API reference and
  provides a back door around the scene's single-write API.  The
  scene's public clear_2d_atom_data is unaffected.
- docs/api.rst autoclass directive now lists only the public
  read surface (``:members: n_atoms, ranges, labels``) instead of
  ``:members:`` wildcard, so internal ``_set`` / ``_del`` /
  ``_clear_2d`` do not leak into the rendered reference.

Changelog:

- 0.19.0 mutation-surface bullet now accurately describes the
  TypeError (for ``scene.atom_data[key] = arr`` and
  ``del scene.atom_data[key]``, from Python's STORE_SUBSCR /
  DELETE_SUBSCR bytecode) vs AttributeError (for ``pop``,
  ``popitem``, ``setdefault``, ``update``, ``clear``, which
  simply do not exist on a ``Mapping``).  The earlier phrasing
  ("raises at the Python protocol level") was deliberately vague
  and one reviewer's suggested "all raise AttributeError"
  rewrite would have been wrong.
- 0.19.0 adds a bullet documenting the single-entry in-place
  reassign shortcut introduced in 3629f6d.  Previously the
  changelog taught clear_2d_atom_data as the unconditional
  recovery idiom; a user following it would have destroyed state
  they could have kept.
- 0.19.0 reframes the "set_atom_data now performs an explicit
  check" bullet to reflect the single unified walk (the stale
  shape[0] check wording referenced a deleted scene-level
  helper).
- 0.17.0 "update values by building a new array and reassigning
  the key" how-to would now raise AttributeError.  Rewrite to
  point at scene.set_atom_data.

docs/colouring.rst:

- Document the single-entry in-place reassign shortcut explicitly,
  alongside the multi-entry clear_2d_atom_data recovery workflow.

Tests:

- Move test_set_bad_dtype_fires_before_stale_stored_check into the
  atom_data test suite from the earlier commit where it was added
  directly after the reorder.  (Already landed.)
- New container-level tests: test_set_2d_nested_list_coerced (2-D
  nested-list input), test_del_2d_preserves_constraint_when_others_remain
  (delete of one 2-D entry does not release the constraint when
  others remain), test_atom_data_not_importable_from_top_level
  (the re-export contract advertised in the changelog), test_bracket_del_raises
  (symmetric with test_bracket_assignment_raises).
- Parametrised test_mutable_mapping_methods_absent covering the
  seven mutation entry points individually, as promised (but
  previously undelivered) by test_inherits_from_mapping_not_mutable_mapping.
- test_set_detects_stale_stored_entry now pins the recovery-advice
  tail ("clear_2d_atom_data") via explicit substring assertions so
  a future reword cannot silently drop the user-actionable part
  of the error message.
- New scene-level tests:
  test_render_mpl_succeeds_after_in_place_reassign (end-to-end
  single-entry workflow) and test_in_place_reassign_names_other_stale_key
  (the error names the actually-stale entry in the multi-entry
  case).  These pin the main behavioural improvement of 3629f6d
  at the user-facing layer, not just the container layer.
Copilot AI review requested due to automatic review settings April 11, 2026 21:02
@bjmorgan bjmorgan changed the title Decouple AtomData container from frames and drop top-level re-exports Decouple AtomData from frames, unify 2-D shape validation, drop top-level re-exports Apr 11, 2026

This comment was marked as resolved.

Two small fixes on top of the previous review-findings commit:

1. Stale-stored error message: rephrase the recovery hint from
   "call StructureScene.clear_2d_atom_data() to recover" to "call
   scene.clear_2d_atom_data() to recover".  The class-qualified
   form reads as a pseudo-classmethod call that users cannot
   literally type -- the method is an instance method and users
   call it via their scene variable (canonically named "scene" in
   the user guide, changelog, and README examples).  The
   substring-based test assertions on the error message still pass
   because they match on "clear_2d_atom_data" alone.

2. TestValidateForRender previously had a "# --- Unit test directly
   on the helper ---" section header with no test under it, left
   over from an earlier refactor.  Add
   test_validate_for_render_raises_on_stale_2d that calls the
   helper directly, bypassing the render stack.  Faster and
   narrower than the end-to-end integration tests, and guards
   against a refactor that reshuffles how render_* call the helper
   without changing the helper itself.
The method's docstring still taught the unconditional
``frames.append`` -> ``clear_2d_atom_data`` -> ``set_atom_data`` ->
``render_mpl`` workflow, which is only accurate for multi-entry
scenes.  After 3629f6d, a single 2-D entry can be reassigned at
a new shape directly via ``set_atom_data`` without clearing
first, because the stored version is treated as overridden by
the pending write.

Three other documentation sites (``set_atom_data`` docstring,
``docs/colouring.rst``, the 0.19.0 changelog entry) already
teach the nuanced story; this docstring was the last stale
site.  Rewrite it to match: state that the method is *required*
for multi-entry recovery, note that single-entry reassignment
can go through ``set_atom_data`` directly, and show the
multi-entry workflow example explicitly (two ``set_atom_data``
calls after the clear, not one).
Copilot AI review requested due to automatic review settings April 11, 2026 21:17
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@bjmorgan bjmorgan merged commit 2744adb into main Apr 11, 2026
9 checks passed
@bjmorgan bjmorgan deleted the privatise-atom-data branch April 11, 2026 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Detect stale 2-D atom_data after frames mutation

2 participants