Decouple AtomData from frames, unify 2-D shape validation, drop top-level re-exports#71
Merged
Decouple AtomData from frames, unify 2-D shape validation, drop top-level re-exports#71
Conversation
The earlier commits in this branch renamed the container to _AtomData to signal "don't construct this", relying on the leading underscore as the hiding mechanism. On review that position is incoherent: the public StructureScene.atom_data property returns this object, users can and should iterate it, read membership, and access the derived ranges / labels / n_atoms metadata, yet the type they receive has a private name. A public accessor returning a privately-named type mismatches stdlib precedent (MappingProxyType, dict_keys, dict_values, range, enumerate are all public, documented return types whose constructors are effectively private), and forces a nitpick_ignore entry in Sphinx to paper over the inconsistency. The load-bearing defences against the original bugs (#69) are: 1. Removed @atom_data.setter — no externally-built container can be attached to a scene at all. 2. Decoupled AtomData from frames — the "stale frames reference" state is no longer representable. 3. Cross-entry 2-D shape invariant in _set. 4. _validate_for_render at the start of every public render_* method. None of these depend on the class-level underscore. Privatising the name was redundant with the real defences, and it pushed the class's type annotation into Sphinx-unresolvable territory. This commit: - Renames _AtomData -> AtomData in src/, tests/, and internal imports. The class is still kept out of hofmann.__all__ and hofmann.model.__all__, so `from hofmann import AtomData` and `from hofmann.model import AtomData` both fail as before. The only supported way to obtain an instance is `scene.atom_data`. - Adds AtomData to docs/api.rst via its fully qualified module path `hofmann.model.atom_data.AtomData`, so Sphinx can cross-link the return type of the atom_data property. Drops the nitpick_ignore entry that was covering the private name. - Updates the class docstring to state that the class is obtained via scene.atom_data and not instantiated directly. - Rewrites the changelog 0.19.0 entry: the breaking change is "no longer re-exported" rather than "renamed to _AtomData". - Addresses Copilot review comment on the atom_data property docstring: clarifies that 2-D shapes are *expected* to match len(self.frames) and are validated at render time, rather than implying the invariant always holds. - Declines the other Copilot review comment (annotate return as Mapping[str, np.ndarray]): doing so would erase the derived ranges / labels / n_atoms attributes from type information, which are useful for inspection. The right fix is the one above — expose the concrete type with a public name. Downstream effects: pytest 1024 passing, mypy clean, sphinx nitpicky warning count 87 (baseline was 88 — one fewer, none introduced).
Collapse the two duplicate 2-D shape checks (scene-level
``_check_2d_atom_data_shape`` and container-level
``_check_frames_compatibility``) into a single container method,
``AtomData._check_2d_consistency(expected, *, pending=None)``, that
validates the prospective post-operation state.
The new helper takes an optional ``pending`` mapping of keys to
arrays being written. Pending entries replace stored entries of the
same key (for reassignment) or add new entries (for first writes);
stored entries whose keys appear in ``pending`` are skipped during the
walk because they will no longer exist in the post-operation state.
Pending entries are checked before stored entries so that a user
passing a wrong-shape array gets a call-site error rather than a
confusing stale-stored error.
``AtomData._set`` now takes a required ``expected_frames: int``
kwarg and calls ``_check_2d_consistency(expected_frames,
pending={key: arr})`` for 2-D writes. ``StructureScene`` passes
``expected_frames=len(self.frames)`` at every write site
(``set_atom_data`` and the ``__init__`` atom_data loop). The
scene-level ``_check_2d_atom_data_shape`` helper is deleted; its job
is now done entirely inside ``_set`` via the container check.
``StructureScene._validate_for_render`` now calls
``self._atom_data._check_2d_consistency(len(self.frames))`` with no
``pending``, which validates the current stored state (no write is
happening at render time).
Three substantive user-visible effects:
1. Error messages name the key in every failure mode. A wrong-shape
write reports ``atom_data['foo'] has N rows but M frames were
expected``; a stale stored entry reports ``atom_data has stale
2-D entry 'foo' sized for N frames, but M frames were expected;
call StructureScene.clear_2d_atom_data() to recover before
reassigning``. The old scene-level check and container-level
check had divergent wording; both now flow from one helper.
2. Reassigning a single 2-D entry in place with a new shape now
succeeds when it is the only 2-D entry. The stored version is
overridden by the pending entry and skipped during the walk, so
``_set("energy", new_4_row_array, expected_frames=4)`` on a
container that previously held ``"energy"`` at ``(3, n_atoms)``
now works directly without requiring ``clear_2d_atom_data``.
Multi-entry cases still require recovery because the walk fires
on the other stale entry, naming the actually-stale key.
3. ``_set`` cannot be called without an ``expected_frames`` kwarg.
A required keyword argument is type-system enforced, closing the
"direct caller forgets the scene check" bug surface that existed
when the scene-level ``_check_2d_atom_data_shape`` was the
outer wrapper.
Test updates:
- Every ``_set`` call gains an ``expected_frames=`` kwarg. For 1-D
tests any value works; the kwarg is required regardless to keep
the internal API uniform.
- ``test_reassigning_only_2d_to_new_shape_raises`` was already
removed in the previous commit; no further action needed.
- New ``test_in_place_reassign_same_key_new_shape_succeeds`` pins
the single-entry reassignment case.
- New ``test_in_place_reassign_fails_when_other_2d_stale`` pins the
multi-entry case and verifies the error names the actually-stale
key.
- ``test_set_detects_stale_stored_entry`` and
``test_set_rejects_2d_wrong_new_shape`` already match the new
error wording since they were added in the previous commit.
- Render-time stale-2-D integration tests updated to match the new
``stale 2-D entry 'energy'`` wording.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Previously, a 2-D write with an unsupported dtype could trigger a
stale-stored-entry error before the dtype error ever ran, because
_check_2d_consistency was called from inside the ndim==2 branch and
ran before the dtype whitelist check.
Concrete failure: container holds "energy" at shape (3, N), scene is
extended to 4 frames, user calls set_atom_data("new_key", complex_4N)
-- the dtype is bad, so the write was never going to land, but the
cross-entry walk fires first on the now-stale "energy" and the error
reads "call clear_2d_atom_data() to recover before reassigning".
The user duly destroys "energy", retries, and only then gets the
dtype error they should have seen on the first call.
Reorder _set so all input-validation checks (ndim, per-atom shape,
dtype) run before the state-dependent cross-entry check. Input
errors now take priority, and the state check runs only after the
input is known to be valid. Add a regression test pinning the
ordering: a bad-dtype 2-D array assigned alongside a stale stored
entry raises the dtype error, not the stale error.
Docstring and documentation accuracy:
- AtomData class docstring no longer claims "the container itself
does not know about the scene's trajectory". Post-3629f6d every
_set takes a required expected_frames kwarg and the container
enforces consistency on every write; the previous claim was
aspirational but false. Rewrite to describe both enforcement
sites (write-time via _set, render-time via _validate_for_render
as a backstop).
- StructureScene.set_atom_data Raises: section adds the previously
undocumented unsupported-dtype case and the stale-stored-other-
key case introduced by the unified consistency walk (the error
names a different key from the one being written and points at
clear_2d_atom_data).
- StructureScene.set_atom_data body prose replaces the stale "in
addition to the cross-entry ... check performed by the
underlying container" framing (which described the deleted two-
check design) with an honest description of the single
prospective-state walk.
- StructureScene.atom_data property docstring stops softening the
invariant to "expected to match"; instead names the two
enforcement sites explicitly and mentions the single-entry
in-place reassign behaviour.
- StructureScene._validate_for_render docstring trimmed from 16
lines to 2: a backstop helper does not need elaborate prose
explaining what a backstop is.
Naming:
- AtomData.clear_2d -> AtomData._clear_2d. Only non-underscored
mutation method on a class that says "do not instantiate
directly"; renders as a public method in the API reference and
provides a back door around the scene's single-write API. The
scene's public clear_2d_atom_data is unaffected.
- docs/api.rst autoclass directive now lists only the public
read surface (``:members: n_atoms, ranges, labels``) instead of
``:members:`` wildcard, so internal ``_set`` / ``_del`` /
``_clear_2d`` do not leak into the rendered reference.
Changelog:
- 0.19.0 mutation-surface bullet now accurately describes the
TypeError (for ``scene.atom_data[key] = arr`` and
``del scene.atom_data[key]``, from Python's STORE_SUBSCR /
DELETE_SUBSCR bytecode) vs AttributeError (for ``pop``,
``popitem``, ``setdefault``, ``update``, ``clear``, which
simply do not exist on a ``Mapping``). The earlier phrasing
("raises at the Python protocol level") was deliberately vague
and one reviewer's suggested "all raise AttributeError"
rewrite would have been wrong.
- 0.19.0 adds a bullet documenting the single-entry in-place
reassign shortcut introduced in 3629f6d. Previously the
changelog taught clear_2d_atom_data as the unconditional
recovery idiom; a user following it would have destroyed state
they could have kept.
- 0.19.0 reframes the "set_atom_data now performs an explicit
check" bullet to reflect the single unified walk (the stale
shape[0] check wording referenced a deleted scene-level
helper).
- 0.17.0 "update values by building a new array and reassigning
the key" how-to would now raise AttributeError. Rewrite to
point at scene.set_atom_data.
docs/colouring.rst:
- Document the single-entry in-place reassign shortcut explicitly,
alongside the multi-entry clear_2d_atom_data recovery workflow.
Tests:
- Move test_set_bad_dtype_fires_before_stale_stored_check into the
atom_data test suite from the earlier commit where it was added
directly after the reorder. (Already landed.)
- New container-level tests: test_set_2d_nested_list_coerced (2-D
nested-list input), test_del_2d_preserves_constraint_when_others_remain
(delete of one 2-D entry does not release the constraint when
others remain), test_atom_data_not_importable_from_top_level
(the re-export contract advertised in the changelog), test_bracket_del_raises
(symmetric with test_bracket_assignment_raises).
- Parametrised test_mutable_mapping_methods_absent covering the
seven mutation entry points individually, as promised (but
previously undelivered) by test_inherits_from_mapping_not_mutable_mapping.
- test_set_detects_stale_stored_entry now pins the recovery-advice
tail ("clear_2d_atom_data") via explicit substring assertions so
a future reword cannot silently drop the user-actionable part
of the error message.
- New scene-level tests:
test_render_mpl_succeeds_after_in_place_reassign (end-to-end
single-entry workflow) and test_in_place_reassign_names_other_stale_key
(the error names the actually-stale entry in the multi-entry
case). These pin the main behavioural improvement of 3629f6d
at the user-facing layer, not just the container layer.
Two small fixes on top of the previous review-findings commit: 1. Stale-stored error message: rephrase the recovery hint from "call StructureScene.clear_2d_atom_data() to recover" to "call scene.clear_2d_atom_data() to recover". The class-qualified form reads as a pseudo-classmethod call that users cannot literally type -- the method is an instance method and users call it via their scene variable (canonically named "scene" in the user guide, changelog, and README examples). The substring-based test assertions on the error message still pass because they match on "clear_2d_atom_data" alone. 2. TestValidateForRender previously had a "# --- Unit test directly on the helper ---" section header with no test under it, left over from an earlier refactor. Add test_validate_for_render_raises_on_stale_2d that calls the helper directly, bypassing the render stack. Faster and narrower than the end-to-end integration tests, and guards against a refactor that reshuffles how render_* call the helper without changing the helper itself.
The method's docstring still taught the unconditional ``frames.append`` -> ``clear_2d_atom_data`` -> ``set_atom_data`` -> ``render_mpl`` workflow, which is only accurate for multi-entry scenes. After 3629f6d, a single 2-D entry can be reassigned at a new shape directly via ``set_atom_data`` without clearing first, because the stored version is treated as overridden by the pending write. Three other documentation sites (``set_atom_data`` docstring, ``docs/colouring.rst``, the 0.19.0 changelog entry) already teach the nuanced story; this docstring was the last stale site. Rewrite it to match: state that the method is *required* for multi-entry recovery, note that single-entry reassignment can go through ``set_atom_data`` directly, and show the multi-entry workflow example explicitly (two ``set_atom_data`` calls after the clear, not one).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #69.
Makes the per-atom metadata container frame-agnostic, tightens the scene's write API, and unifies 2-D shape validation into a single prospective-state walk inside the container.
Container changes
AtomDatais no longer re-exported fromhofmannorhofmann.model. The class is still public-named and documented indocs/api.rstso users can inspect instances returned byscene.atom_data; construction is considered an internal implementation detail andscene.atom_datais the only supported way to obtain one. Follows stdlib precedent --types.MappingProxyType,dict_keys,dict_values,range,enumerateare all public, documented return types obtained via other APIs.n_atoms;frames=parameter andn_framesproperty are gone.collections.abc.Mapping(notMutableMapping), so mutation lives on private_set/_del/_clear_2dmethods. Directscene.atom_data[key] = arranddel scene.atom_data[key]raiseTypeError(from Python's subscript bytecode, sinceMappingdefines no__setitem__/__delitem__);.pop,.popitem,.setdefault,.update,.clearraiseAttributeError(those methods do not exist on aMapping).__repr__shows shapes in numpy tuple format, e.g.AtomData({'charge': (3,), 'energy': (5, 3)}).Unified 2-D shape validation
AtomData._check_2d_consistency(expected, *, pending=None)validates the prospective post-operation state.pendingis a mapping of keys being written; entries inpendingoverride stored entries of the same key. Pending entries are checked first so user errors on a call-site array beat stale-stored errors.AtomData._set(key, value, *, expected_frames: int)takes a required keyword frame count and calls_check_2d_consistency(expected_frames, pending={key: arr}). There is no separate scene-level shape check; all 2-D validation lives in this one walk.clear_2d_atom_data()recovery for a stale stored entry that was going to stay valid anyway.Single-entry in-place reassign
scene.frames.append(...)now works withoutclear_2d_atom_data(). The stored version of the key is treated as overridden by the pending write and skipped during the consistency walk. Multi-entry scenes still requireclear_2d_atom_databecause the other stored entries are stale; the error names the actually-stale key so the user knows what to drop.Scene write API
StructureScenegainsdel_atom_data(key)andclear_2d_atom_data(), alongside the existingset_atom_data.set_atom_dataand the__init__atom_data loop passexpected_frames=len(self.frames)through to_set; there is no scene-side shape helper. The old_check_2d_atom_data_shapewas deleted.@atom_data.setteris removed.Render validation
StructureScene._validate_for_render()callsself._atom_data._check_2d_consistency(len(self.frames))with nopending, validating the current stored state as-is. Invoked at the top ofrender_mpl,render_mpl_interactive, andrender_animationas a backstop for the specific case wherescene.framesis mutated after the lastset_atom_databut before the next render.ValueErrorat the start of the render call, naming the stale key and pointing atclear_2d_atom_data()for recovery.Documentation
AtomDataclass docstring describes the two enforcement sites honestly (write-time via_set, render-time via the scene's backstop). The prior draft's "the container itself does not know about the scene's trajectory" was aspirational; post-refactor the container enforces consistency on every write.docs/api.rstdocumentsAtomDatavia its fully qualified module path with an explicit:members: n_atoms, ranges, labels, so internal_set/_del/_clear_2ddo not leak into the rendered reference.docs/colouring.rstdocuments the single-entry in-place reassign shortcut alongside the multi-entryclear_2d_atom_dataworkflow.docs/changelog.rst0.19.0 entry reflects the final shipped design: no-re-export ofAtomData, the unified validation walk, the single-entry shortcut, and the honestTypeErrorvsAttributeErrorsplit at the mutation surface. 0.17.0 "update values by reassigning the key" how-to (which would now raiseAttributeError) has been rewritten to point atscene.set_atom_data.Breaking API changes
from hofmann import AtomDataraisesImportError(still importable viafrom hofmann.model.atom_data import AtomDatafor type annotations only; construction is not a supported workflow).AtomData.n_framesremoved; uselen(scene.frames).AtomData.__init__(frames=...)removed.scene.atom_data[key] = arranddel scene.atom_data[key]raiseTypeError.scene.atom_data.pop(...)/.popitem()/.setdefault(...)/.update(...)/.clear()raiseAttributeError.scene.atom_data = new_containerraisesAttributeError(setter removed).None of the removed paths were documented in the user guide or API reference. Migrations are mechanical one-line changes to the corresponding
StructureScenemethods, all covered in the 0.19.0 changelog.