Skip to content

Commit 912ebbf

Browse files
committed
Add AI summary
1 parent 256cb43 commit 912ebbf

File tree

1 file changed

+355
-0
lines changed

1 file changed

+355
-0
lines changed

AGENTS.md

Lines changed: 355 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,355 @@
1+
# AGENTS.md — projspec
2+
3+
This document is a guide for AI coding agents working in this repository. It
4+
covers the architecture of the `projspec` package (located in `src/projspec/`),
5+
the three central class families, the contract every `parse()` method must
6+
honour, and the conventions used throughout.
7+
8+
Extensions and the Qt application (`vsextension/`, `pycharm_plugin/`,
9+
`src/projspec/qtapp/`) are out of scope.
10+
11+
---
12+
13+
## Repository layout
14+
15+
```
16+
src/projspec/
17+
__init__.py # public re-exports: Project, ProjectSpec, get_cls
18+
proj/
19+
base.py # Project, ProjectSpec, ProjectExtra, ParseFailed
20+
*.py # one file per concrete spec type
21+
content/
22+
base.py # BaseContent + content registry
23+
*.py # one file per concrete content type
24+
artifact/
25+
base.py # BaseArtifact, FileArtifact + artifact registry
26+
*.py # one file per concrete artifact type
27+
utils.py # AttrDict, camel_to_snake, run_subprocess, …
28+
config.py # get_conf / set_conf
29+
tests/
30+
conftest.py # shared fixtures (proj = Project("/data"))
31+
test_basic.py # smoke tests
32+
test_roundtrips.py # serialise / deserialise round-trips
33+
34+
```
35+
36+
---
37+
38+
## The three class families
39+
40+
### 1. `Project` (`proj/base.py:43`)
41+
42+
The top-level container for a parsed directory. It is not subclassed.
43+
44+
Key attributes set during `__init__``resolve()`:
45+
46+
| attribute | type | description |
47+
|-----------|------|-------------|
48+
| `specs` | `AttrDict` | matched `ProjectSpec` instances, keyed by snake-case class name |
49+
| `contents` | `AttrDict` | `BaseContent` instances contributed by `ProjectExtra` specs |
50+
| `artifacts` | `AttrDict` | `BaseArtifact` instances contributed by `ProjectExtra` specs |
51+
| `children` | `AttrDict` | child `Project` instances found by directory walking |
52+
| `fs` | `fsspec.AbstractFileSystem` | filesystem used for all file I/O |
53+
| `url` | `str` | FS-normalised path to the project root |
54+
| `basenames` | `dict[str, str]` | `{basename: full_path}` for every entry at the root |
55+
| `pyproject` | `dict` | parsed `pyproject.toml`, or `{}` |
56+
57+
`Project.resolve()` iterates every registered `ProjectSpec` subclass and calls
58+
`cls(proj)` (which runs `match()`) then `inst.parse()`. A `ValueError` /
59+
`ParseFailed` means the directory did not match that type and is silently
60+
skipped. Any other exception is logged but does not abort parsing.
61+
62+
`ProjectExtra` subclasses are handled differently: their `contents` and
63+
`artifacts` are merged directly into `proj.contents` / `proj.artifacts` rather
64+
than being stored in `proj.specs`.
65+
66+
---
67+
68+
### 2. `ProjectSpec` (`proj/base.py:435`)
69+
70+
Base class for every concrete project type. Subclasses are **auto-registered**
71+
on import via `__init_subclass__` using their snake-case name as the key
72+
(`proj/base.py:511`).
73+
74+
Lifecycle inside `Project.resolve()`:
75+
76+
```
77+
cls(proj) ← __init__ calls self.match(); raises ParseFailed if False
78+
inst.parse() ← populate self._contents and self._artifacts
79+
```
80+
81+
Important class-level attribute:
82+
83+
| attribute | description |
84+
|-----------|-------------|
85+
| `spec_doc` | URL to upstream specification docs (optional but encouraged) |
86+
87+
Instance attributes after `parse()`:
88+
89+
| attribute | type | description |
90+
|-----------|------|-------------|
91+
| `_contents` | `AttrDict` | content objects for this spec |
92+
| `_artifacts` | `AttrDict` | artifact objects for this spec |
93+
| `proj` | `Project` | back-reference to the owning project |
94+
95+
Public properties `.contents` and `.artifacts` delegate to `_contents` /
96+
`_artifacts` and call `parse()` lazily if they are `None` (`proj/base.py:466`).
97+
98+
#### `ProjectExtra` (`proj/base.py:542`)
99+
100+
A special subclass of `ProjectSpec` for cross-cutting concerns (CI/CD, Docker,
101+
pre-commit, requirements files, …). These specs are *not* standalone projects.
102+
After parsing, `Project.resolve()` merges their `contents` / `artifacts` into
103+
the root project rather than storing them in `proj.specs`.
104+
105+
---
106+
107+
### 3. `BaseContent` (`content/base.py:11`)
108+
109+
A **dataclass** holding descriptive information extracted from a project.
110+
Content objects are read-only descriptions; they have no executable behaviour.
111+
112+
Every subclass is a `@dataclass` that **must** include `proj: Project` as its
113+
first field (inherited from `BaseContent`).
114+
115+
Subclasses are auto-registered on import via `__init_subclass__` (keyed by
116+
snake-case name).
117+
118+
Concrete content classes:
119+
120+
| class | module | fields |
121+
|-------|--------|--------|
122+
| `Environment` | `content/environment.py` | `stack: Stack`, `precision: Precision`, `packages: list[str]`, `channels: list[str]` |
123+
| `Command` | `content/executable.py` | `cmd: list[str] \| str` |
124+
| `DescriptiveMetadata` | `content/metadata.py` | `meta: dict[str, str]` |
125+
| `License` | `content/metadata.py` | `shortname`, `fullname`, `url` |
126+
| `PythonPackage` | `content/package.py` | `package_name: str` |
127+
| `RustModule` | `content/package.py` | `name: str` |
128+
| `NodePackage` | `content/package.py` | `name: str` |
129+
| `FrictionlessData` | `content/data.py` | `name: str`, `schema: dict` |
130+
| `IntakeSource` | `content/data.py` | `name: str` |
131+
| `EnvironmentVariables` | `content/env_var.py` | `variables: dict[str, str \| None]` |
132+
133+
Helper enums used by `Environment`:
134+
135+
- `Stack` (`PIP`, `CONDA`, `NPM`) — packaging technology
136+
- `Precision` (`SPEC`, `LOCK`) — how precisely the environment is pinned
137+
138+
---
139+
140+
### 4. `BaseArtifact` (`artifact/base.py:14`)
141+
142+
An executable action or producible output attached to a project.
143+
144+
Constructor signature: `__init__(self, proj: Project, cmd: list[str] | None, **kwargs)`
145+
146+
All extra keyword arguments are stored via `self.__dict__.update(kwargs)`.
147+
148+
Key interface:
149+
150+
| method | description |
151+
|--------|-------------|
152+
| `make(**kwargs)` | Execute/produce the artifact. Raises `RuntimeError` for remote projects. |
153+
| `clean()` | Remove or stop the artifact. Default no-op. |
154+
| `remake()` | `clean()` then `make()`. |
155+
| `state` | Property returning `"clean"`, `"done"`, `"pending"`, or `""`. |
156+
157+
Subclasses are auto-registered on import via `__init_subclass__`.
158+
159+
`FileArtifact` (`artifact/base.py:108`) specialises `BaseArtifact` for outputs
160+
that are one or more files. Constructor adds `fn: str` (glob pattern for
161+
output path). `_is_done()` / `_is_clean()` check for the file's existence via
162+
`proj.fs.glob(self.fn)`.
163+
164+
Concrete artifact classes:
165+
166+
| class | module | description |
167+
|-------|--------|-------------|
168+
| `Process` | `artifact/process.py` | Subprocess / long-running service |
169+
| `Server` | `artifact/process.py` | HTTP service (subclass of `Process`) |
170+
| `Wheel` | `artifact/installable.py` | Python wheel (`dist/*.whl`) |
171+
| `CondaPackage` | `artifact/installable.py` | Conda `.conda` package |
172+
| `SystemInstallablePackage` | `artifact/installable.py` | OS installer (deb, msi, dmg, …) |
173+
| `VirtualEnv` | `artifact/python_env.py` | Python venv directory |
174+
| `CondaEnv` | `artifact/python_env.py` | Conda environment directory |
175+
| `LockFile` | `artifact/python_env.py` | Lock-file on disk |
176+
| `EnvPack` | `artifact/python_env.py` | Packed environment archive |
177+
| `DockerImage` | `artifact/container.py` | Docker image |
178+
| `DockerRuntime` | `artifact/container.py` | Running Docker container |
179+
| `PreCommit` | `artifact/linter.py` | pre-commit hook runner |
180+
181+
---
182+
183+
## Writing a `parse()` method
184+
185+
`parse()` is the core obligation of every `ProjectSpec` subclass. The base
186+
implementation simply raises `ParseFailed`, so not calling `super().parse()` is
187+
normal.
188+
189+
### Contract
190+
191+
1. **Populate `self._contents` and `self._artifacts`** — both must be
192+
`AttrDict` instances (or remain empty `AttrDict()`). They must not be
193+
`None` after `parse()` returns.
194+
195+
2. **Grouping convention** — keys inside `_contents` / `_artifacts` are
196+
snake-case *type names* (`"environment"`, `"wheel"`, `"process"`, …).
197+
If there are multiple instances of the same type, the value is itself an
198+
`AttrDict` keyed by an identifying name (e.g. `"default"`, `"test"`,
199+
`"main"`).
200+
201+
```python
202+
# single item
203+
self._contents["python_package"] = PythonPackage(proj=self.proj, package_name="foo")
204+
205+
# multiple items of the same type
206+
self._artifacts["process"] = AttrDict(
207+
main=Process(proj=self.proj, cmd=["python", "__main__.py"]),
208+
)
209+
```
210+
211+
3. **Every content/artifact must receive `proj=self.proj`** — this back-
212+
reference is required by `BaseContent` / `BaseArtifact`.
213+
214+
4. **Raise `ParseFailed` (or any `ValueError`) on unrecoverable bad state**
215+
for example if a required file is malformed and you cannot produce
216+
meaningful output. Do *not* raise for optional fields that simply aren't
217+
present.
218+
219+
5. **Read files via `self.proj.get_file(name)` or `self.proj.fs`** — never
220+
use plain `open()`. This keeps parsing compatible with remote filesystems
221+
(S3, GCS, HTTP, …).
222+
223+
6. **Use `self.proj.basenames` for existence checks** — it is a
224+
`{basename: full_path}` dict of the top-level directory, already loaded,
225+
so it is cheap to query.
226+
227+
7. **Keep it cheap**`match()` runs for every registered type on every
228+
directory; `parse()` runs immediately afterwards if `match()` returns
229+
`True`. Read only the files you actually need and avoid recursive
230+
directory traversal.
231+
232+
8. **Use `self.proj.pyproject`** for any data in `pyproject.toml` — it is a
233+
`@cached_property` that is shared with other specs in the same resolve
234+
pass.
235+
236+
### Minimal example
237+
238+
```python
239+
from projspec.proj.base import ProjectSpec, ParseFailed
240+
from projspec.content.environment import Environment, Stack, Precision
241+
from projspec.artifact.python_env import LockFile
242+
from projspec.utils import AttrDict
243+
244+
245+
class MyTool(ProjectSpec):
246+
"""Projects managed by mytool (mytool.toml present)."""
247+
248+
spec_doc = "https://mytool.example.com/spec"
249+
250+
def match(self) -> bool:
251+
return "mytool.toml" in self.proj.basenames
252+
253+
def parse(self) -> None:
254+
import toml
255+
from projspec.utils import PickleableTomlDecoder
256+
257+
try:
258+
with self.proj.get_file("mytool.toml") as f:
259+
meta = toml.load(f, decoder=PickleableTomlDecoder())
260+
except (OSError, ValueError):
261+
raise ParseFailed("Could not read mytool.toml")
262+
263+
packages = meta.get("dependencies", [])
264+
self._contents = AttrDict(
265+
environment=Environment(
266+
proj=self.proj,
267+
stack=Stack.PIP,
268+
precision=Precision.SPEC,
269+
packages=packages,
270+
channels=[],
271+
)
272+
)
273+
self._artifacts = AttrDict(
274+
lock_file=LockFile(
275+
proj=self.proj,
276+
cmd=["mytool", "lock"],
277+
fn=f"{self.proj.url}/mytool.lock",
278+
)
279+
)
280+
```
281+
282+
### `ProjectExtra.parse()` — additional note
283+
284+
`ProjectExtra.parse()` should write to `self._contents` / `self._artifacts`
285+
(accessed as `self.contents` / `self.artifacts` via the property) exactly as
286+
above. `Project.resolve()` merges these into the root project automatically.
287+
288+
---
289+
290+
## Registry and discovery
291+
292+
All three class families self-register via `__init_subclass__`:
293+
294+
```
295+
projspec.proj.base.registry # ProjectSpec subclasses
296+
projspec.content.base.registry # BaseContent subclasses
297+
projspec.artifact.base.registry # BaseArtifact subclasses
298+
```
299+
300+
Keys are **snake-case class names** produced by `camel_to_snake(cls.__name__)`.
301+
A new spec is discovered automatically the moment its module is imported.
302+
All concrete specs are imported in `src/projspec/proj/__init__.py`, so adding
303+
a new file there is all that is needed to register a new type.
304+
305+
`get_cls(name, registry="proj")` (`utils.py:358`) looks up any class by name
306+
across all registries.
307+
308+
---
309+
310+
## `AttrDict` (`utils.py:50`)
311+
312+
A `dict` subclass that also supports attribute-style read access.
313+
314+
```python
315+
d = AttrDict(foo=42)
316+
d.foo # → 42
317+
d["foo"] # → 42
318+
```
319+
320+
`parse()` always assigns `AttrDict` instances to `self._contents` and
321+
`self._artifacts`, never plain `dict`.
322+
323+
---
324+
325+
## Serialisation
326+
327+
Every content, artifact, and spec class implements `to_dict(compact=True)`.
328+
329+
- `compact=True` — human-readable condensed form (strings, nested dicts).
330+
- `compact=False` — full form including a `"klass"` key that encodes the
331+
category and snake-case class name, enabling round-trip deserialisation
332+
via `utils.from_dict()`.
333+
334+
The test suite exercises round-trips in `tests/test_roundtrips.py`.
335+
336+
---
337+
338+
## Testing
339+
340+
```bash
341+
pytest tests/
342+
```
343+
344+
The `proj` fixture in `tests/conftest.py` returns
345+
`projspec.Project("/data")` — the repository root itself — which is a
346+
real `PythonLibrary` + `GitRepo` + `Pixi` project, so most tests work
347+
against live on-disk data.
348+
349+
New specs should add at minimum:
350+
351+
1. A `match()` unit test verifying the positive and negative cases.
352+
2. A `parse()` unit test asserting expected keys in `.contents` and
353+
`.artifacts`.
354+
3. A round-trip test (`to_dict``from_dict`) if the spec introduces new
355+
content or artifact classes.

0 commit comments

Comments
 (0)