|
| 1 | +# AGENTS.md — projspec |
| 2 | + |
| 3 | +This document is a guide for AI coding agents working in this repository. It |
| 4 | +covers the architecture of the `projspec` package (located in `src/projspec/`), |
| 5 | +the three central class families, the contract every `parse()` method must |
| 6 | +honour, and the conventions used throughout. |
| 7 | + |
| 8 | +Extensions and the Qt application (`vsextension/`, `pycharm_plugin/`, |
| 9 | +`src/projspec/qtapp/`) are out of scope. |
| 10 | + |
| 11 | +--- |
| 12 | + |
| 13 | +## Repository layout |
| 14 | + |
| 15 | +``` |
| 16 | +src/projspec/ |
| 17 | + __init__.py # public re-exports: Project, ProjectSpec, get_cls |
| 18 | + proj/ |
| 19 | + base.py # Project, ProjectSpec, ProjectExtra, ParseFailed |
| 20 | + *.py # one file per concrete spec type |
| 21 | + content/ |
| 22 | + base.py # BaseContent + content registry |
| 23 | + *.py # one file per concrete content type |
| 24 | + artifact/ |
| 25 | + base.py # BaseArtifact, FileArtifact + artifact registry |
| 26 | + *.py # one file per concrete artifact type |
| 27 | + utils.py # AttrDict, camel_to_snake, run_subprocess, … |
| 28 | + config.py # get_conf / set_conf |
| 29 | +tests/ |
| 30 | + conftest.py # shared fixtures (proj = Project("/data")) |
| 31 | + test_basic.py # smoke tests |
| 32 | + test_roundtrips.py # serialise / deserialise round-trips |
| 33 | + … |
| 34 | +``` |
| 35 | + |
| 36 | +--- |
| 37 | + |
| 38 | +## The three class families |
| 39 | + |
| 40 | +### 1. `Project` (`proj/base.py:43`) |
| 41 | + |
| 42 | +The top-level container for a parsed directory. It is not subclassed. |
| 43 | + |
| 44 | +Key attributes set during `__init__` → `resolve()`: |
| 45 | + |
| 46 | +| attribute | type | description | |
| 47 | +|-----------|------|-------------| |
| 48 | +| `specs` | `AttrDict` | matched `ProjectSpec` instances, keyed by snake-case class name | |
| 49 | +| `contents` | `AttrDict` | `BaseContent` instances contributed by `ProjectExtra` specs | |
| 50 | +| `artifacts` | `AttrDict` | `BaseArtifact` instances contributed by `ProjectExtra` specs | |
| 51 | +| `children` | `AttrDict` | child `Project` instances found by directory walking | |
| 52 | +| `fs` | `fsspec.AbstractFileSystem` | filesystem used for all file I/O | |
| 53 | +| `url` | `str` | FS-normalised path to the project root | |
| 54 | +| `basenames` | `dict[str, str]` | `{basename: full_path}` for every entry at the root | |
| 55 | +| `pyproject` | `dict` | parsed `pyproject.toml`, or `{}` | |
| 56 | + |
| 57 | +`Project.resolve()` iterates every registered `ProjectSpec` subclass and calls |
| 58 | +`cls(proj)` (which runs `match()`) then `inst.parse()`. A `ValueError` / |
| 59 | +`ParseFailed` means the directory did not match that type and is silently |
| 60 | +skipped. Any other exception is logged but does not abort parsing. |
| 61 | + |
| 62 | +`ProjectExtra` subclasses are handled differently: their `contents` and |
| 63 | +`artifacts` are merged directly into `proj.contents` / `proj.artifacts` rather |
| 64 | +than being stored in `proj.specs`. |
| 65 | + |
| 66 | +--- |
| 67 | + |
| 68 | +### 2. `ProjectSpec` (`proj/base.py:435`) |
| 69 | + |
| 70 | +Base class for every concrete project type. Subclasses are **auto-registered** |
| 71 | +on import via `__init_subclass__` using their snake-case name as the key |
| 72 | +(`proj/base.py:511`). |
| 73 | + |
| 74 | +Lifecycle inside `Project.resolve()`: |
| 75 | + |
| 76 | +``` |
| 77 | +cls(proj) ← __init__ calls self.match(); raises ParseFailed if False |
| 78 | +inst.parse() ← populate self._contents and self._artifacts |
| 79 | +``` |
| 80 | + |
| 81 | +Important class-level attribute: |
| 82 | + |
| 83 | +| attribute | description | |
| 84 | +|-----------|-------------| |
| 85 | +| `spec_doc` | URL to upstream specification docs (optional but encouraged) | |
| 86 | + |
| 87 | +Instance attributes after `parse()`: |
| 88 | + |
| 89 | +| attribute | type | description | |
| 90 | +|-----------|------|-------------| |
| 91 | +| `_contents` | `AttrDict` | content objects for this spec | |
| 92 | +| `_artifacts` | `AttrDict` | artifact objects for this spec | |
| 93 | +| `proj` | `Project` | back-reference to the owning project | |
| 94 | + |
| 95 | +Public properties `.contents` and `.artifacts` delegate to `_contents` / |
| 96 | +`_artifacts` and call `parse()` lazily if they are `None` (`proj/base.py:466`). |
| 97 | + |
| 98 | +#### `ProjectExtra` (`proj/base.py:542`) |
| 99 | + |
| 100 | +A special subclass of `ProjectSpec` for cross-cutting concerns (CI/CD, Docker, |
| 101 | +pre-commit, requirements files, …). These specs are *not* standalone projects. |
| 102 | +After parsing, `Project.resolve()` merges their `contents` / `artifacts` into |
| 103 | +the root project rather than storing them in `proj.specs`. |
| 104 | + |
| 105 | +--- |
| 106 | + |
| 107 | +### 3. `BaseContent` (`content/base.py:11`) |
| 108 | + |
| 109 | +A **dataclass** holding descriptive information extracted from a project. |
| 110 | +Content objects are read-only descriptions; they have no executable behaviour. |
| 111 | + |
| 112 | +Every subclass is a `@dataclass` that **must** include `proj: Project` as its |
| 113 | +first field (inherited from `BaseContent`). |
| 114 | + |
| 115 | +Subclasses are auto-registered on import via `__init_subclass__` (keyed by |
| 116 | +snake-case name). |
| 117 | + |
| 118 | +Concrete content classes: |
| 119 | + |
| 120 | +| class | module | fields | |
| 121 | +|-------|--------|--------| |
| 122 | +| `Environment` | `content/environment.py` | `stack: Stack`, `precision: Precision`, `packages: list[str]`, `channels: list[str]` | |
| 123 | +| `Command` | `content/executable.py` | `cmd: list[str] \| str` | |
| 124 | +| `DescriptiveMetadata` | `content/metadata.py` | `meta: dict[str, str]` | |
| 125 | +| `License` | `content/metadata.py` | `shortname`, `fullname`, `url` | |
| 126 | +| `PythonPackage` | `content/package.py` | `package_name: str` | |
| 127 | +| `RustModule` | `content/package.py` | `name: str` | |
| 128 | +| `NodePackage` | `content/package.py` | `name: str` | |
| 129 | +| `FrictionlessData` | `content/data.py` | `name: str`, `schema: dict` | |
| 130 | +| `IntakeSource` | `content/data.py` | `name: str` | |
| 131 | +| `EnvironmentVariables` | `content/env_var.py` | `variables: dict[str, str \| None]` | |
| 132 | + |
| 133 | +Helper enums used by `Environment`: |
| 134 | + |
| 135 | +- `Stack` (`PIP`, `CONDA`, `NPM`) — packaging technology |
| 136 | +- `Precision` (`SPEC`, `LOCK`) — how precisely the environment is pinned |
| 137 | + |
| 138 | +--- |
| 139 | + |
| 140 | +### 4. `BaseArtifact` (`artifact/base.py:14`) |
| 141 | + |
| 142 | +An executable action or producible output attached to a project. |
| 143 | + |
| 144 | +Constructor signature: `__init__(self, proj: Project, cmd: list[str] | None, **kwargs)` |
| 145 | + |
| 146 | +All extra keyword arguments are stored via `self.__dict__.update(kwargs)`. |
| 147 | + |
| 148 | +Key interface: |
| 149 | + |
| 150 | +| method | description | |
| 151 | +|--------|-------------| |
| 152 | +| `make(**kwargs)` | Execute/produce the artifact. Raises `RuntimeError` for remote projects. | |
| 153 | +| `clean()` | Remove or stop the artifact. Default no-op. | |
| 154 | +| `remake()` | `clean()` then `make()`. | |
| 155 | +| `state` | Property returning `"clean"`, `"done"`, `"pending"`, or `""`. | |
| 156 | + |
| 157 | +Subclasses are auto-registered on import via `__init_subclass__`. |
| 158 | + |
| 159 | +`FileArtifact` (`artifact/base.py:108`) specialises `BaseArtifact` for outputs |
| 160 | +that are one or more files. Constructor adds `fn: str` (glob pattern for |
| 161 | +output path). `_is_done()` / `_is_clean()` check for the file's existence via |
| 162 | +`proj.fs.glob(self.fn)`. |
| 163 | + |
| 164 | +Concrete artifact classes: |
| 165 | + |
| 166 | +| class | module | description | |
| 167 | +|-------|--------|-------------| |
| 168 | +| `Process` | `artifact/process.py` | Subprocess / long-running service | |
| 169 | +| `Server` | `artifact/process.py` | HTTP service (subclass of `Process`) | |
| 170 | +| `Wheel` | `artifact/installable.py` | Python wheel (`dist/*.whl`) | |
| 171 | +| `CondaPackage` | `artifact/installable.py` | Conda `.conda` package | |
| 172 | +| `SystemInstallablePackage` | `artifact/installable.py` | OS installer (deb, msi, dmg, …) | |
| 173 | +| `VirtualEnv` | `artifact/python_env.py` | Python venv directory | |
| 174 | +| `CondaEnv` | `artifact/python_env.py` | Conda environment directory | |
| 175 | +| `LockFile` | `artifact/python_env.py` | Lock-file on disk | |
| 176 | +| `EnvPack` | `artifact/python_env.py` | Packed environment archive | |
| 177 | +| `DockerImage` | `artifact/container.py` | Docker image | |
| 178 | +| `DockerRuntime` | `artifact/container.py` | Running Docker container | |
| 179 | +| `PreCommit` | `artifact/linter.py` | pre-commit hook runner | |
| 180 | + |
| 181 | +--- |
| 182 | + |
| 183 | +## Writing a `parse()` method |
| 184 | + |
| 185 | +`parse()` is the core obligation of every `ProjectSpec` subclass. The base |
| 186 | +implementation simply raises `ParseFailed`, so not calling `super().parse()` is |
| 187 | +normal. |
| 188 | + |
| 189 | +### Contract |
| 190 | + |
| 191 | +1. **Populate `self._contents` and `self._artifacts`** — both must be |
| 192 | + `AttrDict` instances (or remain empty `AttrDict()`). They must not be |
| 193 | + `None` after `parse()` returns. |
| 194 | + |
| 195 | +2. **Grouping convention** — keys inside `_contents` / `_artifacts` are |
| 196 | + snake-case *type names* (`"environment"`, `"wheel"`, `"process"`, …). |
| 197 | + If there are multiple instances of the same type, the value is itself an |
| 198 | + `AttrDict` keyed by an identifying name (e.g. `"default"`, `"test"`, |
| 199 | + `"main"`). |
| 200 | + |
| 201 | + ```python |
| 202 | + # single item |
| 203 | + self._contents["python_package"] = PythonPackage(proj=self.proj, package_name="foo") |
| 204 | + |
| 205 | + # multiple items of the same type |
| 206 | + self._artifacts["process"] = AttrDict( |
| 207 | + main=Process(proj=self.proj, cmd=["python", "__main__.py"]), |
| 208 | + ) |
| 209 | + ``` |
| 210 | + |
| 211 | +3. **Every content/artifact must receive `proj=self.proj`** — this back- |
| 212 | + reference is required by `BaseContent` / `BaseArtifact`. |
| 213 | + |
| 214 | +4. **Raise `ParseFailed` (or any `ValueError`) on unrecoverable bad state** — |
| 215 | + for example if a required file is malformed and you cannot produce |
| 216 | + meaningful output. Do *not* raise for optional fields that simply aren't |
| 217 | + present. |
| 218 | + |
| 219 | +5. **Read files via `self.proj.get_file(name)` or `self.proj.fs`** — never |
| 220 | + use plain `open()`. This keeps parsing compatible with remote filesystems |
| 221 | + (S3, GCS, HTTP, …). |
| 222 | + |
| 223 | +6. **Use `self.proj.basenames` for existence checks** — it is a |
| 224 | + `{basename: full_path}` dict of the top-level directory, already loaded, |
| 225 | + so it is cheap to query. |
| 226 | + |
| 227 | +7. **Keep it cheap** — `match()` runs for every registered type on every |
| 228 | + directory; `parse()` runs immediately afterwards if `match()` returns |
| 229 | + `True`. Read only the files you actually need and avoid recursive |
| 230 | + directory traversal. |
| 231 | + |
| 232 | +8. **Use `self.proj.pyproject`** for any data in `pyproject.toml` — it is a |
| 233 | + `@cached_property` that is shared with other specs in the same resolve |
| 234 | + pass. |
| 235 | + |
| 236 | +### Minimal example |
| 237 | + |
| 238 | +```python |
| 239 | +from projspec.proj.base import ProjectSpec, ParseFailed |
| 240 | +from projspec.content.environment import Environment, Stack, Precision |
| 241 | +from projspec.artifact.python_env import LockFile |
| 242 | +from projspec.utils import AttrDict |
| 243 | + |
| 244 | + |
| 245 | +class MyTool(ProjectSpec): |
| 246 | + """Projects managed by mytool (mytool.toml present).""" |
| 247 | + |
| 248 | + spec_doc = "https://mytool.example.com/spec" |
| 249 | + |
| 250 | + def match(self) -> bool: |
| 251 | + return "mytool.toml" in self.proj.basenames |
| 252 | + |
| 253 | + def parse(self) -> None: |
| 254 | + import toml |
| 255 | + from projspec.utils import PickleableTomlDecoder |
| 256 | + |
| 257 | + try: |
| 258 | + with self.proj.get_file("mytool.toml") as f: |
| 259 | + meta = toml.load(f, decoder=PickleableTomlDecoder()) |
| 260 | + except (OSError, ValueError): |
| 261 | + raise ParseFailed("Could not read mytool.toml") |
| 262 | + |
| 263 | + packages = meta.get("dependencies", []) |
| 264 | + self._contents = AttrDict( |
| 265 | + environment=Environment( |
| 266 | + proj=self.proj, |
| 267 | + stack=Stack.PIP, |
| 268 | + precision=Precision.SPEC, |
| 269 | + packages=packages, |
| 270 | + channels=[], |
| 271 | + ) |
| 272 | + ) |
| 273 | + self._artifacts = AttrDict( |
| 274 | + lock_file=LockFile( |
| 275 | + proj=self.proj, |
| 276 | + cmd=["mytool", "lock"], |
| 277 | + fn=f"{self.proj.url}/mytool.lock", |
| 278 | + ) |
| 279 | + ) |
| 280 | +``` |
| 281 | + |
| 282 | +### `ProjectExtra.parse()` — additional note |
| 283 | + |
| 284 | +`ProjectExtra.parse()` should write to `self._contents` / `self._artifacts` |
| 285 | +(accessed as `self.contents` / `self.artifacts` via the property) exactly as |
| 286 | +above. `Project.resolve()` merges these into the root project automatically. |
| 287 | + |
| 288 | +--- |
| 289 | + |
| 290 | +## Registry and discovery |
| 291 | + |
| 292 | +All three class families self-register via `__init_subclass__`: |
| 293 | + |
| 294 | +``` |
| 295 | +projspec.proj.base.registry # ProjectSpec subclasses |
| 296 | +projspec.content.base.registry # BaseContent subclasses |
| 297 | +projspec.artifact.base.registry # BaseArtifact subclasses |
| 298 | +``` |
| 299 | + |
| 300 | +Keys are **snake-case class names** produced by `camel_to_snake(cls.__name__)`. |
| 301 | +A new spec is discovered automatically the moment its module is imported. |
| 302 | +All concrete specs are imported in `src/projspec/proj/__init__.py`, so adding |
| 303 | +a new file there is all that is needed to register a new type. |
| 304 | + |
| 305 | +`get_cls(name, registry="proj")` (`utils.py:358`) looks up any class by name |
| 306 | +across all registries. |
| 307 | + |
| 308 | +--- |
| 309 | + |
| 310 | +## `AttrDict` (`utils.py:50`) |
| 311 | + |
| 312 | +A `dict` subclass that also supports attribute-style read access. |
| 313 | + |
| 314 | +```python |
| 315 | +d = AttrDict(foo=42) |
| 316 | +d.foo # → 42 |
| 317 | +d["foo"] # → 42 |
| 318 | +``` |
| 319 | + |
| 320 | +`parse()` always assigns `AttrDict` instances to `self._contents` and |
| 321 | +`self._artifacts`, never plain `dict`. |
| 322 | + |
| 323 | +--- |
| 324 | + |
| 325 | +## Serialisation |
| 326 | + |
| 327 | +Every content, artifact, and spec class implements `to_dict(compact=True)`. |
| 328 | + |
| 329 | +- `compact=True` — human-readable condensed form (strings, nested dicts). |
| 330 | +- `compact=False` — full form including a `"klass"` key that encodes the |
| 331 | + category and snake-case class name, enabling round-trip deserialisation |
| 332 | + via `utils.from_dict()`. |
| 333 | + |
| 334 | +The test suite exercises round-trips in `tests/test_roundtrips.py`. |
| 335 | + |
| 336 | +--- |
| 337 | + |
| 338 | +## Testing |
| 339 | + |
| 340 | +```bash |
| 341 | +pytest tests/ |
| 342 | +``` |
| 343 | + |
| 344 | +The `proj` fixture in `tests/conftest.py` returns |
| 345 | +`projspec.Project("/data")` — the repository root itself — which is a |
| 346 | +real `PythonLibrary` + `GitRepo` + `Pixi` project, so most tests work |
| 347 | +against live on-disk data. |
| 348 | + |
| 349 | +New specs should add at minimum: |
| 350 | + |
| 351 | +1. A `match()` unit test verifying the positive and negative cases. |
| 352 | +2. A `parse()` unit test asserting expected keys in `.contents` and |
| 353 | + `.artifacts`. |
| 354 | +3. A round-trip test (`to_dict` → `from_dict`) if the spec introduces new |
| 355 | + content or artifact classes. |
0 commit comments