You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(architecture): fix health states, complete IPC protocol, document missing features
Fix health state machine to start at UNKNOWN (not STARTING), add
UNHEALTHY state for user-defined healthchecks. Complete IPC protocol
tables with all message variants (~8 were missing). Document four
previously undocumented features: input spilling, file outputs, custom
metrics, user-defined healthchecks. Expand env var table and Go package
listing. Add hidden CLI commands. Fix crates/README.md duplicate line.
Update skill to prefer important packages over exhaustive listings.
|`Cancel { slot }`| Cancel a running prediction on a slot |
127
+
|`Healthcheck { id }`| Request a user-defined healthcheck |
128
+
|`Shutdown`| Graceful shutdown |
129
+
130
+
**Worker → Parent:**
131
+
132
+
| Message | Purpose |
133
+
|---------|---------|
134
+
|`Ready { slots, schema }`| Worker initialized, here are the slot IDs and OpenAPI schema |
135
+
|`Log { source, data }`| Setup-time log line (stdout or stderr) |
136
+
|`WorkerLog { target, level, message }`| Structured log from the worker runtime itself (not user code) |
137
+
|`Idle { slot }`| Slot finished a prediction and is available |
138
+
|`Cancelled { slot }`| Prediction on slot was cancelled |
139
+
|`Failed { slot, error }`| Prediction on slot failed |
140
+
|`Fatal { reason }`| Unrecoverable error -- worker is shutting down |
141
+
|`DroppedLogs { count, interval_millis }`| Worker dropped log messages due to backpressure |
142
+
|`HealthcheckResult { id, status, error }`| Result of a user-defined healthcheck |
143
+
|`ShuttingDown`| Worker is shutting down |
144
+
145
+
### Slot Channel (Unix socket per slot)
146
+
147
+
Per-prediction data. Using separate sockets per slot avoids head-of-line blocking between concurrent predictions.
148
+
149
+
**Parent → Worker:**
150
+
151
+
| Message | Purpose |
152
+
|---------|---------|
153
+
|`Predict { id, input, input_file, output_dir }`| Run a prediction. `input` is inline JSON; for large payloads (>6MiB) it's `null` and `input_file` points to a spill file on disk |
154
+
155
+
**Worker → Parent:**
156
+
157
+
| Message | Purpose |
158
+
|---------|---------|
159
+
|`Log { source, data }`| Log line from predict() |
160
+
|`Output { output }`| Yielded output value (for generators/streaming) |
161
+
|`FileOutput { filename, kind, mime_type }`| File produced by predict() -- referenced by path, uploaded by parent |
There's a distinction between internal health state (`Health` enum) and what the HTTP response returns (`HealthResponse`). The HTTP response adds one extra state: `UNHEALTHY`, which is transient -- it's returned when a user-defined healthcheck fails but doesn't change the internal health state. See [User-Defined Healthchecks](#user-defined-healthchecks) below.
194
+
160
195
## Prediction Flow
161
196
162
197
### Sync Request (POST /predictions)
@@ -302,13 +337,42 @@ How coglet gets invoked when running a Cog container:
302
337
-**Observable**: Easy to monitor slot usage
303
338
-**Simple**: No async complexity in worker subprocess
304
339
340
+
## Input Spilling
341
+
342
+
When a prediction input exceeds 6MiB, it's too large to send inline through the IPC socket. Instead, the parent writes it to a temporary file and sends the file path in `input_file` (with `input` set to null). The worker reads the file, deletes it, and proceeds normally. This is transparent to the predictor code.
343
+
344
+
## File Outputs
345
+
346
+
When predict() produces file outputs (`cog.Path`), the worker sends a `FileOutput` message with the filename and MIME type. The parent handles uploading the file (or base64-encoding it for inline responses). The `output_dir` field in the `Predict` request tells the worker where to write output files.
347
+
348
+
`FileOutputKind` distinguishes between normal file outputs (`FileType`) and oversized outputs (`Oversized`) that exceeded an inline size limit.
349
+
350
+
## Custom Metrics
351
+
352
+
Models can record custom metrics via `self.record_metric(name, value, mode)` in their predict method. These are sent as `Metric` messages on the slot channel. The `mode` controls how metrics aggregate:
353
+
354
+
-`replace` -- overwrite any existing value
355
+
-`increment` -- add to the current value (numeric)
356
+
-`append` -- append to a list
357
+
358
+
Metrics appear in the prediction response's `metrics` object alongside the built-in `predict_time`.
359
+
360
+
## User-Defined Healthchecks
361
+
362
+
Models can implement a custom healthcheck that runs alongside the built-in health state machine. The parent sends `Healthcheck { id }` on the control channel; the worker runs the user's healthcheck and responds with `HealthcheckResult { id, status, error }`.
363
+
364
+
If the healthcheck fails, the HTTP `/health-check` endpoint returns `UNHEALTHY` -- but this is transient and doesn't change the internal `Health` state. The model stays `READY` and continues accepting predictions.
365
+
305
366
## Environment Variables
306
367
307
368
| Variable | Default | Purpose |
308
369
|----------|---------|---------|
309
370
|`PORT`| 5000 | HTTP server port |
310
-
|`COG_LOG_LEVEL`| INFO | Logging verbosity |
371
+
|`COG_LOG_LEVEL`| INFO | Logging verbosity (ignored if `RUST_LOG` is set) |
311
372
|`COG_MAX_CONCURRENCY`| 1 | Number of concurrent prediction slots |
373
+
|`COG_SETUP_TIMEOUT`| none | Setup timeout in seconds (0 is ignored) |
There's also a separate `base-image` binary (`cmd/base-image/`) with subcommands for managing Cog base images (`dockerfile`, `build`, `generate-matrix`). This isn't a `cog` subcommand.
183
+
172
184
## How CLI Commands Interact with Containers
173
185
174
186
Commands like `predict`, `train`, and `serve` follow the same pattern: build an image, start a container, communicate via HTTP. The CLI never runs model code directly.
Copy file name to clipboardExpand all lines: skills/updating-architecture-docs/SKILL.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,6 +40,8 @@ The first gives you the mental model. The second just restates the code. A reade
40
40
41
41
Reference source locations at the **package/directory level** with a description of what that package owns. Specific file paths and line numbers rot as code moves around. A pointer like "`crates/coglet/src/bridge/` -- IPC protocol and transport" stays accurate through refactors. "`bridge/protocol.rs:69` -- ControlRequest enum" doesn't.
42
42
43
+
Only document packages that matter for understanding the system's shape. Generic utility packages (`pkg/util/`, `pkg/path/`, etc.) don't need a mention -- their existence is obvious and they don't help a reader build a mental model. If someone needs them, they'll find them.
44
+
43
45
When a specific file reference is genuinely useful (a key entry point, a non-obvious starting point for understanding a subsystem), include it -- but prefer "the `PredictionService` in `service.rs`" over a line number.
0 commit comments