Docs: Add job dispatch and resource tiers documentation#271
Docs: Add job dispatch and resource tiers documentation#271erick-GeGe wants to merge 2 commits intomainfrom
Conversation
joaquingx
left a comment
There was a problem hiding this comment.
Good coverage of the dispatch system, but I think this reads more like internal code notes than documentation. A few suggestions:
Too much implementation detail
- Things like the Redis lock command (
SET spider_jobs_lock 1 NX EX 120), internal function names (_get_cluster_resources(),_dispatch_single_job()), and the "Key Files" table are implementation details that will go stale as code changes. These belong in code comments or docstrings, not in docs.
Suggestion: Simplify to focus on what users need to know (tiers, statuses, config options) and drop the code-level details. The code should document itself.
| exits immediately. | ||
|
|
||
| 2. **Fetch queued jobs**: Queries jobs with `IN_QUEUE` status, ordered by creation | ||
| date (FIFO), limited to `RUN_JOBS_PER_LOT` (default 100, 1000 in production). |
There was a problem hiding this comment.
recommended in production is 1000?
There was a problem hiding this comment.
It's not a recommendation — it's the current value in config/settings/prod.py. The default in base.py is 100, but prod.py overrides it to 1000.
| ## Cluster Resource Checking | ||
|
|
||
| The `_get_cluster_resources()` function queries the K8s API to determine available | ||
| capacity on worker nodes. Nodes are selected by label (`role=<SPIDER_NODE_ROLE>`, |
There was a problem hiding this comment.
What about if the DEDICATED_SPIDER_NODES is set to true?
There was a problem hiding this comment.
When DEDICATED_SPIDER_NODES=True, spider pods are scheduled only on nodes labeled with role=<SPIDER_NODE_ROLE>, and _get_cluster_resources() checks capacity only on those nodes. When set to False, pods have no nodeSelector and can land on any node — but the capacity check won't work accurately since it doesn't know which nodes to measure.
docs/estela/api/job-dispatch.md
Outdated
| - **`MULTI_NODE_MODE` must be `"True"`**: This is **critical**. When `MULTI_NODE_MODE` | ||
| is enabled, spider pods are scheduled with a `nodeSelector` matching `SPIDER_NODE_ROLE`, | ||
| and `_get_cluster_resources()` queries only those labeled nodes. If `MULTI_NODE_MODE` | ||
| is `"False"`, pods have no `nodeSelector` and the capacity check has no way to | ||
| accurately measure available resources. The sequential dispatch system is designed | ||
| to work with `MULTI_NODE_MODE=True`. |
There was a problem hiding this comment.
If this is mandatory why there's an option to deactivate?
There was a problem hiding this comment.
It's not strictly mandatory — it's the recommended setup for production or any infrastructure running many spiders at scale. With DEDICATED_SPIDER_NODES=True, you get accurate capacity checking and isolation between spider workloads and system components. However, for smaller setups where everything runs on one or a few nodes, you may want spiders to be scheduled on any available node, so the option exists for that flexibility.
Description
Please include a summary of the changes, relevant motivation and context.
Issue
Checklist before requesting a review