docs: add DAG and worker system specifications
DAG.md describes the two-tier dependency graph (BOM DAG + feature DAG), node/edge data model, validation states, dirty propagation, forward/backward cone queries, DAG sync payload format, and REST API. WORKERS.md describes the general-purpose async compute job system: YAML job definitions, job lifecycle (pending→claimed→running→completed/failed), runner registration and authentication, claim semantics (SELECT FOR UPDATE SKIP LOCKED), timeout enforcement, SSE events, and REST API.
This commit is contained in:
246
docs/DAG.md
Normal file
246
docs/DAG.md
Normal file
@@ -0,0 +1,246 @@
|
|||||||
|
# Dependency DAG Specification
|
||||||
|
|
||||||
|
**Status:** Draft
|
||||||
|
**Last Updated:** 2026-02-13
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Purpose
|
||||||
|
|
||||||
|
The Dependency DAG is a server-side graph that tracks how features, constraints, and assembly relationships depend on each other. It enables three capabilities described in [MULTI_USER_EDITS.md](MULTI_USER_EDITS.md):
|
||||||
|
|
||||||
|
1. **Interference detection** -- comparing dependency cones of concurrent edit sessions to classify conflicts as none, soft, or hard before the user encounters them.
|
||||||
|
2. **Incremental validation** -- marking changed nodes dirty and propagating only through the affected subgraph, using input-hash memoization to stop early when inputs haven't changed.
|
||||||
|
3. **Structured merge safety** -- walking the DAG to determine whether concurrent edits share upstream dependencies, deciding if auto-merge is safe or manual review is required.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Two-Tier Model
|
||||||
|
|
||||||
|
Silo maintains two levels of dependency graph:
|
||||||
|
|
||||||
|
### 2.1 BOM DAG (existing)
|
||||||
|
|
||||||
|
The assembly-to-part relationship graph already stored in the `relationships` table. Each row represents a parent item containing a child item with a quantity and relationship type (`component`, `alternate`, `reference`). This graph is queried via `GetBOM`, `GetExpandedBOM`, `GetWhereUsed`, and `HasCycle` in `internal/db/relationships.go`.
|
||||||
|
|
||||||
|
The BOM DAG is **not modified** by this specification. It continues to serve its existing purpose.
|
||||||
|
|
||||||
|
### 2.2 Feature DAG (new)
|
||||||
|
|
||||||
|
A finer-grained graph stored in `dag_nodes` and `dag_edges` tables. Each node represents a feature within a single item's revision -- a sketch, pad, fillet, pocket, constraint, body, or part-level container. Edges represent "depends on" relationships: if Pad003 depends on Sketch001, an edge runs from Sketch001 to Pad003.
|
||||||
|
|
||||||
|
The feature DAG is populated by clients (silo-mod) when users save, or by runners after compute jobs. Silo stores and queries it but does not generate it -- the Create client has access to the feature tree and is the authoritative source.
|
||||||
|
|
||||||
|
### 2.3 Cross-Item Edges
|
||||||
|
|
||||||
|
Assembly constraints often reference geometry on child parts (e.g., "mate Face6 of PartA to Face2 of PartB"). These cross-item dependencies are stored in `dag_cross_edges`, linking a node in one item to a node in another. Each cross-edge optionally references the `relationships` row that establishes the BOM connection.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Data Model
|
||||||
|
|
||||||
|
### 3.1 dag_nodes
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `id` | UUID | Primary key |
|
||||||
|
| `item_id` | UUID | FK to `items.id` |
|
||||||
|
| `revision_number` | INTEGER | Revision this DAG snapshot belongs to |
|
||||||
|
| `node_key` | TEXT | Feature name from Create (e.g., `Sketch001`, `Pad003`, `Body`) |
|
||||||
|
| `node_type` | TEXT | One of: `sketch`, `pad`, `pocket`, `fillet`, `chamfer`, `constraint`, `body`, `part`, `datum`, `mirror`, `pattern`, `boolean` |
|
||||||
|
| `properties_hash` | TEXT | SHA-256 of the node's parametric inputs (sketch coordinates, fillet radius, constraint values). Used for memoization -- if the hash hasn't changed, validation can skip this node. |
|
||||||
|
| `validation_state` | TEXT | One of: `clean`, `dirty`, `validating`, `failed` |
|
||||||
|
| `validation_msg` | TEXT | Error message when `validation_state = 'failed'` |
|
||||||
|
| `metadata` | JSONB | Type-specific data (sketch coords, feature params, constraint definitions) |
|
||||||
|
| `created_at` | TIMESTAMPTZ | Row creation time |
|
||||||
|
| `updated_at` | TIMESTAMPTZ | Last state change |
|
||||||
|
|
||||||
|
**Uniqueness:** `(item_id, revision_number, node_key)` -- one node per feature per revision.
|
||||||
|
|
||||||
|
### 3.2 dag_edges
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `id` | UUID | Primary key |
|
||||||
|
| `source_node_id` | UUID | FK to `dag_nodes.id` -- the upstream node |
|
||||||
|
| `target_node_id` | UUID | FK to `dag_nodes.id` -- the downstream node that depends on source |
|
||||||
|
| `edge_type` | TEXT | `depends_on` (default), `references`, `constrains` |
|
||||||
|
| `metadata` | JSONB | Optional edge metadata |
|
||||||
|
|
||||||
|
**Direction convention:** An edge from A to B means "B depends on A". A is upstream, B is downstream. Forward-cone traversal from A walks edges where A is the source.
|
||||||
|
|
||||||
|
**Uniqueness:** `(source_node_id, target_node_id, edge_type)`.
|
||||||
|
|
||||||
|
**Constraint:** `source_node_id != target_node_id` (no self-edges).
|
||||||
|
|
||||||
|
### 3.3 dag_cross_edges
|
||||||
|
|
||||||
|
| Column | Type | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| `id` | UUID | Primary key |
|
||||||
|
| `source_node_id` | UUID | FK to `dag_nodes.id` -- node in item A |
|
||||||
|
| `target_node_id` | UUID | FK to `dag_nodes.id` -- node in item B |
|
||||||
|
| `relationship_id` | UUID | FK to `relationships.id` (nullable) -- the BOM entry connecting the two items |
|
||||||
|
| `edge_type` | TEXT | `assembly_ref` (default) |
|
||||||
|
| `metadata` | JSONB | Reference details (face ID, edge ID, etc.) |
|
||||||
|
|
||||||
|
**Uniqueness:** `(source_node_id, target_node_id)`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Validation States
|
||||||
|
|
||||||
|
Each node has a `validation_state` that tracks whether its computed geometry is current:
|
||||||
|
|
||||||
|
| State | Meaning |
|
||||||
|
|-------|---------|
|
||||||
|
| `clean` | Node's geometry matches its `properties_hash`. No recompute needed. |
|
||||||
|
| `dirty` | An upstream change has propagated to this node. Recompute required. |
|
||||||
|
| `validating` | A compute job is currently revalidating this node. |
|
||||||
|
| `failed` | Recompute failed. `validation_msg` contains the error. |
|
||||||
|
|
||||||
|
### 4.1 State Transitions
|
||||||
|
|
||||||
|
```
|
||||||
|
clean → dirty (upstream change detected, or MarkDirty called)
|
||||||
|
dirty → validating (compute job claims this node)
|
||||||
|
validating → clean (recompute succeeded, properties_hash updated)
|
||||||
|
validating → failed (recompute produced an error)
|
||||||
|
failed → dirty (upstream change detected, retry possible)
|
||||||
|
dirty → clean (properties_hash matches previous -- memoization shortcut)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4.2 Dirty Propagation
|
||||||
|
|
||||||
|
When a node is marked dirty, all downstream nodes in its forward cone are also marked dirty. This is done atomically in a single recursive CTE:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
WITH RECURSIVE forward_cone AS (
|
||||||
|
SELECT $1::uuid AS node_id
|
||||||
|
UNION
|
||||||
|
SELECT e.target_node_id
|
||||||
|
FROM dag_edges e
|
||||||
|
JOIN forward_cone fc ON fc.node_id = e.source_node_id
|
||||||
|
)
|
||||||
|
UPDATE dag_nodes SET validation_state = 'dirty', updated_at = now()
|
||||||
|
WHERE id IN (SELECT node_id FROM forward_cone)
|
||||||
|
AND validation_state = 'clean';
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4.3 Memoization
|
||||||
|
|
||||||
|
Before marking a node dirty, the system can compare the new `properties_hash` against the stored value. If they match, the change did not affect this node's inputs, and propagation stops. This is the memoization boundary described in MULTI_USER_EDITS.md Section 5.2.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Graph Queries
|
||||||
|
|
||||||
|
### 5.1 Forward Cone
|
||||||
|
|
||||||
|
Returns all nodes downstream of a given node -- everything that would be affected if the source node changes. Used for interference detection: if two users' forward cones overlap, there is potential interference.
|
||||||
|
|
||||||
|
```sql
|
||||||
|
WITH RECURSIVE forward_cone AS (
|
||||||
|
SELECT target_node_id AS node_id
|
||||||
|
FROM dag_edges WHERE source_node_id = $1
|
||||||
|
UNION
|
||||||
|
SELECT e.target_node_id
|
||||||
|
FROM dag_edges e
|
||||||
|
JOIN forward_cone fc ON fc.node_id = e.source_node_id
|
||||||
|
)
|
||||||
|
SELECT n.* FROM dag_nodes n JOIN forward_cone fc ON n.id = fc.node_id;
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5.2 Backward Cone
|
||||||
|
|
||||||
|
Returns all nodes upstream of a given node -- everything the target node depends on.
|
||||||
|
|
||||||
|
### 5.3 Dirty Subgraph
|
||||||
|
|
||||||
|
Returns all nodes for a given item where `validation_state != 'clean'`, along with their edges. This is the input to an incremental validation job.
|
||||||
|
|
||||||
|
### 5.4 Cycle Detection
|
||||||
|
|
||||||
|
Before adding an edge, check that it would not create a cycle. Uses the same recursive ancestor-walk pattern as `HasCycle` in `internal/db/relationships.go`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. DAG Sync
|
||||||
|
|
||||||
|
Clients push the full feature DAG to Silo via `PUT /api/items/{partNumber}/dag`. The sync payload is a JSON document:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"revision": 3,
|
||||||
|
"nodes": [
|
||||||
|
{
|
||||||
|
"key": "Sketch001",
|
||||||
|
"type": "sketch",
|
||||||
|
"properties_hash": "a1b2c3...",
|
||||||
|
"metadata": {
|
||||||
|
"coordinates": [[0, 0], [10, 0], [10, 5]],
|
||||||
|
"constraints": ["horizontal", "vertical"]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"key": "Pad003",
|
||||||
|
"type": "pad",
|
||||||
|
"properties_hash": "d4e5f6...",
|
||||||
|
"metadata": {
|
||||||
|
"length": 15.0,
|
||||||
|
"direction": [0, 0, 1]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"edges": [
|
||||||
|
{
|
||||||
|
"source": "Sketch001",
|
||||||
|
"target": "Pad003",
|
||||||
|
"type": "depends_on"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The server processes this within a single transaction:
|
||||||
|
1. Upsert all nodes (matched by `item_id + revision_number + node_key`).
|
||||||
|
2. Replace all edges for this item/revision.
|
||||||
|
3. Compare new `properties_hash` values against stored values to detect changes.
|
||||||
|
4. Mark changed nodes and their forward cones dirty.
|
||||||
|
5. Publish `dag.updated` SSE event.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Interference Detection
|
||||||
|
|
||||||
|
When a user registers an edit context (MULTI_USER_EDITS.md Section 3.1), the server:
|
||||||
|
|
||||||
|
1. Looks up the node(s) being edited by `node_key` within the item's current revision.
|
||||||
|
2. Computes the forward cone for those nodes.
|
||||||
|
3. Compares the cone against all active edit sessions' cones.
|
||||||
|
4. Classifies interference:
|
||||||
|
- **No overlap** → no interference, fully concurrent.
|
||||||
|
- **Overlap, different objects** → soft interference, visual indicator via SSE.
|
||||||
|
- **Same object, same edit type** → hard interference, edit blocked.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. REST API
|
||||||
|
|
||||||
|
All endpoints are under `/api/items/{partNumber}` and require authentication.
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|--------|------|------|-------------|
|
||||||
|
| `GET` | `/dag` | viewer | Get full feature DAG for current revision |
|
||||||
|
| `GET` | `/dag/forward-cone/{nodeKey}` | viewer | Get forward dependency cone |
|
||||||
|
| `GET` | `/dag/dirty` | viewer | Get dirty subgraph |
|
||||||
|
| `PUT` | `/dag` | editor | Sync full feature tree (from client or runner) |
|
||||||
|
| `POST` | `/dag/mark-dirty/{nodeKey}` | editor | Manually mark a node and its cone dirty |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. References
|
||||||
|
|
||||||
|
- [MULTI_USER_EDITS.md](MULTI_USER_EDITS.md) -- Full multi-user editing specification
|
||||||
|
- [WORKERS.md](WORKERS.md) -- Worker/runner system that executes validation jobs
|
||||||
|
- [ROADMAP.md](ROADMAP.md) -- Tier 0 Dependency DAG entry
|
||||||
364
docs/WORKERS.md
Normal file
364
docs/WORKERS.md
Normal file
@@ -0,0 +1,364 @@
|
|||||||
|
# Worker System Specification
|
||||||
|
|
||||||
|
**Status:** Draft
|
||||||
|
**Last Updated:** 2026-02-13
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Purpose
|
||||||
|
|
||||||
|
The worker system provides async compute job execution for Silo. Jobs are defined as YAML files, managed by the Silo server, and executed by external runner processes. The system is general-purpose -- while DAG validation is the first use case, it supports any compute workload: geometry export, thumbnail rendering, FEA/CFD batch jobs, report generation, and data migration.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
YAML Job Definitions (files on disk, version-controllable)
|
||||||
|
|
|
||||||
|
v
|
||||||
|
Silo Server (parser, scheduler, state machine, REST API, SSE events)
|
||||||
|
|
|
||||||
|
v
|
||||||
|
Runners (silorunner binary, polls via REST, executes Headless Create)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Three layers:**
|
||||||
|
|
||||||
|
1. **Job definitions** -- YAML files in a configurable directory (default `/etc/silo/jobdefs`). Each file defines a job type: what triggers it, what it operates on, what computation to perform, and what runner capabilities are required. These are the source of truth and can be version-controlled alongside other Silo config.
|
||||||
|
|
||||||
|
2. **Silo server** -- Parses YAML definitions on startup and upserts them into the `job_definitions` table. Creates job instances when triggers fire (revision created, BOM changed, manual). Manages job lifecycle, enforces timeouts, and broadcasts status via SSE.
|
||||||
|
|
||||||
|
3. **Runners** -- Separate `silorunner` processes that authenticate with Silo via API tokens, poll for available jobs, claim them atomically, execute the compute, and report results. A runner host must have Headless Create and silo-mod installed for geometry jobs.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Job Definition Format
|
||||||
|
|
||||||
|
Job definitions are YAML files with the following structure:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
job:
|
||||||
|
name: assembly-validate
|
||||||
|
version: 1
|
||||||
|
description: "Validate assembly by rebuilding its dependency subgraph"
|
||||||
|
|
||||||
|
trigger:
|
||||||
|
type: revision_created # revision_created, bom_changed, manual, schedule
|
||||||
|
filter:
|
||||||
|
item_type: assembly # only trigger for assemblies
|
||||||
|
|
||||||
|
scope:
|
||||||
|
type: assembly # item, assembly, project
|
||||||
|
|
||||||
|
compute:
|
||||||
|
type: validate # validate, rebuild, diff, export, custom
|
||||||
|
command: create-validate # runner-side command identifier
|
||||||
|
args: # passed to runner as JSON
|
||||||
|
rebuild_mode: incremental
|
||||||
|
check_interference: true
|
||||||
|
|
||||||
|
runner:
|
||||||
|
tags: [create] # required runner capabilities
|
||||||
|
|
||||||
|
timeout: 900 # seconds before job is marked failed (default 600)
|
||||||
|
max_retries: 2 # retry count on failure (default 1)
|
||||||
|
priority: 50 # lower = higher priority (default 100)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3.1 Trigger Types
|
||||||
|
|
||||||
|
| Type | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| `revision_created` | Fires when a new revision is created on an item matching the filter |
|
||||||
|
| `bom_changed` | Fires when a BOM merge completes |
|
||||||
|
| `manual` | Only triggered via `POST /api/jobs` |
|
||||||
|
| `schedule` | Future: cron-like scheduling (not yet implemented) |
|
||||||
|
|
||||||
|
### 3.2 Trigger Filters
|
||||||
|
|
||||||
|
The `filter` map supports key-value matching against item properties:
|
||||||
|
|
||||||
|
| Key | Description |
|
||||||
|
|-----|-------------|
|
||||||
|
| `item_type` | Match item type: `part`, `assembly`, `drawing`, etc. |
|
||||||
|
| `schema` | Match schema name |
|
||||||
|
|
||||||
|
All filter keys must match for the trigger to fire. An empty filter matches all items.
|
||||||
|
|
||||||
|
### 3.3 Scope Types
|
||||||
|
|
||||||
|
| Type | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| `item` | Job operates on a single item |
|
||||||
|
| `assembly` | Job operates on an assembly and its BOM tree |
|
||||||
|
| `project` | Job operates on all items in a project |
|
||||||
|
|
||||||
|
### 3.4 Compute Commands
|
||||||
|
|
||||||
|
The `command` field identifies what the runner should execute. Built-in commands:
|
||||||
|
|
||||||
|
| Command | Description |
|
||||||
|
|---------|-------------|
|
||||||
|
| `create-validate` | Open file in Headless Create, rebuild features, report validation results |
|
||||||
|
| `create-export` | Open file, export to specified format (STEP, IGES, 3MF) |
|
||||||
|
| `create-dag-extract` | Open file, extract feature DAG, output as JSON |
|
||||||
|
| `create-thumbnail` | Open file, render thumbnail image |
|
||||||
|
|
||||||
|
Custom commands can be added by extending silo-mod's `silo.runner` module.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Job Lifecycle
|
||||||
|
|
||||||
|
```
|
||||||
|
pending → claimed → running → completed
|
||||||
|
→ failed
|
||||||
|
→ cancelled
|
||||||
|
```
|
||||||
|
|
||||||
|
| State | Description |
|
||||||
|
|-------|-------------|
|
||||||
|
| `pending` | Job created, waiting for a runner to claim it |
|
||||||
|
| `claimed` | Runner has claimed the job. `expires_at` is set. |
|
||||||
|
| `running` | Runner has started execution (reported via progress update) |
|
||||||
|
| `completed` | Runner reported success. `result` JSONB contains output. |
|
||||||
|
| `failed` | Runner reported failure, timeout expired, or max retries exceeded |
|
||||||
|
| `cancelled` | Admin cancelled the job before completion |
|
||||||
|
|
||||||
|
### 4.1 Claim Semantics
|
||||||
|
|
||||||
|
Runners claim jobs via `POST /api/runner/claim`. The server uses PostgreSQL's `SELECT FOR UPDATE SKIP LOCKED` to ensure exactly-once delivery:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
WITH claimable AS (
|
||||||
|
SELECT id FROM jobs
|
||||||
|
WHERE status = 'pending'
|
||||||
|
AND runner_tags <@ $2::text[]
|
||||||
|
ORDER BY priority ASC, created_at ASC
|
||||||
|
LIMIT 1
|
||||||
|
FOR UPDATE SKIP LOCKED
|
||||||
|
)
|
||||||
|
UPDATE jobs SET
|
||||||
|
status = 'claimed',
|
||||||
|
runner_id = $1,
|
||||||
|
claimed_at = now(),
|
||||||
|
expires_at = now() + (timeout_seconds || ' seconds')::interval
|
||||||
|
FROM claimable
|
||||||
|
WHERE jobs.id = claimable.id
|
||||||
|
RETURNING jobs.*;
|
||||||
|
```
|
||||||
|
|
||||||
|
The `runner_tags <@ $2::text[]` condition ensures the runner has all tags required by the job. A runner with tags `["create", "linux", "gpu"]` can claim a job requiring `["create"]`, but not one requiring `["create", "windows"]`.
|
||||||
|
|
||||||
|
### 4.2 Timeout Enforcement
|
||||||
|
|
||||||
|
A background sweeper runs every 30 seconds (configurable via `jobs.job_timeout_check`) and marks expired jobs as failed:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
UPDATE jobs SET status = 'failed', error_message = 'job timed out'
|
||||||
|
WHERE status IN ('claimed', 'running')
|
||||||
|
AND expires_at < now();
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4.3 Retry
|
||||||
|
|
||||||
|
When a job fails and `retry_count < max_retries`, a new job is created with the same definition and scope, with `retry_count` incremented.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Runners
|
||||||
|
|
||||||
|
### 5.1 Registration
|
||||||
|
|
||||||
|
Runners are registered via `POST /api/runners` (admin only). The server generates a token (shown once) and stores the SHA-256 hash in the `runners` table. This follows the same pattern as API tokens in `internal/auth/token.go`.
|
||||||
|
|
||||||
|
### 5.2 Authentication
|
||||||
|
|
||||||
|
Runners authenticate via `Authorization: Bearer silo_runner_<token>`. A dedicated `RequireRunnerAuth` middleware validates the token against the `runners` table and injects a `RunnerIdentity` into the request context.
|
||||||
|
|
||||||
|
### 5.3 Heartbeat
|
||||||
|
|
||||||
|
Runners send `POST /api/runner/heartbeat` every 30 seconds. The server updates `last_heartbeat` and sets `status = 'online'`. A background sweeper marks runners as `offline` if their heartbeat is older than `runner_timeout` seconds (default 90).
|
||||||
|
|
||||||
|
### 5.4 Tags
|
||||||
|
|
||||||
|
Each runner declares capability tags (e.g., `["create", "linux", "gpu"]`). Jobs require specific tags via the `runner.tags` field in their YAML definition. A runner can only claim jobs whose required tags are a subset of the runner's tags.
|
||||||
|
|
||||||
|
### 5.5 Runner Config
|
||||||
|
|
||||||
|
The `silorunner` binary reads its config from a YAML file:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
server_url: "https://silo.example.com"
|
||||||
|
token: "silo_runner_abc123..."
|
||||||
|
name: "worker-01"
|
||||||
|
tags: ["create", "linux"]
|
||||||
|
poll_interval: 5 # seconds between claim attempts
|
||||||
|
create_path: "/usr/bin/create" # path to Headless Create binary (with silo-mod installed)
|
||||||
|
```
|
||||||
|
|
||||||
|
Or via environment variables: `SILO_RUNNER_SERVER_URL`, `SILO_RUNNER_TOKEN`, etc.
|
||||||
|
|
||||||
|
### 5.6 Deployment
|
||||||
|
|
||||||
|
Runner prerequisites:
|
||||||
|
- `silorunner` binary (built from `cmd/silorunner/`)
|
||||||
|
- Headless Create (Kindred's fork of FreeCAD) with silo-mod workbench installed
|
||||||
|
- Network access to Silo server API
|
||||||
|
|
||||||
|
Runners can be deployed as:
|
||||||
|
- Bare metal processes alongside Create installations
|
||||||
|
- Docker containers with Create pre-installed
|
||||||
|
- Scaled horizontally by registering multiple runners with different names
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Job Log
|
||||||
|
|
||||||
|
Each job has an append-only log stored in the `job_log` table. Runners append entries via `POST /api/runner/jobs/{jobID}/log`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"level": "info",
|
||||||
|
"message": "Rebuilding Pad003...",
|
||||||
|
"metadata": {"node_key": "Pad003", "progress_pct": 45}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Log levels: `debug`, `info`, `warn`, `error`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. SSE Events
|
||||||
|
|
||||||
|
All job lifecycle transitions are broadcast via Silo's SSE broker. Clients subscribe to `/api/events` and receive:
|
||||||
|
|
||||||
|
| Event Type | Payload | When |
|
||||||
|
|------------|---------|------|
|
||||||
|
| `job.created` | `{id, definition_name, item_id, status, priority}` | Job created |
|
||||||
|
| `job.claimed` | `{id, runner_id, runner_name}` | Runner claims job |
|
||||||
|
| `job.progress` | `{id, progress, progress_message}` | Runner reports progress (0-100) |
|
||||||
|
| `job.completed` | `{id, result_summary, duration_seconds}` | Job completed successfully |
|
||||||
|
| `job.failed` | `{id, error_message}` | Job failed |
|
||||||
|
| `job.cancelled` | `{id, cancelled_by}` | Admin cancelled job |
|
||||||
|
| `runner.online` | `{id, name, tags}` | Runner heartbeat (first after offline) |
|
||||||
|
| `runner.offline` | `{id, name}` | Runner heartbeat timeout |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. REST API
|
||||||
|
|
||||||
|
### 8.1 Job Endpoints (user-facing, require auth)
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|--------|------|------|-------------|
|
||||||
|
| `GET` | `/api/jobs` | viewer | List jobs (filterable by status, item, definition) |
|
||||||
|
| `GET` | `/api/jobs/{jobID}` | viewer | Get job details |
|
||||||
|
| `GET` | `/api/jobs/{jobID}/logs` | viewer | Get job log entries |
|
||||||
|
| `POST` | `/api/jobs` | editor | Manually trigger a job |
|
||||||
|
| `POST` | `/api/jobs/{jobID}/cancel` | editor | Cancel a pending/running job |
|
||||||
|
|
||||||
|
### 8.2 Job Definition Endpoints
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|--------|------|------|-------------|
|
||||||
|
| `GET` | `/api/job-definitions` | viewer | List loaded definitions |
|
||||||
|
| `GET` | `/api/job-definitions/{name}` | viewer | Get specific definition |
|
||||||
|
| `POST` | `/api/job-definitions/reload` | admin | Re-read YAML from disk |
|
||||||
|
|
||||||
|
### 8.3 Runner Management Endpoints (admin)
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|--------|------|------|-------------|
|
||||||
|
| `GET` | `/api/runners` | admin | List registered runners |
|
||||||
|
| `POST` | `/api/runners` | admin | Register runner (returns token) |
|
||||||
|
| `DELETE` | `/api/runners/{runnerID}` | admin | Delete runner |
|
||||||
|
|
||||||
|
### 8.4 Runner-Facing Endpoints (runner token auth)
|
||||||
|
|
||||||
|
| Method | Path | Auth | Description |
|
||||||
|
|--------|------|------|-------------|
|
||||||
|
| `POST` | `/api/runner/heartbeat` | runner | Send heartbeat |
|
||||||
|
| `POST` | `/api/runner/claim` | runner | Claim next available job |
|
||||||
|
| `PUT` | `/api/runner/jobs/{jobID}/progress` | runner | Report progress |
|
||||||
|
| `POST` | `/api/runner/jobs/{jobID}/complete` | runner | Report completion with result |
|
||||||
|
| `POST` | `/api/runner/jobs/{jobID}/fail` | runner | Report failure |
|
||||||
|
| `POST` | `/api/runner/jobs/{jobID}/log` | runner | Append log entry |
|
||||||
|
| `PUT` | `/api/runner/jobs/{jobID}/dag` | runner | Sync DAG results after compute |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. Configuration
|
||||||
|
|
||||||
|
Add to `config.yaml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
jobs:
|
||||||
|
directory: /etc/silo/jobdefs # path to YAML job definitions
|
||||||
|
runner_timeout: 90 # seconds before marking runner offline
|
||||||
|
job_timeout_check: 30 # seconds between timeout sweeps
|
||||||
|
default_priority: 100 # default job priority
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10. Example Job Definitions
|
||||||
|
|
||||||
|
### Assembly Validation
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
job:
|
||||||
|
name: assembly-validate
|
||||||
|
version: 1
|
||||||
|
description: "Validate assembly by rebuilding its dependency subgraph"
|
||||||
|
trigger:
|
||||||
|
type: revision_created
|
||||||
|
filter:
|
||||||
|
item_type: assembly
|
||||||
|
scope:
|
||||||
|
type: assembly
|
||||||
|
compute:
|
||||||
|
type: validate
|
||||||
|
command: create-validate
|
||||||
|
args:
|
||||||
|
rebuild_mode: incremental
|
||||||
|
check_interference: true
|
||||||
|
runner:
|
||||||
|
tags: [create]
|
||||||
|
timeout: 900
|
||||||
|
max_retries: 2
|
||||||
|
priority: 50
|
||||||
|
```
|
||||||
|
|
||||||
|
### STEP Export
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
job:
|
||||||
|
name: part-export-step
|
||||||
|
version: 1
|
||||||
|
description: "Export a part to STEP format"
|
||||||
|
trigger:
|
||||||
|
type: manual
|
||||||
|
scope:
|
||||||
|
type: item
|
||||||
|
compute:
|
||||||
|
type: export
|
||||||
|
command: create-export
|
||||||
|
args:
|
||||||
|
format: step
|
||||||
|
output_key_template: "exports/{part_number}_rev{revision}.step"
|
||||||
|
runner:
|
||||||
|
tags: [create]
|
||||||
|
timeout: 300
|
||||||
|
max_retries: 1
|
||||||
|
priority: 100
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 11. References
|
||||||
|
|
||||||
|
- [DAG.md](DAG.md) -- Dependency DAG specification
|
||||||
|
- [MULTI_USER_EDITS.md](MULTI_USER_EDITS.md) -- Multi-user editing specification
|
||||||
|
- [ROADMAP.md](ROADMAP.md) -- Tier 0 Job Queue Infrastructure, Tier 1 Headless Create
|
||||||
Reference in New Issue
Block a user