feat: dependency DAG and YAML-defined compute jobs #92

Merged
forbes merged 13 commits from feat-dag-workers into main 2026-02-14 19:27:19 +00:00
Owner

Summary

Implements the server-side dependency DAG and worker/runner system for YAML-defined compute jobs, as specified in MULTI_USER_EDITS.md.

What's Included

Design Documentation

  • docs/DAG.md -- Dependency DAG specification (two-tier model, validation states, graph queries, interference detection)
  • docs/WORKERS.md -- Worker system specification (YAML job definitions, job lifecycle, runner architecture, claim semantics)
  • docs/DAG_CLIENT_INTEGRATION.md -- Client/silo-mod integration contract (DAG sync payload, Python entry points, headless invocation)

Database

  • migrations/014_dag_nodes_edges.sql -- dag_nodes, dag_edges, dag_cross_edges tables
  • migrations/015_jobs_runners.sql -- runners, job_definitions, jobs, job_log tables with job_status enum

Core Libraries

  • internal/jobdef/ -- YAML job definition parser (Load, LoadAll, Validate) with 11 unit tests
  • internal/db/dag.go -- DAG repository: recursive CTE queries (forward/backward cone, dirty propagation), SyncFeatureTree, cycle detection
  • internal/db/jobs.go -- Job + runner repository: SELECT FOR UPDATE SKIP LOCKED atomic claim, lifecycle methods, timeout enforcement

Server Wiring

  • JobsConfig in config with defaults (runner timeout, sweep interval, priority)
  • Server struct extended with DAG/job repos, job definitions
  • Background sweepers for job timeout + stale runner expiry
  • Job definitions loaded from YAML on startup, upserted into DB

Authentication

  • internal/auth/runner.go -- RunnerIdentity context helpers
  • RequireRunnerAuth middleware -- validates silo_runner_ tokens via SHA-256 hash lookup

API Endpoints (~25 new routes)

DAG (nested under /api/items/{partNumber}):

  • GET /dag, GET /dag/forward-cone/{nodeKey}, GET /dag/dirty
  • PUT /dag (sync), POST /dag/mark-dirty/{nodeKey}

Jobs:

  • GET/POST /api/jobs, GET /api/jobs/{id}, GET /api/jobs/{id}/logs, POST /api/jobs/{id}/cancel

Job Definitions:

  • GET /api/job-definitions, GET /api/job-definitions/{name}, POST /api/job-definitions/reload

Runners (admin):

  • GET/POST /api/runners, DELETE /api/runners/{id}

Runner-facing (runner token auth):

  • POST /api/runner/heartbeat, POST /api/runner/claim
  • POST /api/runner/jobs/{id}/start, PUT /api/runner/jobs/{id}/progress
  • POST /api/runner/jobs/{id}/complete, POST /api/runner/jobs/{id}/fail
  • POST /api/runner/jobs/{id}/log, PUT /api/runner/jobs/{id}/dag

Auto-Triggers

  • HandleCreateRevision triggers revision_created jobs
  • HandleMergeBOM triggers bom_changed jobs

Runner Binary

  • cmd/silorunner/main.go -- separate binary with config, heartbeat, poll loop, job execution scaffold
  • Placeholder execution for create-validate, create-export, create-dag-extract, create-thumbnail

Tests

  • 11 jobdef unit tests (YAML parsing, validation, LoadAll)
  • DAG handler integration tests (sync, forward cone, dirty propagation, not-found)
  • Job handler integration tests (CRUD, cancel, filter, runner registration)
  • Runner token generation unit test

Verification

go build ./cmd/silod && go build ./cmd/silorunner && go build ./cmd/silo
go test ./internal/jobdef/... -v
go test ./internal/api/... -v  # integration tests need TEST_DATABASE_URL
## Summary Implements the server-side dependency DAG and worker/runner system for YAML-defined compute jobs, as specified in MULTI_USER_EDITS.md. ## What's Included ### Design Documentation - `docs/DAG.md` -- Dependency DAG specification (two-tier model, validation states, graph queries, interference detection) - `docs/WORKERS.md` -- Worker system specification (YAML job definitions, job lifecycle, runner architecture, claim semantics) - `docs/DAG_CLIENT_INTEGRATION.md` -- Client/silo-mod integration contract (DAG sync payload, Python entry points, headless invocation) ### Database - `migrations/014_dag_nodes_edges.sql` -- `dag_nodes`, `dag_edges`, `dag_cross_edges` tables - `migrations/015_jobs_runners.sql` -- `runners`, `job_definitions`, `jobs`, `job_log` tables with `job_status` enum ### Core Libraries - `internal/jobdef/` -- YAML job definition parser (Load, LoadAll, Validate) with 11 unit tests - `internal/db/dag.go` -- DAG repository: recursive CTE queries (forward/backward cone, dirty propagation), SyncFeatureTree, cycle detection - `internal/db/jobs.go` -- Job + runner repository: `SELECT FOR UPDATE SKIP LOCKED` atomic claim, lifecycle methods, timeout enforcement ### Server Wiring - `JobsConfig` in config with defaults (runner timeout, sweep interval, priority) - Server struct extended with DAG/job repos, job definitions - Background sweepers for job timeout + stale runner expiry - Job definitions loaded from YAML on startup, upserted into DB ### Authentication - `internal/auth/runner.go` -- RunnerIdentity context helpers - `RequireRunnerAuth` middleware -- validates `silo_runner_` tokens via SHA-256 hash lookup ### API Endpoints (~25 new routes) **DAG** (nested under `/api/items/{partNumber}`): - `GET /dag`, `GET /dag/forward-cone/{nodeKey}`, `GET /dag/dirty` - `PUT /dag` (sync), `POST /dag/mark-dirty/{nodeKey}` **Jobs**: - `GET/POST /api/jobs`, `GET /api/jobs/{id}`, `GET /api/jobs/{id}/logs`, `POST /api/jobs/{id}/cancel` **Job Definitions**: - `GET /api/job-definitions`, `GET /api/job-definitions/{name}`, `POST /api/job-definitions/reload` **Runners** (admin): - `GET/POST /api/runners`, `DELETE /api/runners/{id}` **Runner-facing** (runner token auth): - `POST /api/runner/heartbeat`, `POST /api/runner/claim` - `POST /api/runner/jobs/{id}/start`, `PUT /api/runner/jobs/{id}/progress` - `POST /api/runner/jobs/{id}/complete`, `POST /api/runner/jobs/{id}/fail` - `POST /api/runner/jobs/{id}/log`, `PUT /api/runner/jobs/{id}/dag` ### Auto-Triggers - `HandleCreateRevision` triggers `revision_created` jobs - `HandleMergeBOM` triggers `bom_changed` jobs ### Runner Binary - `cmd/silorunner/main.go` -- separate binary with config, heartbeat, poll loop, job execution scaffold - Placeholder execution for `create-validate`, `create-export`, `create-dag-extract`, `create-thumbnail` ### Tests - 11 jobdef unit tests (YAML parsing, validation, LoadAll) - DAG handler integration tests (sync, forward cone, dirty propagation, not-found) - Job handler integration tests (CRUD, cancel, filter, runner registration) - Runner token generation unit test ## Verification ```bash go build ./cmd/silod && go build ./cmd/silorunner && go build ./cmd/silo go test ./internal/jobdef/... -v go test ./internal/api/... -v # integration tests need TEST_DATABASE_URL ```
forbes added 13 commits 2026-02-14 19:26:23 +00:00
DAG.md describes the two-tier dependency graph (BOM DAG + feature DAG),
node/edge data model, validation states, dirty propagation, forward/backward
cone queries, DAG sync payload format, and REST API.

WORKERS.md describes the general-purpose async compute job system: YAML job
definitions, job lifecycle (pending→claimed→running→completed/failed),
runner registration and authentication, claim semantics (SELECT FOR UPDATE
SKIP LOCKED), timeout enforcement, SSE events, and REST API.
Migration 014: dag_nodes, dag_edges, dag_cross_edges tables for the
feature-level dependency graph with validation state tracking.

Migration 015: runners, job_definitions, jobs, job_log tables for the
async compute job system with PostgreSQL-backed work queue.

Update TruncateAll in testutil to include new tables.
New package internal/jobdef mirrors the schema package pattern:
- Load/LoadAll/Validate for YAML job definitions
- Supports trigger types: revision_created, bom_changed, manual, schedule
- Supports scope types: item, assembly, project
- Supports compute types: validate, rebuild, diff, export, custom
- Defaults: timeout=600s, max_retries=1, priority=100

Example definitions in jobdefs/:
- assembly-validate.yaml: incremental validation on revision_created
- part-export-step.yaml: STEP export on manual trigger

11 unit tests, all passing.
forbes merged commit defb3af56f into main 2026-02-14 19:27:19 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: kindred/silo#92