# Solver Service Specification **Status:** Phase 3b Implemented (server endpoints, job definitions, result cache) **Last Updated:** 2026-03-01 **Depends on:** KCSolve Phase 1 (PR #297), Phase 2 (PR #298) **Prerequisite infrastructure:** Job queue, runner system, and SSE broadcasting are fully implemented (see [WORKERS.md](WORKERS.md), migration `015_jobs_runners.sql`, `cmd/silorunner/`). --- ## 1. Overview The solver service extends Silo's job queue system with assembly constraint solving capabilities. It enables server-side solving of assemblies stored in Silo, with results streamed back to clients in real time via SSE. This specification describes how the existing KCSolve client-side API (C++ library + pybind11 `kcsolve` module) integrates with Silo's worker infrastructure to provide headless, asynchronous constraint solving. ### 1.1 Goals 1. **Offload solving** -- Move heavy solve operations off the user's machine to server workers. 2. **Batch validation** -- Automatically validate assemblies on commit (e.g. check for over-constrained systems). 3. **Solver selection** -- Allow the server to run different solvers than the client (e.g. a more thorough solver for validation, a fast one for interactive editing). 4. **Standalone execution** -- Solver workers can run without a full FreeCAD installation, using just the `kcsolve` Python module and the `.kc` file. ### 1.2 Non-Goals - **Interactive drag** -- Real-time drag solving stays client-side (latency-sensitive). - **Geometry processing** -- Workers don't compute geometry; they receive pre-extracted constraint graphs. - **Solver development** -- Writing new solver backends is out of scope; this spec covers the transport and execution layer. --- ## 2. Architecture ``` ┌─────────────────────┐ │ Kindred Create │ │ (FreeCAD client) │ └───────┬──────────────┘ │ 1. POST /api/solver/jobs │ (SolveContext JSON) │ │ 4. GET /api/events (SSE) │ job.progress, job.completed ▼ ┌─────────────────────┐ │ Silo Server │ │ (silod) │ │ │ │ solver module │ │ REST + SSE + queue │ └───────┬──────────────┘ │ 2. POST /api/runner/claim │ 3. POST /api/runner/jobs/{id}/complete ▼ ┌─────────────────────┐ │ Solver Runner │ │ (silorunner) │ │ │ │ kcsolve module │ │ OndselAdapter │ │ Python solvers │ └─────────────────────┘ ``` ### 2.1 Components | Component | Role | Deployment | |-----------|------|------------| | **Silo server** | Job queue management, REST API, SSE broadcast, result storage | Existing `silod` binary (jobs module, migration 015) | | **Solver runner** | Claims solver jobs, executes `kcsolve`, reports results | Existing `silorunner` binary (`cmd/silorunner/`) with `solver` tag | | **kcsolve module** | Python/C++ solver library (Phase 1+2) | Installed on runner nodes | | **Create client** | Submits jobs, receives results via SSE | Existing FreeCAD client | ### 2.2 Module Registration The solver service is a Silo module with ID `solver`, gated behind the existing module system: ```yaml # config.yaml modules: solver: enabled: true ``` It depends on the `jobs` module being enabled. All solver endpoints return `404` with `{"error": "module not enabled"}` when disabled. --- ## 3. Data Model ### 3.1 SolveContext JSON Schema The `SolveContext` is the input to a solve operation. Currently it exists only as a C++ struct and pybind11 binding with no serialization. Phase 3 adds JSON serialization to enable server transport. ```json { "api_version": 1, "parts": [ { "id": "Part001", "placement": { "position": [0.0, 0.0, 0.0], "quaternion": [1.0, 0.0, 0.0, 0.0] }, "mass": 1.0, "grounded": true }, { "id": "Part002", "placement": { "position": [100.0, 0.0, 0.0], "quaternion": [1.0, 0.0, 0.0, 0.0] }, "mass": 1.0, "grounded": false } ], "constraints": [ { "id": "Joint001", "part_i": "Part001", "marker_i": { "position": [50.0, 0.0, 0.0], "quaternion": [1.0, 0.0, 0.0, 0.0] }, "part_j": "Part002", "marker_j": { "position": [0.0, 0.0, 0.0], "quaternion": [1.0, 0.0, 0.0, 0.0] }, "type": "Revolute", "params": [], "limits": [], "activated": true } ], "motions": [], "simulation": null, "bundle_fixed": false } ``` **Field reference:** See [KCSolve Python API](../reference/kcsolve-python.md) for full field documentation. The JSON schema maps 1:1 to the Python/C++ types. **Enum serialization:** Enums serialize as strings matching their Python names (e.g. `"Revolute"`, `"Success"`, `"Redundant"`). **Transform shorthand:** The `placement` and `marker_*` fields use the `Transform` struct: `position` is `[x, y, z]`, `quaternion` is `[w, x, y, z]`. **Constraint.Limit:** ```json { "kind": "RotationMin", "value": -1.5708, "tolerance": 1e-9 } ``` **MotionDef:** ```json { "kind": "Rotational", "joint_id": "Joint001", "marker_i": "", "marker_j": "", "rotation_expr": "2*pi*t", "translation_expr": "" } ``` **SimulationParams:** ```json { "t_start": 0.0, "t_end": 2.0, "h_out": 0.04, "h_min": 1e-9, "h_max": 1.0, "error_tol": 1e-6 } ``` ### 3.2 SolveResult JSON Schema ```json { "status": "Success", "placements": [ { "id": "Part002", "placement": { "position": [50.0, 0.0, 0.0], "quaternion": [0.707, 0.0, 0.707, 0.0] } } ], "dof": 1, "diagnostics": [ { "constraint_id": "Joint003", "kind": "Redundant", "detail": "6 DOF removed by Joint003 are already constrained" } ], "num_frames": 0 } ``` ### 3.3 Solver Job Record Solver jobs are stored in the existing `jobs` table. The solver-specific data is in the `args` and `result` JSONB columns. **Job args (input):** ```json { "solver": "ondsel", "operation": "solve", "context": { /* SolveContext JSON */ }, "item_part_number": "ASM-001", "revision_number": 3 } ``` **Operation types:** | Operation | Description | Requires simulation? | |-----------|-------------|---------------------| | `solve` | Static equilibrium solve | No | | `diagnose` | Constraint analysis only (no placement update) | No | | `kinematic` | Time-domain kinematic simulation | Yes | **Job result (output):** ```json { "result": { /* SolveResult JSON */ }, "solver_name": "OndselSolver (Lagrangian)", "solver_version": "1.0", "solve_time_ms": 127.4 } ``` --- ## 4. REST API All endpoints are prefixed with `/api/solver/` and gated behind `RequireModule("solver")`. ### 4.1 Submit Solve Job ``` POST /api/solver/jobs Authorization: Bearer silo_... Content-Type: application/json { "solver": "ondsel", "operation": "solve", "context": { /* SolveContext */ }, "priority": 50 } ``` **Optional fields:** | Field | Type | Default | Description | |-------|------|---------|-------------| | `solver` | string | `""` (default solver) | Solver name from registry | | `operation` | string | `"solve"` | `solve`, `diagnose`, or `kinematic` | | `context` | object | required | SolveContext JSON | | `priority` | int | `50` | Lower = higher priority | | `item_part_number` | string | `null` | Silo item reference (for result association) | | `revision_number` | int | `null` | Revision that generated this context | | `callback_url` | string | `null` | Webhook URL for completion notification | **Response `201 Created`:** ```json { "job_id": "550e8400-e29b-41d4-a716-446655440000", "status": "pending", "created_at": "2026-02-19T18:30:00Z" } ``` **Error responses:** | Code | Condition | |------|-----------| | `400` | Invalid SolveContext (missing required fields, unknown enum values) | | `401` | Not authenticated | | `404` | Module not enabled | | `422` | Unknown solver name, invalid operation | ### 4.2 Get Job Status ``` GET /api/solver/jobs/{jobID} ``` **Response `200 OK`:** ```json { "job_id": "550e8400-...", "status": "completed", "operation": "solve", "solver": "ondsel", "priority": 50, "item_part_number": "ASM-001", "revision_number": 3, "runner_id": "runner-01", "runner_name": "solver-worker-01", "created_at": "2026-02-19T18:30:00Z", "claimed_at": "2026-02-19T18:30:01Z", "completed_at": "2026-02-19T18:30:02Z", "result": { "result": { /* SolveResult */ }, "solver_name": "OndselSolver (Lagrangian)", "solve_time_ms": 127.4 } } ``` ### 4.3 List Solver Jobs ``` GET /api/solver/jobs?status=completed&item=ASM-001&limit=20&offset=0 ``` **Query parameters:** | Param | Type | Description | |-------|------|-------------| | `status` | string | Filter by status: `pending`, `claimed`, `running`, `completed`, `failed` | | `item` | string | Filter by item part number | | `operation` | string | Filter by operation type | | `solver` | string | Filter by solver name | | `limit` | int | Page size (default 20, max 100) | | `offset` | int | Pagination offset | **Response `200 OK`:** ```json { "jobs": [ /* array of job objects */ ], "total": 42, "limit": 20, "offset": 0 } ``` ### 4.4 Cancel Job ``` POST /api/solver/jobs/{jobID}/cancel ``` Only `pending` and `claimed` jobs can be cancelled. Running jobs must complete or time out. **Response `200 OK`:** ```json { "job_id": "550e8400-...", "status": "cancelled" } ``` ### 4.5 Get Solver Registry ``` GET /api/solver/solvers ``` Returns available solvers on registered runners. Runners report their solver capabilities during heartbeat. **Response `200 OK`:** ```json { "solvers": [ { "name": "ondsel", "display_name": "OndselSolver (Lagrangian)", "deterministic": true, "supported_joints": [ "Coincident", "Fixed", "Revolute", "Cylindrical", "Slider", "Ball", "Screw", "Gear", "RackPinion", "Parallel", "Perpendicular", "Angle", "Planar", "Concentric", "PointOnLine", "PointInPlane", "LineInPlane", "Tangent", "DistancePointPoint", "DistanceCylSph", "Universal" ], "runner_count": 2 } ], "default_solver": "ondsel" } ``` --- ## 5. Server-Sent Events Solver jobs emit events on the existing `/api/events` SSE stream. ### 5.1 Event Types Solver jobs use the existing `job.*` SSE event prefix (see [WORKERS.md](WORKERS.md)). Clients filter on `definition_name` to identify solver-specific events. | Event | Payload | When | |-------|---------|------| | `job.created` | `{job_id, definition_name, trigger, item_id}` | Job submitted | | `job.claimed` | `{job_id, runner_id, runner}` | Runner claims work | | `job.progress` | `{job_id, progress, message}` | Progress update (0-100) | | `job.completed` | `{job_id, runner_id}` | Job succeeded | | `job.failed` | `{job_id, runner_id, error}` | Job failed | ### 5.2 Example Stream ``` event: job.created data: {"job_id":"abc-123","definition_name":"assembly-solve","trigger":"manual","item_id":"uuid-..."} event: job.claimed data: {"job_id":"abc-123","runner_id":"r1","runner":"solver-worker-01"} event: job.progress data: {"job_id":"abc-123","progress":50,"message":"Building constraint system..."} event: job.completed data: {"job_id":"abc-123","runner_id":"r1"} ``` ### 5.3 Client Integration The Create client subscribes to the SSE stream and updates the Assembly workbench UI: 1. **Silo viewport widget** shows job status indicator (pending/running/done/failed) 2. On `job.completed` (where `definition_name` starts with `assembly-`), the client fetches the full result via `GET /api/jobs/{id}` and applies placements 3. On `job.failed`, the client shows the error in the report panel 4. Diagnostic results (redundant/conflicting constraints) surface in the constraint tree --- ## 6. Runner Integration ### 6.1 Runner Requirements Solver runners are standard `silorunner` instances (see `cmd/silorunner/main.go`) registered with the `solver` tag. The existing runner binary already handles the full job lifecycle (claim, start, progress, complete/fail, log, DAG sync). Solver support requires adding `solver-run`, `solver-diagnose`, and `solver-kinematic` to the runner's command dispatch (currently handles `create-validate`, `create-export`, `create-dag-extract`, `create-thumbnail`). Additional requirements on the runner host: - Python 3.11+ with `kcsolve` module installed - `libKCSolve.so` and solver backend libraries (e.g. `libOndselSolver.so`) - Network access to the Silo server No FreeCAD installation is required. The runner operates on pre-extracted `SolveContext` JSON. ### 6.2 Runner Registration ```bash # Register a solver runner (admin) curl -X POST https://silo.example.com/api/runners \ -H "Authorization: Bearer admin_token" \ -d '{"name":"solver-01","tags":["solver"]}' # Response includes one-time token {"id":"uuid","token":"silo_runner_xyz..."} ``` ### 6.3 Runner Heartbeat and Capabilities The existing heartbeat endpoint (`POST /api/runner/heartbeat`) takes no body — it updates `last_heartbeat` on every authenticated request via the `RequireRunnerAuth` middleware. Runners that go 90 seconds without a request are marked offline by the background sweeper. Solver capabilities are reported via the runner's `metadata` JSONB field, set at registration time: ```bash curl -X POST https://silo.example.com/api/runners \ -H "Authorization: Bearer admin_token" \ -d '{ "name": "solver-01", "tags": ["solver"], "metadata": { "solvers": ["ondsel"], "api_version": 1, "python_version": "3.11.11" } }' ``` > **Future enhancement:** The heartbeat endpoint could be extended to accept an optional body for dynamic capability updates, but currently capabilities are static per registration. ### 6.4 Runner Execution Flow ```python #!/usr/bin/env python3 """Solver runner entry point.""" import json import kcsolve def execute_solve_job(args: dict) -> dict: """Execute a solver job from parsed args.""" solver_name = args.get("solver", "") operation = args.get("operation", "solve") ctx_dict = args["context"] # Deserialize SolveContext from JSON ctx = kcsolve.SolveContext.from_dict(ctx_dict) # Load solver solver = kcsolve.load(solver_name) if solver is None: raise ValueError(f"Unknown solver: {solver_name!r}") # Execute operation if operation == "solve": result = solver.solve(ctx) elif operation == "diagnose": diags = solver.diagnose(ctx) result = kcsolve.SolveResult() result.diagnostics = diags elif operation == "kinematic": result = solver.run_kinematic(ctx) else: raise ValueError(f"Unknown operation: {operation!r}") # Serialize result return { "result": result.to_dict(), "solver_name": solver.name(), "solver_version": "1.0", } ``` ### 6.5 Standalone Process Mode For minimal deployments, the runner can invoke a standalone solver process: ```bash echo '{"solver":"ondsel","operation":"solve","context":{...}}' | \ python3 -m kcsolve.runner ``` The `kcsolve.runner` module reads JSON from stdin, executes the solve, and writes the result JSON to stdout. Exit code 0 = success, non-zero = failure with error JSON on stderr. --- ## 7. Job Definitions ### 7.1 Manual Solve Job Triggered by the client when the user requests a server-side solve. > **Note:** The `compute.type` uses `custom` because the valid types in `internal/jobdef/jobdef.go` are: `validate`, `rebuild`, `diff`, `export`, `custom`. Solver commands are dispatched by the runner based on the `command` field. ```yaml job: name: assembly-solve version: 1 description: "Solve assembly constraints on server" trigger: type: manual scope: type: assembly compute: type: custom command: solver-run runner: tags: [solver] timeout: 300 max_retries: 1 priority: 50 ``` ### 7.2 Commit-Time Validation Automatically validates assembly constraints when a new revision is committed: ```yaml job: name: assembly-validate version: 1 description: "Validate assembly constraints on commit" trigger: type: revision_created filter: item_type: assembly scope: type: assembly compute: type: custom command: solver-diagnose args: operation: diagnose runner: tags: [solver] timeout: 120 max_retries: 2 priority: 75 ``` ### 7.3 Kinematic Simulation Server-side kinematic simulation for assemblies with motion definitions: ```yaml job: name: assembly-kinematic version: 1 description: "Run kinematic simulation" trigger: type: manual scope: type: assembly compute: type: custom command: solver-kinematic args: operation: kinematic runner: tags: [solver] timeout: 1800 max_retries: 0 priority: 100 ``` --- ## 8. SolveContext Extraction When a solver job is triggered by a revision commit (rather than a direct context submission), the server or runner must extract a `SolveContext` from the `.kc` file. ### 8.1 Extraction via Headless Create For full-fidelity extraction that handles geometry classification: ```bash create --console -e " import kcsolve_extract kcsolve_extract.extract_and_solve('input.kc', 'output.json', solver='ondsel') " ``` This requires a full Create installation on the runner and uses the Assembly module's existing adapter layer to build `SolveContext` from document objects. ### 8.2 Extraction from .kc Silo Directory For lightweight extraction without FreeCAD, the constraint graph can be stored in the `.kc` archive's `silo/` directory during commit: ``` silo/solver/context.json # Pre-extracted SolveContext silo/solver/result.json # Last solve result (if any) ``` The client extracts the `SolveContext` locally before committing the `.kc` file. The server reads it from the archive, avoiding the need for geometry processing on the runner. **Commit-time packing** (client side): ```python # In the Assembly workbench commit hook: ctx = assembly_object.build_solve_context() kc_archive.write("silo/solver/context.json", ctx.to_json()) ``` **Runner-side extraction:** ```python import zipfile, json with zipfile.ZipFile("assembly.kc") as zf: ctx_json = json.loads(zf.read("silo/solver/context.json")) ``` --- ## 9. Database Schema ### 9.1 Migration The solver module uses the existing `jobs` table. One new table is added for result caching: ```sql -- Migration: 021_solver_results.sql CREATE TABLE solver_results ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), item_id UUID NOT NULL REFERENCES items(id) ON DELETE CASCADE, revision_number INTEGER NOT NULL, job_id UUID REFERENCES jobs(id) ON DELETE SET NULL, operation TEXT NOT NULL, -- 'solve', 'diagnose', 'kinematic' solver_name TEXT NOT NULL, status TEXT NOT NULL, -- SolveStatus string dof INTEGER, diagnostics JSONB DEFAULT '[]', placements JSONB DEFAULT '[]', num_frames INTEGER DEFAULT 0, solve_time_ms DOUBLE PRECISION, created_at TIMESTAMPTZ NOT NULL DEFAULT now(), UNIQUE(item_id, revision_number, operation) ); CREATE INDEX idx_solver_results_item ON solver_results(item_id); CREATE INDEX idx_solver_results_status ON solver_results(status); ``` The `UNIQUE(item_id, revision_number, operation)` constraint means each revision has at most one result per operation type. Re-running overwrites the previous result. ### 9.2 Result Association When a solver job completes, the server: 1. Stores the full result in the `jobs.result` JSONB column (standard job result) 2. Upserts a row in `solver_results` for quick lookup by item/revision 3. Broadcasts `job.completed` SSE event --- ## 10. Configuration ### 10.1 Server Config ```yaml # config.yaml modules: solver: enabled: true default_solver: "ondsel" max_context_size_mb: 10 # Reject oversized SolveContext payloads default_timeout: 300 # Default job timeout (seconds) auto_diagnose_on_commit: true # Auto-submit diagnose job on revision commit ``` ### 10.2 Environment Variables | Variable | Description | |----------|-------------| | `SILO_SOLVER_ENABLED` | Override module enabled state | | `SILO_SOLVER_DEFAULT` | Default solver name | ### 10.3 Runner Config ```yaml # runner.yaml server_url: https://silo.example.com token: silo_runner_xyz... tags: [solver] solver: kcsolve_path: /opt/create/lib # LD_LIBRARY_PATH for kcsolve.so python: /opt/create/bin/python3 max_concurrent: 2 # Parallel job slots per runner ``` --- ## 11. Security ### 11.1 Authentication All solver endpoints use the existing Silo authentication: - **User endpoints** (`/api/solver/jobs`): Session or API token, requires `viewer` role to read, `editor` role to submit - **Runner endpoints** (`/api/runner/...`): Runner token authentication (existing) ### 11.2 Input Validation The server validates SolveContext JSON before queuing: - Maximum payload size (configurable, default 10 MB) - Required fields present (`parts`, `constraints`) - Enum values are valid strings - Transform arrays have correct length (position: 3, quaternion: 4) - No duplicate part or constraint IDs ### 11.3 Runner Isolation Solver runners execute untrusted constraint data. Mitigations: - Runners should run in containers or sandboxed environments - Python solver registration (`kcsolve.register_solver()`) is disabled in runner mode - Solver execution has a configurable timeout (killed on expiry) - Result size is bounded (large kinematic simulations are truncated) --- ## 12. Client SDK ### 12.1 Python Client The existing `silo-client` package is extended with solver methods: ```python from silo_client import SiloClient client = SiloClient("https://silo.example.com", token="silo_...") # Submit a solve job import kcsolve ctx = kcsolve.SolveContext() # ... build context ... job = client.solver.submit(ctx.to_dict(), solver="ondsel") print(job.id, job.status) # "pending" # Poll for completion result = client.solver.wait(job.id, timeout=60) print(result.status) # "Success" # Or use SSE for real-time updates for event in client.solver.stream(job.id): print(event.type, event.data) # Query results for an item results = client.solver.results("ASM-001") ``` ### 12.2 Create Workbench Integration The Assembly workbench adds a "Solve on Server" command: ```python # CommandSolveOnServer.py (sketch) def activated(self): assembly = get_active_assembly() ctx = assembly.build_solve_context() # Submit to Silo from silo_client import get_client client = get_client() job = client.solver.submit(ctx.to_dict()) # Subscribe to SSE for updates self.watch_job(job.id) def on_solver_completed(self, job_id, result): # Apply placements back to assembly assembly = get_active_assembly() for pr in result["placements"]: assembly.set_part_placement(pr["id"], pr["placement"]) assembly.recompute() ``` --- ## 13. Implementation Plan ### Phase 3a: JSON Serialization Add `to_dict()` / `from_dict()` methods to all KCSolve types in the pybind11 module. **Files to modify:** - `src/Mod/Assembly/Solver/bindings/kcsolve_py.cpp` -- add dict conversion methods **Verification:** `ctx.to_dict()` round-trips through `SolveContext.from_dict()`. ### Phase 3b: Server Endpoints -- COMPLETE Add the solver module to the Silo server. This builds on the existing job queue infrastructure (`migration 015_jobs_runners.sql`, `internal/db/jobs.go`, `internal/api/job_handlers.go`, `internal/api/runner_handlers.go`). **Implemented files:** - `internal/api/solver_handlers.go` -- REST endpoint handlers (solver-specific convenience layer over existing `/api/jobs`) - `internal/db/migrations/021_solver_results.sql` -- Database migration for result caching table - Module registered as `solver` in `internal/modules/modules.go` with `jobs` dependency ### Phase 3c: Runner Support Add solver command handlers to the existing `silorunner` binary (`cmd/silorunner/main.go`). The runner already implements the full job lifecycle (claim, start, progress, complete/fail). This phase adds `solver-run`, `solver-diagnose`, and `solver-kinematic` to the `executeJob` switch statement. **Files to modify:** - `cmd/silorunner/main.go` -- Add solver command dispatch cases - `src/Mod/Assembly/Solver/bindings/runner.py` -- `kcsolve.runner` Python entry point (invoked by silorunner via subprocess) ### Phase 3d: .kc Context Packing Pack `SolveContext` into `.kc` archives on commit. **Files to modify:** - `mods/silo/freecad/silo_origin.py` -- Hook into commit to pack solver context ### Phase 3e: Client Integration Add "Solve on Server" command to the Assembly workbench. **Files to modify:** - `mods/silo/freecad/` -- Solver client methods - `src/Mod/Assembly/` -- Server solve command --- ## 14. Open Questions 1. **Context size limits** -- Large assemblies may produce multi-MB SolveContext JSON. Should we compress (gzip) or use a binary format (msgpack)? 2. **Result persistence** -- How long should solver results be retained? Per-revision (overwritten on next commit) or historical (keep all)? 3. **Kinematic frame storage** -- Kinematic simulations can produce thousands of frames. Store all frames in JSONB, or write to a separate file and reference it? 4. **Multi-solver comparison** -- Should the API support running the same context through multiple solvers and comparing results? Useful for Phase 4 (second solver validation). 5. **Webhook notifications** -- The `callback_url` field allows external integrations (e.g. CI). What authentication should the webhook use? --- ## 15. References - [KCSolve Architecture](../architecture/ondsel-solver.md) - [KCSolve Python API Reference](../reference/kcsolve-python.md) - [INTER_SOLVER.md](../../INTER_SOLVER.md) -- Full pluggable solver spec - [WORKERS.md](WORKERS.md) -- Worker/runner job system - [SPECIFICATION.md](SPECIFICATION.md) -- Silo server specification - [MODULES.md](MODULES.md) -- Module system