feat(storage): explode .kc/.fcstd archives for XML-level revision diffing #176

Open
opened 2026-03-03 20:30:49 +00:00 by forbes · 0 comments
Owner

Context

From GAP_ANALYSIS.md Open Question #4: "Exploded FCStd storage for better diffing?"

Decision: Yes. Explode .kc and .fcstd ZIP archives on commit and store the extracted tree alongside the original archive. This enables meaningful XML-level diffs in revision history. Add a retention policy for exploded trees via config.yaml.

Current Behavior

Files are stored as opaque blobs in {root_dir}/{fileKey}. A .kc file is a ZIP bundle containing Document.xml, GuiDocument.xml, BREP geometry files, thumbnails, and the silo/ metadata directory. Currently there is no way to see what changed between two revisions at the file content level — only property/metadata diffs are available via GET /api/items/{pn}/revisions/compare.

Proposed Behavior

Storage Layout

When a .kc or .fcstd file is uploaded, the server:

  1. Stores the original archive as-is (existing behavior)
  2. Extracts the ZIP contents into an adjacent _exploded/ directory:
{root_dir}/items/{partNumber}/rev3.kc              # original archive
{root_dir}/items/{partNumber}/rev3_exploded/        # extracted tree
  Document.xml
  GuiDocument.xml
  *.brp                                             # BREP geometry
  thumbnails/
  silo/
    manifest.json
    metadata.json
    dependencies.json
    ...

Diff API

Extend the existing revision comparison endpoint:

GET /api/items/{pn}/revisions/compare?from=2&to=3&include_file_diff=true

Response adds a file_changes section:

{
  "property_changes": { ... },
  "file_changes": [
    {
      "path": "Document.xml",
      "status": "modified",
      "diff_type": "xml",
      "diff": "... unified diff or structured diff ..."
    },
    {
      "path": "silo/metadata.json",
      "status": "modified",
      "diff_type": "json"
    },
    {
      "path": "Part__Pad003.brp",
      "status": "added",
      "diff_type": "binary",
      "size": 14320
    }
  ]
}
  • XML files: structured or unified text diff
  • JSON files: key-level diff
  • BREP/binary files: size change only (no content diff)

Retention Policy

Exploded trees can be large. Add configurable retention:

storage:
  backend: filesystem
  filesystem:
    root_dir: /opt/silo/data
  explode_archives: true          # enable archive explosion (default true)
  exploded_retention:
    keep_last_n: 10               # keep exploded trees for the N most recent revisions per item
    always_keep_released: true    # never delete exploded trees for released revisions

When an exploded tree is cleaned up, only the _exploded/ directory is deleted — the original archive is retained (subject to the separate revision retention policy from #175).

Extraction Rules

  • Only .kc and .fcstd files are exploded (check file extension)
  • Extraction happens synchronously during file upload (the ZIP is already in memory)
  • If extraction fails (corrupt ZIP), log a warning but do not fail the upload — store the archive without an exploded tree
  • Max exploded size safety: skip extraction if uncompressed size exceeds a configurable limit (e.g. 500 MB)

Files to Modify

  • internal/storage/filesystem.go — add Explode(key string) error method that extracts ZIP to {key}_exploded/
  • internal/api/handlers.go or internal/api/file_handlers.go — call Explode() after successful file upload for .kc/.fcstd files
  • internal/api/handlers.go — extend HandleCompareRevisions() to include file-level diffs when include_file_diff=true
  • internal/config/config.go — add ExplodeArchives, ExplodedRetention fields to StorageConfig
  • New cleanup goroutine or extend the retention sweeper from #175

Dependencies

  • #175 (revision retention policy) — the two retention systems should share the sweeper infrastructure
  • GAP_ANALYSIS.md Section 7, Question 4
  • GAP_ANALYSIS.md C.1: "Compare revisions — Visual + metadata diff / Metadata diff only"
  • ROADMAP.md Tier 2: "Intelligent FCStd Diffing"
  • .kc file format: ROADMAP.md "The .kc File Format" section
## Context From GAP_ANALYSIS.md Open Question #4: "Exploded FCStd storage for better diffing?" **Decision: Yes.** Explode `.kc` and `.fcstd` ZIP archives on commit and store the extracted tree alongside the original archive. This enables meaningful XML-level diffs in revision history. Add a retention policy for exploded trees via `config.yaml`. ## Current Behavior Files are stored as opaque blobs in `{root_dir}/{fileKey}`. A `.kc` file is a ZIP bundle containing `Document.xml`, `GuiDocument.xml`, BREP geometry files, thumbnails, and the `silo/` metadata directory. Currently there is no way to see what changed between two revisions at the file content level — only property/metadata diffs are available via `GET /api/items/{pn}/revisions/compare`. ## Proposed Behavior ### Storage Layout When a `.kc` or `.fcstd` file is uploaded, the server: 1. Stores the original archive as-is (existing behavior) 2. Extracts the ZIP contents into an adjacent `_exploded/` directory: ``` {root_dir}/items/{partNumber}/rev3.kc # original archive {root_dir}/items/{partNumber}/rev3_exploded/ # extracted tree Document.xml GuiDocument.xml *.brp # BREP geometry thumbnails/ silo/ manifest.json metadata.json dependencies.json ... ``` ### Diff API Extend the existing revision comparison endpoint: ``` GET /api/items/{pn}/revisions/compare?from=2&to=3&include_file_diff=true ``` Response adds a `file_changes` section: ```json { "property_changes": { ... }, "file_changes": [ { "path": "Document.xml", "status": "modified", "diff_type": "xml", "diff": "... unified diff or structured diff ..." }, { "path": "silo/metadata.json", "status": "modified", "diff_type": "json" }, { "path": "Part__Pad003.brp", "status": "added", "diff_type": "binary", "size": 14320 } ] } ``` - XML files: structured or unified text diff - JSON files: key-level diff - BREP/binary files: size change only (no content diff) ### Retention Policy Exploded trees can be large. Add configurable retention: ```yaml storage: backend: filesystem filesystem: root_dir: /opt/silo/data explode_archives: true # enable archive explosion (default true) exploded_retention: keep_last_n: 10 # keep exploded trees for the N most recent revisions per item always_keep_released: true # never delete exploded trees for released revisions ``` When an exploded tree is cleaned up, only the `_exploded/` directory is deleted — the original archive is retained (subject to the separate revision retention policy from #175). ### Extraction Rules - Only `.kc` and `.fcstd` files are exploded (check file extension) - Extraction happens synchronously during file upload (the ZIP is already in memory) - If extraction fails (corrupt ZIP), log a warning but do not fail the upload — store the archive without an exploded tree - Max exploded size safety: skip extraction if uncompressed size exceeds a configurable limit (e.g. 500 MB) ## Files to Modify - `internal/storage/filesystem.go` — add `Explode(key string) error` method that extracts ZIP to `{key}_exploded/` - `internal/api/handlers.go` or `internal/api/file_handlers.go` — call `Explode()` after successful file upload for `.kc`/`.fcstd` files - `internal/api/handlers.go` — extend `HandleCompareRevisions()` to include file-level diffs when `include_file_diff=true` - `internal/config/config.go` — add `ExplodeArchives`, `ExplodedRetention` fields to `StorageConfig` - New cleanup goroutine or extend the retention sweeper from #175 ## Dependencies - #175 (revision retention policy) — the two retention systems should share the sweeper infrastructure ## Related - GAP_ANALYSIS.md Section 7, Question 4 - GAP_ANALYSIS.md C.1: "Compare revisions — Visual + metadata diff / Metadata diff only" - ROADMAP.md Tier 2: "Intelligent FCStd Diffing" - .kc file format: ROADMAP.md "The .kc File Format" section
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: kindred/silo#176