feat(cli): data migration script — MinIO to filesystem #130

Closed
opened 2026-02-17 16:11:27 +00:00 by forbes · 0 comments
Owner

Summary

Write a Go CLI command (silo migrate-storage) to migrate all files from MinIO to the local filesystem backend, with checksum verification and database updates.

Context

After the filesystem backend (#127) and metadata columns (#128) are in place, existing deployments need a way to move their files from MinIO to the filesystem. The migration must be:

  • Idempotent — safe to re-run (skips already-migrated files)
  • Verified — SHA-256 checksum compared after transfer
  • Tracked — updates storage_backend column in DB so the application knows where each file lives

Data to migrate

Two categories of stored objects:

  1. Revision files — tracked in revisions table

    • Key pattern: items/{partNumber}/rev{N}.FCStd
    • Metadata: file_key, file_version, file_checksum, file_size
    • New column: file_storage_backend (from #128)
  2. Item file attachments — tracked in item_files table

    • Key pattern: items/{itemID}/files/{uuid}/{filename}
    • Metadata: object_key, size
    • New column: storage_backend (from #128)
  3. Thumbnails — tracked in items.thumbnail_key and revisions.thumbnail_key

    • Key patterns: items/{itemID}/thumbnail.png, thumbnails/{partNumber}/rev{N}.png

MinIO bucket

Bucket name: silo-files (configured in storage.bucket or env var, default from config.example.yaml).

Requirements

CLI entry point

Add a migrate-storage subcommand to the silod binary (or a separate silo-migrate binary):

silod migrate-storage [--dry-run] [--batch-size=100]

Algorithm

1. Connect to both MinIO (source) and filesystem (destination) using config
2. Connect to PostgreSQL

3. For each revision with file_storage_backend = 'minio' and file_key IS NOT NULL:
   a. Download from MinIO using file_key (and file_version if set)
   b. Write to filesystem at same key path
   c. Compute SHA-256 of downloaded content
   d. Compare with file_checksum in DB — if mismatch, log error and skip
   e. UPDATE revisions SET file_storage_backend = 'filesystem' WHERE id = ?

4. For each item_file with storage_backend = 'minio':
   a. Download from MinIO using object_key
   b. Write to filesystem at same key path
   c. Verify file size matches
   d. UPDATE item_files SET storage_backend = 'filesystem' WHERE id = ?

5. For each thumbnail (items.thumbnail_key, revisions.thumbnail_key):
   a. Download from MinIO
   b. Write to filesystem
   c. (No separate backend column for thumbnails — they follow the revision/item backend)

6. Print summary: total, migrated, skipped (already migrated), errors

Flags

  • --dry-run: List files that would be migrated without actually migrating
  • --batch-size: Number of files to process before committing DB updates (default 100)
  • --workers: Number of concurrent download/upload workers (default 4)

Error handling

  • Individual file failures should not abort the entire migration
  • Log errors with file key, error message, and continue
  • Exit code 0 if all files migrated, 1 if any errors occurred
  • Re-running skips files where storage_backend is already 'filesystem'

Files to create/modify

  • cmd/silod/migrate.go (or cmd/silo-migrate/main.go) — CLI entry point
  • Reuses internal/storage/ for both MinIO and filesystem FileStore implementations
  • Reuses internal/db/ for database access
  • Reuses internal/config/ for configuration

Testing

  • Integration test with a temp directory as filesystem root
  • Mock or embedded MinIO for source (or test against real MinIO in CI)
  • Verify idempotency: run twice, second run should skip all files

Acceptance criteria

  • All revision files migrated with verified checksums
  • All item file attachments migrated
  • All thumbnails migrated
  • Database rows updated to 'filesystem' backend
  • Idempotent — safe to re-run
  • --dry-run mode works
  • Summary report printed
  • Non-zero exit code on errors

Priority

P1

Depends on

  • #126 (FileStore interface)
  • #127 (filesystem backend)
  • #128 (metadata columns)

Part of

Storage Migration: MinIO → PostgreSQL + Filesystem

## Summary Write a Go CLI command (`silo migrate-storage`) to migrate all files from MinIO to the local filesystem backend, with checksum verification and database updates. ## Context After the filesystem backend (#127) and metadata columns (#128) are in place, existing deployments need a way to move their files from MinIO to the filesystem. The migration must be: - **Idempotent** — safe to re-run (skips already-migrated files) - **Verified** — SHA-256 checksum compared after transfer - **Tracked** — updates `storage_backend` column in DB so the application knows where each file lives ### Data to migrate Two categories of stored objects: 1. **Revision files** — tracked in `revisions` table - Key pattern: `items/{partNumber}/rev{N}.FCStd` - Metadata: `file_key`, `file_version`, `file_checksum`, `file_size` - New column: `file_storage_backend` (from #128) 2. **Item file attachments** — tracked in `item_files` table - Key pattern: `items/{itemID}/files/{uuid}/{filename}` - Metadata: `object_key`, `size` - New column: `storage_backend` (from #128) 3. **Thumbnails** — tracked in `items.thumbnail_key` and `revisions.thumbnail_key` - Key patterns: `items/{itemID}/thumbnail.png`, `thumbnails/{partNumber}/rev{N}.png` ### MinIO bucket Bucket name: `silo-files` (configured in `storage.bucket` or env var, default from `config.example.yaml`). ## Requirements ### CLI entry point Add a `migrate-storage` subcommand to the `silod` binary (or a separate `silo-migrate` binary): ``` silod migrate-storage [--dry-run] [--batch-size=100] ``` ### Algorithm ``` 1. Connect to both MinIO (source) and filesystem (destination) using config 2. Connect to PostgreSQL 3. For each revision with file_storage_backend = 'minio' and file_key IS NOT NULL: a. Download from MinIO using file_key (and file_version if set) b. Write to filesystem at same key path c. Compute SHA-256 of downloaded content d. Compare with file_checksum in DB — if mismatch, log error and skip e. UPDATE revisions SET file_storage_backend = 'filesystem' WHERE id = ? 4. For each item_file with storage_backend = 'minio': a. Download from MinIO using object_key b. Write to filesystem at same key path c. Verify file size matches d. UPDATE item_files SET storage_backend = 'filesystem' WHERE id = ? 5. For each thumbnail (items.thumbnail_key, revisions.thumbnail_key): a. Download from MinIO b. Write to filesystem c. (No separate backend column for thumbnails — they follow the revision/item backend) 6. Print summary: total, migrated, skipped (already migrated), errors ``` ### Flags - `--dry-run`: List files that would be migrated without actually migrating - `--batch-size`: Number of files to process before committing DB updates (default 100) - `--workers`: Number of concurrent download/upload workers (default 4) ### Error handling - Individual file failures should not abort the entire migration - Log errors with file key, error message, and continue - Exit code 0 if all files migrated, 1 if any errors occurred - Re-running skips files where `storage_backend` is already `'filesystem'` ## Files to create/modify - `cmd/silod/migrate.go` (or `cmd/silo-migrate/main.go`) — CLI entry point - Reuses `internal/storage/` for both MinIO and filesystem `FileStore` implementations - Reuses `internal/db/` for database access - Reuses `internal/config/` for configuration ## Testing - Integration test with a temp directory as filesystem root - Mock or embedded MinIO for source (or test against real MinIO in CI) - Verify idempotency: run twice, second run should skip all files ## Acceptance criteria - [ ] All revision files migrated with verified checksums - [ ] All item file attachments migrated - [ ] All thumbnails migrated - [ ] Database rows updated to `'filesystem'` backend - [ ] Idempotent — safe to re-run - [ ] `--dry-run` mode works - [ ] Summary report printed - [ ] Non-zero exit code on errors ## Priority P1 ## Depends on - #126 (FileStore interface) - #127 (filesystem backend) - #128 (metadata columns) ## Part of Storage Migration: MinIO → PostgreSQL + Filesystem
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: kindred/silo#130