2 Commits

Author SHA1 Message Date
f29060491e feat(datagen): add dataset generation CLI with sharding and checkpointing
Some checks failed
CI / lint (push) Has been cancelled
CI / type-check (push) Has been cancelled
CI / test (push) Has been cancelled
- Add solver/datagen/dataset.py with DatasetConfig, DatasetGenerator,
  ShardSpec/ShardResult dataclasses, parallel shard generation via
  ProcessPoolExecutor, checkpoint/resume support, index and stats output
- Add scripts/generate_synthetic.py CLI entry point with Hydra-first
  and argparse fallback modes
- Add minimal YAML parser (parse_simple_yaml) for config loading
  without PyYAML dependency
- Add progress display with tqdm fallback to print-based ETA
- Update configs/dataset/synthetic.yaml with shard_size, checkpoint_every
- Update solver/datagen/__init__.py with DatasetConfig, DatasetGenerator
  exports
- Add tests/datagen/test_dataset.py with 28 tests covering config,
  YAML parsing, seed derivation, end-to-end generation, resume,
  stats/index structure, determinism, and CLI integration

Closes #10
2026-02-03 08:44:31 -06:00
363b49281b build: phase 0 infrastructure setup
Some checks failed
CI / lint (push) Has been cancelled
CI / type-check (push) Has been cancelled
CI / test (push) Has been cancelled
- Project structure: solver/, freecad/, export/, configs/, scripts/, tests/, docs/
- pyproject.toml with dependency groups: core, train, freecad, dev
- Hydra configs: dataset (synthetic, fusion360), model (baseline, gat), training (pretrain, finetune), export (production)
- Dockerfile with CUDA+PyG GPU and CPU-only targets
- docker-compose.yml for train, test, data-gen services
- Makefile with targets: train, test, lint, format, type-check, data-gen, export, check
- Pre-commit hooks: ruff, mypy, conventional commits
- Gitea Actions CI: lint, type-check, test on push/PR
- README with setup and usage instructions
2026-02-02 13:26:38 -06:00