feat(jobs): implement auto-retry on job failure #108

Open
opened 2026-02-15 10:54:45 +00:00 by forbes · 0 comments
Owner

Ref: docs/WORKERS.md §4

The DB schema supports retry_count and max_retries on jobs, and job definitions specify max_retries, but no auto-retry logic exists. When a job fails, it stays failed regardless of retry budget.

Requirements

  • When a job transitions to failed (via runner FailJob or timeout sweeper), check if retry_count < max_retries
  • If retries remain, create a new pending job with the same definition, item, and scope, incrementing retry_count
  • Log the retry in the original job's log: "Retrying (attempt N of M)"
  • Publish SSE job.created event for the retry job
  • The timeout sweeper (TimeoutExpiredJobs) should also trigger retries for timed-out jobs

Acceptance Criteria

  • Failed jobs with remaining retries automatically create a new pending job
  • Timed-out jobs with remaining retries automatically create a new pending job
  • Retry count is incremented correctly
  • Jobs at max retries stay failed (no infinite loop)
  • Tests for retry logic
**Ref:** docs/WORKERS.md §4 The DB schema supports `retry_count` and `max_retries` on jobs, and job definitions specify `max_retries`, but **no auto-retry logic exists**. When a job fails, it stays failed regardless of retry budget. ## Requirements - When a job transitions to `failed` (via runner `FailJob` or timeout sweeper), check if `retry_count < max_retries` - If retries remain, create a new `pending` job with the same definition, item, and scope, incrementing `retry_count` - Log the retry in the original job's log: "Retrying (attempt N of M)" - Publish SSE `job.created` event for the retry job - The timeout sweeper (`TimeoutExpiredJobs`) should also trigger retries for timed-out jobs ## Acceptance Criteria - [ ] Failed jobs with remaining retries automatically create a new pending job - [ ] Timed-out jobs with remaining retries automatically create a new pending job - [ ] Retry count is incremented correctly - [ ] Jobs at max retries stay failed (no infinite loop) - [ ] Tests for retry logic
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: kindred/silo#108