Background jobs with retries—without a worker fleet
Background jobs fail: APIs time out, databases lock, networks blip. A retry platform needs per-step backoff, idempotent handlers, and searchable failure history—not a while-loop in a worker process. Inquir pipelines retry failed steps automatically while keeping completed work checkpointed.
Last updated: 2026-06-23
Answer first
Direct answer
Background jobs with retries—without a worker fleet. Each pipeline step retries independently on failure. Completed steps stay checkpointed—only the failed step re-runs. Retry policy (count, backoff) is configuration, not application code.
When it fits
- Async work triggered from HTTP, webhooks, or cron where downstream failures are expected
- Multi-step jobs where re-running the entire job on failure is expensive or unsafe
Tradeoffs
- Retrying the entire job from scratch re-runs completed steps—double emails, double charges, duplicate records if handlers are not idempotent.
- Fixed-interval retries hammer failing downstream services. Exponential backoff needs careful tuning and jitter—boilerplate most teams copy incorrectly.
Workload and what breaks
Why background job retries are harder than they look
Most teams start with fire-and-forget: enqueue a job, hope it succeeds. When it fails, someone greps logs and manually replays. At scale, silent failures become revenue leaks and angry customers.
Custom retry loops in worker processes lose state on restart, double-process on crash recovery, and hide retry counts from observability dashboards.
Trade-offs
Where DIY retry logic breaks
Retrying the entire job from scratch re-runs completed steps—double emails, double charges, duplicate records if handlers are not idempotent.
Fixed-interval retries hammer failing downstream services. Exponential backoff needs careful tuning and jitter—boilerplate most teams copy incorrectly.
How Inquir helps
Pipeline step retries as a platform primitive
Each pipeline step retries independently on failure. Completed steps stay checkpointed—only the failed step re-runs. Retry policy (count, backoff) is configuration, not application code.
Execution history shows every attempt: input, output, duration, error message, retry count. Alert on failure rates without building a custom dashboard.
What you get
Background job retry patterns
Per-step retry with backoff
Failed steps retry with configurable policy. Completed steps are not re-executed when a later step fails.
Idempotent job handlers
Design handlers to tolerate retries: upsert by job ID, check-before-write, dedupe keys from trigger payload.
Trigger from any entry point
HTTP 202, webhook ACK, cron schedule, or another job—all enqueue pipeline work with the same retry semantics.
Searchable failure history
Find failed runs by time range, function, or error text. Replay from stored payload without re-triggering the original HTTP request.
What to do next
How to run background jobs with retries on Inquir
Accept fast, enqueue durable work, let the platform retry failed steps.
Accept and enqueue
HTTP handler or webhook returns 202 immediately. Call global.durable.startNew() with the job payload.
Write idempotent step handlers
Each pipeline step checks a dedupe key before side effects. Return structured output for the next step.
Configure retry policy and alert
Set step retry count and backoff in pipeline config. Alert when failure rate exceeds threshold.
Code example
Background job with retriable pipeline steps
The HTTP handler enqueues work in one line. Pipeline steps retry independently on downstream failure.
export async function handler(event) { const { userId, format } = JSON.parse(event.body || '{}'); if (!userId) return { statusCode: 400, body: JSON.stringify({ error: 'userId required' }) }; const { instanceId: jobId } = await global.durable.startNew('export-data', undefined, { userId, format }); return { statusCode: 202, body: JSON.stringify({ jobId, status: 'queued' }) }; }
export async function handler(event) { const { userId, format } = event.payload ?? {}; const existing = await db.findExport(userId, event.instanceId); if (existing) return { statusCode: 200, body: JSON.stringify({ url: existing.url }) }; const url = await generateAndUploadExport(userId, format); await db.saveExport(userId, event.instanceId, url); return { statusCode: 200, body: JSON.stringify({ url }) }; }
When it fits
Good fit for background jobs with retries
When this works
- Async work triggered from HTTP, webhooks, or cron where downstream failures are expected
- Multi-step jobs where re-running the entire job on failure is expensive or unsafe
When to skip it
- Synchronous work under one second with no failure risk
FAQ
FAQ
How is this different from BullMQ retries?
BullMQ retries entire jobs in Redis with a worker process. Inquir retries individual pipeline steps with checkpointed progress—no Redis, no worker fleet.
Can I set different retry policies per step?
Yes. Configure retry count and backoff per pipeline step. A flaky external API step can retry more aggressively than a local validation step.
What happens after retries are exhausted?
The pipeline run is marked failed with full logs. Replay manually from execution history or alert for operator intervention.