Inquir Compute · background jobs

Background jobs with retries—without a worker fleet

Background jobs fail: APIs time out, databases lock, networks blip. A retry platform needs per-step backoff, idempotent handlers, and searchable failure history—not a while-loop in a worker process. Inquir pipelines retry failed steps automatically while keeping completed work checkpointed.

Last updated: 2026-06-23

Direct answer

Background jobs with retries—without a worker fleet. Each pipeline step retries independently on failure. Completed steps stay checkpointed—only the failed step re-runs. Retry policy (count, backoff) is configuration, not application code.

When it fits

  • Async work triggered from HTTP, webhooks, or cron where downstream failures are expected
  • Multi-step jobs where re-running the entire job on failure is expensive or unsafe

Tradeoffs

  • Retrying the entire job from scratch re-runs completed steps—double emails, double charges, duplicate records if handlers are not idempotent.
  • Fixed-interval retries hammer failing downstream services. Exponential backoff needs careful tuning and jitter—boilerplate most teams copy incorrectly.

Why background job retries are harder than they look

Most teams start with fire-and-forget: enqueue a job, hope it succeeds. When it fails, someone greps logs and manually replays. At scale, silent failures become revenue leaks and angry customers.

Custom retry loops in worker processes lose state on restart, double-process on crash recovery, and hide retry counts from observability dashboards.

Where DIY retry logic breaks

Retrying the entire job from scratch re-runs completed steps—double emails, double charges, duplicate records if handlers are not idempotent.

Fixed-interval retries hammer failing downstream services. Exponential backoff needs careful tuning and jitter—boilerplate most teams copy incorrectly.

Pipeline step retries as a platform primitive

Each pipeline step retries independently on failure. Completed steps stay checkpointed—only the failed step re-runs. Retry policy (count, backoff) is configuration, not application code.

Execution history shows every attempt: input, output, duration, error message, retry count. Alert on failure rates without building a custom dashboard.

Background job retry patterns

Per-step retry with backoff

Failed steps retry with configurable policy. Completed steps are not re-executed when a later step fails.

Idempotent job handlers

Design handlers to tolerate retries: upsert by job ID, check-before-write, dedupe keys from trigger payload.

Trigger from any entry point

HTTP 202, webhook ACK, cron schedule, or another job—all enqueue pipeline work with the same retry semantics.

Searchable failure history

Find failed runs by time range, function, or error text. Replay from stored payload without re-triggering the original HTTP request.

How to run background jobs with retries on Inquir

Accept fast, enqueue durable work, let the platform retry failed steps.

1

Accept and enqueue

HTTP handler or webhook returns 202 immediately. Call global.durable.startNew() with the job payload.

2

Write idempotent step handlers

Each pipeline step checks a dedupe key before side effects. Return structured output for the next step.

3

Configure retry policy and alert

Set step retry count and backoff in pipeline config. Alert when failure rate exceeds threshold.

Background job with retriable pipeline steps

The HTTP handler enqueues work in one line. Pipeline steps retry independently on downstream failure.

jobs/enqueue-export.mjs
export async function handler(event) {
  const { userId, format } = JSON.parse(event.body || '{}');
  if (!userId) return { statusCode: 400, body: JSON.stringify({ error: 'userId required' }) };
  const { instanceId: jobId } = await global.durable.startNew('export-data', undefined, { userId, format });
  return { statusCode: 202, body: JSON.stringify({ jobId, status: 'queued' }) };
}
jobs/export-step.mjs (pipeline step — retried on failure)
export async function handler(event) {
  const { userId, format } = event.payload ?? {};
  const existing = await db.findExport(userId, event.instanceId);
  if (existing) return { statusCode: 200, body: JSON.stringify({ url: existing.url }) };
  const url = await generateAndUploadExport(userId, format);
  await db.saveExport(userId, event.instanceId, url);
  return { statusCode: 200, body: JSON.stringify({ url }) };
}

Good fit for background jobs with retries

When this works

  • Async work triggered from HTTP, webhooks, or cron where downstream failures are expected
  • Multi-step jobs where re-running the entire job on failure is expensive or unsafe

When to skip it

  • Synchronous work under one second with no failure risk

FAQ

How is this different from BullMQ retries?

BullMQ retries entire jobs in Redis with a worker process. Inquir retries individual pipeline steps with checkpointed progress—no Redis, no worker fleet.

Can I set different retry policies per step?

Yes. Configure retry count and backoff per pipeline step. A flaky external API step can retry more aggressively than a local validation step.

What happens after retries are exhausted?

The pipeline run is marked failed with full logs. Replay manually from execution history or alert for operator intervention.