Webhook retry platform: survive provider redeliveries
Stripe, GitHub, and Slack retry failed webhooks for hours or days. A webhook retry platform needs three layers: reject forgeries fast, ACK inside provider timeouts, and retry downstream work without duplicating side effects. Inquir gives you idempotency, pipeline retries, and per-delivery traces in one serverless stack.
Last updated: 2026-06-23
Answer first
Direct answer
Webhook retry platform: survive provider redeliveries. The webhook function verifies signatures, writes the provider event ID to durable storage, returns 200 immediately, and enqueues work to a pipeline. Provider retries hit the idempotency check and return 200 without re-processing.
When it fits
- SaaS webhooks from Stripe, GitHub, Slack, Shopify, or HubSpot with aggressive retry policies
- Handlers where downstream work can fail after you already returned 200 to the provider
Tradeoffs
- Checking "did I see this event ID?" in application memory does not survive restarts or horizontal scaling. You need durable idempotency keys written before any side effect.
- Retrying the entire webhook handler on downstream failure re-runs signature verification and risks double-processing if the first attempt partially succeeded.
Workload and what breaks
Two kinds of webhook retries—and both hurt if you ignore them
Provider retries: Stripe retries for up to 72 hours when your endpoint returns non-2xx or times out. GitHub retries for 3 days. Each redelivery is a new HTTP request with the same event ID—you must detect duplicates before mutating state.
Downstream retries: your handler ACKed fast, but the fulfillment API failed. Without a retry platform, that failure is silent—or you manually replay from logs. Pipeline step retries solve this without re-triggering the provider.
Trade-offs
Why ad-hoc retry logic fails at scale
Checking "did I see this event ID?" in application memory does not survive restarts or horizontal scaling. You need durable idempotency keys written before any side effect.
Retrying the entire webhook handler on downstream failure re-runs signature verification and risks double-processing if the first attempt partially succeeded.
How Inquir helps
Idempotent ingress + retriable pipelines
The webhook function verifies signatures, writes the provider event ID to durable storage, returns 200 immediately, and enqueues work to a pipeline. Provider retries hit the idempotency check and return 200 without re-processing.
Pipeline steps retry independently with exponential backoff. Execution traces show every delivery attempt, every step retry, and the final outcome—so on-call can answer "was this event processed?" in seconds.
What you get
Webhook retry platform features
Provider idempotency
Upsert event IDs before mutations. Duplicate deliveries from Stripe, GitHub, or Slack return 200 without side effects.
Fast ACK under timeout pressure
Return 200 within Slack's 3s window, Stripe's 30s limit, and GitHub's expectations—heavy work runs in pipelines.
Downstream step retries
Pipeline steps retry failed API calls, database writes, and notifications without re-invoking the webhook ingress function.
Per-delivery execution traces
Inspect headers, timing, retry count, and step outputs for every webhook delivery—not a black-box worker log.
What to do next
How to build a webhook retry platform on Inquir
Separate provider retry handling (idempotency at ingress) from downstream retry handling (pipeline step policy).
Verify and record event ID
Check HMAC on raw body, upsert provider event ID to durable storage, return 200—even on duplicate delivery.
Enqueue durable work
Call global.durable.startNew() with the parsed event payload. The HTTP response completes before downstream work begins.
Retry failed steps, not the webhook
Configure pipeline step retry policy for downstream failures. Completed steps are not re-run when a later step fails.
Code example
Idempotent webhook ingress with pipeline handoff
Provider retries hit the idempotency check. Downstream failures retry at the pipeline step level.
export async function handler(event) { const rawBody = event.body ?? ''; if (!verifyStripeSignature(rawBody, event.headers['stripe-signature'])) { return { statusCode: 400, body: 'invalid signature' }; } const evt = JSON.parse(rawBody); const isNew = await db.tryInsertWebhookEvent(evt.id, evt.type); if (!isNew) return { statusCode: 200, body: 'already processed' }; await global.durable.startNew('stripe-fulfillment', undefined, { eventId: evt.id, type: evt.type, object: evt.data.object, }); return { statusCode: 200, body: 'accepted' }; }
When it fits
Good fit for a webhook retry platform
When this works
- SaaS webhooks from Stripe, GitHub, Slack, Shopify, or HubSpot with aggressive retry policies
- Handlers where downstream work can fail after you already returned 200 to the provider
When to skip it
- Webhooks forwarded to a third-party iPaaS with no custom runtime
FAQ
FAQ
How long do providers retry?
Stripe: up to 72 hours with exponential backoff. GitHub: up to 3 days. Slack expects a response within 3 seconds or marks your app slow. Design for both fast ACK and duplicate delivery.
What if the pipeline step exhausts retries?
The pipeline run is marked failed with full step logs. Alert on failure rates; replay manually from the execution history using the stored event payload.
Do I need a separate dead-letter queue?
Failed pipeline runs serve as dead letters with searchable execution history. No separate DLQ infrastructure to provision.