How to Handle Stripe Webhooks Reliably on Serverless

Verify the stripe-signature on the raw body, acknowledge fast, dedupe on the event id, and fulfill asynchronously — the four moves that make serverless Stripe webhooks reliable.

How to Handle Stripe Webhooks Reliably on Serverless

A Stripe webhook looks like the easiest endpoint you will ever write. Stripe POSTs some JSON, you return 200, done. Then a customer gets charged twice, or a paid order never ships, and you learn that the reliability of a payment integration lives entirely in four moves you didn’t make.

Those four moves are: verify the signature on the raw body, acknowledge fast, dedupe on the event id, and do the actual fulfillment asynchronously. Get them right and duplicate deliveries, retries, and replay attempts all collapse into no-ops. Get them wrong and every one of them turns into a support ticket or a chargeback.

This is the pragmatic version of “how to handle Stripe webhooks reliably on serverless” — the exact failure modes, a real stripe webhook handler you can copy, and an honest list of what the platform does and does not guarantee. Everything here is about serverless Stripe webhooks specifically, where short timeouts and at-least-once delivery make the discipline non-optional.

The three ways a Stripe webhook handler fails

Almost every broken Stripe integration is one of three failure modes, and each produces a distinct, expensive symptom.

1. Missed or broken verification. You parse the body before verifying, or you skip verification entirely. Now anyone who can reach your URL can POST a forged payment_intent.succeeded and trip your fulfillment for free. The subtler version is worse: you do verify, but against re-serialized JSON, so valid events fail the check and real payments silently never fulfill.

2. Slow acknowledgement. Your handler charges, fulfills, and emails inline, then returns 200 after eight seconds. Stripe’s delivery window is short; it gives up waiting and retries. Now the same event is in flight twice, and if step one didn’t include idempotency, both copies run.

3. Duplicate delivery. Stripe guarantees at-least-once delivery, not exactly-once. Retries, network blips, and Stripe-side redelivery all mean the same event id can arrive two or more times, sometimes seconds apart, sometimes hours. Without an idempotency key that is two orders, two emails, two ledger rows.

The fix for all three is the same four-step spine. The rest of this post is that spine.

Verify the stripe-signature on the raw body

Stripe signs every webhook with HMAC-SHA256 and puts the result in the stripe-signature header. That header is not a bare hash — it is a comma-separated list like t=1699999999,v1=abc123..., where t is the Unix timestamp Stripe signed at and v1 is the signature. The signed payload is not the parsed JSON object; it is the literal string ${t}.${rawBody} — the timestamp, a dot, and the exact bytes of the request body.

That last detail is the one that bites everyone who tries to verify a stripe webhook signature by hand. HMAC is computed over raw bytes. If you JSON.parse the body and re-serialize it, key order and whitespace can change, the bytes no longer match, and verification fails on perfectly valid events. The rule is absolute: read event.body as a string and verify before you parse.

On Inquir Compute you don’t hand-roll this on the hot path. Set webhookMode: 'stripe' on the gateway route, with webhookSecret set to your whsec_... signing secret. The gateway verifies the raw-body HMAC before your function is ever invoked, and returns 403 { error: { code: 'BAD_SIGNATURE' } } on any mismatch. Your code only runs on already-verified requests, and the body still arrives as an unparsed string, so nothing downstream is disturbed.

Stripe mode also understands the t= timestamp, so it supports replay tolerance: reject a signature whose timestamp is too old. That is what stops an attacker from capturing one valid signed request and replaying it hours later against your endpoint. Stripe’s own SDK does the same thing inside constructEvent, with a default tolerance window.

If you are not using gateway mode — maybe you want the Stripe SDK’s typed events object, or you’re verifying a different provider — you verify in the handler. The contract is identical: raw body string in, signature header in, secret in.

// Verify in-handler when you are NOT using the gateway's webhookMode: 'stripe'.
// The raw body must be the exact bytes Stripe signed — never JSON.parse first.
import Stripe from 'stripe';

const stripe = new Stripe(process.env.STRIPE_SECRET_KEY);

export async function handler(event) {
  const rawBody = event.body ?? '';                     // string, as delivered
  const sig = event.headers['stripe-signature'] ?? '';
  let evt;
  try {
    // constructEvent recomputes HMAC over `${t}.${rawBody}` and checks the
    // t= timestamp for replay within Stripe's default tolerance.
    evt = stripe.webhooks.constructEvent(rawBody, sig, process.env.STRIPE_WEBHOOK_SECRET);
  } catch (err) {
    return { statusCode: 400, body: `Webhook Error: ${err.message}` };
  }
  // ...then the same idempotency + fast ACK + async handoff as below.
  return { statusCode: 200, body: JSON.stringify({ received: true }) };
}

The gateway feature is turnkey for two providers plus a custom option: github (the x-hub-signature-256 scheme), stripe (above), and custom (a plain hex HMAC over the body, with the header name configurable via signatureHeader). Slack is deliberately not a built-in mode — its v0:timestamp:body scheme differs — so Slack verification stays handler code.

Acknowledge fast: Stripe wants a quick 2xx

Stripe expects a fast 2xx. Return within roughly 30 seconds or Stripe treats the delivery as failed and retries — for up to 72 hours. Every second your handler spends charging cards, writing to a slow database, or calling an email API is a second closer to a retry you did not want and a duplicate you now have to defend against.

So the acknowledgement and the work must be decoupled. The webhook handler’s only job is: verify (done at the gateway), dedupe, record intent, return 200. Nothing that can be slow happens before that 200 leaves.

Serverless makes this discipline structural rather than aspirational. On Inquir the default function timeout is 5 seconds, raisable to a hard maximum of 15 minutes per function or step. That 5-second default is a feature here, not a limitation: it is a standing reminder that a webhook handler is not the place to do fulfillment. If your handler wants more than a few hundred milliseconds, the work belongs somewhere else — which is the last two sections.

Make it idempotent: dedupe on the Stripe event id

Because delivery is at-least-once, the single most important line in a stripe webhook handler is the one that writes the event id before doing anything with side effects. This is webhook idempotency, and it is not optional.

Every Stripe event has a stable id of the form evt_.... Upsert it into a table with a unique constraint the moment you receive it. If the insert succeeds, this is the first time you have seen the event — proceed. If it collides, you have already handled this delivery — return 200 and stop. That is the whole mechanism, and it is what makes retries and duplicate deliveries harmless.

Order matters: do the idempotency write before the fulfillment, never after. If you fulfill first and record second, a crash in the gap guarantees a double on the next retry.

Inquir gives you a second, complementary layer for free. When you hand work to a durable job with global.durable.startNew(name, id, payload), the id you pass doubles as an idempotency key that dedupes startNew within a 24-hour window. Pass the Stripe event id as that instance id, and a duplicate delivery inside 24 hours will not enqueue a second fulfillment job — even if your own database write somehow slipped through. Belt and suspenders, and both layers cost nothing.

A realistic serverless stripe webhook handler

Here is the whole spine in one function. The route is configured with webhookMode: 'stripe', so verification already happened at the gateway; the handler parses, dedupes, hands off, and acks.

// webhooks/stripe.mjs
//
// Gateway route config: { method: 'POST', webhookMode: 'stripe',
//   webhookSecret: 'whsec_...' }.  The gateway verifies the stripe-signature
// header against the RAW body and returns 403 { error: { code: 'BAD_SIGNATURE' } }
// before this handler runs.  event.body still arrives as an unparsed string.
export async function handler(event) {
  // Trust boundary already crossed at the gateway — safe to parse now.
  const evt = JSON.parse(event.body ?? '{}');

  // Webhook idempotency: dedupe on the Stripe event id. Delivery is
  // at-least-once, so this write — before any side effect — is what makes
  // a duplicate delivery a no-op.
  const isNew = await db.webhookEvents.upsert({ id: evt.id, type: evt.type });
  if (!isNew) {
    return { statusCode: 200, body: JSON.stringify({ received: true, duplicate: true }) };
  }

  // Offload slow fulfillment to a durable, Postgres-backed job and return
  // immediately. The event id is passed as the instance id, so it also acts
  // as a 24h idempotency key on startNew.
  if (evt.type === 'payment_intent.succeeded') {
    await global.durable.startNew('stripe-fulfill', evt.id, {
      intentId: evt.data.object.id,
      amount: evt.data.object.amount,
    });
  }

  // Fast 2xx so Stripe marks the delivery done and stops retrying.
  return { statusCode: 200, body: JSON.stringify({ received: true }) };
}

Read the shape, not the line count. Parse only after the gateway’s trust boundary. Dedupe on evt.id. Offload anything slow to a durable job whose instance id is the event id. Return a fast 2xx. Everything expensive — the charge reconciliation, the fulfillment, the receipt email — happens in the job, outside the request.

Do fulfillment as a durable background job

global.durable.startNew('stripe-fulfill', evt.id, {...}) enqueues a durable, Postgres-backed background job. It survives restarts: once the row is written, the work is not lost if the container recycles under it. And it runs outside the HTTP request, so it is completely detached from Stripe’s delivery window — the job can take seconds or minutes without putting any retry pressure back on the webhook.

Retries are available on that job with exponential backoff, and a job that exhausts its attempts dead-letters with its last error recorded, instead of vanishing. Be precise about the default, though: a plain async job runs once unless you opt into more attempts; the built-in resume path uses up to five. Treat retries as something you turn on for a given job, not a guarantee that every job auto-retries forever. Visibility-timeout reaping separately picks up jobs whose worker died mid-flight.

Because each function and job runs in its own isolated container, your Stripe fulfillment gets the same observability as a normal API route: execution traces show each delivery with body (redacted), headers, timing, and retry count. When finance asks “did we actually process evt_1abc?”, that is a trace lookup, not a grep through log files.

One deliberate omission: do not try to model the whole fulfillment as one long, sleeping, human-gated workflow that pretends the HTTP request is still open. Keep the durable unit a plain job — or a pipeline of short steps, each within the 15-minute cap — that you can retry and reason about. Chaining short, idempotent steps beats one clever construct that assumes the caller is still waiting.

Test with the Stripe CLI

You do not need real card traffic to exercise any of this. The Stripe CLI forwards real event shapes to whatever endpoint you point it at.

  • stripe listen --forward-to <url> streams live events from your account to a local or deployed handler and prints a signing secret you use as STRIPE_WEBHOOK_SECRET.
  • stripe trigger payment_intent.succeeded fabricates a realistic event and sends it through, so you can watch the whole verify → dedupe → ack → job path run end to end.

Two tests are worth writing explicitly. First, fire the same event twice and assert the second call returns your duplicate 200 and enqueues nothing — that proves idempotency works. Second, POST a body with a bad or missing stripe-signature and assert a 403 BAD_SIGNATURE comes back from the gateway — that proves verification works. Keep the same event.body string contract locally that the gateway delivers in production, so what you test is exactly what you ship.

What this does NOT do — read before you ship

Honesty is part of reliability. The platform does not quietly paper over any of these, and neither should your mental model.

  • No exactly-once, no guaranteed ordering. Delivery is at-least-once and events can arrive out of order. Your idempotency key is not decoration — it is the mechanism that makes at-least-once safe. Do not remove it because “the gateway already verifies.” Verification and deduplication solve different problems.
  • Replay tolerance is not idempotency. The timestamp check rejects stale signatures; it does nothing about a valid duplicate delivered a few seconds apart. You need both the timestamp window and the event-id write.
  • Slack is not a built-in webhookMode. GitHub, Stripe, and a custom HMAC mode are built in. Slack’s v0:timestamp:body scheme differs, so verify Slack in handler code.
  • Timeouts are real. 5 seconds default, 15 minutes hard maximum per function or step. Nothing runs forever; long work is a chain of jobs or steps, each under the cap — not one unbounded request held open.
  • Cold starts are reduced, not eliminated. Hot container pools cut them, but a first invoke or one after idle can still be cold. Stripe’s generous retry window absorbs the occasional cold start; just don’t design as if latency is always zero.
  • The gateway verifies; it does not dedupe or fulfill. webhookMode: 'stripe' is the signature step only. The event-id idempotency write and the async handoff are still your code, and still your responsibility.

Takeaway

Reliable Stripe webhooks are not a framework you install — they are four moves in order: verify on the raw body, ack fast, dedupe on the event id, fulfill asynchronously. On Inquir Compute the first move is a route setting — webhookMode: 'stripe' — and a bad signature is a 403 BAD_SIGNATURE before your code even runs. The fast 2xx and the event-id idempotency write are a handful of lines. The slow part becomes a durable, Postgres-backed job with retries and traces. Wire those four in order and duplicate deliveries, retries, and replays stop being incidents and start being no-ops.