Why Webhook Handlers Should Return Fast and Process Later

Why Webhook Handlers Should Return Fast and Process Later

A common mistake in webhook processing is doing too much work inside the request handler.

The provider sends an event. Your endpoint receives it. Then the handler verifies the event, calls a database, calls external APIs, sends an email, generates a report, and maybe calls an LLM. Only after all that does it return 200 OK.

This is fragile.

Webhook handlers should usually return fast and process later.

The webhook provider only needs acknowledgement

Most webhook providers need to know one thing first:

Did you receive the event?

They do not necessarily need your entire downstream workflow to complete before you respond.

A better handler flow is:

receive event
→ verify signature
→ store event or start job
→ return 200
→ process later

The provider gets fast acknowledgement. Your system gets time to complete work safely.

What goes wrong when handlers are slow

Provider timeout

If your endpoint takes too long, the provider may assume delivery failed. It can retry the same event, causing duplicate processing.

Duplicate side effects

A duplicate webhook can create duplicate records, send duplicate emails, or trigger duplicate billing logic unless your processing is idempotent.

Bad user experience

Slow handlers are harder to debug. A user may complete an action in one system but see no result in yours because the handler timed out halfway.

Fragile dependency chain

If your webhook response depends on five downstream services, any one of them can make the whole provider delivery fail.

The fast acknowledgement pattern

The handler should do only the minimum required work:

  1. parse the request;
  2. verify the signature;
  3. validate the event type;
  4. store the event or create a job;
  5. return success.

Everything else can happen in a background job.

Provider request
→ Webhook route
→ Event stored
→ 200 OK

Background job
→ Business logic
→ External APIs
→ Notifications
→ Logs

This pattern makes failures easier to manage because provider delivery and internal processing are separate.

Example: payment webhook

Bad pattern:

Stripe event
→ verify
→ create order
→ generate invoice
→ send email
→ update CRM
→ return 200

Better pattern:

Stripe event
→ verify
→ store event.id
→ start fulfill-order job
→ return 200

fulfill-order job
→ create order
→ generate invoice
→ send email
→ update CRM
→ mark event processed

If the CRM call fails, the provider does not need to resend the event. Your internal job can retry.

Example: AI webhook workflow

Suppose a webhook starts an AI workflow:

new support ticket
→ classify urgency
→ summarize conversation
→ suggest reply
→ notify team

This can be slow. It may involve multiple model calls and external API requests. Running it directly inside the webhook request is risky.

A better flow:

/support/webhook
→ verify event
→ create classification job
→ return 200

classification job
→ retrieve ticket
→ call LLM
→ store result
→ notify Slack

The webhook endpoint stays fast. The AI work becomes observable.

Why this improves retries

Provider retries and internal retries should not be the same mechanism.

Provider retry means:

the provider could not deliver the event

Internal retry means:

your system received the event but processing failed

Those are different problems. Mixing them creates confusion.

Fast acknowledgement lets your platform take ownership of the event after receipt.

What to store before returning

Before returning 200, store enough data to process later:

  • provider name;
  • event ID;
  • event type;
  • raw or normalized payload;
  • received timestamp;
  • signature verification result;
  • processing status;
  • tenant or account context.

You do not always need to store the entire raw payload forever, but you need enough context to retry and debug.

Where Inquir Compute fits

Inquir Compute supports this pattern with API routes and background jobs or pipelines.

You can expose a webhook route:

POST /webhooks/stripe

Then move slow work into a job:

fulfill-order
send-notification
run-ai-classification
sync-customer

This keeps the public endpoint responsive while giving the internal workflow logs and execution history.

When direct processing is acceptable

Direct processing can be acceptable when the work is very small and safe:

  • log an internal event;
  • update a lightweight counter;
  • trigger a fire-and-forget notification;
  • handle a low-risk internal webhook.

But for payments, provisioning, AI workflows, external API chains, and customer-facing actions, process later.

Checklist

A good webhook handler should answer:

  • Can it return quickly?
  • Is the event verified?
  • Is there an event ID for idempotency?
  • Is slow work moved to a job?
  • Can the job be retried safely?
  • Are logs tied to the event ID?
  • Can duplicate events be ignored?

Conclusion

Webhook handlers should be boring and fast. Their job is to receive, verify, record, and acknowledge.

The real work belongs in background jobs or pipelines where it can be retried, logged, and inspected.

Fast acknowledgement reduces duplicate events, improves reliability, and makes webhook systems easier to operate.