Serverless observability: traces, logs, and execution history

Ephemeral containers can't be SSH'd into. See how per-invocation execution history — traces, structured logs, retry counts, and 30-day retention — lets you debug a failed webhook, alert on failure rates, and answer 'what happened to request X?' as a lookup.

Serverless observability: traces, logs, and execution history

Serverless observability starts where console.log ends

When a function runs on a server you own, debugging has a comfortable fallback: SSH in, tail -f the log file, grep for the request id. Serverless takes that away. Your handler runs in an isolated container that is created on demand, may be recycled after roughly 1,000 invocations, and is evicted once it goes idle. By the time you hear “the Stripe webhook failed at 09:14,” the container that served it is gone. There is no file to tail and no process to attach to.

That is the whole reason serverless observability is a platform feature rather than an afterthought. Instead of writing lines to stdout and hoping something scrapes them, every invocation is recorded as a structured execution history entry — an invocation record with a status, a duration, the input and output, the log lines, and the retry count. Debugging stops being “grep across N container logs” and becomes “open the record for that run.” The container is ephemeral; the trace is not.

The rest of this post is about what that record actually contains on Inquir Compute, how to write handler logs that stay useful inside it, and how to answer the two questions you really ask during an incident: what happened to this specific request, and is the failure rate climbing right now.

What a serverless trace records: the execution history model

A run record is the atomic unit of serverless tracing here. For every invocation — whether it came from the API gateway, a cron trigger, a webhook, or another job — the platform writes one record with:

  • status — one of RUNNING, SUCCEEDED, FAILED, TIMED_OUT, or CANCELLED. This is the first thing you filter on.
  • durationMs — wall-clock execution time, so the slow ones are visible.
  • logs[] — the log lines your handler emitted, each tagged INFO, WARN, ERROR, or DEBUG.
  • steps[] — for pipelines, a nested tree of steps, each with its own input, output, error, and duration.
  • attempts[] — one entry per try, so the retry count and each failure reason are on the record.
  • generation metadata — for LLM steps, the model, provider, input/output/total token counts, and costUsd.

The feature page shows a compact trace.json with runId, functionName, status, durationMs, startedAt, a logs array, and the output — that is genuinely the shape you read. Records are kept for 30 days, which is your practical window for “pull up last Tuesday’s failures.” Two storage limits are worth knowing up front: a stored log/invocation record caps at around 16 KB, and the result JSON at around 64 KB of characters. Traces are for diagnosis, not for warehousing large payloads.

Because each record carries durationMs and status, the history is also where aggregate views come from — the console surfaces latency percentiles (p50/p95/p99) and a success-rate figure across recent runs.

Structured serverless logging from the handler

The single highest-leverage habit for serverless logging is to log structured lines, not prose. A handler has the shape (event, context); the gateway hands you an API-Gateway-style event with httpMethod, path, headers, queryStringParameters, and body (a string or null). Anything you write with console.* is captured into that run’s logs[] with a level, so a little discipline pays off every time you read a trace later.

// stripe-webhook — Node.js 22, handler shape (event, context)
exports.handler = async (event, context) => {
  // gateway event: httpMethod, path, headers, body (string | null)
  const payload = JSON.parse(event.body ?? '{}');

  // console.* is captured into the run's logs[] with a level
  console.log(JSON.stringify({ at: 'received', type: payload.type, id: payload.id }));

  try {
    const order = await fulfill(payload);
    console.log(JSON.stringify({ at: 'fulfilled', charge: order.chargeId, amount: order.amount }));
    return { statusCode: 200, body: JSON.stringify({ ok: true }) };
  } catch (err) {
    // console.error → ERROR level; throwing records the run as FAILED
    console.error(JSON.stringify({ at: 'fulfill_failed', id: payload.id, error: err.message }));
    throw err;
  }
};

Two things make this pay off. First, returning normally records the run SUCCEEDED while throwing records it FAILED — so your control flow and your trace status agree without extra bookkeeping. Second, sensitive values are redacted on the way into logs and traces: keys and patterns like password, secret, token, api-key, and Bearer … are masked, so structured-logging a whole object won’t leak a credential into 30 days of history. Log the ids and amounts you will want to search on; the platform scrubs the things you don’t want stored.

Reading a trace to debug a failed webhook

Now the after-the-fact part — the reason people reach for observability at all is to debug serverless functions they can no longer attach to. A provider reports that a delivery failed; you have the run id (or you filter the execution history to status = FAILED in the window). You open the record and read it top to bottom:

{
  "runId": "run_01hw9m3k7q8fz...",
  "functionName": "stripe-webhook",
  "status": "FAILED",
  "durationMs": 842,
  "startedAt": "2026-07-01T09:14:02.880Z",
  "attempts": [
    { "n": 1, "status": "FAILED", "error": "HTTP 502 from orders-api" }
  ],
  "logs": [
    { "level": "INFO",  "message": "{\"at\":\"received\",\"type\":\"charge.succeeded\",\"id\":\"evt_1P...\"}" },
    { "level": "ERROR", "message": "{\"at\":\"fulfill_failed\",\"id\":\"evt_1P...\",\"error\":\"HTTP 502 from orders-api\"}" }
  ],
  "output": null,
  "error": "HTTP 502 from orders-api"
}

The record answers the questions in order. Did it even arrive? Yes — there is a run, startedAt 09:14:02. Did we accept it? The first INFO line shows the payload parsed as charge.succeeded. Where did it break? The ERROR line and the top-level error agree: the downstream orders-api returned 502. How hard did we try? attempts[] has a single entry — this function was not configured with retries, so it failed on the first try in 842 ms. That is a five-second diagnosis, and none of it required the container that ran the code to still exist.

Contrast that with the stdout world, where you would be grepping interleaved output from every concurrent invocation for a request id that may or may not have been printed. The per-invocation record is the difference between searching and looking up.

“What happened to request X?” becomes a lookup

The failed-webhook walk-through generalizes. Because every invocation is its own record keyed by a run id, “what happened to request X?” is a direct lookup rather than a text search across a log stream. You either have the id — return it in your response, or log it as a correlation field — or you filter the execution history by function, status, and time window and open the one you want.

A few capabilities build directly on that:

  • Replay — re-invoke a recorded run with its original input (POST /observability/runs/:id/replay). Ideal for “did my fix actually handle that payload?” Because replay is a fresh execution, keep handlers idempotent — the platform does not promise exactly-once, so a replay must not double-charge anyone.
  • Cancel — stop a run that is still RUNNING.
  • Run compare — put two runs side by side to see what changed between a good one and a bad one.
  • Live streaming — per-run and all-active SSE streams let you watch logs arrive in real time, which is the closest thing to tail -f when you are reproducing an issue on purpose.

The mental-model shift is the whole point: serverless logs are no longer a firehose you sample, they are rows you address.

Alerting on failure rates and watching latency

Reading traces is reactive. The other half of serverless observability is being told before you go looking. Because every run lands as a record with a status and a durationMs, the platform can watch those aggregates and alert via email, Slack, or webhook. The natural signal is failure rate: a rise in FAILED / TIMED_OUT runs for a given function is exactly the “something regressed” alarm you want at 2 a.m., and routing it to a webhook means you can fan it straight back into a pipeline for automated triage.

Latency is the companion signal. Each record’s durationMs rolls up into the percentile view — p50, p95, p99 — that the console shows next to the success rate. p50 tells you the typical experience; p99 tells you the tail a simple synthetic check would miss. When p99 climbs while p50 stays flat, you are usually looking at a cold start or a slow downstream on a subset of requests — and you can confirm it by opening the slowest runs in the history and reading their step durations.

One honest caveat: these are operational signals derived from your run history, not a general-purpose metrics product. You alert on the shape of your executions — failures, timeouts, durations — rather than defining arbitrary custom dashboards.

Serverless tracing across pipeline steps and SDK spans

For anything beyond a single function, the trace becomes a tree. A pipeline run records steps[] as a nested structure, and each step carries its own input, output, error, and duration, written as the run progresses. That is what makes a multi-step job debuggable: when a nightly ETL or a webhook-triggered pipeline fails, you don’t only learn that it failed — you see which step failed, what it received as input, and what it returned or threw, and whether an earlier step quietly passed the wrong data downstream. If a step was configured with per-node retries (maxAttempts, backoffMs, fixed or exponential), each attempt shows up, so you can see how many times it backed off before giving up.

For LLM steps the record additionally carries generation metadata — model, provider, input/output/total tokens, and costUsd — so an agent trace tells you not only what the model returned but what the call cost. A surprising share of “why is this pipeline so expensive?” investigations resolve right there.

Finally, tracing does not have to stop at the platform boundary. The observability ingest accepts W3C traceparent, so spans your application or SDK already emits can be stitched into the same run context instead of living in a separate tool. That is the “SDK spans” piece: your existing instrumentation and the platform’s per-invocation records describe one execution, not two.

Honest limits and the takeaway

What this system is: durable, structured execution history — per-invocation records with status, duration, structured logs, per-step traces, retry attempts, and LLM cost metadata, queryable for 30 days, streamable live, and wired to alerts. What it is not, and where honesty matters:

  • Retention is 30 days. If you need year-long audit history, export what you care about; the trace store is an operational window, not an archive.
  • Records are size-capped. A stored log/invocation record is around 16 KB and the result JSON around 64 KB of characters. Log identifiers and summaries, not entire request bodies or model contexts.
  • Replay is a re-run, not time travel. It re-invokes with the recorded input; there is no exactly-once and no guaranteed ordering, so idempotent handlers remain your responsibility.
  • It is the observability of invocations, not a durable orchestration timeline. Traces show what each execution did; they are not an event-sourced workflow log you can deterministically resume from.
  • It is opt-in. Observability capture is gated behind a platform flag (ENABLE_OBSERVABILITY), so confirm it is enabled for your workspace.

The takeaway for a senior engineer: treat the run record as the primary debugging artifact and design for it. Return or log a correlation id, log structured JSON at the boundaries (received / decided / failed), let secrets get redacted for you, and put a failure-rate alert on anything that faces a provider or a schedule. Do that, and the ephemeral, no-SSH nature of serverless stops being a debugging handicap and becomes the reason you can answer “what happened to request X?” in a single lookup — which is exactly what you cannot do by grepping stdout.