Warm pools

Hot containers for serverless latency and cold starts

Warm pools keep selected serverless functions ready between calls so the next invoke skips most cold-bootstrap work—measure p95/p99 on your handlers rather than assuming a fixed number. On Linux, Firecracker microVMs are a separate isolation path with their own cold profile.

Last updated: 2026-04-20

Open console →Container runtime

Answer first

Direct answer

Hot containers for serverless latency and cold starts. When hot lambdas are enabled, the Docker path can keep a small pool of ready runners per function. Before reuse, a quick health check confirms the process is still alive; idle containers age out on configurable timers so you are not leaking memory forever.

When it fits

Steady traffic paths
Agent tool loops with tight timeouts

Tradeoffs

A warm pool keeps ready slots for repeat traffic: a previously cold function returns faster on the second and subsequent calls. The very first call after a deploy or after idle timeout still takes a cold path.
Hot containers are not a zero-cold-start guarantee. If you need the absolute first-call latency to be fast, design the handler for fast init: lazy-load heavy clients, avoid large startup I/O, measure p99 under realistic traffic before declaring the problem solved.
Edge-only models optimize for geography, not for long-lived language runtimes with heavy imports — a different trade-off entirely.

Workload and what breaks

When cold starts hurt

Interactive flows and tight SLA endpoints notice sporadic initialization: dependency import, JIT warmup, connection pools.

Autoscaling from zero saves money but injects variance that shows up in user-visible tails.

Trade-offs

Hot containers reduce cold starts — they do not eliminate them

A warm pool keeps ready slots for repeat traffic: a previously cold function returns faster on the second and subsequent calls. The very first call after a deploy or after idle timeout still takes a cold path.

Hot containers are not a zero-cold-start guarantee. If you need the absolute first-call latency to be fast, design the handler for fast init: lazy-load heavy clients, avoid large startup I/O, measure p99 under realistic traffic before declaring the problem solved.

Edge-only models optimize for geography, not for long-lived language runtimes with heavy imports — a different trade-off entirely.

How Inquir helps

How Inquir approaches warmth

When hot lambdas are enabled, the Docker path can keep a small pool of ready runners per function. Before reuse, a quick health check confirms the process is still alive; idle containers age out on configurable timers so you are not leaking memory forever.

Each warm slot is still its own container for that function. If you need the strictest hygiene between requests, you can force dispose-after-use so every call starts fresh—at the cost of cold-start time.

What you get

Related controls

Tuning

Pool depth, idle timeouts, invocations per container, and dispose-after-use are all exposed as environment-driven settings on the Node server—tune them against real traffic instead of defaults.

Observability

Compare init vs handler time in execution traces and health-check latency from the pool.

Cost

Warmth trades idle Docker capacity for tail latency—validate against your traffic bands.

What to do next

How to tune warm containers in Inquir Compute

Measure

Capture p95/p99 with realistic auth and payload sizes.

Adjust

Align pool sizes with traffic bands instead of guessing.

Review

Revisit after major dependency upgrades—warmth does not fix slow imports.

Code example

Latency thinking

Split measurements into connect, init, and handler phases when profiling.

metrics-notes.js

// p50 alone hides cold tails — track p99 alongside error rates after deploys.

When it fits

Fit

When this works

Steady traffic paths
Agent tool loops with tight timeouts

When to skip it

Rare batch jobs where zero idle cost dominates

FAQ

Is warmth guaranteed?

It is a best-effort capacity strategy; load tests on your environment remain the source of truth.

Does this replace profiling?

No. Slow code stays slow—warmth only removes one class of startup overhead.

What about streaming responses from the gateway?

The Docker orchestrator’s streaming path requires hot lambdas to be enabled—without a warm runner it refuses the stream so clients are not left half-connected.

Direct answer

When it fits

Tradeoffs

When cold starts hurt

Hot containers reduce cold starts — they do not eliminate them

How Inquir approaches warmth

Related controls

Tuning

Observability

Cost

How to tune warm containers in Inquir Compute

Measure

Adjust

Review

Latency thinking

Fit

✓When this works

×When to skip it

FAQ

Related guides

When this works

When to skip it