AI agent backend on serverless functions
Model tools become HTTP tool routes on the gateway: lock them down with API keys, inject secrets per function, offload long steps to jobs or pipelines, and ship handlers on Node.js, Python, or Go with the same observability as the rest of your stack.
Last updated: 2026-06-28
Answer first
Direct answer
AI agent backend on serverless functions. One function per tool keeps dependencies and deploy risk localized; execution history in the console matches each tool invocation—easier on-call than a shared monolith.
When it fits
- Multi-step agents
- Tool access to private data
Tradeoffs
- Running privileged tools on end-user devices breaks the moment data is regulated or the user is on a locked-down laptop.
- Throwing every tool into one giant server turns every deploy into a high-risk change and makes log lines impossible to attribute to a specific capability.
Workload and what breaks
What agents actually need
A chat completion is only the headline. Production agents also fetch private data, write to systems of record, escalate to humans, and enforce guardrails—and each of those steps needs clear failure and retry semantics.
When there is no real backend, side effects creep into prompt text or the user’s browser, where they are nearly impossible to audit or revoke cleanly.
Where shortcuts fail
Patterns that break in production
Running privileged tools on end-user devices breaks the moment data is regulated or the user is on a locked-down laptop.
Throwing every tool into one giant server turns every deploy into a high-risk change and makes log lines impossible to attribute to a specific capability.
How Inquir helps
Composable serverless tools for AI agents
One function per tool keeps dependencies and deploy risk localized; execution history in the console matches each tool invocation—easier on-call than a shared monolith.
Handlers use the same Node.js, Python, or Go runtimes as the rest of the platform. Optional warm pools trim cold-start overhead when the model calls tools in a tight loop—measure under realistic load.
What you get
Implementation rules for AI agent tools
One function per tool
Split functions unless dependencies are tightly coupled. One tool per function keeps deploy risk small and logs attributable to a specific capability.
Validate inputs, return structured JSON
Define required fields and reject invalid payloads early. Return a stable JSON shape the orchestrator can parse without special-casing.
Secrets in environment variables, never in prompts
Scope API keys per tool function in workspace secrets. Rotate keys independently of model versions without touching prompt templates—secrets never appear in logs or context windows.
Gateway auth before handler
Every tool route requires an API key enforced at the gateway level. Tool functions receive only already-authenticated requests—no ad-hoc auth logic inside handlers, no accidental open endpoints.
Use jobs or pipelines for long-running work
When a tool step exceeds the gateway timeout, return a job ID immediately and continue enrichment or side effects in a background pipeline. The orchestrator polls or receives a webhook when the pipeline completes.
Hot containers for tight tool loops
When the model calls tools in rapid succession, cold-start latency adds up. Enable warm pools for tool functions with steady traffic—measure p95/p99 before and after to validate the gain.
Track failures per tool endpoint
Alert on error rates per tool route, not only per chat session. One failing tool should surface in observability before it silently degrades agent quality.
What to do next
How to build AI agent tools on Inquir Compute
Define per-tool input contract
Document required fields, validation behavior, and the error shape the orchestrator should handle.
Define output schema and auth model
Keep return shapes stable across versions. Use route-level API key auth so tools are not accidentally open to public traffic.
Define retries, idempotency, and timeout handoff
Decide when to retry in-place, when to return a job ID and continue via pipeline, and how to key writes idempotently so retries do not create duplicates.
Code example
Tool handler patterns — Node.js, Python, Go, and async handoff
Sync tools use the same gateway event contract: body arrives as a string, return {statusCode, body}. Mix languages per tool—Python for ML inference, Node.js for API calls, Go for high-throughput lookups. When a step outlasts the gateway timeout, return 202 with a job ID and continue in a pipeline.
export async function handler(event) { const { id } = JSON.parse(event.body || '{}'); if (!id) return { statusCode: 400, body: JSON.stringify({ error: 'id required' }) }; // API key auth is enforced at the gateway route — handler assumes authenticated caller const row = await db.findById(id); if (!row) return { statusCode: 404, body: JSON.stringify({ error: 'not found' }) }; return { statusCode: 200, body: JSON.stringify({ row }) }; }
import json, os from openai import OpenAI client = OpenAI(api_key=os.environ["OPENAI_API_KEY"]) # injected from workspace secrets def handler(event, context): body = json.loads(event.get("body") or "{}") text = body.get("text") if not text: return {"statusCode": 400, "body": json.dumps({"error": "text required"})} r = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": f"Classify intent of: {text}"}], ) return {"statusCode": 200, "body": json.dumps({"intent": r.choices[0].message.content})}
package main import ( "encoding/json" ) func parsePayload(event map[string]interface{}) map[string]interface{} { if s, ok := event["body"].(string); ok && s != "" { var out map[string]interface{} if err := json.Unmarshal([]byte(s), &out); err != nil || out == nil { return map[string]interface{}{} } return out } return event } // Handler — API key auth enforced at the gateway route func Handler(event map[string]interface{}, ctx map[string]interface{}) (interface{}, error) { payload := parsePayload(event) id, _ := payload["id"].(string) if id == "" { b, _ := json.Marshal(map[string]string{"error": "id required"}) return map[string]interface{}{"statusCode": 400, "body": string(b)}, nil } row, err := db.FindByID(id) if err != nil { b, _ := json.Marshal(map[string]string{"error": "not found"}) return map[string]interface{}{"statusCode": 404, "body": string(b)}, nil } b, _ := json.Marshal(map[string]interface{}{"row": row}) return map[string]interface{}{"statusCode": 200, "body": string(b)}, nil }
export async function handler(event) { const { customerId } = JSON.parse(event.body || '{}'); if (!customerId) return { statusCode: 400, body: JSON.stringify({ error: 'customerId required' }) }; // Return fast; continue in pipeline — orchestrator polls /jobs/:jobId or receives webhook const { instanceId: jobId } = await global.durable.startNew('enrich-customer', undefined, { customerId }); return { statusCode: 202, body: JSON.stringify({ jobId }) }; }
When it fits
Good fit
When this works
- Multi-step agents
- Tool access to private data
When to skip it
- Stateless single-shot completions with no side effects
FAQ
FAQ
Should agent tools be separate HTTP functions?
Yes for production: one function per tool (or tight group) keeps dependencies isolated, deploy risk small, and logs attributable—easier than a monolith that mixes user sessions and tool IO.
How do I store secrets for tool calls?
Use Inquir workspace secrets and environment injection so API keys never live in prompts or client bundles; rotate keys independently of model versions.
Streaming responses to the user?
End-user streaming is a gateway concern; many tool-calling stacks still use plain request/response JSON between the orchestrator and each tool because retries and idempotency stay simpler that way.
How do I make tool calls idempotent when the model retries?
Key writes with stable IDs from the tool payload (customer ID, order ID, external record key). Return the same JSON shape on replay so the orchestrator can treat duplicate invocations as safe no-ops.