LLM pipelines without mystery state machines
Stages you can retry one at a time—cleaner than a single giant prompt when tools touch prod.
Situation
Why giant single prompts fail
You cannot retry one sub-step without redoing everything.
Costs balloon when every branch re-embeds the same context.
Why workarounds fail
Script spaghetti
Notebooks are great for exploration, not for durable production graphs.
How Inquir fits
Stage as functions
Each stage is deployable and loggable; compose with pipelines for async gaps.
Tool calls remain HTTP functions with explicit auth.
Capabilities
Stages to consider
Retrieve
Isolate embedding and search calls.
Moderate
Fail fast before expensive generation.
Act
Call tools with tight input validation.
Summarize
Compress for storage or user display.
Steps
How to stage LLM work with Inquir pipelines
Draw dataflow
Name inputs/outputs per box.
Codify
Implement each box as a function or pipeline step.
Measure cost
Track tokens and wall time per stage.
Code example
Pseudo stages
Replace with your orchestrator’s actual calls.
// retrieve → moderate → generate → tool? → summarize → persistFit
Use pipelines when…
When to use
- Multi-model flows
- Human-in-the-loop handoffs
- Long-running enrichment
When not to use
- Single prompt demos
FAQ
FAQ
Why split an LLM workflow into stages?
Retries, cost attribution, and debugging improve when retrieval, moderation, tool calls, and summarization are separate steps with their own logs.
Streaming tokens to end users?
Keep user-visible streaming at the boundary; internal stages can use request/response for simpler failure handling and replays.
How do I control cost across stages?
Measure tokens and wall time per stage in observability; cap expensive steps with budgets and short-circuit when moderation fails.