5. Scaling
Learn how to orchestrate sub-agents to deliver fast, structured, and visual answers using the AI SDK v5.
Router Systems
A router system lets your AI handle complex queries by splitting work across multiple sub-agents.
Instead of one large model doing everything, the router decides who should do what — like a manager assigning tasks to domain experts.
Think of it as a brain with a team:
the brain understands intent, then delegates to the right specialists.
Why Routers Matter
Routers make AI systems feel instant, structured, and reliable — even when queries are messy or multi-part.
For example:
“Show me sales this month and our best-selling products.”
Here’s what happens behind the scenes:
- Router detects intents → “sales” and “analytics”
- Two sub-agents spin up: one for Orders, one for Analytics
- Both work in parallel
- Results stream in as a sales table and a top-products chart
The user just sees an instant, visual answer.
But under the hood, multiple agents cooperated in perfect sync.
How It’s Structured
A router-based system typically has four layers:
- 🧭 Router Layer – understands intent and assigns work
- 🤖 Sub-Agent Layer – each agent handles one domain (Orders, Inventory, etc.)
- 🧠 Data & Cache Layer – validates, caches, and returns clean data
- ⚡ Streaming Output – combines text and visuals for real-time answers
Router Layer
The router is the brain of your multi-agent system.
It listens to the user, figures out what they mean, and decides which agent should handle the task.
You can think of it as a traffic controller — lightweight, fast, and domain-aware.
How It Works
- User sends a natural language query.
- Router runs a small LLM call or keyword classifier to detect intent.
- Router assigns the right sub-agent(s).
- If more than one domain is detected, agents run in parallel.
Example
“Show me total revenue and inventory for this week.”
| Step | What Happens | Layer |
|---|---|---|
| 1 | Router detects Orders and Inventory | Router |
| 2 | Both agents spin up | Sub-Agent |
| 3 | Orders agent fetches total revenue | Data Layer |
| 4 | Inventory agent checks stock levels | Data Layer |
| 5 | Results stream back together | Output |
Intent Detection
The router’s core job is intent detection.
1. Light LLM Call
const intent = await generateText({
model: "openai/gpt-4o-mini",
prompt: `
You are a router. Given a query, decide which domains apply.
Domains: Orders, Inventory, Products, Analytics.
Query: "Show me revenue and stock this week."
`,
})Output:
{ "domains": ["Orders", "Inventory"] }2. Keyword Rules
function detectDomains(query: string) {
const domains = []
if (query.includes("revenue")) domains.push("Orders")
if (query.includes("inventory") || query.includes("stock"))
domains.push("Inventory")
return domains
}Multi-Intent Handling
When a query touches more than one domain (“sales and stock”), the router splits the task and runs agents in parallel.
Optional Plan Path
Complex queries can also generate an execution plan:
{
"domains": ["Orders", "Analytics"],
"plan": [
"Fetch last month's orders",
"Fetch this month's orders",
"Compare totals and summarize by category"
]
}Sub-Agent Layer
Once the router decides which domains to activate, it hands work off to sub-agents — domain experts built with the Agent class.
Each sub-agent encapsulates:
- its own system prompt
- a small, static toolset
- loop control for multi-step reasoning
Example: Orders Agent
import { Experimental_Agent as Agent, stepCountIs, tool } from "ai"
import { z } from "zod"
export const ordersAgent = new Agent({
model: "openai/gpt-4o",
system: `You are an ecommerce orders specialist.
You can fetch, summarize, and analyze order data.`,
tools: {
getSalesData: tool({
description: "Fetch total sales and revenue for a date range",
parameters: z.object({
startDate: z.string(),
endDate: z.string(),
}),
execute: async ({ startDate, endDate }) => {
const revenue = 124_000
const totalOrders = 3800
return { startDate, endDate, revenue, totalOrders }
},
}),
summarizePerformance: tool({
description: "Summarize performance given sales data",
parameters: z.object({
revenue: z.number(),
totalOrders: z.number(),
}),
execute: async ({ revenue, totalOrders }) => ({
summary: `Revenue: $${revenue}, Orders: ${totalOrders}`,
avgOrderValue: Math.round(revenue / totalOrders),
}),
}),
},
stopWhen: stepCountIs(10),
})Example: Inventory Agent
import { Experimental_Agent as Agent, tool } from "ai"
import { z } from "zod"
export const inventoryAgent = new Agent({
model: "openai/gpt-4o",
system: `You manage stock data for an ecommerce store.
You know how to check product quantities and restock needs.`,
tools: {
getInventoryLevels: tool({
description: "Get current stock levels for products",
parameters: z.object({
category: z.string().optional(),
}),
execute: async ({ category }) => {
const data = [
{ product: "Wireless Mouse", stock: 83 },
{ product: "Laptop Stand", stock: 45 },
]
return { category, data }
},
}),
},
})Parallel Execution
import { inventoryAgent } from "./inventory-agent"
import { ordersAgent } from "./orders-agent"
async function handleMultiIntent() {
const [orders, inventory] = await Promise.all([
ordersAgent.generate({ prompt: "Get sales this week" }),
inventoryAgent.generate({ prompt: "Get stock for top sellers" }),
])
return { orders: orders.text, inventory: inventory.text }
}Design Principles
- Keep each agent’s toolset small (3–7 tools)
- Validate everything with Zod
- Keep system prompts short and role-specific
- Run independent agents in parallel when possible
Data & Cache Layer
This layer connects your agents to real data — safely and predictably.
All tool calls must be deterministic, validated, and cacheable.
Deterministic Tools
import { tool } from "ai"
import { z } from "zod"
export const getOrders = tool({
description: "Fetch total revenue and order count",
parameters: z.object({
startDate: z.string(),
endDate: z.string(),
}),
output: z.object({
startDate: z.string(),
endDate: z.string(),
revenue: z.number(),
totalOrders: z.number(),
}),
execute: async ({ startDate, endDate }) => {
const response = await fetch(
`https://api.yourstore.com/orders?start=${startDate}&end=${endDate}`
)
const data = await response.json()
return {
startDate,
endDate,
revenue: data.revenue,
totalOrders: data.count,
}
},
})Caching for Reliability
const key = `getOrders:v1:org123:read:${JSON.stringify({ startDate, endDate })}`
const result = await sdkCache.getOrSet(key, async () => {
return await getOrders.execute({ startDate, endDate })
})Same call, same key, same result — every time.
Safe Write Operations
export const updateInventory = tool({
description: "Update stock levels for a product",
parameters: z.object({
productId: z.string(),
quantity: z.number(),
}),
execute: async ({ productId, quantity }) => {
return {
preview: `Stock for ${productId} will be set to ${quantity}.`,
confirm: async () => {
await db.products.update({ id: productId, stock: quantity })
return { success: true }
},
}
},
})💡 Always require user confirmation for write actions.
Streaming Output
Once data is ready, the system streams text + artifacts together.
Charts, tables, and KPIs render instantly while the agent keeps generating.
Example: Stream Both Agents
import { inventoryAgent } from "./inventory-agent"
import { ordersAgent } from "./orders-agent"
export async function generateEcommerceReport() {
const [ordersStream, inventoryStream] = await Promise.all([
ordersAgent.stream({ prompt: "Show me this week's revenue" }),
inventoryAgent.stream({ prompt: "Show me low-stock products" }),
])
for await (const chunk of ordersStream.textStream) {
process.stdout.write(chunk)
}
for await (const artifact of ordersStream.artifactStream) {
renderToCanvas(artifact)
}
for await (const artifact of inventoryStream.artifactStream) {
renderToCanvas(artifact)
}
}Artifact Example
import { Artifact, tool } from "ai"
import { z } from "zod"
export const getSalesChart = tool({
description: "Render a sales chart for a date range",
parameters: z.object({
startDate: z.string(),
endDate: z.string(),
}),
execute: async ({ startDate, endDate }) => {
const data = [
{ day: "Mon", revenue: 4000 },
{ day: "Tue", revenue: 6800 },
{ day: "Wed", revenue: 7200 },
{ day: "Thu", revenue: 5800 },
{ day: "Fri", revenue: 9100 },
]
return Artifact.chart({
title: "Weekly Revenue",
data,
x: "day",
y: "revenue",
})
},
})Why It Works
This system feels natural but runs on structure.
- Zod contracts ensure correctness
- Small tool catalogs ensure predictability
- Cache keys ensure determinism
- Streaming ensures responsiveness
Each layer has a clear responsibility — together they form a predictable, testable, production-grade architecture.
Scaling the System
Scaling the system is simple and linear.
Add a new domain by defining a new sub-agent and registering it with the router.
import { Experimental_Agent as Agent } from "ai"
export const returnsAgent = new Agent({
model: "openai/gpt-4o",
system: "You handle returns and refund logic.",
tools: { processRefund, getReturnStats },
})
router.register("Returns", returnsAgent)That’s it — your system now supports returns.
Core Philosophy
🧠 Let the model think. 🔧 Let the tools act. 🪄 Let the router decide who does what.
By keeping each piece small, typed, and predictable, you build systems that are fast, safe, and production-grade.
A router-based multi-agent setup feels conversational to the user — but internally, it’s structured like software.
On This Page
Router SystemsWhy Routers MatterHow It’s StructuredRouter LayerHow It WorksExampleIntent Detection1. Light LLM Call2. Keyword RulesMulti-Intent HandlingOptional Plan PathSub-Agent LayerExample: Orders AgentExample: Inventory AgentParallel ExecutionDesign PrinciplesData & Cache LayerDeterministic ToolsCaching for ReliabilitySafe Write OperationsStreaming OutputExample: Stream Both AgentsArtifact ExampleWhy It WorksScaling the SystemCore Philosophyai sdk patterns.