5. Scaling

Previous

Learn how to orchestrate sub-agents to deliver fast, structured, and visual answers using the AI SDK v5.

Router Systems

A router system lets your AI handle complex queries by splitting work across multiple sub-agents.

Instead of one large model doing everything, the router decides who should do what — like a manager assigning tasks to domain experts.

Think of it as a brain with a team:
the brain understands intent, then delegates to the right specialists.


Why Routers Matter

Routers make AI systems feel instant, structured, and reliable — even when queries are messy or multi-part.

For example:

“Show me sales this month and our best-selling products.”

Here’s what happens behind the scenes:

  1. Router detects intents → “sales” and “analytics”
  2. Two sub-agents spin up: one for Orders, one for Analytics
  3. Both work in parallel
  4. Results stream in as a sales table and a top-products chart

The user just sees an instant, visual answer.
But under the hood, multiple agents cooperated in perfect sync.


How It’s Structured

A router-based system typically has four layers:

  1. 🧭 Router Layer – understands intent and assigns work
  2. 🤖 Sub-Agent Layer – each agent handles one domain (Orders, Inventory, etc.)
  3. 🧠 Data & Cache Layer – validates, caches, and returns clean data
  4. Streaming Output – combines text and visuals for real-time answers

Router Layer

The router is the brain of your multi-agent system.
It listens to the user, figures out what they mean, and decides which agent should handle the task.

You can think of it as a traffic controller — lightweight, fast, and domain-aware.


How It Works

  1. User sends a natural language query.
  2. Router runs a small LLM call or keyword classifier to detect intent.
  3. Router assigns the right sub-agent(s).
  4. If more than one domain is detected, agents run in parallel.

Example

“Show me total revenue and inventory for this week.”

StepWhat HappensLayer
1Router detects Orders and InventoryRouter
2Both agents spin upSub-Agent
3Orders agent fetches total revenueData Layer
4Inventory agent checks stock levelsData Layer
5Results stream back togetherOutput

Intent Detection

The router’s core job is intent detection.

1. Light LLM Call

router-intent.ts
const intent = await generateText({
  model: "openai/gpt-4o-mini",
  prompt: `
  You are a router. Given a query, decide which domains apply.
  Domains: Orders, Inventory, Products, Analytics.
  Query: "Show me revenue and stock this week."
  `,
})

Output:

{ "domains": ["Orders", "Inventory"] }

2. Keyword Rules

router-keyword.ts
function detectDomains(query: string) {
  const domains = []
  if (query.includes("revenue")) domains.push("Orders")
  if (query.includes("inventory") || query.includes("stock"))
    domains.push("Inventory")
  return domains
}

Multi-Intent Handling

When a query touches more than one domain (“sales and stock”), the router splits the task and runs agents in parallel.


Optional Plan Path

Complex queries can also generate an execution plan:

{
  "domains": ["Orders", "Analytics"],
  "plan": [
    "Fetch last month's orders",
    "Fetch this month's orders",
    "Compare totals and summarize by category"
  ]
}

Sub-Agent Layer

Once the router decides which domains to activate, it hands work off to sub-agents — domain experts built with the Agent class.

Each sub-agent encapsulates:

  • its own system prompt
  • a small, static toolset
  • loop control for multi-step reasoning

Example: Orders Agent

orders-agent.ts
import { Experimental_Agent as Agent, stepCountIs, tool } from "ai"
import { z } from "zod"
 
export const ordersAgent = new Agent({
  model: "openai/gpt-4o",
  system: `You are an ecommerce orders specialist.
  You can fetch, summarize, and analyze order data.`,
 
  tools: {
    getSalesData: tool({
      description: "Fetch total sales and revenue for a date range",
      parameters: z.object({
        startDate: z.string(),
        endDate: z.string(),
      }),
      execute: async ({ startDate, endDate }) => {
        const revenue = 124_000
        const totalOrders = 3800
        return { startDate, endDate, revenue, totalOrders }
      },
    }),
    summarizePerformance: tool({
      description: "Summarize performance given sales data",
      parameters: z.object({
        revenue: z.number(),
        totalOrders: z.number(),
      }),
      execute: async ({ revenue, totalOrders }) => ({
        summary: `Revenue: $${revenue}, Orders: ${totalOrders}`,
        avgOrderValue: Math.round(revenue / totalOrders),
      }),
    }),
  },
  stopWhen: stepCountIs(10),
})

Example: Inventory Agent

inventory-agent.ts
import { Experimental_Agent as Agent, tool } from "ai"
import { z } from "zod"
 
export const inventoryAgent = new Agent({
  model: "openai/gpt-4o",
  system: `You manage stock data for an ecommerce store.
  You know how to check product quantities and restock needs.`,
 
  tools: {
    getInventoryLevels: tool({
      description: "Get current stock levels for products",
      parameters: z.object({
        category: z.string().optional(),
      }),
      execute: async ({ category }) => {
        const data = [
          { product: "Wireless Mouse", stock: 83 },
          { product: "Laptop Stand", stock: 45 },
        ]
        return { category, data }
      },
    }),
  },
})

Parallel Execution

run-sub-agents.ts
import { inventoryAgent } from "./inventory-agent"
import { ordersAgent } from "./orders-agent"
 
async function handleMultiIntent() {
  const [orders, inventory] = await Promise.all([
    ordersAgent.generate({ prompt: "Get sales this week" }),
    inventoryAgent.generate({ prompt: "Get stock for top sellers" }),
  ])
  return { orders: orders.text, inventory: inventory.text }
}

Design Principles

  • Keep each agent’s toolset small (3–7 tools)
  • Validate everything with Zod
  • Keep system prompts short and role-specific
  • Run independent agents in parallel when possible

Data & Cache Layer

This layer connects your agents to real data — safely and predictably.

All tool calls must be deterministic, validated, and cacheable.


Deterministic Tools

tools-with-zod.ts
import { tool } from "ai"
import { z } from "zod"
 
export const getOrders = tool({
  description: "Fetch total revenue and order count",
  parameters: z.object({
    startDate: z.string(),
    endDate: z.string(),
  }),
  output: z.object({
    startDate: z.string(),
    endDate: z.string(),
    revenue: z.number(),
    totalOrders: z.number(),
  }),
  execute: async ({ startDate, endDate }) => {
    const response = await fetch(
      `https://api.yourstore.com/orders?start=${startDate}&end=${endDate}`
    )
    const data = await response.json()
    return {
      startDate,
      endDate,
      revenue: data.revenue,
      totalOrders: data.count,
    }
  },
})

Caching for Reliability

cached-tool-call.ts
const key = `getOrders:v1:org123:read:${JSON.stringify({ startDate, endDate })}`
 
const result = await sdkCache.getOrSet(key, async () => {
  return await getOrders.execute({ startDate, endDate })
})

Same call, same key, same result — every time.


Safe Write Operations

safe-write-tools.ts
export const updateInventory = tool({
  description: "Update stock levels for a product",
  parameters: z.object({
    productId: z.string(),
    quantity: z.number(),
  }),
  execute: async ({ productId, quantity }) => {
    return {
      preview: `Stock for ${productId} will be set to ${quantity}.`,
      confirm: async () => {
        await db.products.update({ id: productId, stock: quantity })
        return { success: true }
      },
    }
  },
})

💡 Always require user confirmation for write actions.


Streaming Output

Once data is ready, the system streams text + artifacts together.

Charts, tables, and KPIs render instantly while the agent keeps generating.


Example: Stream Both Agents

streaming-report.ts
import { inventoryAgent } from "./inventory-agent"
import { ordersAgent } from "./orders-agent"
 
export async function generateEcommerceReport() {
  const [ordersStream, inventoryStream] = await Promise.all([
    ordersAgent.stream({ prompt: "Show me this week's revenue" }),
    inventoryAgent.stream({ prompt: "Show me low-stock products" }),
  ])
 
  for await (const chunk of ordersStream.textStream) {
    process.stdout.write(chunk)
  }
 
  for await (const artifact of ordersStream.artifactStream) {
    renderToCanvas(artifact)
  }
 
  for await (const artifact of inventoryStream.artifactStream) {
    renderToCanvas(artifact)
  }
}

Artifact Example

artifact-tool.ts
import { Artifact, tool } from "ai"
import { z } from "zod"
 
export const getSalesChart = tool({
  description: "Render a sales chart for a date range",
  parameters: z.object({
    startDate: z.string(),
    endDate: z.string(),
  }),
  execute: async ({ startDate, endDate }) => {
    const data = [
      { day: "Mon", revenue: 4000 },
      { day: "Tue", revenue: 6800 },
      { day: "Wed", revenue: 7200 },
      { day: "Thu", revenue: 5800 },
      { day: "Fri", revenue: 9100 },
    ]
    return Artifact.chart({
      title: "Weekly Revenue",
      data,
      x: "day",
      y: "revenue",
    })
  },
})

Why It Works

This system feels natural but runs on structure.

  • Zod contracts ensure correctness
  • Small tool catalogs ensure predictability
  • Cache keys ensure determinism
  • Streaming ensures responsiveness

Each layer has a clear responsibility — together they form a predictable, testable, production-grade architecture.


Scaling the System

Scaling the system is simple and linear.

Add a new domain by defining a new sub-agent and registering it with the router.

register-new-agent.ts
import { Experimental_Agent as Agent } from "ai"
 
export const returnsAgent = new Agent({
  model: "openai/gpt-4o",
  system: "You handle returns and refund logic.",
  tools: { processRefund, getReturnStats },
})
 
router.register("Returns", returnsAgent)

That’s it — your system now supports returns.


Core Philosophy

🧠 Let the model think. 🔧 Let the tools act. 🪄 Let the router decide who does what.

By keeping each piece small, typed, and predictable, you build systems that are fast, safe, and production-grade.


A router-based multi-agent setup feels conversational to the user — but internally, it’s structured like software.