← All examples
5-MINUTE WOW · VERCEL AI SDK

Vercel AI SDK + NocturnusAI

The problem: streamText() receives the full message history on every request — token cost climbs with every turn.

Without NocturnusAI
1,259
tokens / turn
With NocturnusAI
221
tokens / turn
Reduction
5.7×
82% fewer tokens

Measured on our 15-turn product-support benchmark against Claude Opus 4. The compression happens at the Nocturnus layer — it's framework-agnostic. Run bench.py against your own workload.

Copy-paste install

npm install nocturnusai-sdk ai @ai-sdk/openai
docker run -d -p 9300:9300 ghcr.io/auctalis/nocturnusai:latest

2-minute demo

Next.js chat app, 15 turns, flat latency and token cost.

Vercel AI SDK demo — Next.js chat app, 15 turns, flat latency and token cost

What changes in your API route

Call noct.processTurns() before streamText. Pass the briefingDelta as the system message. streamText receives only the new user message — never the full thread.

// app/api/chat/route.ts
import { streamText, type Message } from 'ai';
import { openai } from '@ai-sdk/openai';
import { NocturnusAIClient } from 'nocturnusai-sdk';

const noct = new NocturnusAIClient({
  baseUrl: process.env.NOCTURNUS_URL ?? 'http://localhost:9300',
  tenantId: 'default',
});

export async function POST(req: Request) {
  const { messages, sessionId } =
    (await req.json()) as { messages: Message[]; sessionId: string };
  const lastUserMessage = messages[messages.length - 1]?.content ?? '';

  // Compress prior context into a short briefing
  const ctx = await noct.processTurns({
    turns: [lastUserMessage],
    sessionId,
    scope: sessionId,
  });

  // streamText sees a short system prompt + the new user message only
  const result = streamText({
    model: openai('gpt-4o-mini'),
    system: ctx.briefingDelta ?? 'Start of conversation.',
    messages: [{ role: 'user', content: lastUserMessage }],
  });

  return result.toDataStreamResponse();
}

What changes in your page

Nothing. Standard useChat UI with a persistent sessionId body param.

// app/page.tsx
'use client';
import { useChat } from 'ai/react';
import { useMemo } from 'react';

export default function Chat() {
  const sessionId = useMemo(() => `session-${crypto.randomUUID()}`, []);
  const { messages, input, handleInputChange, handleSubmit } = useChat({
    api: '/api/chat',
    body: { sessionId },
  });
  // ...standard chat UI
}

Why it works

On a default Vercel AI SDK chat, streamText receives the full messages array every turn. By turn 10 the prompt is thousands of tokens of repeated history. With NocturnusAI, the prompt stays flat — the model sees only the briefingDelta for this turn.

What’s in the repo

  • app/api/chat/route.ts — the compression handler
  • app/page.tsx — standard useChat UI
  • bench.mjs — measure your own tokens with node bench.mjs
  • package.json — Next.js + ai + @ai-sdk/openai + nocturnusai-sdk

Run it yourself