Vercel AI SDK + NocturnusAI
The problem: streamText() receives the full message history on every request — token cost climbs with every turn.
Measured on our 15-turn product-support benchmark against Claude Opus 4.
The compression happens at the Nocturnus layer — it's framework-agnostic.
Run bench.py
against your own workload.
Copy-paste install
npm install nocturnusai-sdk ai @ai-sdk/openai
docker run -d -p 9300:9300 ghcr.io/auctalis/nocturnusai:latest
2-minute demo
Next.js chat app, 15 turns, flat latency and token cost.
What changes in your API route
Call noct.processTurns() before streamText. Pass the briefingDelta as the system message. streamText receives only the new user message — never the full thread.
// app/api/chat/route.ts
import { streamText, type Message } from 'ai';
import { openai } from '@ai-sdk/openai';
import { NocturnusAIClient } from 'nocturnusai-sdk';
const noct = new NocturnusAIClient({
baseUrl: process.env.NOCTURNUS_URL ?? 'http://localhost:9300',
tenantId: 'default',
});
export async function POST(req: Request) {
const { messages, sessionId } =
(await req.json()) as { messages: Message[]; sessionId: string };
const lastUserMessage = messages[messages.length - 1]?.content ?? '';
// Compress prior context into a short briefing
const ctx = await noct.processTurns({
turns: [lastUserMessage],
sessionId,
scope: sessionId,
});
// streamText sees a short system prompt + the new user message only
const result = streamText({
model: openai('gpt-4o-mini'),
system: ctx.briefingDelta ?? 'Start of conversation.',
messages: [{ role: 'user', content: lastUserMessage }],
});
return result.toDataStreamResponse();
}
What changes in your page
Nothing. Standard useChat UI with a persistent sessionId body param.
// app/page.tsx
'use client';
import { useChat } from 'ai/react';
import { useMemo } from 'react';
export default function Chat() {
const sessionId = useMemo(() => `session-${crypto.randomUUID()}`, []);
const { messages, input, handleInputChange, handleSubmit } = useChat({
api: '/api/chat',
body: { sessionId },
});
// ...standard chat UI
}
Why it works
On a default Vercel AI SDK chat, streamText receives the full messages array every turn. By turn 10 the prompt is thousands of tokens of repeated history. With NocturnusAI, the prompt stays flat — the model sees only the briefingDelta for this turn.
What’s in the repo
app/api/chat/route.ts— the compression handlerapp/page.tsx— standarduseChatUIbench.mjs— measure your own tokens withnode bench.mjspackage.json— Next.js +ai+@ai-sdk/openai+nocturnusai-sdk