5-MINUTE WOW · MCP

MCP + NocturnusAI

The problem: Claude Desktop, Cursor, and Continue accumulate tool outputs across the session — context window fills with repetition.

Without NocturnusAI

1,259

tokens / turn

→

With NocturnusAI

221

tokens / turn

Reduction

5.7×

82% fewer tokens

Measured on our 15-turn product-support benchmark against Claude Opus 4. The compression happens at the Nocturnus layer — it's framework-agnostic. Run bench.py against your own workload.

Copy-paste install

docker run -d -p 9300:9300 -e EXTRACTION_ENABLED=true ghcr.io/auctalis/nocturnusai:latest
cp config.json ~/Library/Application\ Support/Claude/mcp_servers/nocturnus.json

2-minute demo

Claude Desktop session, 10 tool calls, flat token cost per turn.

What changes in your MCP client

Drop this config into your MCP client’s server directory. On next restart, the Nocturnus tools — tell, ask, teach, forget, context, predicates, bulk_assert, fork_scope — appear in your tool list.

{
  "mcpServers": {
    "nocturnus": {
      "url": "http://localhost:9300/mcp/sse",
      "transport": "sse",
      "env": {
        "NOCTURNUS_TENANT_ID": "default",
        "NOCTURNUS_DATABASE": "default"
      }
    }
  }
}

How to use it in a session

When your assistant needs working memory — at the start of a task, before a tool call, or when the thread grows long — it calls nocturnus:context. Nocturnus returns a salience-ranked working set instead of the full transcript.

You:  Remember customer_tier(acme, enterprise) and contract_value(acme, 2000000).
      Then call context to see what you know.

Claude (tool call → nocturnus:tell):  customer_tier(acme, enterprise)
Claude (tool call → nocturnus:tell):  contract_value(acme, 2000000)
Claude (tool call → nocturnus:context):  { "goals": [...], "maxFacts": 8 }

Claude (response):  "I know Acme is enterprise tier with a $2M contract.
                    Salience 0.65 on both. Ready for the next step."

Why it works

On a naive MCP client, every turn sends the full prior conversation to the model. Prompt size grows linearly. With Nocturnus as an MCP server, the assistant calls nocturnus:context when it wants working memory — and receives a short goal-filtered briefing. The per-turn cost stays flat regardless of how long the session runs.

What’s in the repo

config.json — drop-in for Claude Desktop, Cursor, Continue, Windsurf
example-session.md — 10-turn walkthrough with before/after token counts per turn

Run it yourself

GITHUB

Full example directory

examples/mcp/

METHODOLOGY

Benchmark details

How the numbers were measured, and how to reproduce them.

NEXT EXAMPLE

Vercel AI SDK

streamText() receives the full message history on every request — token cost climbs with every turn.