← All examples
5-MINUTE WOW · MCP

MCP + NocturnusAI

The problem: Claude Desktop, Cursor, and Continue accumulate tool outputs across the session — context window fills with repetition.

Without NocturnusAI
1,259
tokens / turn
With NocturnusAI
221
tokens / turn
Reduction
5.7×
82% fewer tokens

Measured on our 15-turn product-support benchmark against Claude Opus 4. The compression happens at the Nocturnus layer — it's framework-agnostic. Run bench.py against your own workload.

Copy-paste install

docker run -d -p 9300:9300 -e EXTRACTION_ENABLED=true ghcr.io/auctalis/nocturnusai:latest
cp config.json ~/Library/Application\ Support/Claude/mcp_servers/nocturnus.json

2-minute demo

Claude Desktop session, 10 tool calls, flat token cost per turn.

MCP demo — Claude Desktop session, 10 tool calls, flat token cost per turn

What changes in your MCP client

Drop this config into your MCP client’s server directory. On next restart, the Nocturnus tools — tell, ask, teach, forget, context, predicates, bulk_assert, fork_scope — appear in your tool list.

{
  "mcpServers": {
    "nocturnus": {
      "url": "http://localhost:9300/mcp/sse",
      "transport": "sse",
      "env": {
        "NOCTURNUS_TENANT_ID": "default",
        "NOCTURNUS_DATABASE": "default"
      }
    }
  }
}

How to use it in a session

When your assistant needs working memory — at the start of a task, before a tool call, or when the thread grows long — it calls nocturnus:context. Nocturnus returns a salience-ranked working set instead of the full transcript.

You:  Remember customer_tier(acme, enterprise) and contract_value(acme, 2000000).
      Then call context to see what you know.

Claude (tool call → nocturnus:tell):  customer_tier(acme, enterprise)
Claude (tool call → nocturnus:tell):  contract_value(acme, 2000000)
Claude (tool call → nocturnus:context):  { "goals": [...], "maxFacts": 8 }

Claude (response):  "I know Acme is enterprise tier with a $2M contract.
                    Salience 0.65 on both. Ready for the next step."

Why it works

On a naive MCP client, every turn sends the full prior conversation to the model. Prompt size grows linearly. With Nocturnus as an MCP server, the assistant calls nocturnus:context when it wants working memory — and receives a short goal-filtered briefing. The per-turn cost stays flat regardless of how long the session runs.

What’s in the repo

  • config.json — drop-in for Claude Desktop, Cursor, Continue, Windsurf
  • example-session.md — 10-turn walkthrough with before/after token counts per turn

Run it yourself