Features

Cut the prompt bill.

Big threads in. Smaller prompts out. Everything else is backend.

1. Why prompt replay gets expensive

Most of the waste is old context.

The model keeps getting charged to reread things it does not need.

Repeated history

Old messages and old summaries keep showing up long after they stopped mattering.

Tool chatter

Trace output, logs, notes, and retries are useful to systems, not to every next prompt.

Stale state

Resolved issues and dead branches still ride along unless something cuts them out.

2. How Nocturnus reduces it

Three calls. Lower spend.

Cut the prompt. Tighten the next step. Stop replaying stable state.

POST /context with sessionId

Send just the new turns. The server tracks priors, extracts facts under the conversation scope, and returns a briefingDelta — only what's new.

What you send
{
  "turns": [new messages only],
  "scope": "conversation-id",
  "sessionId": "conversation-id"
}

POST /memory/context with goals

Once the next step is clear, narrow the window to just the subset the model needs.

DELETE /scope/{id}

Conversation done? Delete the scope. Promote durable facts to the tenant layer first.

3. What works underneath

Backend mechanics. Not the pitch.

This is the machinery that keeps the cut reliable.

Extraction and ranking

Turn arrays become usable state

Raw text gets extracted, ranked, and deduplicated before it hits the model.

Memory lifecycle

State can decay, compress, and expire

Old context can decay, compress, and expire instead of growing forever.

Logical consistency

Backend mechanics keep stale conclusions from lingering

Truth maintenance and temporal state keep the reducer from drifting.

4. Conversation memory

Three layers. One identifier.

Use the conversation id as both scope and sessionId. Nocturnus handles the rest.

Durable customer knowledge

Facts with no scope: customer profiles, policies, product catalogs. Survives across every conversation. Think CRM state.

Facts scoped to this dialog

Facts tagged with scope=conversationId. Tool results, user statements, agent commitments. Deleted when the ticket closes.

Inferred automatically

Rules fire across both layers. If a customer is enterprise tier and has an outage, the priority escalation rule triggers without you writing glue code.

The full turn lifecycle
// 1. Each turn — extract, store, get delta
r = POST /context(turns, scope, sessionId)
next_prompt = r.briefingDelta

// 2. Need a goal-specific subset?
ctx = POST /memory/context(goals, sessionId, format="natural")
next_prompt = ctx.formattedText

// 3. Promote durable facts before cleanup
POST /assert/fact({"predicate":"IsA","args":["Acme","Enterprise"]})

// 4. Conversation done — clean up
DELETE /scope/{conversationId}
POST /context/session/clear({"sessionId":"..."})
5. How to integrate it

Keep your stack.

OpenClaw is a clear example: more context means more tokens. Nocturnus cuts the payload before the run.

OpenClaw

Fast MCP now. Full Context Engine later.

Add it with openclaw mcp set now, or replace context assembly completely later.

SDKs

Python and TypeScript

Call windows, optimize, diffs, summaries, and cleanup directly.

REST, MCP, and CLI

Use the reducer where the stack already runs.

Use REST in app code, MCP for agents, or CLI locally.