The problem: Every tool call re-sends the prior conversation into the agent's context.
Without NocturnusAI
1,259
tokens / turn
→
With NocturnusAI
221
tokens / turn
Reduction
5.7×
82% fewer tokens
Measured on our 15-turn product-support benchmark against Claude Opus 4.
The compression happens at the Nocturnus layer — it's framework-agnostic.
Run bench.py
against your own workload.
get_nocturnusai_tools() returns a list of Agents-SDK-compatible tools. Pair them with process_turns() before each agent run and the agent sees only what changed.
from nocturnusai import SyncNocturnusAIClientfrom nocturnusai.openai_agents import get_nocturnusai_toolsfrom agents import Agent, Runnerwith SyncNocturnusAIClient("http://localhost:9300") as client: # Compress the ticket thread once client.process_turns( turns=[ "User: Globex asking for SLA credits, claims 14h downtime.", "Tool sla_calculator: breach=true, credit_owed=$77500.", "Tool contract_check: auto_approval_limit=$100k.", ], scope="sla-ticket-8812", session_id="sla-ticket-8812", ) # Hand the knowledge-query tools to the agent agent = Agent( name="sla-resolver", instructions="Resolve SLA credit tickets. Query the KB (scope='sla-ticket-8812').", tools=get_nocturnusai_tools(client), ) result = Runner.run_sync(agent, "Is this ticket eligible for auto-approval?") print(result.final_output) # → "Yes — $77,500 is under the $100,000 auto-approval threshold."
Why it works
The agent never sees the raw ticket thread. It sees facts like credit_owed(globex, 77500) and auto_approval_limit(100000) — and backward-chains from eligible_for_auto_approval(globex) to the answer. The context stays flat turn-over-turn.
What’s in the repo
knowledge_agent.py — full agent wiring against a 5-turn SLA ticket
bench.py — run a 15-turn conversation with and without NocturnusAI against your own OpenAI key