OpenClaw Money-Saving Strategy: Saving Two Thousand a Month - What Am I Doing Right?
Original Article Title: Why My OpenClaw Sessions Burned 21.5M Tokens in a Day (And What Actually Fixed It)
Original Article Author: MOSHIII
Translation: Peggy, BlockBeats
Editor's Note: In the current rapid adoption of Agent applications, many teams have encountered a seemingly anomalous phenomenon: while the system appears to be running smoothly, the token cost continues to rise unnoticed. This article reveals that the reason for cost explosion in a real OpenClaw workload often does not stem from user input or model output but from the overlooked cached prefix replay. The model repeatedly reads a large historical context in each call, leading to significant token consumption.
The article, using specific session data, demonstrates how large intermediate artifacts such as tool outputs, browser snapshots, JSON logs, etc., are continuously written into the historical context and repetitively read in the agent loop.
Through this case study, the author presents a clear optimization approach: from context structure design, tool output management to compaction mechanism configuration. For developers building Agent systems, this is not only a technical troubleshooting record but also a practical money-saving strategy.
Below is the original article:
I analyzed a real OpenClaw workload and discovered a pattern that I believe many Agent users will recognize:
The token usage looks "active."
The replies appear normal.
But the token consumption suddenly explodes.
Here is the breakdown of the structure, root cause, and a practical fix path for this analysis.
TL;DR
The biggest cost driver is not overly long user messages. It is the massive cached prefix being repeatedly replayed.
From the session data:
Total tokens: 21,543,714
cacheRead: 17,105,970 (79.40%)
input: 4,345,264 (20.17%)
output: 92,480 (0.43%)
In other words, the majority of the cost of inference is not in processing new user intent, but in repeatedly reading a massive historical context.
The "Wait, Why?" Moment
I originally thought high token usage came from: very long user prompts, extensive output generation, or expensive tool invocations.
But the predominant pattern is:
input: hundreds to thousands of tokens
cacheRead: each call 170k to 180k tokens
In other words, the model is rereading the same massive stable prefix every round.
Data Scope
I analyzed data at two levels:
1. Runtime logs
2. Session transcripts
It's worth noting that:
Runtime logs are primarily used to observe behavioral signals (e.g., restarts, errors, configuration issues)
Precise token counts come from the usage field in session JSONL
Scripts used:
scripts/session_token_breakdown.py
scripts/session_duplicate_waste_analysis.py
Analysis files generated:
tmp/session_token_stats_v2.txt
tmp/session_token_stats_v2.json
tmp/session_duplicate_waste.txt
tmp/session_duplicate_waste.json
tmp/session_duplicate_waste.png
Where is the Token Actually Being Consumed?
1) Session Centralization
There is one session that consumes significantly more than others:
570587c3-dc42-47e4-9dd4-985c2a50af86: 19,204,645 tokens
This is followed by a sharp drop-off:
ef42abbb-d8a1-48d8-9924-2f869dea6d4a: 1,505,038
ea880b13-f97f-4d45-ba8c-a236cf6f2bb5: 649,584
2) Behavior Centralization
The tokens mainly come from:
toolUse: 16,372,294
stop: 5,171,420
The issue is primarily with tool call chain loops rather than regular chat.
3) Time Centralization
The token peaks are not random but rather concentrated in a few time slots:
2026-03-08 16:00: 4,105,105
2026-03-08 09:00: 4,036,070
2026-03-08 07:00: 2,793,648
What Exactly Is in the Massive Cache Prefix?
It's not the conversation content but mainly large intermediate artifacts:
Massive toolResult data blocks
Lengthy reasoning/thinking traces
Large JSON snapshots
File lists
Browser fetch data
Sub-Agent conversation logs
In the largest session, the character count is approximately:
toolResult:text: 366,469 characters
assistant:thinking: 331,494 characters
assistant:toolCall: 53,039 characters
Once these contents are retained in the historical context, each subsequent invocation may retrieve them again via a cache prefix.
Specific Example (from session file)
A significantly large context block repeatedly appears at the following locations:
sessions/570587c3-dc42-47e4-9dd4-985c2a50af86.jsonl:70
Large Gateway JSON Log (approx. 37,000 characters)
sessions/570587c3-dc42-47e4-9dd4-985c2a50af86.jsonl:134
Browser Snapshot + Security Encapsulation (approx. 29,000 characters)
sessions/570587c3-dc42-47e4-9dd4-985c2a50af86.jsonl:219
Large File List Output (approx. 41,000 characters)
sessions/570587c3-dc42-47e4-9dd4-985c2a50af86.jsonl:311
session/status Status Snapshot + Large Prompt Structure (approx. 30,000 characters)
「Duplicate Content Waste」 vs 「Cache Replay Burden」
I also measured the duplicate content ratio within a single invocation:
Approximate duplication ratio: 1.72%
It does exist but is not the primary issue.
The real problem is: the absolute volume of the cache prefix is too large
Structure: Massive historical context, re-read per-round invocation, with only a small amount of new input stacked on top.
Therefore, the optimization focus is not on deduplication, but on context structure design.
Why is Agent Loop particularly prone to this issue?
Three mechanisms overlapping:
1. A large amount of tool output is written to historical context
2. Tool looping generates a large number of short interval calls
3. Minimal prefix changes → cache is re-read every time
If context compaction is not stably triggered, the issue will quickly escalate.
Most Critical Remediation Strategies (by impact)
P0—Avoid stuffing massive tool output into long-lived context
For oversized tool output:
· Keep summary + reference path / ID
· Write original payload to a file artifact
· Do not retain the full original text in chat history
Priority to limit these categories:
· Large JSON
· Long directory lists
· Browser full snapshots
· Sub-Agent full transcripts
P1—Ensure compaction mechanism truly takes effect
In this dataset, configurational compatibility issues have repeatedly arisen: the compaction key is invalid
This will silently disable optimization mechanisms.
Correct approach: use only version-compatible configurations
Then verify:
openclaw doctor --fix
and check startup logs to confirm compaction acceptance.
P1—Reduce reasoning text persistence
Avoid long reasoning texts being replayed repeatedly
In a production environment: save brief summaries instead of complete reasoning
P2—Improve prompt caching design
Goal is not to maximize cacheRead. Goal is to use cache on compact, stable, high-value prefixes.
Recommendations:
· Put stable rules into system prompt
· Avoid putting unstable data under stable prefixes
· Avoid injecting large amounts of debug data each round
Implementation Stop-Loss Plan (if I were to tackle it tomorrow)
1. Identify the session with the highest cacheRead percentage
2. Run /compact on runaway sessions
3. Add truncation + artifacting to tool outputs
4. Rerun token stats after each modification
Focus on tracking four KPIs:
cacheRead / totalTokens
toolUse avgTotal/call
Calls with>=100k tokens
Maximum session percentage
Success Signals
If the optimization is successful, you should see:
A noticeable reduction in calls with 100k+ tokens
A decrease in cacheRead percentage
A decrease in toolUse call weight
A decrease in the dominance of individual sessions
If these metrics do not change, it means your contextual policies are still too loose.
Reproducibility Experiment Command
python3 scripts/session_token_breakdown.py 'sessions' \
--include-deleted \
--top 20 \
--outlier-threshold 120000 \
--json-out tmp/session_token_stats_v2.json \
> tmp/session_token_stats_v2.txt
python3 scripts/session_duplicate_waste_analysis.py 'sessions' \
--include-deleted \
--top 20 \
--png-out tmp/session_duplicate_waste.png \
--json-out tmp/session_duplicate_waste.json \
> tmp/session_duplicate_waste.txt
Conclusion
If your Agent system appears to be working fine but costs are continually rising, you may want to first check for one issue: Are you paying for new inferences or for large-scale replay of old contexts?
In my case, the majority of costs actually came from context replays.
Once you realize this, the solution becomes clear: Strictly control the data entering long-lived contexts.
You may also like

Overnight, the crypto tycoons were severely played by Vanity Fair
Auto Earn Crypto Passive Income: Staking Rewards Up to 8% APR
Start earning crypto passive income with auto earn. Get up to 8% APR on BTC and higher yields on stablecoins. Compare staking rewards and maximize your returns today.

Interview with Hyperliquid Founder Jeff Yan: Crypto and DeFi Are in Our DNA, Never Compromising on Trust

$1 Billion Free Lottery, Kalshi Launches Prediction Challenge

SlowMist: Is it Really Safe to Entrust Your Money to an AI Agent like "Lobster"?

Regulation, Insiderism, and Essence: The Story Behind Kalshi's $20 Billion Valuation

You Have Been Training Google's AI for Free for 15 Years, and You Didn't Even Know
Best AI Crypto Trading Bot? Inside the AI Trading System That Ranked Top 3 on WEEX
Discover the best AI crypto trading bot on WEEX. Learn how AI trading works, how to trade automatically, and why this system stands out among top AI trading apps.

How to Trade Cryptocurrency Without App Store: Instant Browser Crypto Trading on WEEX
Trade crypto instantly without downloading an app. Use WEEX H5 to access spot and futures trading directly in your browser with fast execution, real-time risk control, and seamless experience across mobile, tablet, and desktop. Supports Bitcoin, Ethereum, and more.

From OKX to Bybit, exchanges are changing tires on the highway at high speed

A Brief History and Future of Perpetual Contracts

AI Agent Gets ID and Wallet on the Same Day | Rewire News Morning Brief

IOSG: Power Flexibility Paradigm Shift: From Macro Assets to Distributed Intelligence Layer

Murata 35% Price Increase Explained: A Capacitor that Gives AI Empire a Cold

MiniMax: A Henan County Youth and His 300 Billion

From Abandoned Project to Sky-High Target, Mastercorp Acquires BVNK for $1.8 Billion

Is Polymarket's Pricing Accurate? I Simulated a Crisis with 200 Agents to Find Out

A Decade of Regulation Finally Clarified, Victory for Crypto-Native Logic
Overnight, the crypto tycoons were severely played by Vanity Fair
Auto Earn Crypto Passive Income: Staking Rewards Up to 8% APR
Start earning crypto passive income with auto earn. Get up to 8% APR on BTC and higher yields on stablecoins. Compare staking rewards and maximize your returns today.