CostPlan – LLM Cost Enforcement Proxy
Open-source circuit breaker for autonomous agent API spend
The Setup
You're running an autonomous coding agent — let's say Moltbot, or Claude Code, or your own agent loop. The agent reads a codebase, plans changes, writes code, runs tests, reads errors, fixes bugs, and repeats. Every iteration is an API call. Every API call costs money.
Here's what a typical agent session looks like in terms of token usage:
| Iteration | Input Tokens | Output Tokens | Cost | |-----------|-------------|---------------|------| | 1 | 8,000 | 2,000 | $0.09 | | 2 | 15,000 | 3,000 | $0.17 | | 3 | 28,000 | 4,000 | $0.30 | | 4 | 45,000 | 5,000 | $0.47 | | 5 | 70,000 | 6,000 | $0.71 | | ... | ... | ... | ... | | 20 | 200,000 | 8,000 | $1.88 |
Notice the pattern: input tokens grow every iteration because the agent includes prior context (conversation history, file contents, error logs) in each call. By iteration 20, you're sending 200k input tokens per call.
Over 20 iterations on Claude Sonnet 4, that session costs about $7.50. That's one task. A productive developer might run 30–40 sessions a day. That's $225–$300/day in API costs.
Now imagine the agent hits a bug it can't fix. It tries the same approach, gets the same error, retries with a slightly different prompt, gets the same error again. Ten iterations burned on a single failing test. That's $15 wasted in a loop.
The Problem
There's no built-in way to say: "Stop after $10."
The Anthropic API doesn't have a budget parameter. max_tokens limits output length, not cost. There's no session concept. Each API call is independent — the server doesn't know or care that you've already spent $50 today.
Billing alerts exist, but they're:
- Delayed — minutes to hours behind real spend
- Non-blocking — the alert doesn't stop the next call
- Account-level — no per-agent or per-session granularity
If your agent is burning $5/minute in a retry loop, a billing alert that fires hourly won't help.
The Fix: Two Lines
Start the CostPlan proxy with a session budget:
costplan proxy --per-call 1.00 --session 10.00 --port 8080
Point your agent at it:
export ANTHROPIC_BASE_URL=http://localhost:8080
That's it. No code changes. No configuration files. The agent doesn't know or care that it's talking to a proxy instead of the real API.
What Happens Now
Normal Operation
The proxy forwards every request to Anthropic, streaming SSE chunks back in real-time. Zero added latency on the response stream. After each response completes, the proxy calculates the actual cost (including cache read and cache creation tokens) and deducts it from the session budget.
Your agent works exactly as before — until it shouldn't.
Budget Approaching
The proxy tracks remaining budget and returns it in response headers:
X-Budget-Remaining: 2.35
X-Budget-Session-Total: 10.00
If your agent framework reads headers (most don't, but yours could), you can implement graceful wind-down: "You have $2 left, wrap up."
Budget Exceeded
When the session budget is exhausted, the next API call gets:
HTTP 429 Too Many Requests
The agent receives an error. Most agent frameworks handle API errors by stopping or retrying with backoff — either way, the spending stops.
This is the circuit breaker. The session cannot spend more than $10.
Per-Call Protection
The --per-call 1.00 flag catches a different failure mode: a single gigantic call. If your agent accidentally sends a 500k-token context (maybe it read an entire repository into the prompt), the proxy estimates the cost before forwarding and rejects it if over the per-call limit.
The expensive call never reaches Anthropic. You're not charged.
The Numbers
Here's the same 20-iteration scenario, with CostPlan enforcing a $10 session budget:
| Iteration | Cumulative Cost | Budget Remaining | Status | |-----------|----------------|-----------------|--------| | 1 | $0.09 | $9.91 | Forwarded | | 5 | $1.74 | $8.26 | Forwarded | | 10 | $5.20 | $4.80 | Forwarded | | 13 | $8.45 | $1.55 | Forwarded | | 14 | $9.80 | $0.20 | Forwarded | | 15 | — | $0.20 | Rejected (429) |
The agent gets 14 productive iterations. The 15th is rejected because the estimated cost exceeds the remaining $0.20. Total spend: $9.80 instead of unbounded.
If the agent was in a retry loop (iterations 10–20 all hitting the same bug), the circuit breaker would have tripped at iteration 13 or 14 instead of letting it burn through $7+ more on hopeless retries.
Cache-Aware Accuracy
A naive cost tracker would look at input tokens and output tokens and multiply by the standard rates. But Anthropic's actual pricing has four token types:
| Token Type | Rate (Claude Sonnet 4) | Description | |------------|----------------------|-------------| | Input | $0.003/1K | Regular input tokens | | Output | $0.015/1K | Generated output tokens | | Cache Read | $0.0003/1K | Reusing cached context (90% discount) | | Cache Creation | $0.00375/1K | Writing new context to cache (25% surcharge) |
Agents that use prompt caching (including Claude Code) might have 80% of their input tokens come from cache reads. If you price those at the full input rate, your cost tracker reports 5x the actual cost — which means your budget limit triggers way too early, killing productive sessions.
CostPlan parses cache token counts from the SSE stream and prices each type correctly. The budget reflects real spend, not estimates.
When To Use This
You should use CostPlan if:
- You're running autonomous agents (Moltbot, Claude Code, custom loops)
- You're letting LLMs run unattended (CI/CD, batch jobs, overnight tasks)
- You're building multi-agent systems where total cost is hard to predict
- You want to give team members API access without unlimited spend
You don't need CostPlan if:
- You're making a few manual API calls per day
- Your LLM usage is already predictable and low-cost
- You're fine with billing alerts and manual monitoring
The Broader Point
The AI ecosystem has great tools for building agents. It has almost no tools for constraining agents. The assumption seems to be that developers will be careful, that usage will be reasonable, that loops will terminate.
In production, that assumption costs real money.
CostPlan is infrastructure, not a feature. It's the ulimit for LLM spend — boring, invisible, and exactly the thing you wish you'd had the morning after your agent burned $200 overnight.
CostPlan is open source and takes two lines to set up. Get started on GitHub

