When the Docs Fall Short: Investigating Claude Code’s Budget Cap

Whether or not it is controversial, I do use Claude Code quite heavily for testing in my day job. In fact, I designed an internal tool which is an AI-powered static analysis tool. There are pros and cons of Claude Code. I’m not going to cover them today. But I am going to cover something that is not well documented and I had to test myself to figure it out.

Claude Code

For those who don’t know, Claude Code is a command line utility which is essentially a harness and interface to Anthropic’s Claude models. It allows you to use Claude in a codebase for whatever task you need (with caveats I won’t get into today). There is also a “Claude Code” on their website and in the GUI application, but it is the CLI I am focusing on today.

Now, there are two ways to use Claude Code. The first is fixed billing, which is at tier levels of $20, $100 or $200 per month. This gives you a maximum amount of usage which resets every 5 hours and every week. This usage allowance is typically much higher than what you are paying for.

The second is API billing, which charges you per transaction. This is what is used when you go over your fixed usage and enterprise plans typically use this.

Now, the point of this blog post. Claude Code has a non-interactive mode in addition to the interactive mode. The non-interactive mode is very useful for scripting and CI usage. Obviously, in this mode, you don’t want it to run away with your API usage and blow lots of money for no good reason. This is why the non-interactive mode has the option --max-budget-usd.

Max budget option

At the time of writing, the documentation for this option says:

Maximum dollar amount to spend on API calls before stopping (print mode only)

OK, that is great, but then what? Does the caller get an error? How is it determined when the maximum is hit? I asked Claude itself and it didn’t know. I searched around the web and couldn’t find any information.

Therefore, I’m doing today what I always do when I find documentation missing, I research and write it myself.

I could look at the recent Claude Code source leaks that have been shared everywhere, but that probably wouldn’t be legal and certainly wouldn’t be ethical. So, instead, I decided to test it to see what would happen. I should note that this works whether you are using fixed billing or API billing. The costs are absorbed automatically when you are using fixed billing.

claude -p --max-budget-usd 0.01 --output-format json "write me a 2000 word essay on bees" | jq

This is the response I got (I’ve blanked out session identifiers):

{
  "type": "result",
  "subtype": "error_max_budget_usd",
  "duration_ms": 3807,
  "duration_api_ms": 0,
  "is_error": true,
  "num_turns": 1,
  "stop_reason": "end_turn",
  "session_id": "XXXXX",
  "total_cost_usd": 0.08440724999999999,
  "usage": {
    "input_tokens": 0,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "output_tokens": 0,
    "server_tool_use": {
      "web_search_requests": 0,
      "web_fetch_requests": 0
    },
    "service_tier": "standard",
    "cache_creation": {
      "ephemeral_1h_input_tokens": 0,
      "ephemeral_5m_input_tokens": 0
    },
    "inference_geo": "",
    "iterations": [],
    "speed": "standard"
  },
  "modelUsage": {
    "claude-opus-4-7[1m]": {
      "inputTokens": 6,
      "outputTokens": 48,
      "cacheReadInputTokens": 20117,
      "cacheCreationInputTokens": 11699,
      "webSearchRequests": 0,
      "costUSD": 0.08440724999999999,
      "contextWindow": 1000000,
      "maxOutputTokens": 64000
    }
  },
  "permission_denials": [],
  "fast_mode_state": "off",
  "uuid": "XXXXX",
  "errors": [
    "Reached maximum budget ($0.01)"
  ]
}

In addition to this, the console return code was a 1.

Now, there is a lot we can figure out from this. First off, we do get an error, and it is clear what the error code and message is. Which I think is very useful information. But I think we can also infer from this how it is figuring out maximum usage. This is my theory:

Every prompt or turn from Claude Code to the API server returns a spend. Claude Code then adds that up and if it goes over the limit, then Claude Code returns the error. This means that you could potentially go well over the limit before it is detected. But to be fair, this makes sense. It would be extremely difficult and costly to figure this out on-the-fly at the server side for accurate capping.

It is also worth noting that it is much cheaper the second time you run that same command and prompt, likely due to cache usage.

I hope this is useful to someone, and I hope Anthropic eventually create better documentation for the non-interactive mode. Because it is extremely useful.

Full disclosure: this blog post was written by me, but the cover image was AI generated.