Claude Code Rate Limit Structure and Fundamentals by Pricing Plan

Chronist Team Chronist Team

Claude Code has API rate limits (RPM/ITPM/OTPM), product-side session limits (reset approximately every 5 hours), and weekly usage allowances. When limits are reached, HTTP 429 (Too Many Requests) is returned, with response headers containing retry-after and anthropic-ratelimit-*. Also, Pro/Max offer expected usage time by model and Extra usage (pay-as-you-go) for continuation after reaching limits.

This article organizes the limit system, differences by plan, 429 error determination and countermeasures, retry design, continued operation after exceeding limits, and how to read logs.

Table of Contents

Organizing the Overall Limit Structure (API Rates and Product Limits)

First, let’s organize the limits you may encounter in Claude Code from the perspectives of application target, symptoms, and confirmation points.

Limit TypeApplication TargetMain MetricsTypical SymptomsMain Confirmation Location
API Rate LimitsMessages APIRPM / ITPM / OTPM429 + retry-after, momentary failuresResponse headers, Console
”Acceleration” LimitsMessages APIShort-term surge detectionFrequent 429s immediately after startupTransmission startup method (avoid surges)
Request Length LimitsAPI requestse.g., 32MB413 request_too_largeError message
Product Usage LimitsClaude (Web/Code/Desktop)5-hour session limits + weekly allowancesTemporary unavailability, waitingPlan-specific guide/settings screen
Batch-specific LimitsMessage Batches APIAsync, count/queueBacklog in bulk processingBatch API status/result retrieval

Understanding this categorization makes it easier to distinguish whether 429 is momentary rate excess on the API side or reaching session or weekly allowances on the product side.

The differences in representative error codes are as follows:

  • 429 indicates client-side limit reached (RPM/ITPM/OTPM).

  • 529 indicates service-side congestion, often resolving upon retry after some time.

Understanding Plan-specific Limits and Usage Images (Pro/Max)

Even within Claude Code, easily reached limits and usage time benchmarks vary by plan. Here’s an organization of representative differences.

AspectProMax 5xMax 20x
Expected message volume (5-hour unit)About 45 (Claude) / Claude Code about 10-40 promptsAbout 225About 900
Weekly expected usage time (Sonnet 4)About 40-80 hoursAbout 140-280 hoursAbout 240-480 hours
Weekly expected usage time (Opus 4)Opus usage limited in ProAbout 15-35 hoursAbout 24-40 hours
Model availabilitySonnet-centricSonnet and Opus availableSonnet and Opus available
Limit reset5-hour session refresh5-hour session refresh5-hour session refresh
Continuation after reaching limitsContinuation possible with Extra usage (pay-as-you-go)Continuation possible with Extra usageContinuation possible with Extra usage

The table’s figures and reset specifications are based on official help documentation benchmarks, with actual results varying based on codebase scale, auto-approval settings, and parallelism. Also note differences in weekly allowances (expected time) by model.

Understanding Shared Usage Allowances Between CLI and Web

Claude Code CLI and Web version (Claude and Claude Code on the web), while differing in execution aspects, share the same plan’s usage allowance.

  • Web version and CLI consume the same contract allowance, with both counted together.

  • More parallel executions mean faster reaching of 5-hour session allowances and weekly usage allowances.

  • When limits are reached, confirmation to switch to Extra usage appears according to settings.

  • Execute immediate-response processing on Web/API, and separate bulk or non-real-time processing to Batches API to avoid pressuring allowances.

Understanding this premise allows incorporating the assumption that “the same allowance decreases regardless of where you execute” as a prerequisite, reducing unexpected bottlenecks during operation.

429 Error Determination and Reading Header Information

429 indicates “rate or token limit reached,” with response headers containing recovery benchmarks.

HTTP 429 Response Example (excerpt)

HTTP/1.1 429 Too Many Requests
retry-after: 17
anthropic-ratelimit-requests-limit: 50
anthropic-ratelimit-requests-remaining: 0
anthropic-ratelimit-input-tokens-remaining: 0
anthropic-ratelimit-output-tokens-remaining: 7600
content-type: application/json
{
"type": "error",
"error": {
"type": "rate_limit_error",
"message": "Requests per minute limit exceeded for model."
}
}

From this response, you can confirm the following:

  • retry-after is the wait time in seconds before retry, with retries earlier than this value likely to continue failing.

  • anthropic-ratelimit-* indicates limit values, remaining amounts, restoration timing, etc., providing material to judge which metric (RPM or ITPM/OTPM) ran out first.

  • Related errors include 529 overloaded_error indicating service overload, and 413 request_too_large indicating request size excess.

In practice, always outputting this header information to logs and being able to immediately determine which threshold was hit speeds up cause identification and recovery.

Operational Design to Prevent Stoppage After Reaching Limits (Utilizing Extra Usage)

When needing to continue business after reaching limits, you can enable Extra usage to switch to pay-as-you-go and continue usage.

Extra usage settings are made from Claude’s settings screen at “Settings > Usage,” controlling monthly caps and whether automatic charging occurs. If you don’t want to move to pay-as-you-go, you can also choose operation that refuses pay-as-you-go usage at the presented confirmation screen and waits for session reset or weekly allowance recovery.

On screen, you can confirm pay-as-you-go usage status and charges, making it easy to understand monthly usage outlook.

Extra Usage Settings

Extra usage applies to both Claude’s regular conversation and Claude Code operations, with usage from both UIs combined.

Additional usage applies to both Claude conversation and Claude Code, counted as combined from both UIs.

Retry Design and Concepts of Waiting, Smoothing, and Idempotency

In retry design, two points are important: “how to wait” and “whether repeating the same process is safe.”

  • Prioritize respecting retry-after values, then adjust retry intervals by combining exponential backoff and jitter.

  • Separate queues by model and purpose, smoothing transmission to avoid acceleration limits from short-term surges.

  • Review max_tokens and input length to prevent ITPM/OTPM estimates from becoming excessive. Consider prompt caching for long texts.

  • Design request IDs and state management to ensure idempotency, preventing side effects when re-executing the same process.

Implementing these while understanding header specifications and rate limit concepts allows minimizing overall business delays while recovering when 429 occurs.

Offloading Non-real-time Processing Through Batch Operations

For processing not requiring immediacy, separating to Message Batches API can improve both stability and costs.

Comparison AspectRegular (Online)Message Batches API
LatencyImmediate (SSE/sync)Async (completes within 24 hours max)
ThroughputSusceptible to rate limit effectsUp to 10,000 items per batch possible
CostRegular pricingHalf price of standard (50% off)
Application ExamplesDialogue, sequential responseBulk report generation, batch summarization/evaluation

For bulk processing, clearly separating online and batch purposes makes it easier to avoid peak hour collisions.

Key Points in Logs and Visualization

Bottleneck identification utilizes both response headers and settings/usage screens.

  • On the application side, always save anthropic-ratelimit-* and retry-after to logs and visualize on dashboards, etc.

  • In settings and usage screens, confirm current consumption status, Extra usage costs, and weekly allowance consumption.

  • Understand where bottlenecks occur by model and time period, reflecting in transmission plans (leveling, model switching, max_tokens review).

Continuing this operation enables quantitative understanding of “which limits,” “at which times,” and “with which models” bottlenecks occur, making it easier to balance costs and performance.

Summary

Claude Code limits consist of API momentary rates (RPM/ITPM/OTPM) and product-side session limits (5-hour reset) + weekly allowances. First, when 429 errors occur, it’s important to check response headers and determine which threshold was reached. For design aspects, assume backoff complying with retry-after and transmission smoothing, review of max_tokens and input length, and model selection optimization as prerequisites, separating non-immediate processing to Message Batches API for safety. If operation requiring no stoppage after reaching limits is needed, appropriately use Extra usage (pay-as-you-go) to continue, running cost control in parallel. Since expected time and switching specifications by plan can be updated, we recommend regularly updating operational policies aligned with the latest official help information.