429 at the edge, see Rate Limits.
| Mechanism | What it limits | Behavior on exceeded | Returns |
|---|---|---|---|
| Rate limits | Requests per second to the API | Request is rejected at the ingress | 429 |
| Concurrency throttle | Parse batches running in parallel for your account | Work queues until a slot frees, then runs | 200 (after wait) |
Your Ceiling
earned_base: capacity sized for your sustained recent traffic, starting from your tier baseline.burst_headroom: short-term slack on top ofearned_baseso a sudden spike does not immediately queue.
Tier Baselines
The baseline is the starting allocation forearned_base in the region you’re hitting. With little or no recent traffic, your ceiling sits around this baseline; sustained traffic grows it above. Baselines vary per region because shared compute capacity is sized for the typical regional load.
| Tier | US | EU | AU |
|---|---|---|---|
| Standard | 200 | 60 | 10 |
| Growth | 350 | 120 | 20 |
| Enterprise | 500+ (custom) | 275+ (custom) | 115+ (custom) |
How Earned Capacity Grows
Submit consistent traffic and your ceiling grows above the baseline. Reducto measures your submission rate over a short trailing window and sizes your ceiling for that rate plus burst headroom. When you stop submitting, the ceiling decays back toward the baseline over the same window. Bursty traffic gets less headroom than steady traffic at the same average rate.Tenant Throttling
For multi-tenant applications, you can passsettings.tenant_throttling on parse requests to bound how much of your account’s concurrency a single one of your own customers, workspaces, or organizations can consume. Tag each request with the tenant it belongs to:
tenant_id— your identifier for the tenant. Requests with the same id share one tenant-level throttle inside your account.max_share— the maximum fraction of your account’s concurrency ceiling this tenant may use, between 0 (exclusive) and 1. Optional; defaults to0.5. You can pass different values for different tenants — for example0.2on a backfill tenant’s requests to keep more headroom for interactive traffic.
tenant_throttling is omitted, Reducto uses the existing account-level behavior only.
Sync vs Async Under Throttle
Requests above your ceiling do not fail. They queue until a slot frees, then run. The wait surfaces differently depending on endpoint type:- Async (
/parse_async,/extract_async,/split_async,/edit_async). The job is accepted immediately, ajob_idreturned, and the work queued. Latency shows up between submission and webhook delivery, never as a 4xx. - Sync (
/parse,/extract,/split,/edit). The HTTP request blocks until a slot opens and the job completes. Total response time includes queue wait. The edge has a 15-minute (900s) hard timeout, so a sustained burst on sync endpoints risks the HTTP connection timing out before the job finishes.
Related
Rate Limits
Per-second request caps at the API edge.
Async Processing
Submit jobs and receive results via webhook.
Batch Processing
Patterns for processing many documents.