Concurrency Throttle

Reducto enforces two independent limit mechanisms. This page covers the concurrency throttle. For the per-second request rate caps that return 429 at the edge, see Rate Limits.

Mechanism	What it limits	Behavior on exceeded	Returns
Rate limits	Requests per second to the API	Request is rejected at the ingress	`429`
Concurrency throttle	Parse batches running in parallel for your account	Work queues until a slot frees, then runs	`200` (after wait)

If you submit more parse work than your account’s concurrency ceiling allows, Reducto queues the excess rather than rejecting it. You see added latency, not 4xx.

Your Ceiling

ceiling = earned_base + burst_headroom

earned_base: capacity sized for your sustained recent traffic, starting from your tier baseline.
burst_headroom: short-term slack on top of earned_base so a sudden spike does not immediately queue.

The unit is concurrent batches. Reducto splits a parse job into one or more batches, typically around 10 pages each. A 5-page document runs as a single batch. A 200-page document runs as roughly 20 concurrent batches.

Tier Baselines

The baseline is the starting allocation for earned_base in the region you’re hitting. With little or no recent traffic, your ceiling sits around this baseline; sustained traffic grows it above. Baselines vary per region because shared compute capacity is sized for the typical regional load.

Tier	US	EU	AU
Standard	200	60	10
Growth	350	120	20
Enterprise	500+ (custom)	275+ (custom)	115+ (custom)

All values are in concurrent batches. The actual raw cap at any moment is higher than the baseline (burst headroom on top) and scales further with sustained traffic. Enterprise baselines are negotiable upward; contact sales to discuss. Multi-region customers get the tier’s baseline in each region they hit.

How Earned Capacity Grows

Submit consistent traffic and your ceiling grows above the baseline. Reducto measures your submission rate over a short trailing window and sizes your ceiling for that rate plus burst headroom. When you stop submitting, the ceiling decays back toward the baseline over the same window. Bursty traffic gets less headroom than steady traffic at the same average rate.

Tenant Throttling

For multi-tenant applications, you can pass settings.tenant_throttling on parse requests to bound how much of your account’s concurrency a single one of your own customers, workspaces, or organizations can consume. Tag each request with the tenant it belongs to:

{
  "input": "https://example.com/document.pdf",
  "settings": {
    "tenant_throttling": {
      "tenant_id": "workspace_123",
      "max_share": 0.5
    }
  }
}

tenant_id — your identifier for the tenant. Requests with the same id share one tenant-level throttle inside your account.
max_share — the maximum fraction of your account’s concurrency ceiling this tenant may use, between 0 (exclusive) and 1. Optional; defaults to 0.5. You can pass different values for different tenants — for example 0.2 on a backfill tenant’s requests to keep more headroom for interactive traffic.

Your account-level concurrency throttle still applies first; the tenant throttle only divides capacity inside it. If tenant_throttling is omitted, Reducto uses the existing account-level behavior only.

Sync vs Async Under Throttle

Requests above your ceiling do not fail. They queue until a slot frees, then run. The wait surfaces differently depending on endpoint type:

Async (/parse_async, /extract_async, /split_async, /edit_async). The job is accepted immediately, a job_id returned, and the work queued. Latency shows up between submission and webhook delivery, never as a 4xx.
Sync (/parse, /extract, /split, /edit). The HTTP request blocks until a slot opens and the job completes. Total response time includes queue wait. The edge has a 15-minute (900s) hard timeout, so a sustained burst on sync endpoints risks the HTTP connection timing out before the job finishes.

For bursty workloads, use async with webhooks. Your client doesn’t hold open HTTP connections during queue wait.

import asyncio
from reducto import AsyncReducto

async def submit_burst(files: list[str]):
    client = AsyncReducto()
    jobs = await asyncio.gather(*[
        client.parse.run_job(input=f, async_={"webhook": {"mode": "svix"}})
        for f in files
    ])
    return [job.job_id for job in jobs]

Rate Limits

Per-second request caps at the API edge.

Async Processing

Submit jobs and receive results via webhook.

Batch Processing

Patterns for processing many documents.

​Your Ceiling

​Tier Baselines

​How Earned Capacity Grows

​Tenant Throttling

​Sync vs Async Under Throttle

​Related