API Quickstart - Reducto

This guide walks you through your first Reducto API call. You will parse a document and get back structured JSON ready for LLMs, downstream extraction, or any other processing step in your pipeline.

What we’re going to parse

We’ll use a financial statement PDF that contains multiple tables, headers, account summaries, and formatted text. This is the kind of complex document that’s difficult to process manually but straightforward with Reducto.

View the sample PDF in Studio or download it directly to follow along. What we want to extract:

The portfolio value table with beginning and ending values
Account information including account numbers and types
Income summary broken down by tax category
Top holdings with values and percentages

By the end of this guide, you’ll have all of this data in structured JSON that you can use in your application. For structured field extraction (e.g., extracting specific account numbers or values into typed fields), see the /extract endpoint after completing this quickstart.

Prerequisites

Create a Reducto account

Go to studio.reducto.ai and sign up for a free account.

Get your API key

In the Studio sidebar, click API Keys, then Create new API key. Give it a name and copy the key.

Reducto Studio sidebar showing API Keys option

Set your API key as an environment variable

This allows the SDK to authenticate automatically without hardcoding the key in your code.

macOS / Linux
Windows (PowerShell)

export REDUCTO_API_KEY="your_api_key_here"

$env:REDUCTO_API_KEY="your_api_key_here"

Setup with AI

You can also copy the below snippet for your AI coding agent to connect to Reducto via the MCP Server.

## Add Reducto MCP Server

### 1. Authenticate (one-time)

```bash
uvx mcp-server-reducto --login
```

This opens your browser to approve access. Your API key is saved to `~/.reducto/config.yaml`.

### 2. Add to your MCP client

**Claude Code:**
```bash
claude mcp add reducto -- uvx mcp-server-reducto
```

**Claude Desktop**: edit `~/Library/Application Support/Claude/claude_desktop_config.json`:
```json
{
  "mcpServers": {
    "reducto": {
      "command": "uvx",
      "args": ["mcp-server-reducto"]
    }
  }
}
```

**Cursor**: edit `.cursor/mcp.json`:
```json
{
  "mcpServers": {
    "reducto": {
      "command": "uvx",
      "args": ["mcp-server-reducto"]
    }
  }
}
```

**VS Code**: edit `.vscode/mcp.json`:
```json
{
  "servers": {
    "reducto": {
      "command": "uvx",
      "args": ["mcp-server-reducto"]
    }
  }
}
```

### 3. Use it

The server provides these tools:

| Tool | What it does |
|------|-------------|
| `upload_file` | Upload a local file or URL to Reducto (returns `reducto://` URL) |
| `parse_document` | Parse a document into structured text, tables, figures |
| `extract_data` | Extract structured JSON from a document using a schema |
| `split_document` | Segment a document into labeled sections |
| `classify_document` | Categorize a document type |
| `edit_document` | Fill forms or modify a PDF/DOCX |

**Local files:** Use `upload_file` first, e.g. `upload_file("./report.pdf")`, then pass the returned `reducto://` URL to other tools.

**Chain operations:** `parse_document` returns a `job_id`. Pass `jobid://<job_id>` to `extract_data` or `split_document` to skip re-parsing.

Install the SDK

Choose your language and install the Reducto SDK:

Python
Node.js
Go

pip install reductoai

Requires Python 3.8+.

npm install reductoai

go get github.com/reductoai/reducto-go-sdk

Parse the document

Now let’s write the code to parse our financial statement. We’ll go through each part step by step.

Python
Node.js
Go
cURL

Import the SDK and initialize the client

First, we import the Reducto client. When you create a Reducto() client without passing an API key, it automatically reads from the REDUCTO_API_KEY environment variable you set earlier.

from reducto import Reducto

# The client reads REDUCTO_API_KEY from your environment
client = Reducto()

Upload your document

Before parsing, you need to upload the document to Reducto’s servers. The upload() method accepts a file path (as a string) and returns a reference that you’ll use in the next step.You can download the sample PDF from here.

from pathlib import Path

# Upload the PDF file to Reducto
upload = client.upload(file=Path("fidelity-example.pdf"))
print(f"Uploaded: {upload}")

You can also pass a URL directly to the parse method if your document is already hosted somewhere accessible, like an S3 bucket:

result = client.parse.run(input="https://cdn.reducto.ai/samples/fidelity-example.pdf")

Parse the document

Now we call the parse.run() method with the uploaded file reference. This sends the document through Reducto’s processing pipeline, which runs OCR, detects layout, extracts tables, and structures everything into chunks.

# Parse the uploaded document
result = client.parse.run(input=upload)

# Check what we got back
print(f"Job ID: {result.job_id}")
print(f"Pages processed: {result.usage.num_pages}")
print(f"Credits used: {result.usage.credits}")
print(f"Number of chunks: {len(result.result.chunks)}")

Access the extracted content

The response contains chunks, which are logical sections of the document. Each chunk has a content field with the full text and a blocks field with individual elements like tables, headers, and paragraphs.

# Loop through each chunk
for i, chunk in enumerate(result.result.chunks):
    print(f"\n=== Chunk {i + 1} ===")
    print(chunk.content[:500])  # First 500 characters
    
    # Look at individual blocks within this chunk
    for block in chunk.blocks:
        print(f"  [{block.type}] on page {block.bbox.page}")
        
        # Tables are returned as HTML by default
        if block.type == "Table":
            print(f"  Table content: {block.content[:200]}...")

Each block has a type that tells you what kind of content it is: Title, Section Header, Text, Table, Figure, Key Value, and others. The bbox field contains the bounding box coordinates so you know exactly where on the page this content came from.

Complete code:

from pathlib import Path
from reducto import Reducto

client = Reducto()
upload = client.upload(file=Path("fidelity-example.pdf"))
result = client.parse.run(input=upload)

print(f"Processed {result.usage.num_pages} pages")

for chunk in result.result.chunks:
    print(chunk.content)
    for block in chunk.blocks:
        if block.type == "Table":
            print(f"Found table on page {block.bbox.page}")

All Node.js examples use await and must be run inside an async function, or in a file with top-level await enabled (ES modules with Node.js 14.8+).

Import the SDK and initialize the client

Import the Reducto client and the fs module for reading files. The client automatically uses the REDUCTO_API_KEY environment variable for authentication.

import Reducto from 'reductoai';
import fs from 'fs';

// The client reads REDUCTO_API_KEY from your environment
const client = new Reducto();

Upload your document

Use createReadStream to upload the file to Reducto. This returns a reference you’ll use when calling the parse endpoint.You can download the sample PDF from here.

// Upload the PDF file to Reducto
const upload = await client.upload({ 
  file: fs.createReadStream("fidelity-example.pdf") 
});
console.log(`Uploaded: ${upload}`);

Parse the document

Call parse.run() with the uploaded file reference. Reducto processes the document and returns structured content.

// Parse the uploaded document
const result = await client.parse.run({ input: upload });

console.log(`Job ID: ${result.job_id}`);
console.log(`Pages processed: ${result.usage.num_pages}`);
console.log(`Credits used: ${result.usage.credits}`);
console.log(`Number of chunks: ${result.result.chunks.length}`);

Access the extracted content

Loop through the chunks and blocks to access the extracted text, tables, and other elements.

// Loop through each chunk
for (let i = 0; i < result.result.chunks.length; i++) {
  const chunk = result.result.chunks[i];
  console.log(`\n=== Chunk ${i + 1} ===`);
  console.log(chunk.content.substring(0, 500));
  
  // Look at individual blocks within this chunk
  for (const block of chunk.blocks) {
    console.log(`  [${block.type}] on page ${block.bbox.page}`);
    
    if (block.type === "Table") {
      console.log(`  Table content: ${block.content.substring(0, 200)}...`);
    }
  }
}

Complete code:

import Reducto from 'reductoai';
import fs from 'fs';

const client = new Reducto();

async function main() {
  const upload = await client.upload({ 
    file: fs.createReadStream("fidelity-example.pdf") 
  });
  const result = await client.parse.run({ input: upload });
  
  console.log(`Processed ${result.usage.num_pages} pages`);
  
  for (const chunk of result.result.chunks) {
    console.log(chunk.content);
    for (const block of chunk.blocks) {
      if (block.type === "Table") {
        console.log(`Found table on page ${block.bbox.page}`);
      }
    }
  }
}

main();

The Go SDK is currently in alpha (v0.1.0-alpha.1). The API may change in future releases.

Import the SDK and initialize the client

Import the Reducto client and the option package for configuration. The Go SDK requires you to pass the API key explicitly using option.WithAPIKey().

package main

import (
    "context"
    "fmt"
    "io"
    "os"

    reducto "github.com/reductoai/reducto-go-sdk"
    "github.com/reductoai/reducto-go-sdk/option"
    "github.com/reductoai/reducto-go-sdk/shared"
)

func main() {
    // Initialize client with API key from environment
    client := reducto.NewClient(option.WithAPIKey(os.Getenv("REDUCTO_API_KEY")))
}

Upload your document

Open the file and upload it to Reducto. The upload returns a file ID that you’ll use for parsing.You can download the sample PDF from here.

file, err := os.Open("fidelity-example.pdf")
if err != nil {
    fmt.Printf("Error opening file: %v\n", err)
    return
}
defer file.Close()

upload, err := client.Upload(context.Background(), reducto.UploadParams{
    File: reducto.F[io.Reader](file),
})
if err != nil {
    fmt.Printf("Upload error: %v\n", err)
    return
}
fmt.Printf("Uploaded: %s\n", upload.FileID)

Parse the document

Call Parse.Run() with the file ID. The Go SDK requires you to wrap the file ID with shared.UnionString() and then with reducto.F[...]() because the SDK uses strongly-typed union parameters.

result, err := client.Parse.Run(context.Background(), reducto.ParseRunParams{
    ParseConfig: reducto.ParseConfigParam{
        // The file ID must be wrapped in shared.UnionString() and reducto.F[...]()
        DocumentURL: reducto.F[reducto.ParseConfigDocumentURLUnionParam](
            shared.UnionString(upload.FileID),
        ),
    },
})
if err != nil {
    fmt.Printf("Parse error: %v\n", err)
    return
}

fmt.Printf("Job ID: %s\n", result.JobID)
fmt.Printf("Pages: %d\n", result.Usage.NumPages)
// Note: To view in Studio, construct the URL: https://studio.reducto.ai/job/{job_id}

Access the extracted content

The result contains chunks with extracted content. The Chunks field is typed as interface{}, so you need to type assert it to []shared.ParseResponseResultFullResultChunk before you can iterate over it. When checking block types, use the SDK constants instead of string comparisons.

if result.Result.Type == shared.ParseResponseResultTypeFull {
    // Type assert Chunks from interface{} to the actual type
    chunks, ok := result.Result.Chunks.([]shared.ParseResponseResultFullResultChunk)
    if ok {
        for _, chunk := range chunks {
            fmt.Println(chunk.Content)
            
            for _, block := range chunk.Blocks {
                // Use SDK constants for block type comparisons
                if block.Type == shared.ParseResponseResultFullResultChunksBlocksTypeTable {
                    fmt.Printf("Found table on page %d\n", block.Bbox.Page)
                }
            }
        }
    }
}

Complete code:

package main

import (
    "context"
    "fmt"
    "io"
    "os"

    reducto "github.com/reductoai/reducto-go-sdk"
    "github.com/reductoai/reducto-go-sdk/option"
    "github.com/reductoai/reducto-go-sdk/shared"
)

func main() {
    client := reducto.NewClient(option.WithAPIKey(os.Getenv("REDUCTO_API_KEY")))

    file, _ := os.Open("fidelity-example.pdf")
    defer file.Close()

    upload, _ := client.Upload(context.Background(), reducto.UploadParams{
        File: reducto.F[io.Reader](file),
    })

    result, _ := client.Parse.Run(context.Background(), reducto.ParseRunParams{
        ParseConfig: reducto.ParseConfigParam{
            DocumentURL: reducto.F[reducto.ParseConfigDocumentURLUnionParam](
                shared.UnionString(upload.FileID),
            ),
        },
    })

    fmt.Printf("Processed %d pages\n", result.Usage.NumPages)

    if result.Result.Type == shared.ParseResponseResultTypeFull {
        chunks, _ := result.Result.Chunks.([]shared.ParseResponseResultFullResultChunk)
        for _, chunk := range chunks {
            fmt.Println(chunk.Content)
        }
    }
}

If you prefer not to use an SDK, you can call the API directly with cURL or any HTTP client.

Upload your document

First, upload the file to get a file reference:

curl -X POST "https://platform.reducto.ai/upload" \
  -H "Authorization: Bearer $REDUCTO_API_KEY" \
  -F "file=@fidelity-example.pdf"

This returns a JSON response with a file_id:

{"file_id": "reducto://abc123def456.pdf"}

Parse the document

Use the file_id from the previous step as the input parameter:

curl -X POST "https://platform.reducto.ai/parse" \
  -H "Authorization: Bearer $REDUCTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": "reducto://abc123def456.pdf"}'

You can also skip the upload step if your document is already hosted at a public URL:

curl -X POST "https://platform.reducto.ai/parse" \
  -H "Authorization: Bearer $REDUCTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": "https://cdn.reducto.ai/samples/fidelity-example.pdf"}'

Understanding the response

Here’s what we got back from parsing our financial statement:

{
  "job_id": "5df31070-8d98-4caa-9a5b-c5c511a03f71",
  "duration": 11.35,
  "usage": {
    "num_pages": 3,
    "credits": 4.0
  },
  "result": {
    "chunks": [
      {
        "content": "# *** SAMPLE STATEMENT ***\nFor informational purposes only\n\nFidelity\nINVESTMENTS\n\n## Your Portfolio Value:\n\n$274,222.20\n\n|                                   | This Period   | Year-to-Date   |\n|-|-|-|\n| Beginning Portfolio Value         | $253,221.83   | $232,643.16    |\n| Additions                         | 59,269.64     | 121,433.55     |...",
        "blocks": [
          {
            "type": "Title",
            "content": "*** SAMPLE STATEMENT ***\nFor informational purposes only",
            "bbox": {"page": 1, "left": 0.351, "top": 0.029, "width": 0.296, "height": 0.057},
            "confidence": "high"
          },
          {
            "type": "Section Header",
            "content": "Your Portfolio Value:",
            "bbox": {"page": 1, "left": 0.517, "top": 0.163, "width": 0.153, "height": 0.015},
            "confidence": "high"
          },
          {
            "type": "Table",
            "content": "|                                   | This Period   | Year-to-Date   |\n|-|-|-|\n| Beginning Portfolio Value         | $253,221.83   | $232,643.16    |\n| Additions                         | 59,269.64     | 121,433.55     |\n| Subtractions                      | -45,430.74    | -98,912.58     |\n| Transaction Costs, Fees & Charges | -139.77       | -625.87        |\n| Change in Investment Value*       | 7,161.47      | 19,058.07      |\n| Ending Portfolio Value**          | $274,222.20   | $274,222.20    |",
            "bbox": {"page": 1, "left": 0.516, "top": 0.261, "width": 0.444, "height": 0.158},
            "confidence": "high"
          }
        ]
      }
    ]
  },
  "studio_link": "https://studio.reducto.ai/job/5df31070-8d98-4caa-9a5b-c5c511a03f71"
}

Key fields:

Field	What it contains
`job_id`	Unique identifier for this job. Use it to retrieve results later or debug in Studio.
`usage.num_pages`	Number of pages that were processed.
`usage.credits`	Credits consumed by this request.
`chunks`	Logical sections of the document, optimized for feeding into LLMs.
`chunks[].content`	The full text content of this chunk.
`chunks[].blocks`	Individual elements (tables, headers, text) with their types and positions.
`blocks[].type`	What kind of element this is: `Title`, `Table`, `Section Header`, `Text`, `Figure`, etc.
`blocks[].bbox`	Bounding box with normalized coordinates (0-1) showing where this element appears on the page.
`studio_link`	Direct link to view this job in Reducto Studio for visual debugging.

Customizing the output

The default settings work well for most documents, but you can customize the parsing behavior for specific use cases.

Python
Node.js
Go
cURL

You can pass configuration options as TypedDict imports from reducto.types or as plain dictionaries:

from reducto.types import EnhanceParam, FormattingParam, SettingsParam

result = client.parse.run(
    input=upload,
    enhance=EnhanceParam(
        # Use AI to clean up OCR errors in scanned documents
        agentic=[{"scope": "text"}],
        # Generate descriptions for charts and images
        summarize_figures=True
    ),
    formatting=FormattingParam(
        # Get tables as HTML, md, json, or csv
        table_output_format="md"
    ),
    settings=SettingsParam(
        # Only process pages 1-5
        page_range={"start": 1, "end": 5}
    )
)

You can also pass plain dictionaries instead of TypedDict imports. Both work identically.

const result = await client.parse.run({
  input: upload,
  enhance: {
    agentic: [{scope: "text"}],
    summarize_figures: true
  },
  formatting: {
    table_output_format: "md"
  },
  settings: {
    page_range: {start: 1, end: 5}
  }
});

result, err := client.Parse.Run(context.Background(), reducto.ParseRunParams{
    ParseConfig: reducto.ParseConfigParam{
        DocumentURL: reducto.F[reducto.ParseConfigDocumentURLUnionParam](
            shared.UnionString(upload.FileID),
        ),
        // Output formatting - use SDK constants for table format
        AdvancedOptions: reducto.F(shared.AdvancedProcessingOptionsParam{
            TableOutputFormat: reducto.F(shared.AdvancedProcessingOptionsTableOutputFormatMd),
        }),
        // Chunking options
        Options: reducto.F(shared.BaseProcessingOptionsParam{
            Chunking: reducto.F(shared.BaseProcessingOptionsChunkingParam{
                ChunkMode: reducto.F(shared.BaseProcessingOptionsChunkingChunkModeVariable),
            }),
        }),
    },
})

The Go SDK uses different parameter names than Python and Node.js:

Python/Node.js	Go SDK
`formatting.table_output_format`	`AdvancedOptions.TableOutputFormat`
`settings.page_range`	`AdvancedOptions.PageRange`
`retrieval.chunking.chunk_mode`	`Options.Chunking.ChunkMode`

Use SDK constants like AdvancedProcessingOptionsTableOutputFormatMd instead of strings, and wrap values with reducto.F().

curl -X POST "https://platform.reducto.ai/parse" \
  -H "Authorization: Bearer $REDUCTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "reducto://abc123def456.pdf",
    "enhance": {
      "agentic": [{"scope": "text"}],
      "summarize_figures": true
    },
    "formatting": {
      "table_output_format": "md"
    },
    "settings": {
      "page_range": {"start": 1, "end": 5}
    }
  }'

What these options do:

enhance.agentic: Runs AI-powered cleanup on the specified scope. Use "text" for OCR correction on scanned documents, or "table" to improve table structure detection.
enhance.summarize_figures: Generates natural language descriptions of charts, graphs, and images. Useful for RAG pipelines where you need to search figure content.
formatting.table_output_format: Controls how tables are returned. Options are html, md (markdown), json, csv, dynamic (default, returns markdown for simple tables and HTML for complex ones), or jsonbbox.
settings.page_range: Limits processing to specific pages. Useful for large documents where you only need certain sections.

For the full list of options, see the Parse configuration reference.

What’s next

Now that you can parse documents, explore the other Reducto endpoints:

/extract

Define a JSON schema and extract specific fields from your documents.

/split

Divide long documents into sections based on content type.

/edit

Fill PDF forms and modify DOCX documents programmatically.

/parse (async)

Process documents asynchronously with webhooks for high-volume workloads.

Troubleshooting

401 Unauthorized error

This means your API key is missing or invalid. Check that the REDUCTO_API_KEY environment variable is set correctly and that the key hasn’t expired in Studio.

Tables aren't structured correctly

Some complex tables need extra help. Enable enhance.agentic with [{"scope": "table"}] for AI-powered table reconstruction, or try formatting.table_output_format set to "html" or "json" for more structured output.

Content is missing or garbled

For scanned documents or low-quality PDFs, enable the agentic text enhancement: enhance.agentic: [{"scope": "text"}]. If the document is password-protected, pass the password in settings.document_password. This may also be due to bad metadata polluting the output, in which case, reach out to Reducto support.

Every response includes a studio_link that opens the job in Reducto Studio. Use it to visually inspect what was extracted and debug any issues.

Documentation Index

​What we’re going to parse

​Prerequisites

​Install the SDK

​Parse the document

​Understanding the response

​Customizing the output

​What’s next

/extract

/split

/edit

/parse (async)

​Troubleshooting

What we’re going to parse

Prerequisites

Install the SDK

Parse the document

Understanding the response

Customizing the output

What’s next

Troubleshooting