Skip to main content
POST
/
extract
Extract
import requests

url = "https://platform.reducto.ai/extract"

payload = {
    "input": "<string>",
    "parsing": {
        "enhance": {
            "agentic": [],
            "summarize_figures": True
        },
        "retrieval": {
            "chunking": { "chunk_mode": "disabled" },
            "embedding_optimized": False,
            "filter_blocks": []
        },
        "formatting": {
            "add_page_markers": False,
            "include": [],
            "merge_tables": False,
            "table_output_format": "dynamic"
        },
        "spreadsheet": {
            "clustering": "accurate",
            "exclude": [],
            "include": [],
            "split_large_tables": {
                "enabled": True,
                "size": 50
            }
        },
        "settings": {
            "embed_pdf_metadata": False,
            "force_url_result": False,
            "ocr_system": "standard",
            "persist_results": False,
            "return_images": [],
            "return_ocr_data": False
        }
    },
    "instructions": {
        "schema": {},
        "system_prompt": "Be precise and thorough."
    },
    "settings": {
        "include_images": False,
        "optimize_for_latency": False,
        "array_extract": False,
        "citations": {
            "enabled": False,
            "numerical_confidence": True
        }
    }
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.json())
{
  "usage": {
    "num_pages": 123,
    "num_fields": 123,
    "credits": 123
  },
  "result": "<unknown>",
  "job_id": "<string>",
  "studio_link": "<string>"
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
  • SyncExtractConfig
  • AsyncExtractConfig
input
required

For parse/split/extract pipelines, the URL of the document to be processed. You can provide one of the following: 1. A publicly available URL 2. A presigned S3 URL 3. A reducto:// prefixed URL obtained from the /upload endpoint after directly uploading a document 4. A jobid:// prefixed URL obtained from a previous /parse invocation

For edit pipelines, this should be a string containing the edit instructions
parsing
ParseOptions · object

The configuration options for parsing the document. If you are passing in a jobid:// URL for the file, then this configuration will be ignored.

instructions
Instructions · object

The instructions to use for the extraction.

settings
ExtractSettings · object

The settings to use for the extraction.

Response

Successful Response

  • V3ExtractResponse
  • AsyncExtractResponse
usage
ExtractUsage · object
required
result
required

The extracted response in your provided schema. This is a list of dictionaries. If disable_chunking is True (default), then it will be a list of length one.

job_id
string | null

The link to the studio pipeline for the document.