🚀 Our new and improved config V3 is now live! See API reference for details.
import requests
url = "https://platform.reducto.ai/parse_async"
payload = {
"input": "<string>",
"async": { "priority": False },
"enhance": {
"agentic": [],
"summarize_figures": True
},
"retrieval": {
"chunking": { "chunk_mode": "disabled" },
"filter_blocks": [],
"embedding_optimized": False
},
"formatting": {
"add_page_markers": False,
"table_output_format": "dynamic",
"merge_tables": False,
"include": []
},
"spreadsheet": {
"split_large_tables": {
"enabled": True,
"size": 50
},
"include": [],
"clustering": "accurate",
"exclude": []
},
"settings": {
"ocr_system": "standard",
"force_url_result": False,
"return_ocr_data": False,
"return_images": [],
"embed_pdf_metadata": False,
"persist_results": False
}
}
headers = {
"Authorization": "Bearer <token>",
"Content-Type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
print(response.json()){
"job_id": "<string>"
}import requests
url = "https://platform.reducto.ai/parse_async"
payload = {
"input": "<string>",
"async": { "priority": False },
"enhance": {
"agentic": [],
"summarize_figures": True
},
"retrieval": {
"chunking": { "chunk_mode": "disabled" },
"filter_blocks": [],
"embedding_optimized": False
},
"formatting": {
"add_page_markers": False,
"table_output_format": "dynamic",
"merge_tables": False,
"include": []
},
"spreadsheet": {
"split_large_tables": {
"enabled": True,
"size": 50
},
"include": [],
"clustering": "accurate",
"exclude": []
},
"settings": {
"ocr_system": "standard",
"force_url_result": False,
"return_ocr_data": False,
"return_images": [],
"embed_pdf_metadata": False,
"persist_results": False
}
}
headers = {
"Authorization": "Bearer <token>",
"Content-Type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
print(response.json()){
"job_id": "<string>"
}Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
For parse/split/extract pipelines, the URL of the document to be processed. You can provide one of the following: 1. A publicly available URL 2. A presigned S3 URL 3. A reducto:// prefixed URL obtained from the /upload endpoint after directly uploading a document 4. A jobid:// prefixed URL obtained from a previous /parse invocation
For edit pipelines, this should be a string containing the edit instructionsThe configuration options for asynchronous processing (default synchronous).
Show child attributes
JSON metadata included in webhook request body. Defaults to None.
If True, attempts to process the job with priority if the user has priority processing budget available; by default, sync jobs are prioritized above async jobs.
Show child attributes
Agentic uses vision language models to enhance the accuracy of the output of different types of extraction. This will incur a cost and latency increase.
If True, summarize figures using a small vision language model. Defaults to True.
Show child attributes
Show child attributes
Choose how to partition chunks. Variable mode chunks by character length and visual context. Section mode chunks by section headers. Page mode chunks according to pages. Page sections mode chunks first by page, then by sections within each page. Disabled returns one single chunk.
variable, section, page, disabled, block, page_sections The approximate size of chunks (in characters) that the document will be split into. Defaults to null, in which case the chunk size is variable between 250 - 1500 characters.
A list of block types to filter out from 'content' and 'embed' fields. By default, no blocks are filtered.
Header, Footer, Title, Section Header, Page Number, List Item, Figure, Table, Key Value, Text, Comment, Signature If True, use embedding optimized mode. Defaults to False.
Show child attributes
If True, add page markers to the output. Defaults to False. Useful for extracting data with page specific information.
The mode to use for table output. Defaults to dynamic, which returns md for simpler tables and html for more complex tables.
html, json, md, jsonbbox, dynamic, csv A flag to indicate if consecutive tables with the same number of columns should be merged. Defaults to False.
A list of formatting to include in the output. [insert description of each option here later]
change_tracking, highlight, comments, hyperlinks Show child attributes
Whether to include cell color and formula information in the output.
cell_colors, formula In a spreadsheet with different tables inside, we enable splitting up the tables by default. Accurate mode applies more powerful models for superior accuracy, at 5× the default per-cell rate. Disabling will register as one large table.
accurate, fast, disabled Whether to exclude hidden sheets, rows, or columns in the output.
hidden_sheets, hidden_rows, hidden_cols Show child attributes
Standard is our best multilingual OCR system. Legacy only supports germanic languages and is available for backwards compatibility.
standard, legacy Force the result to be returned in URL form.
Force the URL to be downloaded as a specific file extension (e.g. .png).
If True, return OCR data in the result. Defaults to False.
Whether to return images for the specified block types. By default, no images are returned.
figure, table If True, embed OCR metadata into the returned PDF. Defaults to False.
If True, persist the results indefinitely. Defaults to False.
The timeout for the job in seconds.
Password to decrypt password-protected documents.
Successful Response
Was this page helpful?