Extract Response Format

Extract returns your extracted data as structured JSON matching your schema. The response format differs depending on whether citations are enabled.

Response Structure

Without Citations (Default)

When citations are disabled (the default), result contains an array of objects with your extracted values directly:

{
  "job_id": "9531166f-9725-4854-8096-459785a33972",
  "result": [
    {
      "invoice_number": "INV-2024-001",
      "total": 1575.00,
      "line_items": [
        {
          "description": "Professional Services",
          "quantity": 10,
          "amount": 1500.00
        },
        {
          "description": "Materials",
          "quantity": 1,
          "amount": 75.00
        }
      ]
    }
  ],
  "usage": {
    "num_pages": 1,
    "num_fields": 8,
    "credits": 8.0
  },
  "studio_link": "https://studio.reducto.ai/job/9531166f-..."
}

Top-Level Fields

Field	Type	Description
`job_id`	string	Unique identifier for this extraction job. Use this to retrieve results later or reference in support requests.
`result`	array or object	Without citations: an array containing your extracted data. With citations: an object with wrapped values.
`usage.num_pages`	integer	Number of document pages processed.
`usage.num_fields`	integer	Total number of fields extracted, including nested fields in arrays.
`usage.credits`	number	Credits consumed for this extraction.
`studio_link`	string	Link to view and debug this extraction in Reducto Studio.

Accessing Values

Without Citations

When citations are disabled, access values directly from the result array:

# Access the first (usually only) result object
data = result.result[0]

# Access scalar fields directly
invoice_number = data["invoice_number"]
total = data["total"]

# Access array items
for item in data["line_items"]:
    print(f"{item['description']}: ${item['amount']}")

With Citations

When citations are enabled, values are wrapped in objects with value and citations fields:

# With citations, result is a dict (not an array)
invoice_number = result.result["invoice_number"].value
total = result.result["total"].value

# Access array items
for item in result.result["line_items"]:
    print(f"{item['description'].value}: ${item['amount'].value}")

When a field cannot be extracted, it may appear as null or be absent entirely, depending on whether it was marked as required in your schema.

Citations

When settings.citations.enabled is true, the response format changes. The result becomes an object (not an array), and each value is wrapped with citation data:

{
  "result": {
    "total": {
      "value": 1575.00,
      "citations": [
        {
          "type": "Table",
          "content": "Total Due: $1,575.00",
          "bbox": {
            "left": 0.65,
            "top": 0.82,
            "width": 0.25,
            "height": 0.03,
            "page": 1,
            "original_page": 1
          },
          "confidence": "high",
          "granular_confidence": {
            "extract_confidence": 0.95,
            "parse_confidence": 0.91
          },
          "parentBlock": {
            "type": "Table",
            "content": "Invoice Total\nTotal Due: $1,575.00",
            "bbox": {"left": 0.60, "top": 0.78, "width": 0.35, "height": 0.08, "page": 1}
          }
        }
      ]
    }
  }
}

Citation Fields

Field	Description
`type`	Block type where the value was found: `Text`, `Table`, `Key Value`, etc.
`content`	The source text from which the value was extracted. May differ slightly from the extracted value due to formatting normalization.
`bbox`	Bounding box coordinates for the source location.
`confidence`	Overall confidence as `"high"` or `"low"`.
`granular_confidence`	Detailed confidence breakdown with `extract_confidence` (0-1) and `parse_confidence` (0-1).
`parentBlock`	The larger Parse block containing this citation. Useful for context when the citation is very granular.

Bounding Box Coordinates

All coordinates are normalized to the range [0, 1] relative to page dimensions:

Field	Description
`left`	Distance from the left edge. 0 is the left margin, 1 is the right margin.
`top`	Distance from the top edge. 0 is the top, 1 is the bottom.
`width`	Width as a fraction of page width.
`height`	Height as a fraction of page height.
`page`	Page number (1-indexed) in the processed document.
`original_page`	Page number in the original document. Differs from `page` when using `page_range` to process a subset.

To convert to pixel coordinates, multiply by the page dimensions:

# If your page is 612x792 pixels (standard letter)
bbox = citation.bbox
pixel_left = bbox.left * 612
pixel_top = bbox.top * 792
pixel_width = bbox.width * 612
pixel_height = bbox.height * 792

Array Citations

For array fields, each item in the array has its own citations. The structure mirrors the data:

{
  "line_items": [
    {
      "description": {
        "value": "Professional Services",
        "citations": [{"bbox": {...}, "content": "Professional Services", ...}]
      },
      "amount": {
        "value": 1500.00,
        "citations": [{"bbox": {...}, "content": "$1,500.00", ...}]
      }
    },
    {
      "description": {
        "value": "Materials",
        "citations": [{"bbox": {...}, "content": "Materials", ...}]
      },
      "amount": {
        "value": 75.00,
        "citations": [{"bbox": {...}, "content": "$75.00", ...}]
      }
    }
  ]
}

Each field within each array item has its own citation pointing to where that specific value was found.

Spreadsheet Citations

Excel and other spreadsheet formats use a different coordinate system because they have cells, not continuous pages.

Coordinate Differences

Aspect	PDFs/Images	Spreadsheets
Coordinate system	Normalized 0-1 range	Cell positions (1-indexed)
`left`	Fraction of page width	Column number (1 = A, 2 = B, etc.)
`top`	Fraction of page height	Row number
`width`	Fraction of page width	Number of columns spanned
`height`	Fraction of page height	Number of rows spanned
`page`	Page number	Sheet index (1 = first sheet)

Example Spreadsheet Citation

{
  "bbox": {
    "left": 2,       // Column B
    "top": 5,        // Row 5
    "width": 1,      // Single column
    "height": 1,     // Single row
    "page": 1,       // First sheet
    "original_page": 1
  }
}

This citation points to cell B5 on the first sheet. The coordinates map directly to Excel’s A1 notation, making it straightforward to locate the source cell programmatically.

Confidence Scores

Confidence indicates how certain the extraction is about a value. Each citation includes both summary and detailed confidence information.

Summary Confidence

The confidence field provides a quick assessment:

"confidence": "high"

Values are either "high" or "low" based on internal thresholds.

Granular Confidence

The granular_confidence object provides detailed numerical scores:

"granular_confidence": {
  "extract_confidence": 0.95,
  "parse_confidence": 0.91
}

Score	Description
`extract_confidence`	How confident the extraction LLM is about this value (0-1). May be `null` for array items.
`parse_confidence`	How confident the parsing stage was about the source text (0-1). Reflects OCR and layout detection quality.

Use granular confidence when you need to set custom thresholds or debug extraction issues. Low parse_confidence suggests the source document may have OCR or layout problems. Low extract_confidence suggests the schema description may need refinement.

Usage and Credits

The usage object shows what was processed and what it cost:

{
  "usage": {
    "num_pages": 3,
    "num_fields": 24,
    "credits": 12.0
  }
}

Field	Description
`num_pages`	Document pages that were processed. Affected by `page_range` settings.
`num_fields`	Total leaf fields extracted. A schema with 5 scalar fields and an array of 10 objects with 2 fields each would report 25 fields.
`credits`	Credits charged. Based on pages processed plus complexity factors like agentic modes and latency optimization.

Credit calculation varies based on:

Number of pages processed
Whether agentic parsing modes were used
Whether optimize_for_latency was enabled (2x multiplier)
Spreadsheet complexity (cell count for Excel files)

See Credit Usage for detailed pricing.

Complete Example

Full response with citations enabled

{
  "job_id": "543d1950-068c-4e38-981d-98903326b554",
  "result": {
    "invoice_number": {
      "value": "INV-2024-001",
      "citations": [
        {
          "type": "Text",
          "content": "Invoice #INV-2024-001",
          "bbox": {"left": 0.70, "top": 0.08, "width": 0.20, "height": 0.02, "page": 1, "original_page": 1},
          "confidence": "high",
          "granular_confidence": {"extract_confidence": 0.98, "parse_confidence": 0.95}
        }
      ]
    },
    "date": {
      "value": "2024-01-15",
      "citations": [
        {
          "type": "Text",
          "content": "Date: January 15, 2024",
          "bbox": {"left": 0.70, "top": 0.11, "width": 0.15, "height": 0.02, "page": 1, "original_page": 1},
          "confidence": "high",
          "granular_confidence": {"extract_confidence": 0.96, "parse_confidence": 0.94}
        }
      ]
    },
    "total": {
      "value": 1575.00,
      "citations": [
        {
          "type": "Table",
          "content": "Total: $1,575.00",
          "bbox": {"left": 0.75, "top": 0.85, "width": 0.15, "height": 0.02, "page": 1, "original_page": 1},
          "confidence": "high",
          "granular_confidence": {"extract_confidence": 0.97, "parse_confidence": 0.91}
        }
      ]
    },
    "line_items": [
      {
        "description": {
          "value": "Professional Services",
          "citations": [
            {
              "type": "Table",
              "content": "Professional Services",
              "bbox": {"left": 0.10, "top": 0.45, "width": 0.35, "height": 0.02, "page": 1, "original_page": 1},
              "confidence": "high",
              "granular_confidence": {"extract_confidence": null, "parse_confidence": 0.93}
            }
          ]
        },
        "amount": {
          "value": 1500.00,
          "citations": [
            {
              "type": "Table",
              "content": "$1,500.00",
              "bbox": {"left": 0.78, "top": 0.45, "width": 0.12, "height": 0.02, "page": 1, "original_page": 1},
              "confidence": "high",
              "granular_confidence": {"extract_confidence": null, "parse_confidence": 0.93}
            }
          ]
        }
      },
      {
        "description": {
          "value": "Materials",
          "citations": [
            {
              "type": "Table",
              "content": "Materials",
              "bbox": {"left": 0.10, "top": 0.48, "width": 0.20, "height": 0.02, "page": 1, "original_page": 1},
              "confidence": "high",
              "granular_confidence": {"extract_confidence": null, "parse_confidence": 0.91}
            }
          ]
        },
        "amount": {
          "value": 75.00,
          "citations": [
            {
              "type": "Table",
              "content": "$75.00",
              "bbox": {"left": 0.78, "top": 0.48, "width": 0.10, "height": 0.02, "page": 1, "original_page": 1},
              "confidence": "high",
              "granular_confidence": {"extract_confidence": null, "parse_confidence": 0.91}
            }
          ]
        }
      }
    ]
  },
  "usage": {
    "num_pages": 1,
    "num_fields": 8,
    "credits": 6.0
  },
  "studio_link": "https://studio.reducto.ai/job/543d1950-068c-4e38-981d-98903326b554"
}

Extract Overview

Quick start and parameters.

Citations Guide

Working with source locations.

Array Extraction

Handle long documents with repeating data.

Best Practices

Schema design and prompt tips.

Documentation Index

​Response Structure

​Without Citations (Default)

​Top-Level Fields

​Accessing Values

​Without Citations

​With Citations

​Citations

​Citation Fields

​Bounding Box Coordinates

​Array Citations

​Spreadsheet Citations

​Coordinate Differences

​Example Spreadsheet Citation

​Confidence Scores

​Summary Confidence

​Granular Confidence

​Usage and Credits

​Complete Example

​Related