Skip to content

OCR — Document Recognition

Overview

The OCR service provides universal document recognition with interchangeable providers.

Available Providers:

Provider Type Default Structured Extraction Description
Surya OCR Local (GPU) Yes No GPU-accelerated local OCR — no cloud service, no API costs
Mistral OCR 3 Cloud API No Yes (JSON Schema) Mistral AI Cloud OCR with structured data extraction
  • Prefix: /api/v1/ocr/
  • Permissions: ocr.read, ocr.process
  • Provider Architecture: Abstract Factory — new providers can be added without API changes

Supported Formats

Format Surya Mistral
PDF
PNG
JPEG
WebP
GIF
BMP
TIFF
AVIF
DOCX
PPTX

Max. file size: 50 MB

Endpoints

List Providers

GET /api/v1/ocr/providers

Permission: ocr.read

Returns all registered OCR providers with availability status.

Response:

[
  {
    "type": "surya",
    "name": "Surya OCR (Local)",
    "description": "Local document recognition with GPU acceleration — no cloud service",
    "supported_formats": ["pdf", "png", "jpg", "jpeg", "webp", "gif", "bmp", "tiff"],
    "supports_structured": false,
    "available": true,
    "is_default": true
  },
  {
    "type": "mistral",
    "name": "Mistral OCR 3 (Cloud)",
    "description": "Mistral AI document recognition — Cloud API, structured extraction",
    "supported_formats": ["pdf", "png", "jpg", "jpeg", "webp", "gif", "bmp", "tiff", "avif", "docx", "pptx"],
    "supports_structured": true,
    "available": true,
    "is_default": false
  }
]

Process Document

POST /api/v1/ocr/process

Permission: ocr.process

OCR processing of a document. Returns Markdown per page.

Request Body:

{
  "document": {
    "type": "url",
    "data": "https://example.com/invoice.pdf",
    "filename": "invoice.pdf",
    "mime_type": "application/pdf"
  },
  "provider": "surya",
  "pages": [0, 1],
  "table_format": "markdown",
  "include_images": false,
  "extract_headers": true,
  "extract_footers": true,
  "language_hint": "de"
}

Input Types (document.type):

Type Description data field
url Document URL HTTPS URL to the document
base64 Base64-encoded Base64 string of the document
file_id Mistral File ID ID of an uploaded file (Mistral only)

Options:

Field Type Default Description
provider string surya OCR provider (surya or mistral)
pages int[] all Specific pages (0-based)
table_format string null markdown or html
include_images bool false Extract images as Base64 (Mistral only)
extract_headers bool false Extract headers (Mistral only)
extract_footers bool false Extract footers (Mistral only)
language_hint string null Language hint (e.g. de)

Response:

{
  "pages": [
    {
      "index": 0,
      "markdown": "# Invoice No. 2025-001\n\n| Item | Amount |\n|---|---|\n| Hosting | 29.90 EUR |",
      "images": [],
      "header": null,
      "footer": null,
      "dimensions": {
        "dpi": 300,
        "height": 3507,
        "width": 2480
      }
    }
  ],
  "usage": {
    "pages_processed": 1,
    "provider": "surya",
    "model": "surya"
  },
  "document_annotation": null,
  "processed_at": "2026-03-12T14:30:00Z"
}

Structured Extraction

POST /api/v1/ocr/process/structured

Permission: ocr.process

Extracts structured data based on a JSON schema (e.g. invoice data).

Mistral Only

Structured extraction is only available with the Mistral provider. Surya returns null as document_annotation.

Request Body:

{
  "document": {
    "type": "url",
    "data": "https://example.com/invoice.pdf"
  },
  "provider": "mistral",
  "schema_definition": {
    "name": "invoice",
    "schema": {
      "type": "object",
      "properties": {
        "invoice_number": {"type": "string"},
        "date": {"type": "string"},
        "total": {"type": "number"},
        "line_items": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "description": {"type": "string"},
              "amount": {"type": "number"}
            }
          }
        }
      }
    }
  },
  "extraction_prompt": "Extract all invoice data"
}

Upload File + OCR

POST /api/v1/ocr/upload

Permission: ocr.process

Upload a file directly (multipart) and process OCR. Ideal for frontend integration.

Form Parameters:

Field Type Required Description
file File yes Document (max. 50 MB)
provider string no Provider (default: surya)
table_format string no markdown or html
include_images bool no Extract images

cURL Example:

# Surya (local, default)
curl -X POST https://platform.xynap.tech/api/v1/ocr/upload \
  -H "Authorization: Bearer <token>" \
  -F "file=@invoice.pdf"

# Mistral (cloud)
curl -X POST https://platform.xynap.tech/api/v1/ocr/upload \
  -H "Authorization: Bearer <token>" \
  -F "file=@invoice.pdf" \
  -F "provider=mistral" \
  -F "table_format=markdown"

Architecture

Provider Pattern

BaseOcrProvider (Abstract)
  |
  +-- SuryaOcrProvider        (local, GPU — default)
  +-- MistralOcrProvider      (mistral-ocr-latest, cloud)
  +-- [Future Providers]      (e.g. Google Document AI, PaddleOCR)

New providers implement BaseOcrProvider with two methods:

  • process() — Standard OCR (Markdown output)
  • process_structured() — OCR with JSON schema extraction

Files

File Description
app/core/ocr/base.py Abstract provider interface
app/core/ocr/factory.py Provider factory
app/core/ocr/surya.py Surya OCR (local, GPU)
app/core/ocr/mistral.py Mistral OCR 3 (cloud)
app/core/ocr/schemas.py Pydantic request/response models
app/core/ocr/router.py FastAPI endpoints

Configuration

Surya (Local)

Surya does not require an API key. GPU settings:

TORCH_DEVICE=cuda              # GPU backend (cuda/cpu)
RECOGNITION_BATCH_SIZE=128     # OCR batch size (RTX 4000: 128)
DETECTOR_BATCH_SIZE=18         # Detection batch size

Models are automatically downloaded on first call (~1.5 GB) and cached under /root/.cache/datalab. In the Docker container, /var/lib/xynap/surya-models is mounted as a persistent volume.

Mistral (Cloud)

The Mistral API key is loaded from the following sources (priority order):

  1. Secret Store: ocr.mistral_api_key
  2. Environment variable: MISTRAL_API_KEY

Permissions

Permission Roles Description
ocr.read admin, reseller, customer, user List providers
ocr.process admin, reseller, customer, user Process documents

super_admin

Super admins have implicit access to all permissions.

Adding a New Provider

  1. Create a provider class (inherits from BaseOcrProvider)
  2. Extend the OcrProviderType enum in schemas.py
  3. Extend the factory in factory.py
  4. Provide API key via Secret Store or environment variable (if cloud provider)
  5. Add provider info to list_providers()
  6. Extend tests
  7. Update service register (/home/admin/docs/services/ocr.md)