OCR — Document Recognition¶

Overview¶

The OCR service provides universal document recognition with interchangeable providers.

Available Providers:

Provider	Type	Default	Structured Extraction	Description
Surya OCR	Local (GPU)	Yes	No	GPU-accelerated local OCR — no cloud service, no API costs
Mistral OCR 3	Cloud API	No	Yes (JSON Schema)	Mistral AI Cloud OCR with structured data extraction

Prefix: /api/v1/ocr/
Permissions: ocr.read, ocr.process
Provider Architecture: Abstract Factory — new providers can be added without API changes

Supported Formats¶

Format	Surya	Mistral
PDF	✔	✔
PNG	✔	✔
JPEG	✔	✔
WebP	✔	✔
GIF	✔	✔
BMP	✔	✔
TIFF	✔	✔
AVIF	—	✔
DOCX	—	✔
PPTX	—	✔

Max. file size: 50 MB

Endpoints¶

List Providers¶

GET /api/v1/ocr/providers

Permission: ocr.read

Returns all registered OCR providers with availability status.

Response:

[
  {
    "type": "surya",
    "name": "Surya OCR (Local)",
    "description": "Local document recognition with GPU acceleration — no cloud service",
    "supported_formats": ["pdf", "png", "jpg", "jpeg", "webp", "gif", "bmp", "tiff"],
    "supports_structured": false,
    "available": true,
    "is_default": true
  },
  {
    "type": "mistral",
    "name": "Mistral OCR 3 (Cloud)",
    "description": "Mistral AI document recognition — Cloud API, structured extraction",
    "supported_formats": ["pdf", "png", "jpg", "jpeg", "webp", "gif", "bmp", "tiff", "avif", "docx", "pptx"],
    "supports_structured": true,
    "available": true,
    "is_default": false
  }
]

Process Document¶

POST /api/v1/ocr/process

Permission: ocr.process

OCR processing of a document. Returns Markdown per page.

Request Body:

{
  "document": {
    "type": "url",
    "data": "https://example.com/invoice.pdf",
    "filename": "invoice.pdf",
    "mime_type": "application/pdf"
  },
  "provider": "surya",
  "pages": [0, 1],
  "table_format": "markdown",
  "include_images": false,
  "extract_headers": true,
  "extract_footers": true,
  "language_hint": "de"
}

Input Types (document.type):

Type	Description	`data` field
`url`	Document URL	HTTPS URL to the document
`base64`	Base64-encoded	Base64 string of the document
`file_id`	Mistral File ID	ID of an uploaded file (Mistral only)

Options:

Field	Type	Default	Description
`provider`	string	`surya`	OCR provider (`surya` or `mistral`)
`pages`	int[]	all	Specific pages (0-based)
`table_format`	string	null	`markdown` or `html`
`include_images`	bool	false	Extract images as Base64 (Mistral only)
`extract_headers`	bool	false	Extract headers (Mistral only)
`extract_footers`	bool	false	Extract footers (Mistral only)
`language_hint`	string	null	Language hint (e.g. `de`)

Response:

{
  "pages": [
    {
      "index": 0,
      "markdown": "# Invoice No. 2025-001\n\n| Item | Amount |\n|---|---|\n| Hosting | 29.90 EUR |",
      "images": [],
      "header": null,
      "footer": null,
      "dimensions": {
        "dpi": 300,
        "height": 3507,
        "width": 2480
      }
    }
  ],
  "usage": {
    "pages_processed": 1,
    "provider": "surya",
    "model": "surya"
  },
  "document_annotation": null,
  "processed_at": "2026-03-12T14:30:00Z"
}

Structured Extraction¶

POST /api/v1/ocr/process/structured

Permission: ocr.process

Extracts structured data based on a JSON schema (e.g. invoice data).

Mistral Only

Structured extraction is only available with the Mistral provider. Surya returns null as document_annotation.

Request Body:

{
  "document": {
    "type": "url",
    "data": "https://example.com/invoice.pdf"
  },
  "provider": "mistral",
  "schema_definition": {
    "name": "invoice",
    "schema": {
      "type": "object",
      "properties": {
        "invoice_number": {"type": "string"},
        "date": {"type": "string"},
        "total": {"type": "number"},
        "line_items": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "description": {"type": "string"},
              "amount": {"type": "number"}
            }
          }
        }
      }
    }
  },
  "extraction_prompt": "Extract all invoice data"
}

Upload File + OCR¶

POST /api/v1/ocr/upload

Permission: ocr.process

Upload a file directly (multipart) and process OCR. Ideal for frontend integration.

Form Parameters:

Field	Type	Required	Description
`file`	File	yes	Document (max. 50 MB)
`provider`	string	no	Provider (default: `surya`)
`table_format`	string	no	`markdown` or `html`
`include_images`	bool	no	Extract images

cURL Example:

# Surya (local, default)
curl -X POST https://platform.xynap.tech/api/v1/ocr/upload \
  -H "Authorization: Bearer <token>" \
  -F "file=@invoice.pdf"

# Mistral (cloud)
curl -X POST https://platform.xynap.tech/api/v1/ocr/upload \
  -H "Authorization: Bearer <token>" \
  -F "file=@invoice.pdf" \
  -F "provider=mistral" \
  -F "table_format=markdown"

Architecture¶

Provider Pattern¶

BaseOcrProvider (Abstract)
  |
  +-- SuryaOcrProvider        (local, GPU — default)
  +-- MistralOcrProvider      (mistral-ocr-latest, cloud)
  +-- [Future Providers]      (e.g. Google Document AI, PaddleOCR)

New providers implement BaseOcrProvider with two methods:

process() — Standard OCR (Markdown output)
process_structured() — OCR with JSON schema extraction

Files¶

File	Description
`app/core/ocr/base.py`	Abstract provider interface
`app/core/ocr/factory.py`	Provider factory
`app/core/ocr/surya.py`	Surya OCR (local, GPU)
`app/core/ocr/mistral.py`	Mistral OCR 3 (cloud)
`app/core/ocr/schemas.py`	Pydantic request/response models
`app/core/ocr/router.py`	FastAPI endpoints

Configuration¶

Surya (Local)¶

Surya does not require an API key. GPU settings:

TORCH_DEVICE=cuda              # GPU backend (cuda/cpu)
RECOGNITION_BATCH_SIZE=128     # OCR batch size (RTX 4000: 128)
DETECTOR_BATCH_SIZE=18         # Detection batch size

Models are automatically downloaded on first call (~1.5 GB) and cached under /root/.cache/datalab. In the Docker container, /var/lib/xynap/surya-models is mounted as a persistent volume.

Mistral (Cloud)¶

The Mistral API key is loaded from the following sources (priority order):

Secret Store: ocr.mistral_api_key
Environment variable: MISTRAL_API_KEY

Permissions¶

Permission	Roles	Description
`ocr.read`	admin, reseller, customer, user	List providers
`ocr.process`	admin, reseller, customer, user	Process documents

super_admin

Super admins have implicit access to all permissions.

Adding a New Provider¶

Create a provider class (inherits from BaseOcrProvider)
Extend the OcrProviderType enum in schemas.py
Extend the factory in factory.py
Provide API key via Secret Store or environment variable (if cloud provider)
Add provider info to list_providers()
Extend tests
Update service register (/home/admin/docs/services/ocr.md)