OCR — Document Recognition¶
Overview¶
The OCR service provides universal document recognition with interchangeable providers.
Available Providers:
| Provider | Type | Default | Structured Extraction | Description |
|---|---|---|---|---|
| Surya OCR | Local (GPU) | Yes | No | GPU-accelerated local OCR — no cloud service, no API costs |
| Mistral OCR 3 | Cloud API | No | Yes (JSON Schema) | Mistral AI Cloud OCR with structured data extraction |
- Prefix:
/api/v1/ocr/ - Permissions:
ocr.read,ocr.process - Provider Architecture: Abstract Factory — new providers can be added without API changes
Supported Formats¶
| Format | Surya | Mistral |
|---|---|---|
| ✔ | ✔ | |
| PNG | ✔ | ✔ |
| JPEG | ✔ | ✔ |
| WebP | ✔ | ✔ |
| GIF | ✔ | ✔ |
| BMP | ✔ | ✔ |
| TIFF | ✔ | ✔ |
| AVIF | — | ✔ |
| DOCX | — | ✔ |
| PPTX | — | ✔ |
Max. file size: 50 MB
Endpoints¶
List Providers¶
Permission: ocr.read
Returns all registered OCR providers with availability status.
Response:
[
{
"type": "surya",
"name": "Surya OCR (Local)",
"description": "Local document recognition with GPU acceleration — no cloud service",
"supported_formats": ["pdf", "png", "jpg", "jpeg", "webp", "gif", "bmp", "tiff"],
"supports_structured": false,
"available": true,
"is_default": true
},
{
"type": "mistral",
"name": "Mistral OCR 3 (Cloud)",
"description": "Mistral AI document recognition — Cloud API, structured extraction",
"supported_formats": ["pdf", "png", "jpg", "jpeg", "webp", "gif", "bmp", "tiff", "avif", "docx", "pptx"],
"supports_structured": true,
"available": true,
"is_default": false
}
]
Process Document¶
Permission: ocr.process
OCR processing of a document. Returns Markdown per page.
Request Body:
{
"document": {
"type": "url",
"data": "https://example.com/invoice.pdf",
"filename": "invoice.pdf",
"mime_type": "application/pdf"
},
"provider": "surya",
"pages": [0, 1],
"table_format": "markdown",
"include_images": false,
"extract_headers": true,
"extract_footers": true,
"language_hint": "de"
}
Input Types (document.type):
| Type | Description | data field |
|---|---|---|
url |
Document URL | HTTPS URL to the document |
base64 |
Base64-encoded | Base64 string of the document |
file_id |
Mistral File ID | ID of an uploaded file (Mistral only) |
Options:
| Field | Type | Default | Description |
|---|---|---|---|
provider |
string | surya |
OCR provider (surya or mistral) |
pages |
int[] | all | Specific pages (0-based) |
table_format |
string | null | markdown or html |
include_images |
bool | false | Extract images as Base64 (Mistral only) |
extract_headers |
bool | false | Extract headers (Mistral only) |
extract_footers |
bool | false | Extract footers (Mistral only) |
language_hint |
string | null | Language hint (e.g. de) |
Response:
{
"pages": [
{
"index": 0,
"markdown": "# Invoice No. 2025-001\n\n| Item | Amount |\n|---|---|\n| Hosting | 29.90 EUR |",
"images": [],
"header": null,
"footer": null,
"dimensions": {
"dpi": 300,
"height": 3507,
"width": 2480
}
}
],
"usage": {
"pages_processed": 1,
"provider": "surya",
"model": "surya"
},
"document_annotation": null,
"processed_at": "2026-03-12T14:30:00Z"
}
Structured Extraction¶
Permission: ocr.process
Extracts structured data based on a JSON schema (e.g. invoice data).
Mistral Only
Structured extraction is only available with the Mistral provider. Surya returns null as document_annotation.
Request Body:
{
"document": {
"type": "url",
"data": "https://example.com/invoice.pdf"
},
"provider": "mistral",
"schema_definition": {
"name": "invoice",
"schema": {
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"date": {"type": "string"},
"total": {"type": "number"},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"amount": {"type": "number"}
}
}
}
}
}
},
"extraction_prompt": "Extract all invoice data"
}
Upload File + OCR¶
Permission: ocr.process
Upload a file directly (multipart) and process OCR. Ideal for frontend integration.
Form Parameters:
| Field | Type | Required | Description |
|---|---|---|---|
file |
File | yes | Document (max. 50 MB) |
provider |
string | no | Provider (default: surya) |
table_format |
string | no | markdown or html |
include_images |
bool | no | Extract images |
cURL Example:
# Surya (local, default)
curl -X POST https://platform.xynap.tech/api/v1/ocr/upload \
-H "Authorization: Bearer <token>" \
-F "file=@invoice.pdf"
# Mistral (cloud)
curl -X POST https://platform.xynap.tech/api/v1/ocr/upload \
-H "Authorization: Bearer <token>" \
-F "file=@invoice.pdf" \
-F "provider=mistral" \
-F "table_format=markdown"
Architecture¶
Provider Pattern¶
BaseOcrProvider (Abstract)
|
+-- SuryaOcrProvider (local, GPU — default)
+-- MistralOcrProvider (mistral-ocr-latest, cloud)
+-- [Future Providers] (e.g. Google Document AI, PaddleOCR)
New providers implement BaseOcrProvider with two methods:
process()— Standard OCR (Markdown output)process_structured()— OCR with JSON schema extraction
Files¶
| File | Description |
|---|---|
app/core/ocr/base.py |
Abstract provider interface |
app/core/ocr/factory.py |
Provider factory |
app/core/ocr/surya.py |
Surya OCR (local, GPU) |
app/core/ocr/mistral.py |
Mistral OCR 3 (cloud) |
app/core/ocr/schemas.py |
Pydantic request/response models |
app/core/ocr/router.py |
FastAPI endpoints |
Configuration¶
Surya (Local)¶
Surya does not require an API key. GPU settings:
TORCH_DEVICE=cuda # GPU backend (cuda/cpu)
RECOGNITION_BATCH_SIZE=128 # OCR batch size (RTX 4000: 128)
DETECTOR_BATCH_SIZE=18 # Detection batch size
Models are automatically downloaded on first call (~1.5 GB) and cached under /root/.cache/datalab.
In the Docker container, /var/lib/xynap/surya-models is mounted as a persistent volume.
Mistral (Cloud)¶
The Mistral API key is loaded from the following sources (priority order):
- Secret Store:
ocr.mistral_api_key - Environment variable:
MISTRAL_API_KEY
Permissions¶
| Permission | Roles | Description |
|---|---|---|
ocr.read |
admin, reseller, customer, user | List providers |
ocr.process |
admin, reseller, customer, user | Process documents |
super_admin
Super admins have implicit access to all permissions.
Adding a New Provider¶
- Create a provider class (inherits from
BaseOcrProvider) - Extend the
OcrProviderTypeenum inschemas.py - Extend the factory in
factory.py - Provide API key via Secret Store or environment variable (if cloud provider)
- Add provider info to
list_providers() - Extend tests
- Update service register (
/home/admin/docs/services/ocr.md)