AI Stack¶
Hardware¶
- GPU:NVIDIA RTX 4000 Ada (20 GB VRAM)
- Modelle:devstral-small-2:24b (15GB), qwen2.5-coder:14b, more
Components¶
Ollama (LLM Runtime)¶
- URL:
http://ollama:11434(internal) - GPU:Exclusive, container with
nvidiaRuntime - Modelle:Automatically loaded via Open WebUI
Open WebUI (Chat Interface)¶
- URL:
https://ai.xynap.tech(auth-cut) - Features:Multi-Model Chat, RAG, Prompt templates, API keys
Whisper (Speech-to-Text)¶
- URL:
http://whisper:9099(internal) - Modell:whisper-large-v3-turbo
- Endpoint:
POST /asr?output=json
Piper TTS (Text-to-Speech)¶
Two services:
| Service | Port | Protocol | Use |
|---|---|---|---|
| piper-tts | 10200 | Wyoming | FreeSwitch |
| piper-http | 5100 | HTTP REST | Web apps |
German voice:thorsten_emotional(medium quality)
LibreTranslate¶
- URL:
http://libretranslate:5000(internal) - Sprachen:DE, EN, FR, ES, IT, PT, ...
- API:
POST /translate {"q": "text", "source": "de", "target": "en"}
AI Agent (Genesis)¶
The AI Coding Agent uses:
- Ollama LLM for code generation
- Redis PubSub for Task Communication
- Platform API as control interface
# Task erstellen
POST /api/v1/coder/tasks
{"prompt": "...", "workspace": "/path/to/code"}
# Status abrufen
GET /api/v1/coder/tasks/{token}
Real-time interpreter¶
AI-based telephone interpreter:
Substanced: DE└EN, DE└TR, DE└AR and other language pairs.