Skip to content

AI Stack

Hardware

  • GPU:NVIDIA RTX 4000 Ada (20 GB VRAM)
  • Modelle:devstral-small-2:24b (15GB), qwen2.5-coder:14b, more

Components

Ollama (LLM Runtime)

  • URL:http://ollama:11434(internal)
  • GPU:Exclusive, container withnvidiaRuntime
  • Modelle:Automatically loaded via Open WebUI

Open WebUI (Chat Interface)

  • URL:https://ai.xynap.tech(auth-cut)
  • Features:Multi-Model Chat, RAG, Prompt templates, API keys

Whisper (Speech-to-Text)

  • URL:http://whisper:9099(internal)
  • Modell:whisper-large-v3-turbo
  • Endpoint: POST /asr?output=json

Piper TTS (Text-to-Speech)

Two services:

Service Port Protocol Use
piper-tts 10200 Wyoming FreeSwitch
piper-http 5100 HTTP REST Web apps

German voice:thorsten_emotional(medium quality)

LibreTranslate

  • URL:http://libretranslate:5000(internal)
  • Sprachen:DE, EN, FR, ES, IT, PT, ...
  • API: POST /translate {"q": "text", "source": "de", "target": "en"}

AI Agent (Genesis)

The AI Coding Agent uses:

  • Ollama LLM for code generation
  • Redis PubSub for Task Communication
  • Platform API as control interface
# Task erstellen
POST /api/v1/coder/tasks
{"prompt": "...", "workspace": "/path/to/code"}

# Status abrufen
GET /api/v1/coder/tasks/{token}

Real-time interpreter

AI-based telephone interpreter:

Anruf → FreeSwitch → Whisper (STT) → LibreTranslate → Piper (TTS) → zurueck

Substanced: DE└EN, DE└TR, DE└AR and other language pairs.