IVR, Voice Bot & Interpreter¶

Overview¶

The IVR system (Interactive Voice Response) forms the first point of contact for incoming calls. It combines classic DTMF navigation with AI-based voice recognition and provides access to the Voice Bot as well as real-time interpreter.

IVR system¶

Call flow¶

Eingehender Anruf
    │
    ▼
DID-Lookup (Platform API)
    │
    ▼
IVR-Menue zugeordnet?
    ├── Ja → Personalisierte Begruessung (TTS)
    │         │
    │         ▼
    │     Warte auf Eingabe (DTMF + Sprache parallel)
    │         │
    │         ▼
    │     Aktion ausfuehren
    │
    └── Nein → Fallback (Extension / Ring Group)

Button	Action
0	Central (Ring Group)
1	Voice Bot (LiveKit Agent)
2	Forwarding to employees
3	Record message
4	Start interpreter
9	Repeat the menu

Parallel to DTMF, the caller's speech input is analyzed:

Anrufer spricht
    │
    ▼
audio_fork → WebSocket /ivr
    │
    ▼
STT (Spracherkennung)
    │
    ▼
Intent-Erkennung → Aktion

The voice recognition runs throughaudio_fork, which forwards the audio stream to a WebSocket endpoint in real time. The Intent detection maps spoken keywords on DTMF actions.

IVR-Menue Administration¶

# IVR-Menues verwalten
GET /api/v1/sip/ivr-menus
POST /api/v1/sip/ivr-menus
{
    "name": "Hauptmenue",
    "greeting_tts": "Willkommen bei xynap. Druecken Sie 1 fuer...",
    "timeout": 10,
    "max_retries": 3,
    "actions": [
        {"digit": "0", "type": "ring_group", "target_id": 1},
        {"digit": "1", "type": "livekit_agent", "target_id": null},
        {"digit": "2", "type": "extension", "target_id": 1000},
        {"digit": "4", "type": "interpreter", "target_id": null}
    ]
}

TTS (Text to Speech)¶

ElevenLabs — IVR statements¶

ElevenLabsis used for IVR greetings and menu announcements (high quality, naturally sounding voices).

Einsatz: Personalized greetings, menu announcements
Cache: /var/lib/xynap/voicebot/tts-audio/ivr_greetings/
Format: WAV, 8kHz/16bit (SIP compatible)

TTS-Caching

Generated audio files are checked so that identical announcements do not have to be generated again. When the greeting text is changed, the cache is automatically invalidated.

Piper — Interpreter-TTS¶

Piper(local TTS) is used for the interpreter service:

Einsatz: Real-time translation output
Vorteil: No API Layer, Run locally
Modelle: German and English voices preinstalled

LiveKit Voice Bot¶

Architecture¶

FreeSwitch
    │ SIP INVITE an *99 oder IVR-Taste 1
    ▼
sofia/external-ipv4/test@127.0.0.1:5070
    │
    ▼
LiveKit SIP Bridge (livekit-sip)
    │ Erstellt LiveKit Room
    ▼
LiveKit Server (livekit)
    │ Dispatcht Agent
    ▼
LiveKit Agent Worker (livekit-agent)
    │ Python, livekit-agents 1.4
    ▼
KI-Interaktion (STT → LLM → TTS)

Configuration¶

Component	Config file	Containers
LiveKit Server	`/etc/xynap/livekit/livekit.yaml`	`livekit`
SIP Bridge	`/etc/xynap/livekit-sip/sip.yaml`	`livekit-sip`
Agent	Source code in container	`livekit-agent`

SIP Trunk and Dispatch¶

Parameters	Value
SIP Trunk ID	`ST_kwNbrEg4YHSv`
Dispatch Rule ID	`SDR_k4XfgYznebZA`
SIP target	`127.0.0.1:5070`
Test extension	`*99`

Agent Worker¶

The Agent Worker is a Python process based onlivekit-agents 1.4:

STT(Speech-to-Text) — Transcription of caller language
LLM— Processing and response generation
TTS— Language output of the answer

The agent will start automatically when a new participant enters the LiveKit-Room (Dispatch Rule).

Interpreter (real-time translation)¶

Overview¶

The interpreter allows bidirectional real-time translation during a call. It is activated viaIVR-Taste 4.

Pipeline¶

Anrufer spricht (Deutsch)
    │
    ▼
Whisper STT (Transkription)
    │ Text (DE)
    ▼
LibreTranslate (Uebersetzung DE → EN)
    │ Text (EN)
    ▼
Ollama (optionale Nachbearbeitung/Kontextverstaendnis)
    │ Text (EN, optimiert)
    ▼
Piper TTS (Sprachausgabe)
    │ Audio (EN)
    ▼
Ausgabe an Gegenpartei

The same pipeline runs in reverse direction for the counterparty.

Technical details¶

Component	Description
Containers	`interpreter-bridge`(host network)
Source code	`/usr/local/xynap/interpreter/`
WebSocket	`ws://127.0.0.1:9001/interpret`
STT	Whisper (OpenAI)
Translation	LibreTranslate (local)
LLM	Ollama (local, optional post-processing)
TTS	Piper (local)

Audio flow¶

Teilnehmer A                    Teilnehmer B
     │                               │
     │  Audio (DE)                    │
     ├──────► audio_fork ────────────►│
     │        │                       │
     │        ▼                       │
     │   Whisper STT                  │
     │        │                       │
     │        ▼                       │
     │   LibreTranslate              │
     │        │                       │
     │        ▼                       │
     │   Piper TTS (EN)              │
     │        │                       │
     │        └──────────► Audio (EN) │
     │                               │
     │  Audio (EN)                    │
     │◄──── gleiche Pipeline ────────┤
     │       (umgekehrt)             │

Latenz

The end-to-end charge of the transmission pipeline is typically 2–4 seconds, depending on the set length and GPU load. Whisper and Piper run on the local RTX 4000 Ada (20 GB VRAM).

GPU resources

Whisper, Ollama and Piper share the GPU. With the simultaneous use of all services, bottlenecks can occur. The interpreter is currently a priority for the GPU resources.

Interplay of components¶

                    ┌─────────────────┐
                    │  Platform API   │
                    │  (SIP-Modul)    │
                    └────────┬────────┘
                             │ xml-curl
                             ▼
┌─────────┐     ┌────────────────────────┐     ┌─────────────┐
│  SIP    │────►│     FreeSwitch          │────►│  LiveKit    │
│Provider │     │                        │     │  SIP Bridge │
└─────────┘     │  IVR ──► DTMF/Sprache │     └──────┬──────┘
                │         │              │            │
                │    ┌────┴────┐         │     ┌──────┴──────┐
                │    │ Taste 1 │─────────┼────►│ Voice Bot   │
                │    │ Taste 4 │─────┐   │     │ (Agent)     │
                │    └─────────┘     │   │     └─────────────┘
                └────────────────────┼───┘
                                     │
                              ┌──────┴──────┐
                              │ Interpreter │
                              │   Bridge    │
                              └──────┬──────┘
                                     │
                     ┌───────┬───────┼───────┐
                     ▼       ▼       ▼       ▼
                  Whisper  Libre   Ollama  Piper
                   STT    Translate  LLM    TTS

IVR, Voice Bot & Interpreter¶

Overview¶

IVR system¶

Call flow¶

DTMF menu¶

Language navigation¶

IVR-Menue Administration¶

TTS (Text to Speech)¶

ElevenLabs — IVR statements¶

Piper — Interpreter-TTS¶

LiveKit Voice Bot¶

Architecture¶

Configuration¶

SIP Trunk and Dispatch¶

Agent Worker¶

Interpreter (real-time translation)¶

Overview¶

Pipeline¶

Technical details¶

Audio flow¶

Interplay of components¶