Skip to content

IVR, Voice Bot & Interpreter

Overview

The IVR system (Interactive Voice Response) forms the first point of contact for incoming calls. It combines classic DTMF navigation with AI-based voice recognition and provides access to the Voice Bot as well as real-time interpreter.


IVR system

Call flow

Eingehender Anruf
DID-Lookup (Platform API)
IVR-Menue zugeordnet?
    ├── Ja → Personalisierte Begruessung (TTS)
    │         │
    │         ▼
    │     Warte auf Eingabe (DTMF + Sprache parallel)
    │         │
    │         ▼
    │     Aktion ausfuehren
    └── Nein → Fallback (Extension / Ring Group)

DTMF menu

Button Action
0 Central (Ring Group)
1 Voice Bot (LiveKit Agent)
2 Forwarding to employees
3 Record message
4 Start interpreter
9 Repeat the menu

Language navigation

Parallel to DTMF, the caller's speech input is analyzed:

Anrufer spricht
audio_fork → WebSocket /ivr
STT (Spracherkennung)
Intent-Erkennung → Aktion

The voice recognition runs throughaudio_fork, which forwards the audio stream to a WebSocket endpoint in real time. The Intent detection maps spoken keywords on DTMF actions.

IVR-Menue Administration

# IVR-Menues verwalten
GET /api/v1/sip/ivr-menus
POST /api/v1/sip/ivr-menus
{
    "name": "Hauptmenue",
    "greeting_tts": "Willkommen bei xynap. Druecken Sie 1 fuer...",
    "timeout": 10,
    "max_retries": 3,
    "actions": [
        {"digit": "0", "type": "ring_group", "target_id": 1},
        {"digit": "1", "type": "livekit_agent", "target_id": null},
        {"digit": "2", "type": "extension", "target_id": 1000},
        {"digit": "4", "type": "interpreter", "target_id": null}
    ]
}

TTS (Text to Speech)

ElevenLabs — IVR statements

ElevenLabsis used for IVR greetings and menu announcements (high quality, naturally sounding voices).

  • Einsatz: Personalized greetings, menu announcements
  • Cache: /var/lib/xynap/voicebot/tts-audio/ivr_greetings/
  • Format: WAV, 8kHz/16bit (SIP compatible)

TTS-Caching

Generated audio files are checked so that identical announcements do not have to be generated again. When the greeting text is changed, the cache is automatically invalidated.

Piper — Interpreter-TTS

Piper(local TTS) is used for the interpreter service:

  • Einsatz: Real-time translation output
  • Vorteil: No API Layer, Run locally
  • Modelle: German and English voices preinstalled

LiveKit Voice Bot

Architecture

FreeSwitch
    │ SIP INVITE an *99 oder IVR-Taste 1
sofia/external-ipv4/test@127.0.0.1:5070
LiveKit SIP Bridge (livekit-sip)
    │ Erstellt LiveKit Room
LiveKit Server (livekit)
    │ Dispatcht Agent
LiveKit Agent Worker (livekit-agent)
    │ Python, livekit-agents 1.4
KI-Interaktion (STT → LLM → TTS)

Configuration

Component Config file Containers
LiveKit Server /etc/xynap/livekit/livekit.yaml livekit
SIP Bridge /etc/xynap/livekit-sip/sip.yaml livekit-sip
Agent Source code in container livekit-agent

SIP Trunk and Dispatch

Parameters Value
SIP Trunk ID ST_kwNbrEg4YHSv
Dispatch Rule ID SDR_k4XfgYznebZA
SIP target 127.0.0.1:5070
Test extension *99

Agent Worker

The Agent Worker is a Python process based onlivekit-agents 1.4:

  1. STT(Speech-to-Text) — Transcription of caller language
  2. LLM— Processing and response generation
  3. TTS— Language output of the answer

The agent will start automatically when a new participant enters the LiveKit-Room (Dispatch Rule).


Interpreter (real-time translation)

Overview

The interpreter allows bidirectional real-time translation during a call. It is activated viaIVR-Taste 4.

Pipeline

Anrufer spricht (Deutsch)
Whisper STT (Transkription)
    │ Text (DE)
LibreTranslate (Uebersetzung DE → EN)
    │ Text (EN)
Ollama (optionale Nachbearbeitung/Kontextverstaendnis)
    │ Text (EN, optimiert)
Piper TTS (Sprachausgabe)
    │ Audio (EN)
Ausgabe an Gegenpartei

The same pipeline runs in reverse direction for the counterparty.

Technical details

Component Description
Containers interpreter-bridge(host network)
Source code /usr/local/xynap/interpreter/
WebSocket ws://127.0.0.1:9001/interpret
STT Whisper (OpenAI)
Translation LibreTranslate (local)
LLM Ollama (local, optional post-processing)
TTS Piper (local)

Audio flow

Teilnehmer A                    Teilnehmer B
     │                               │
     │  Audio (DE)                    │
     ├──────► audio_fork ────────────►│
     │        │                       │
     │        ▼                       │
     │   Whisper STT                  │
     │        │                       │
     │        ▼                       │
     │   LibreTranslate              │
     │        │                       │
     │        ▼                       │
     │   Piper TTS (EN)              │
     │        │                       │
     │        └──────────► Audio (EN) │
     │                               │
     │  Audio (EN)                    │
     │◄──── gleiche Pipeline ────────┤
     │       (umgekehrt)             │

Latenz

The end-to-end charge of the transmission pipeline is typically 2–4 seconds, depending on the set length and GPU load. Whisper and Piper run on the local RTX 4000 Ada (20 GB VRAM).

GPU resources

Whisper, Ollama and Piper share the GPU. With the simultaneous use of all services, bottlenecks can occur. The interpreter is currently a priority for the GPU resources.


Interplay of components

                    ┌─────────────────┐
                    │  Platform API   │
                    │  (SIP-Modul)    │
                    └────────┬────────┘
                             │ xml-curl
┌─────────┐     ┌────────────────────────┐     ┌─────────────┐
│  SIP    │────►│     FreeSwitch          │────►│  LiveKit    │
│Provider │     │                        │     │  SIP Bridge │
└─────────┘     │  IVR ──► DTMF/Sprache │     └──────┬──────┘
                │         │              │            │
                │    ┌────┴────┐         │     ┌──────┴──────┐
                │    │ Taste 1 │─────────┼────►│ Voice Bot   │
                │    │ Taste 4 │─────┐   │     │ (Agent)     │
                │    └─────────┘     │   │     └─────────────┘
                └────────────────────┼───┘
                              ┌──────┴──────┐
                              │ Interpreter │
                              │   Bridge    │
                              └──────┬──────┘
                     ┌───────┬───────┼───────┐
                     ▼       ▼       ▼       ▼
                  Whisper  Libre   Ollama  Piper
                   STT    Translate  LLM    TTS