Voice AI · Production

A receptionist that picks up every call, in under 800 milliseconds.

Avyra Voice AI is a self-hosted phone agent built for dental clinics, restaurants, and any business that drops calls after 5pm. Patient calls in, gets a warm greeting, books an appointment, and only a human picks up when a human should.

Deploy Avyra Voice AI Back to suite

<800ms

End-to-end latency

Per-minute AI cost

1 env var

LLM swap

30+

Languages

Features

What it does in production.

Inbound phone handling

Asterisk 20 + PJSIP terminate the call. Works with Twilio, Bandwidth, Vonage, or any SIP trunk.

Books on Google Calendar

Reads availability, holds slots, books, reschedules, and cancels. PMS adapter is pluggable for clinic-specific systems.

RAG for FAQs

ChromaDB indexes your clinic FAQs. The agent answers from your content, not the model's training set.

Claude or local Llama

`LLM_PROVIDER=claude` for hosted, or `ollama` for fully self-hosted on your own GPU. Same prompts, same behavior.

Emergency triage

Detects dental and medical emergencies, follows your written triage protocol, and warm-transfers to staff.

Self-hosted, no per-minute fees

Run it on a $380/mo g4dn.xlarge. No Voiceflow, no Vapi, no per-minute AI charges.

Full observability

Prometheus metrics for calls, intents, STT/TTS/LLM latency. Postgres call logs. Redis session state.

Custom voice training

Fine-tune XTTS v2 on 3–4 hours of recorded voice to make Avyra sound like a specific person on your team.

Architecture

How it's built.

Asterisk 20 LTS handles SIP / PJSIP termination and DTMF fallback menus.
ai-engine orchestrates STT → intent classification → slot filling → LLM → TTS.
Postgres for call logs and audit; Redis for real-time session state.
Deploys end-to-end on AWS via included Terraform (g4dn.xlarge ≈ $380/mo).
Prompts live as Markdown files — no redeploy to tweak agent behavior.
Multi-tenant by `clinic_id` — one box can run many practices.

Inbound Call
    │
    ▼
Asterisk 20 (SIP / PJSIP)
    │  ARI WebSocket
    ▼
ai-engine (FastAPI)
    ├─ STT  : faster-whisper distil-large-v3
    ├─ LLM  : Claude Sonnet / Ollama Llama 3.1
    ├─ Cal  : Google Calendar API
    ├─ RAG  : ChromaDB
    └─ TTS  : XTTS v2 (custom voice)
    │
    ▼
Asterisk plays audio back to caller

Use cases

Who runs it today.

Dental clinics

Front-office receptionist that books, reschedules, cancels, and warm-transfers emergencies. Indistinguishable from human voice.

Restaurants

Phone ordering at peak hours when staff can't grab the line. Quotes wait times, takes pickup orders, hands to POS.

Service businesses

Anything where missed calls = lost revenue. HVAC, salons, vet clinics, auto shops — the brain only changes prompts.

Built withPython 3.11FastAPIAsterisk 20faster-whisperXTTS v2ChromaDBAnthropic ClaudeOllamaPostgreSQLRedisDockerTerraform

FAQ

Questions we get asked.

How is this different from Voiceflow or Vapi?+

Those are SaaS with per-minute pricing and zero control over latency, voice, or the model. Avyra runs on your hardware (or your cloud account), pays no per-minute AI fees, and lets you fine-tune both the voice and the LLM.

Can it sound like a specific person?+

Yes. Record 3–4 hours of clean voice, run the included XTTS v2 fine-tuning pipeline, drop the speaker file in, restart the TTS service.

What's the cheapest fully-local setup?+

One GPU box running Asterisk + ai-engine + Ollama with Llama 3.1 8B. No Anthropic key needed, no cloud calls. Costs scale with hardware, not minutes.

Does it handle warm transfers?+

Yes — when the LLM emits a transfer intent, ai-engine bridges the call to your staff extension via Asterisk.

Ready when you are

Got a product to build? Tell us what you have in mind.

We kick off in days, not months. Working software in weeks. If we're not the right fit, we'll tell you up front.

Start a project Featured work · Avyra