Voice AI stacks for specialty dental: Vapi vs Retell vs Twilio
When DSOs evaluate PracticeIQ, a technical IT lead sometimes asks: 'what's your voice stack, and why did you pick it?' Fair question. Here's our answer, and the honest tradeoffs behind each option.
The three serious players
**Vapi** — we use this. JavaScript/TypeScript-native API, Twilio underneath, straightforward to wire to an LLM of your choice, sub-500ms first-word latency on warm connections.
**Retell** — similar positioning, slightly better tool-use support, slightly worse documentation. Common alternative.
**Twilio Voice + Flex + custom LLM orchestration** — the build-it-yourself path. Maximum control, ~3x the engineering time.
What we optimized for
When we picked the stack, we ranked four things:
1. **Latency** — every 100ms above 500ms makes the AI sound less alive. Specialty dental patients are already nervous on the phone.
2. **HIPAA posture** — the vendor must sign a BAA. Twilio does, Vapi does, Retell does.
3. **Tool-calling reliability** — the AI has to book into your PMS in real-time. Tool calls that fail silently = booking silently dropped.
4. **Developer UX** — we're a 2-person team. Spending 3 weeks gluing Twilio pipes together was not the trade we wanted.
Why Vapi won for us
Vapi lets us treat the voice layer as 'give me a WebSocket and a transcript'. We own the LLM call, the prompt, the tool-use loop, and the PMS bridge. Vapi handles TTS, ASR, VAD, and telephony — all the stuff we'd spend months on otherwise.
We evaluated Retell seriously. Similar feature set. We went with Vapi because their SDK was more TypeScript-idiomatic for our stack and their support team replied within an hour on a critical path question.
The LLM choice
Separately from voice: which LLM does Riley run on? Today we route based on intent. Simple intents (confirm appointment, check hours, reschedule) go to a smaller cheaper model for speed + cost. Complex intents (implant candidate qualification, insurance triage, multi-patient rescheduling) go to a larger reasoning model.
This is the [[Rule 213]] pattern — token efficiency by routing, not by defaulting to the most capable model on every turn.
What a DSO should really ask their vendor
If you're evaluating us (or Weave's AI module, or Smith.ai's bot, or a homegrown build), ask these five questions — the answers separate production-grade from demo-ware:
1. What's P95 first-word latency during peak hours?
2. Where is audio stored in transit vs at rest? What encryption?
3. What happens when the LLM tool-call to your PMS fails — does the call fall back to a human or silently fail?
4. How do you handle barge-in / interruption? (Patients talk over AIs.)
5. What's your published incident response time when voice goes down on a Tuesday at 10am?
If a vendor can't answer those specifically, they're not ready for your DSO.
What we're watching
OpenAI's Realtime API, Anthropic's Claude voice, and emerging speech-to-speech models (no intermediate text) are worth watching. We've prototyped against all three. For now, the Vapi-on-top-of-Twilio stack has the right reliability+latency+cost profile for specialty dental. We'll migrate layers as they mature, without customers noticing.
Technical questions always welcome — ben@practiceiq.ai.