All postsApr 14, 2026 · 8 min · Ben Shah

Voice AI stacks for specialty dental: Vapi vs Retell vs Twilio

When DSOs evaluate PracticeIQ, a technical IT lead sometimes asks: 'what's your voice stack, and why did you pick it?' Fair question. Here's our answer, and the honest tradeoffs behind each option.

The three serious players

**Vapi** — we use this. JavaScript/TypeScript-native API, Twilio underneath, straightforward to wire to an LLM of your choice, sub-500ms first-word latency on warm connections.

**Retell** — similar positioning, slightly better tool-use support, slightly worse documentation. Common alternative.

**Twilio Voice + Flex + custom LLM orchestration** — the build-it-yourself path. Maximum control, ~3x the engineering time.

What we optimized for

When we picked the stack, we ranked four things:

1. **Latency** — every 100ms above 500ms makes the AI sound less alive. Specialty dental patients are already nervous on the phone.

2. **HIPAA posture** — the vendor must sign a BAA. Twilio does, Vapi does, Retell does.

3. **Tool-calling reliability** — the AI has to book into your PMS in real-time. Tool calls that fail silently = booking silently dropped.

4. **Developer UX** — we're a 2-person team. Spending 3 weeks gluing Twilio pipes together was not the trade we wanted.

Why Vapi won for us

Vapi lets us treat the voice layer as 'give me a WebSocket and a transcript'. We own the LLM call, the prompt, the tool-use loop, and the PMS bridge. Vapi handles TTS, ASR, VAD, and telephony — all the stuff we'd spend months on otherwise.

We evaluated Retell seriously. Similar feature set. We went with Vapi because their SDK was more TypeScript-idiomatic for our stack and their support team replied within an hour on a critical path question.

The LLM choice

Separately from voice: which LLM does Riley run on? Today we route based on intent. Simple intents (confirm appointment, check hours, reschedule) go to a smaller cheaper model for speed + cost. Complex intents (implant candidate qualification, insurance triage, multi-patient rescheduling) go to a larger reasoning model.

This is the [[Rule 213]] pattern — token efficiency by routing, not by defaulting to the most capable model on every turn.

What a DSO should really ask their vendor

If you're evaluating us (or Weave's AI module, or Smith.ai's bot, or a homegrown build), ask these five questions — the answers separate production-grade from demo-ware:

1. What's P95 first-word latency during peak hours?

2. Where is audio stored in transit vs at rest? What encryption?

3. What happens when the LLM tool-call to your PMS fails — does the call fall back to a human or silently fail?

4. How do you handle barge-in / interruption? (Patients talk over AIs.)

5. What's your published incident response time when voice goes down on a Tuesday at 10am?

If a vendor can't answer those specifically, they're not ready for your DSO.

What we're watching

OpenAI's Realtime API, Anthropic's Claude voice, and emerging speech-to-speech models (no intermediate text) are worth watching. We've prototyped against all three. For now, the Vapi-on-top-of-Twilio stack has the right reliability+latency+cost profile for specialty dental. We'll migrate layers as they mature, without customers noticing.

Technical questions always welcome — ben@practiceiq.ai.

voice AItechnicalarchitecture

Book a discovery call