VAPI Review: Honest Assessment After Running It in production
VAPI is the voice AI infrastructure platform I use to run Amy, an AI cold caller making 75 calls a day to Australian tradespeople. I've been running it in production since late 2025. This is an honest review — what works, what breaks, how the pricing actually stacks up, and when you should use Bland.ai or Retell instead.
What VAPI Is (and Isn't)
VAPI is infrastructure, not a finished product. You don't sign up and get a working AI caller. You get an API that lets you assemble one from best-of-breed components.
What VAPI handles:
- Outbound and inbound call management (via Twilio, Vonage, or your own SIP trunk)
- Real-time audio streaming to/from your STT, LLM, and TTS providers
- Turn-taking logic — detecting when the human has finished speaking
- Call recording, transcription storage, and webhook events
- Concurrent call scaling — run hundreds of simultaneous calls without managing infrastructure
What VAPI doesn't handle:
- Your script or conversation logic — that's your LLM system prompt
- Lead list management or CRM — you build that
- Compliance (DNC scrubbing, call hour rules) — entirely your responsibility
- The voice itself — you bring ElevenLabs, PlayHT, or similar
The modularity is both the strength and the complexity. You can swap Deepgram for Whisper, GPT-4o for Claude, ElevenLabs for PlayHT — and tune each layer independently. But that also means more configuration surface area where things can go wrong.
Pricing: What You Actually Pay
VAPI's pricing page shows $0.05/min for infrastructure. Here's the full picture for a production setup:
Unanswered calls (voicemail, no answer) cost almost nothing — VAPI detects the voicemail greeting and hangs up in 4–6 seconds. You pay for maybe $0.005 per unanswered dial.
For 75 calls/day at 50% answer rate and 90-second average call duration: ~$5.25/day total. At a 1/20 conversion to booked meeting, that's $105 per booked meeting. For a service selling at $3,000–5,000 AUD, the ROI is obvious.
Pros and Cons: Production Reality
What works well
- Full control over every layer of the stack
- Cheapest at scale when you optimise each component
- Excellent webhook system — every call event fires reliably
- Good concurrent call handling — 50+ simultaneous calls without issues
- Strong Discord community with real engineers answering questions
- Swap any provider without rebuilding everything
- Dashboard is clear; logs are detailed enough to debug
What's painful
- Endpointing tuning takes time — default settings cause interruptions
- Latency spikes to 1.5–2s happen occasionally, sounds unnatural
- Documentation lags behind features — some things only discoverable via Discord
- No built-in analytics — you log everything yourself via webhooks
- ElevenLabs + VAPI integration has occasional audio glitches under high load
- No no-code interface — non-developers will struggle
Configuration That Actually Works
These are the settings I run in production after months of tuning. Copy this as a starting point:
The single most important setting: silenceDurationMs. The default is too low — the agent cuts in while the human is still mid-sentence. Set it to 700ms and you eliminate 80% of the interruption problem. Go to 900ms if your audience speaks slowly or tends to pause mid-thought.
VAPI vs Bland.ai vs Retell: Which to Choose
| Criteria | VAPI | Bland.ai | Retell AI |
|---|---|---|---|
| Time to first call | 2–4 hours (dev setup) | 30 minutes (no-code) | 1–2 hours |
| Pricing model | ~$0.05/min + providers | $0.09/min flat | ~$0.07/min + providers |
| Cost at 10K min/month | ~$650 (optimised) | $900 flat | ~$750 |
| Custom LLM | ✓ Full control | ✗ Limited | ✓ Via webhook |
| Voice providers | ElevenLabs, PlayHT, OpenAI, Deepgram | Built-in + cloning | ElevenLabs, OpenAI, custom |
| Analytics dashboard | ~ Basic logs | ~ Moderate | ✓ Best |
| Non-developer friendly | ✗ No | ✓ Yes | ~ Moderate |
| Best for | Developers, high volume, custom stack | Fast setup, non-technical teams | Agencies managing multiple clients |
My routing rule: VAPI if you're technical and running volume. Bland.ai if you need something live today without touching an API. Retell if you're an agency managing calling campaigns for multiple clients and need a clean reporting UI.
How I Score VAPI
Overall: 4/5 for developers, 2/5 for non-technical users. The platform has gotten meaningfully better in the 6 months I've been running it. The team ships fast. The latency issues are less frequent than they were in Q3 2025. The documentation still needs work.
Three Things I Wish I'd Known Before Starting
- Test endpointing with real background noise before going live. The tradies I'm calling are sometimes on a job site — machinery, traffic, wind. Test your VAD threshold with audio that has ambient noise, not just a quiet office recording.
- Build your logging layer before your first real campaign. VAPI fires webhooks for every event. If you don't have a system to capture them from day one, you'll lose call data you can't recover. I log everything to a Google Sheet via a simple FastAPI endpoint.
- ElevenLabs latency varies significantly by voice. The Matilda voice I use is faster than most custom clones. If you clone a voice, test the actual latency before assuming it'll match what you've read in benchmarks.
When VAPI Doesn't Apply
- You're not technical (or don't have a developer). VAPI requires API configuration, webhook handling, and debugging of voice pipeline settings. Without technical capability, use Bland.ai instead — it's designed for non-developers.
- You need analytics without building them. VAPI's built-in dashboard is minimal. If you need conversion rates, call durations, and booking rates without building your own logging system, Retell has this out of the box.
- You're running under 1,000 minutes a month. At low volume, the cost savings of VAPI over Bland.ai are negligible, but the setup complexity is the same. Bland.ai's simplicity is worth the $0.04/min premium at low volumes.
- You need guaranteed SLA uptime. VAPI is robust but doesn't publish enterprise SLAs. For mission-critical calling operations (financial services, high-stakes healthcare scheduling), verify their current SLA terms before committing.
FAQ
What is VAPI?
VAPI is a developer platform for building real-time voice AI agents. It manages the full call pipeline — dialling, STT, LLM inference, TTS — and lets you plug in your own providers at each layer. You pay per minute of call time, with no monthly platform fee.
How much does VAPI cost in practice?
VAPI infrastructure runs ~$0.05/min. Add Deepgram Nova-2 (~$0.004/min), GPT-4o (~$0.015/min), ElevenLabs (~$0.010/min), and Twilio carrier (~$0.008/min). Total for a 90-second answered call: $0.065–0.09. Unanswered calls cost almost nothing (~$0.005).
What are the main problems with VAPI?
Three main issues: endpointing (agent cuts in too early — fix with silenceDurationMs: 700), occasional latency spikes to 1.5–2s, and documentation gaps where features are only discoverable through the Discord community.
Is VAPI better than Bland.ai or Retell?
VAPI is best for developers running high volume who want full stack control and lowest cost. Bland.ai is fastest to set up (no-code). Retell has the best analytics dashboard for agencies. There's no universally "better" — depends on your technical level and volume.
What STT model works best for Australian English?
Deepgram Nova-2 with language: "en-AU". It handles Australian accents better than Whisper, has lower latency (~150ms vs ~300ms), and supports real-time streaming rather than batch processing. This is a meaningful quality difference on a live call.
Want someone who actually runs VAPI to build your caller?
I run this in production daily. I know the endpointing settings, the voice configs, the CRM hooks. Apply to work with me directly — I'll tell you exactly what your setup looks like and what it costs.
Apply to Work 1-on-1 with RomanOr join my free community — AI Mastery Genesis on Skool — where I drop the templates I use to build these agents.
Application-only · Roman reviews personally