Review Last updated: April 22, 2026 By Roman Stanek ~1,600 words

VAPI Review: Honest Assessment After Running It in production

VAPI is the voice AI infrastructure platform I use to run Amy, an AI cold caller making 75 calls a day to Australian tradespeople. I've been running it in production since late 2025. This is an honest review — what works, what breaks, how the pricing actually stacks up, and when you should use Bland.ai or Retell instead.

~$0.05
Per minute VAPI infrastructure cost
Source: VAPI pricing page, 2026
600ms
Typical end-to-end response latency
Source: Production logs, 2026
107M+
Minutes of voice AI processed by VAPI in 2025
Source: VAPI company blog, 2025
Verdict: Best-in-class for developers who want full control. Not the fastest to get running, but the most powerful at scale.

What VAPI Is (and Isn't)

VAPI is infrastructure, not a finished product. You don't sign up and get a working AI caller. You get an API that lets you assemble one from best-of-breed components.

What VAPI handles:

What VAPI doesn't handle:

The modularity is both the strength and the complexity. You can swap Deepgram for Whisper, GPT-4o for Claude, ElevenLabs for PlayHT — and tune each layer independently. But that also means more configuration surface area where things can go wrong.

Pricing: What You Actually Pay

VAPI's pricing page shows $0.05/min for infrastructure. Here's the full picture for a production setup:

Component
Cost per min
Notes
VAPI infrastructure
~$0.050
Base platform fee
Deepgram Nova-2 (STT)
~$0.004
Real-time streaming model
GPT-4o (LLM)
~$0.015
Varies with token usage per turn
ElevenLabs (TTS)
~$0.010
Creator plan; scales with char count
Twilio (carrier)
~$0.008
Per-minute outbound; varies by country
Total (90-second call)
~$0.065–0.09
Answered calls only

Unanswered calls (voicemail, no answer) cost almost nothing — VAPI detects the voicemail greeting and hangs up in 4–6 seconds. You pay for maybe $0.005 per unanswered dial.

For 75 calls/day at 50% answer rate and 90-second average call duration: ~$5.25/day total. At a 1/20 conversion to booked meeting, that's $105 per booked meeting. For a service selling at $3,000–5,000 AUD, the ROI is obvious.

Pros and Cons: Production Reality

What works well

  • Full control over every layer of the stack
  • Cheapest at scale when you optimise each component
  • Excellent webhook system — every call event fires reliably
  • Good concurrent call handling — 50+ simultaneous calls without issues
  • Strong Discord community with real engineers answering questions
  • Swap any provider without rebuilding everything
  • Dashboard is clear; logs are detailed enough to debug

What's painful

  • Endpointing tuning takes time — default settings cause interruptions
  • Latency spikes to 1.5–2s happen occasionally, sounds unnatural
  • Documentation lags behind features — some things only discoverable via Discord
  • No built-in analytics — you log everything yourself via webhooks
  • ElevenLabs + VAPI integration has occasional audio glitches under high load
  • No no-code interface — non-developers will struggle

Configuration That Actually Works

These are the settings I run in production after months of tuning. Copy this as a starting point:

// VAPI assistant config -- production settings for outbound cold calling { "transcriber": { "provider": "deepgram", "model": "nova-2", "language": "en-AU", // AU English accuracy boost "smartFormat": true }, "voice": { "provider": "11labs", "voiceId": "<your-elevenlabs-voice-id>", // Use an AU-friendly voice; test 2-3 before committing "stability": 0.5, "similarityBoost": 0.75 }, "model": { "provider": "openai", "model": "gpt-4o", "temperature": 0.4 // Lower = more consistent responses }, "endpointingConfig": { "vadThreshold": 0.6, // Higher = less sensitive, fewer false positives "silenceDurationMs": 700 // Wait 700ms of silence before responding }, "backchannelingEnabled": false, // Disable "mm-hmm" filler -- sounds weird at scale "backgroundDenoisingEnabled": true // Helps with tradie background noise }

The single most important setting: silenceDurationMs. The default is too low — the agent cuts in while the human is still mid-sentence. Set it to 700ms and you eliminate 80% of the interruption problem. Go to 900ms if your audience speaks slowly or tends to pause mid-thought.

VAPI vs Bland.ai vs Retell: Which to Choose

Criteria VAPI Bland.ai Retell AI
Time to first call 2–4 hours (dev setup) 30 minutes (no-code) 1–2 hours
Pricing model ~$0.05/min + providers $0.09/min flat ~$0.07/min + providers
Cost at 10K min/month ~$650 (optimised) $900 flat ~$750
Custom LLM Full control Limited Via webhook
Voice providers ElevenLabs, PlayHT, OpenAI, Deepgram Built-in + cloning ElevenLabs, OpenAI, custom
Analytics dashboard ~ Basic logs ~ Moderate Best
Non-developer friendly No Yes ~ Moderate
Best for Developers, high volume, custom stack Fast setup, non-technical teams Agencies managing multiple clients

My routing rule: VAPI if you're technical and running volume. Bland.ai if you need something live today without touching an API. Retell if you're an agency managing calling campaigns for multiple clients and need a clean reporting UI.

How I Score VAPI

Voice quality
4.5/5
Latency
4/5
Pricing
4.5/5
Ease of setup
3/5
Documentation
3/5
Reliability
4/5

Overall: 4/5 for developers, 2/5 for non-technical users. The platform has gotten meaningfully better in the 6 months I've been running it. The team ships fast. The latency issues are less frequent than they were in Q3 2025. The documentation still needs work.

Three Things I Wish I'd Known Before Starting

  1. Test endpointing with real background noise before going live. The tradies I'm calling are sometimes on a job site — machinery, traffic, wind. Test your VAD threshold with audio that has ambient noise, not just a quiet office recording.
  2. Build your logging layer before your first real campaign. VAPI fires webhooks for every event. If you don't have a system to capture them from day one, you'll lose call data you can't recover. I log everything to a Google Sheet via a simple FastAPI endpoint.
  3. ElevenLabs latency varies significantly by voice. The Matilda voice I use is faster than most custom clones. If you clone a voice, test the actual latency before assuming it'll match what you've read in benchmarks.

When VAPI Doesn't Apply

FAQ

What is VAPI?

VAPI is a developer platform for building real-time voice AI agents. It manages the full call pipeline — dialling, STT, LLM inference, TTS — and lets you plug in your own providers at each layer. You pay per minute of call time, with no monthly platform fee.

How much does VAPI cost in practice?

VAPI infrastructure runs ~$0.05/min. Add Deepgram Nova-2 (~$0.004/min), GPT-4o (~$0.015/min), ElevenLabs (~$0.010/min), and Twilio carrier (~$0.008/min). Total for a 90-second answered call: $0.065–0.09. Unanswered calls cost almost nothing (~$0.005).

What are the main problems with VAPI?

Three main issues: endpointing (agent cuts in too early — fix with silenceDurationMs: 700), occasional latency spikes to 1.5–2s, and documentation gaps where features are only discoverable through the Discord community.

Is VAPI better than Bland.ai or Retell?

VAPI is best for developers running high volume who want full stack control and lowest cost. Bland.ai is fastest to set up (no-code). Retell has the best analytics dashboard for agencies. There's no universally "better" — depends on your technical level and volume.

What STT model works best for Australian English?

Deepgram Nova-2 with language: "en-AU". It handles Australian accents better than Whisper, has lower latency (~150ms vs ~300ms), and supports real-time streaming rather than batch processing. This is a meaningful quality difference on a live call.

Want someone who actually runs VAPI to build your caller?

I run this in production daily. I know the endpointing settings, the voice configs, the CRM hooks. Apply to work with me directly — I'll tell you exactly what your setup looks like and what it costs.

Apply to Work 1-on-1 with Roman

Or join my free community — AI Mastery Genesis on Skool — where I drop the templates I use to build these agents.

Application-only · Roman reviews personally