Sarvam AI Bets on Stability With Bulbul V3 Speech Model
New release targets Indian languages, call-center audio and large-scale voice use.
Topics
News
- Hackers Breach Government Systems Worldwide
- Cyber Risk Rises to the Top of India Inc Worry List
- India, Malaysia Bet on Semiconductors to Deepen Ties
- Sarvam AI Bets on Stability With Bulbul V3 Speech Model
- India to Shape AI Summit Around Shared Resources, Safety
- AI Is Turning Cloud Break-Ins Into a Race Against the Clock
Indian AI startup Sarvam AI has released Bulbul V3, a new version of its text-to-speech (TTS) model designed to deliver more natural, stable and production-ready synthetic voices, particularly for Indian language applications.
Bengaluru-based Sarvam said Bulbul V3 is built on a large language model-based architecture that infers pauses, tone and emphasis from context to produce more naturalistic speech suitable for voice agents, customer support systems and multilingual platforms.
In an independent blind listening study conducted by Josh Talks AI, covering 11 Indian languages and more than 500 human annotators, Bulbul V3 ranked highest in listener preference for 8 kHz telephony-grade audio, a standard for call-centre and voice agent use, and showed lower error rates for skipped words and mispronunciations than competitors, including versions from ElevenLabs and Cartesia Sonic-3.
The announcement was shared by Pratyush Kumar, co-founder of Sarvam AI, on LinkedIn.
“Bulbul V3 sets a new benchmark for naturalness and robustness in real-world voice agent deployments, not just studio demos but production environments where consistency matters,” Kumar said.
Sarvam also tested the model on challenging inputs such as numerics, technical terms, named entities and code-mixed text, which are common in Indian speech, and said Bulbul V3 demonstrated stronger consistency on these cases.
In full-band, studio-quality audio, the company acknowledged that ElevenLabs’ v3 alpha led in overall audio quality, with Bulbul V3 outperforming some but not all competitors.
The release also includes a library of over 30 voices across 11 Indian languages, sourced from professional voice artists. Sarvam AI said the focus was on stability and predictability at scale, rather than just expressiveness in short demos.
Bulbul V3 is available free of charge to developers and enterprises until the end of February, after which commercial terms have not been disclosed.
The launch is part of Sarvam AI’s broader rollout of products ahead of the India-AI Impact Summit 2026, where the company is expected to showcase its speech, vision and language models.
