Sarvam takes on ElevenLabs with AI dubbing push for Indian languages
The company introduced Sarvam Dub on February 1, an artificial intelligence system designed to preserve a speaker’s voice while translating audio into multiple languages, with built-in controls to match the timing of the original video.

- Feb 2, 2026,
- Updated Feb 2, 2026 3:17 PM IST
Indian AI startup Sarvam AI has launched a new speech model aimed at automating multilingual video dubbing, pitching its technology as a way for creators, educators and broadcasters to translate content across Indian languages in minutes and positioning itself against global players such as ElevenLabs.
The company introduced Sarvam Dub on February 1, an artificial intelligence system designed to preserve a speaker’s voice while translating audio into multiple languages, with built-in controls to match the timing of the original video.
“We’re introducing Sarvam Dub, a state-of-the-art AI dubbing model that helps creators extend the life and reach of a single piece of content, quickly,” the company said in its announcement.
Dubbing has traditionally required translators, voice artists and studio time, a process Sarvam said “worked, but couldn’t scale.” With its new model, the company said, “what used to take weeks of scripting, recording, studio time, and publishing effort can now be dubbed in minutes.”
Sarvam Dub uses zero-shot voice cloning and cross-lingual speech models to keep the original speaker’s identity intact even as language changes, a hurdle that becomes more complex in India, where content often moves across multiple regional languages and accents. The company said its system also integrates duration control directly into speech generation, avoiding the common practice of speeding up or compressing audio after the fact, which can make voices sound unnatural.
“High-quality dubbing requires duration control that is intrinsic to speech generation, where timing is shaped as the voice is produced rather than adjusted afterwards,” Sarvam said.
To benchmark performance, the company said it evaluated more than 700 audio samples across 64 speakers in 10 Indian languages and English, using speaker-similarity scores derived from ECAPA-TDNN embeddings. Sarvam said its model achieved state-of-the-art results, particularly in cross-lingual settings where voice preservation is hardest.
The company is already testing the system in public communication, education and broadcast workflows. It worked with the Indian Institute of Technology (IIT) Madras to dub technical lectures into multiple languages, and said India’s Union Budget 2026 became the first national budget to be dubbed live using AI, with Finance Minister Nirmala Sitharaman’s speech streamed in Kannada and Hindi.
Live dubbing presents additional challenges around speed, Sarvam said. The startup claims that its engineering team achieved a 6.6-times reduction in latency over a base implementation by optimising model tracing, applying selective and post-training quantisation, and adding intelligent caching, improvements the company said make the system viable for real-time broadcast.
For Unparalleled coverage of India's Businesses and Economy – Subscribe to Business Today Magazine
Indian AI startup Sarvam AI has launched a new speech model aimed at automating multilingual video dubbing, pitching its technology as a way for creators, educators and broadcasters to translate content across Indian languages in minutes and positioning itself against global players such as ElevenLabs.
The company introduced Sarvam Dub on February 1, an artificial intelligence system designed to preserve a speaker’s voice while translating audio into multiple languages, with built-in controls to match the timing of the original video.
“We’re introducing Sarvam Dub, a state-of-the-art AI dubbing model that helps creators extend the life and reach of a single piece of content, quickly,” the company said in its announcement.
Dubbing has traditionally required translators, voice artists and studio time, a process Sarvam said “worked, but couldn’t scale.” With its new model, the company said, “what used to take weeks of scripting, recording, studio time, and publishing effort can now be dubbed in minutes.”
Sarvam Dub uses zero-shot voice cloning and cross-lingual speech models to keep the original speaker’s identity intact even as language changes, a hurdle that becomes more complex in India, where content often moves across multiple regional languages and accents. The company said its system also integrates duration control directly into speech generation, avoiding the common practice of speeding up or compressing audio after the fact, which can make voices sound unnatural.
“High-quality dubbing requires duration control that is intrinsic to speech generation, where timing is shaped as the voice is produced rather than adjusted afterwards,” Sarvam said.
To benchmark performance, the company said it evaluated more than 700 audio samples across 64 speakers in 10 Indian languages and English, using speaker-similarity scores derived from ECAPA-TDNN embeddings. Sarvam said its model achieved state-of-the-art results, particularly in cross-lingual settings where voice preservation is hardest.
The company is already testing the system in public communication, education and broadcast workflows. It worked with the Indian Institute of Technology (IIT) Madras to dub technical lectures into multiple languages, and said India’s Union Budget 2026 became the first national budget to be dubbed live using AI, with Finance Minister Nirmala Sitharaman’s speech streamed in Kannada and Hindi.
Live dubbing presents additional challenges around speed, Sarvam said. The startup claims that its engineering team achieved a 6.6-times reduction in latency over a base implementation by optimising model tracing, applying selective and post-training quantisation, and adding intelligent caching, improvements the company said make the system viable for real-time broadcast.
For Unparalleled coverage of India's Businesses and Economy – Subscribe to Business Today Magazine
