Sarvam launches Sarvam Audio, claims to offer better accuracy than GPT-4o, Gemini 3 Flash

Sarvam introduces audio-first large language model " Sarvam Audio" that specialises in real-world speech recognition across India’s multilingual population.

Noida,
Updated Feb 3, 2026 1:27 PM IST

Sarvam Audio

Indian startup Sarvam AI has launched a new audio-first large language model (LLM) dubbed “Sarvam Audio” that follows linguistic patterns common in India. The company claims it is built on voice-centric AI that understands real-world speech across India’s multilingual population.

The new audio model by Sarvam competes with the global audio AI companies, such as ElevenLabs that has recently gained popularity for voice generation. While ElevenLabs focuses on generating expressive voices, Sarvam Audio is focused towards understanding and transcribing real-world speech, with a focus on Indian languages.

Highlighting India’s language environment, where people often mix languages, accents, and local dialects, the traditional automatic speech recognition (ASR) systems struggle to deliver consistent accuracy. This is where Sarvam Audio can come into play, as it is built to understand, process, and generate speech directly for improved conversational flow.

Sarvam Audio is trained on 22 Indian languages that include Hindi, Tamil, Telugu, Malayalam, Marathi, Bengali, and Indian English. The model is built on the Sarvam 3B model, a 3-billion-parameter language model, and supports multiple transcription formats.

Sarvam AI claims that Sarvam Audio also outperforms GPT-4o-Transcribe and Gemini-3-Flash in early benchmark tests across three transcription styles: unnormalised, normalised, and code-mixed. The tests were run on the IndicVoices dataset, which includes real-world Indian speech. The data highlights that lower WER scores showcase a higher transcription accuracy.

It should also be noted that OpenAI and Google’s language models are designed for global languages and standard transcription tasks. Whereas Sarvam Audio is specialised in Indian languages.

The model also comes with “Diarised Speech Recognition” capability that handles complex, multi-speaker audio, and natural conversational speech with higher accuracy.

Sarvam, in a press note, said, “With built-in context awareness, diarization, format control, and direct speech-to-command capabilities, Sarvam Audio forms the foundation for a new generation of voice-first applications and agents built for real Indian users.”

How can Sarvam Audio be used in real-world cases?

Considering Sarvam Audio supports 22 Indian languages, it can be used for multilingual transcription, multi-speaker conversations across sectors and tools such as call centres, logistics, e-commerce, banking, Fintech, chat and Messaging platfroms, and more

It can also be used for long-form audio processing, such as podcasts, meetings, call centres, or even lectures in Indian languages.

For Unparalleled coverage of India's Businesses and Economy – Subscribe to Business Today Magazine

ABOUT THE AUTHOR

Aishwarya Panda

I’m a technology journalist with over four years of experience writing about the constantly evolving tech world. I cover a wide range of topics, from artificial intelligence and consumer tech to the digital trends that quietly shape how we live and work every day.

I’m especially interested in smartphone innovation, particularly how AI is transforming productivity and camera experiences. Whether it’s on-device intelligence, computational photography, or practical AI features, I enjoy breaking down complex technology into stories that are easy to understand and genuinely useful for readers.

Through my work, I like to look beyond what’s new and focus on how technology is actually changing the way we work, create, and connect.

Published on: Feb 3, 2026 1:25 PM IST

Sarvam launches Sarvam Audio, claims to offer better accuracy than GPT-4o, Gemini 3 Flash

Sarvam introduces audio-first large language model " Sarvam Audio" that specialises in real-world speech recognition across India’s multilingual population.

Related Articles

ABOUT THE AUTHOR

TOP STORIES

TOP VIDEOS

LATEST