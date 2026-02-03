Indian startup Sarvam AI has launched a new audio-first large language model (LLM) dubbed “Sarvam Audio” that follows linguistic patterns common in India. The company claims it is built on voice-centric AI that understands real-world speech across India’s multilingual population.

The new audio model by Sarvam competes with the global audio AI companies, such as ElevenLabs that has recently gained popularity for voice generation. While ElevenLabs focuses on generating expressive voices, Sarvam Audio is focused towards understanding and transcribing real-world speech, with a focus on Indian languages.

Highlighting India’s language environment, where people often mix languages, accents, and local dialects, the traditional automatic speech recognition (ASR) systems struggle to deliver consistent accuracy. This is where Sarvam Audio can come into play, as it is built to understand, process, and generate speech directly for improved conversational flow.

Sarvam Audio is trained on 22 Indian languages that include Hindi, Tamil, Telugu, Malayalam, Marathi, Bengali, and Indian English. The model is built on the Sarvam 3B model, a 3-billion-parameter language model, and supports multiple transcription formats.

Sarvam AI claims that Sarvam Audio also outperforms GPT-4o-Transcribe and Gemini-3-Flash in early benchmark tests across three transcription styles: unnormalised, normalised, and code-mixed. The tests were run on the IndicVoices dataset, which includes real-world Indian speech. The data highlights that lower WER scores showcase a higher transcription accuracy.

It should also be noted that OpenAI and Google’s language models are designed for global languages and standard transcription tasks. Whereas Sarvam Audio is specialised in Indian languages.

The model also comes with “Diarised Speech Recognition” capability that handles complex, multi-speaker audio, and natural conversational speech with higher accuracy.

Sarvam, in a press note, said, “With built-in context awareness, diarization, format control, and direct speech-to-command capabilities, Sarvam Audio forms the foundation for a new generation of voice-first applications and agents built for real Indian users.”

How can Sarvam Audio be used in real-world cases?

Considering Sarvam Audio supports 22 Indian languages, it can be used for multilingual transcription, multi-speaker conversations across sectors and tools such as call centres, logistics, e-commerce, banking, Fintech, chat and Messaging platfroms, and more

It can also be used for long-form audio processing, such as podcasts, meetings, call centres, or even lectures in Indian languages.