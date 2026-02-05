India-based AI startup, Sarvam AI, today (February 5) launched an advanced multimodal AI model dubbed Sarvam Vision. This model comes with document intelligence, Optical Character Recognition (OCR), and visual language understanding across India’s diverse languages and scripts. This new AI model surpasses the actuary of Gemini 3 Pro, GPT 5.2, and other AI models when it comes to document intelligence.

The Sarvam AI press note states that the frontier Vision Language Models are built for processing modern English documents. Therefore, most global models treat Indian languages as secondary.

“Much of India's knowledge remains embedded in physical documents, scanned archives, and historical collections. This is knowledge locked in plain sight.” It added, “Unlocking this material is essential for long-term preservation, access, and reuse across research, governance, and enterprise workflows.”

Sarvam Vision is backed by the company's in-house 3B-parameter state-space vision-language model, which claims to deliver high-fidelity text extraction and semantic understanding, even in documents with mixed content.

In the early benchmark tests, the AI model outperformed leading AI models on OCR tasks in 22 official Indian languages, including Hindi, Bengali, Tamil, Telugu, Marathi, Malayalam, Kannada, Gujarati, Punjabi, Urdu, Assamese, and more.

Sarvam AI says Sarvam Vision was trained using advanced techniques to improve accuracy, reliability, and understanding across text and visuals. Benchmark results show the model performs competitively with global AI systems and outperforms many of them on Indic OCR tasks.

Sarvam Vision’s capabilities go beyond text recognition as it can also interpret visual elements such as trend lines, nested tables, and complex layouts. With the launch, the company is making the Document Intelligence APIs & Vision experience free to users for the month of February 2026.