Google introduced a new capability to Gemini Flash 3 on 28 January, dubbed “Agentic Vision,” that transforms image processing from a static vision into an active investigation. The Google blog post states that the new approach combines visual reasoning with automated code execution to analyse visuals in the “Think, Act, Observe" loop.

Advertisement

Google claims that this approach will reduce hallucinations and provide more accurate responses to visual tasks. “The model formulates plans to zoom in, inspect and manipulate images step-by-step, grounding answers in visual evidence,” Google said in the blog post.

Reportedly, Agentic Vision offers the ability to annotate images in real-time. Therefore, instead of just describing a scene, the model acts as an agent and executes Python code to visualise findings. This replaces "probabilistic guessing" with verifiable, code-driven execution, which claims to deliver a 5-10% quality boost.

Google said, “Standard LLMs often hallucinate during multi-step visual arithmetic. Gemini 3 Flash bypasses this by offloading computation to a deterministic Python environment.” Now, the company is moving from models that simply "look" and is moving toward agents that "investigate."

Advertisement

The company stated a few real-world examples as well, highlighting, “PlanCheckSolver.com, an AI-powered building plan validation platform, improved accuracy by 5% by enabling code execution with Gemini 3 Flash to iteratively inspect high-resolution inputs.”

In another example, “the model is asked to count the digits on a hand in the Gemini app. To avoid counting errors, it uses Python to draw bounding boxes and numeric labels over each finger it identifies.”

The Agentic Vision capability is currently available to developers via the Gemini API in the Google AI Studio development tool and Vertex AI within the Gemini app.

Google also lays out plans for future updates to Agentic Vision, claiming that it is planning to expand its capabilities that will allow the model to automatically decide when to rotate, zoom, or perform visual math without extra prompts.

Advertisement

The company is also planning to equip Gemini models with tools such as web and reverse image search. Lastly, the company also plans to expand Agentic Vision to larger and more powerful models beyond Flash.