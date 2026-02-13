OpenAI on February 12 launched GPT-5.3-Codex-Spark, an ultra-fast artificial intelligence (AI) model designed for real-time software development, marking a shift toward instant, interactive coding workflows as competition intensifies in the AI tools market.

The model, a smaller version of its GPT-5.3-Codex system, is being released as a research preview to ChatGPT Pro users and select partners. OpenAI said the system is optimised for near-instant responses when deployed on specialised low-latency hardware, delivering more than 1,000 tokens per second.

Advertisement

Codex-Spark is also the first milestone in OpenAI’s partnership with chipmaker Cerebras, announced in January.

Real-time coding focus

Unlike larger models built for long autonomous tasks, Codex-Spark is tuned for rapid iteration, editing code, refining logic, and responding immediately to user input.

“Codex-Spark is our first model designed specifically for working with Codex in real-time—making targeted edits, reshaping logic, or refining interfaces and seeing results immediately,” the company said.

The system supports both quick interactions and more complex projects, allowing engineers to intervene, redirect, or interrupt output as it is generated. OpenAI said the model keeps its working style lightweight by default, making minimal edits and skipping automated tests unless requested.

Speed over size

The model has a 128,000-token context window and is currently text-only. While smaller than frontier models, OpenAI says it performs strongly on software-engineering benchmarks such as SWE-Bench Pro and Terminal-Bench 2.0, completing tasks in a fraction of the time.

Advertisement

The release reflects a broader industry trend toward specialised models that prioritise responsiveness over maximum reasoning depth, particularly for developer tools where latency directly affects productivity.

OpenAI also redesigned its infrastructure to reduce delays across the entire request pipeline. Improvements include persistent WebSocket connections and optimisations to its Responses API, cutting overhead per client-server roundtrip by 80% and halving time-to-first-token.

Cerebras hardware partnership

Codex-Spark runs on Cerebras’ Wafer Scale Engine 3, a purpose-built AI accelerator optimised for high-speed inference. The hardware complements traditional GPU infrastructure by focusing on ultra-low latency.

“What excites us most about GPT-5.3-Codex-Spark is partnering with OpenAI and the developer community to discover what fast inference makes possible—new interaction patterns, new use cases, and a fundamentally different model experience,” said Sean Lie, Cerebras’ co-founder and chief technology officer. “This preview is just the beginning.”

Advertisement

OpenAI said GPUs remain central to training and broad deployment, but specialised chips can accelerate workflows where response time is critical.

