Google Shows What Gemini Omni and Gemini 3.5 Can Do in New Videos

Google I/O 2026 put two names on the table regarding models: Gemini Omni and Gemini 3.5. Beyond the on-stage announcements, the Google AI Blog published last Friday, May 29 a collection of nine demonstration videos designed to showcase these capabilities in concrete situations, without slide presentations in sight.

The format matters: rather than benchmarks or comparison tables, Google opts to demonstrate real use cases captured on video. It is a deliberate communication choice that has been standard practice at OpenAI for some time, and Google adopts it clearly here.

What the demos show

The nine videos cover a broad range of scenarios. Without detailed official transcripts available beyond the post, the clips demonstrate multimodal capabilities: real-time video understanding, reasoning about images, voice interaction with natural responses, and code generation with visual context. Gemini Omni emerges as the model oriented toward integrating multiple input and output modalities simultaneously, while Gemini 3.5 is presented as an update focused on reasoning and precision in complex tasks.

The emphasis on smooth transitions between modalities, passing from voice to text to image within the same conversation, is the thread running through several of the demos. This is the type of integration that various labs have been promising for months on paper, but which in practice shows notable friction.

Why it matters and for whom

For teams working with language model integrations, the launch of Gemini Omni is relevant for one concrete reason: if simultaneous multimodal capability works as shown, it reduces the need to chain different specialized models for voice, image, and text. That simplifies architectures and can lower operational costs in complex pipelines.

For developers already working with the Google ecosystem, Vertex AI, Google AI Studio, Gemini APIs, these demos signal where the stack is headed. It is not neutral information: Google is marking what type of use cases it wants you to build on its infrastructure.

For the broader ecosystem, including Claude, the move has direct implications: competition in multimodal capabilities intensifies. Anthropic has its own bets in that space, but Google has structural advantages in integration with proprietary hardware (TPUs) and distribution through its mass-market consumer products.

What the demos don't answer

Demonstration videos have one obvious limitation: they show what the product team chose to show, under the conditions they selected. There is no public information yet about real latencies in production, pricing for accessing Gemini Omni, or when these capabilities will be available outside controlled demos.

It is also unclear how much Gemini 3.5 represents a substantial improvement over previous versions in mathematical reasoning or code tasks, which are the metrics where labs compete with greater technical detail. For that, we will need to wait for independent evaluations.

Editorial perspective

Nine well-produced videos are a good communication tool, but they do not substitute for technical documentation or real access for developers. When that arrives, we will have a clearer picture of whether Gemini Omni changes anything in practice or if it remains, for now, an appealing promise.

Google Shows What Gemini Omni and Gemini 3.5 Can Do in New Videos

What the demos show

Why it matters and for whom

What the demos don't answer

Editorial perspective

Sources

Read next

World Cup AI: Which model leads the June 2026 benchmark rankings

Google Combines A2UI and MCP to Unify Agent Interfaces

Mistral AI announces broader model family expansion