OpenAI Bets on ‘Thinking’ Images with ChatGPT Images 2.0

OpenAI is turning its image generator into a reasoning tool, narrowing the gap between text and visual AI.

MIT SMR Editors 21 minutes ago

Topics

OpenAI on Tuesday introduced ChatGPT Images 2.0, shifting image generation from prompt-to-picture toward a system that can research, reason and produce multi-image outputs in a single run.

The model, available via the API as gpt-image-2, is rolling out to all ChatGPT and Codex users.

Advanced features requiring thinking capabilities are restricted to Plus, Pro and Business subscribers.

It can generate images up to 2,000 pixels wide across multiple aspect ratios, including formats up to three times as wide as they are tall.

The headline capability is what OpenAI calls “thinking” mode. The model can pull reference images and facts mid-generation, which helps with diagram accuracy—charts with real numbers, maps with correct labels.

The reasoning modes also enable the model to “reason through the structure” of a visual asset before generating it, lowering the risk of output errors and reducing manual revisions.

In a demonstration shared by the company, the system produced a series of textbook-style pages explaining Isaac Newton’s scientific contributions, complete with diagrams and consistent formatting.

In another example, it generated a weather-based infographic with location-specific details and recognizable city landmarks, reflecting its ability to combine real-world data with visual composition.

Ayan, a researcher on OpenAI’s ImageN team, said the model represents a step change in how image systems operate. “With thinking enabled, our new ImageN model can research, collect information, find references, and synthesize all of this into its outputs,” he said.

The company also emphasized improvements in customization. Users can now specify a wide range of aspect ratios, from ultra-wide formats to tall layouts, giving greater control over how images are composed for different platforms and use cases.

Text rendering is also improved. The model handles Japanese, Korean, Chinese, Hindi and Bengali more cleanly, along with small UI labels, icons and interface elements that have historically posed difficulties for image models.

The tool uses a built-in general knowledge dataset last updated in December, its baseline knowledge cutoff, with web search available to paid users in thinking mode.

The caveat: thinking mode means images take longer to produce. And as AI-generated visuals grow harder to distinguish from real imagery, the update is likely to sharpen debates around authenticity and misinformation.

Topics

About the Author

Tags:

Topics

Share