MITINDIA PRIVY

OpenAI Launches GPT-5.4 for Complex Work Tasks

New model can handle longer documents and complex professional tasks, OpenAI says.

Topics

  • [Image source: Diksha Mishra/MITSMR India]

    OpenAI has rolled out GPT-5.4, a new artificial intelligence model designed for professional and enterprise tasks.

    The model is available in several versions, including a standard release, a reasoning focused variant called GPT-5.4 Thinking and a higher performance option known as GPT-5.4 Pro.

    OpenAI said the model’s API version supports context windows of up to one million tokens. 

    Context windows determine how much information an AI system can process at once, and the larger limit allows the model to analyze far longer documents or conversations.

    The company also said GPT-5.4 improves efficiency compared with its earlier GPT-5.2 model, completing similar tasks using fewer tokens.

    In tests released by the company, GPT-5.4 posted higher scores on computer use benchmarks known as OSWorld-Verified and WebArena Verified. The model also scored 83% on OpenAI’s GDPval benchmark, which measures performance on knowledge-based work tasks.

    The system also ranked first on Mercor’s APEX-Agents benchmark, which evaluates professional skills in fields such as law and finance. Mercor CEO Brendan Foody said the model performs well on complex work assignments.

    OpenAI said the model also “reduces factual mistakes compared with earlier versions.” According to the company, GPT-5.4 was 33% less likely to make incorrect individual claims than GPT-5.2, while overall responses were 18% less likely to contain errors.

    The company also introduced changes to how the model interacts with external tools through its API. The new system, called Tool Search, allows the model to retrieve tool definitions only when needed, instead of including them in every request. 

    OpenAI said the “change can reduce token use and speed up requests in systems that rely on many tools.”

    As part of the release, OpenAI said it ran a “new safety evaluation examining chain-of-thought reasoning, the step-by-step explanations models generate while solving complex problems.” Some researchers have raised concerns that AI systems could conceal or manipulate these reasoning traces.

    OpenAI said its testing found that deceptive behavior was less likely in the GPT-5.4 Thinking version of the model.

    The results were “suggesting that the model lacks the ability to hide its reasoning and that CoT monitoring remains an effective safety tool,” the company said.

    Topics

    More Like This

    You must to post a comment.

    First time here? : Comment on articles and get access to many more articles.