Anthropic, OpenAI Push AI Deeper Into Scientific Research

Anthropic launched a research workbench for scientists, while OpenAI introduced a benchmark to test AI reasoning in computational biology.

MIT SMR Editors a hour ago

Topics

Image Credit- Chetan Jha/ MIT Sloan Management Review India

Anthropic and OpenAI on Tuesday announced separate tools for scientific research, sharpening their focus on computational biology and life sciences.

Anthropic launched Claude Science, a beta research workbench that combines literature review, data analysis, code execution and scientific computing in one workspace. The app is available to Claude Pro, Max, Team and Enterprise users on macOS and Linux.

Claude Science lets researchers query scientific databases, generate figures and manuscripts, run analysis pipelines and use computing resources from a single interface. It includes more than 60 curated skills and connectors for fields including genomics, single-cell biology, proteomics, structural biology and cheminformatics.

“AI has the potential to dramatically accelerate the pace of scientific discovery and the development of healthcare interventions,” Anthropic said.

The company said Claude Science keeps an auditable record of outputs, including the code, environment and workflow used to produce figures and analyses. It can run on researchers’ existing infrastructure, including laptops, Linux systems and high-performance computing login nodes.

Researchers testing Claude Science have used it for single-cell RNA sequencing analysis, CRISPR screen design, protein structure prediction and cheminformatics, Anthropic said. The company also plans to support up to 50 AI for Science projects with Claude credits and compute resources from Modal.

Separately, OpenAI introduced a benchmark meant to test whether AI models can handle complex computational biology analyses that require scientific judgment, not just predefined workflows.

“Scientific data rarely arrive with instructions,” OpenAI said. “Researchers must decide whether a pattern reflects biology or noise, whether the data can support the question being asked, and how each result should change what they do next.”

GeneBench-Pro has 129 synthetic problems across statistical genetics, cancer genomics, proteomics, pharmacogenomics, clinical diagnostics and other areas. OpenAI said the benchmark tests whether models can analyze datasets, choose suitable methods, revise assumptions and reach scientifically valid conclusions.

The company said 82 of the 129 problems were reviewed by external experts, including graduate students, postdoctoral researchers, industry scientists and professors.

OpenAI said GPT-5.6 Sol reached a 28.7% pass rate at its highest reasoning setting, while GPT-5.6 Sol Pro reached 31.5% in separate Pro runs.

The scores suggest progress but also show the limits of current systems. OpenAI said AI agents remain too unreliable to replace human experts, although even partial automation could have scientific and economic value.

The launches show how major AI companies are moving beyond general-purpose assistants into specialized tools for scientific work, while also building benchmarks to measure whether those systems can reason through messy research problems.

Topics

About the Author

Tags:

Topics

Share