Anthropic Safeguards Lead Sharma Steps Down Over Values and Risk

Mrinank Sharma leaves after two years shaping the firm’s AI safety work, warning of a widening gap between technological power and judgment.

MIT SMR Editors 2 hours ago

Topics

Mrinank Sharma, head of the safeguards research team at Anthropic, has resigned from the AI firm, saying he wants to pursue work that better reflects his values as “global risks intensify.”

His departure was effective 9 February, the day he shared a long letter on X with colleagues, explaining his decision.

In the letter, Sharma wrote “it is clear to me that the time has come to move on,” adding that “the world is in peril,” not only from AI or bioweapons but from “a whole series of interconnected crises unfolding in this very moment.”

Sharma joined Anthropic in August 2023 after completing a PhD in machine learning at the University of Oxford. He led the company’s Safeguards Research Team since its launch last year, focusing on reducing the risks of misuse of AI systems, including research on AI assisted bioterrorism and AI sycophancy, or the tendency of chatbots to excessively flatter users.

“I feel lucky to have been able to contribute,” Sharma wrote, listing work that included “developing defences to reduce risks from AI assisted bioterrorism,” putting those systems into production, and writing one of the company’s early AI safety cases.

He also highlighted recent efforts around internal transparency and a final project examining how AI assistants could “make us less human or distort our humanity.”

Despite those contributions, Sharma said the tension between values and action weighed heavily on him.

“I’ve repeatedly seen how hard it is to truly let our values govern our actions,” he wrote, adding that within organizations “we constantly face pressures to set aside what matters most.”

In a passage that has been widely quoted, Sharma warned of a growing imbalance between technological power and judgment.

“We appear to be approaching a threshold where our wisdom must grow in equal measure to our capacity to affect the world, lest we face the consequences,” he wrote.

Sharma’s departure also follows the release of a recent study he authored on chatbot interactions. The research found thousands of daily conversations with AI systems may contribute to distorted perceptions of reality, particularly around sensitive topics such as relationships and wellness. While severe cases were rare, Sharma said the findings “highlight the need for AI systems designed to robustly support human autonomy and flourishing.”

Looking ahead, Sharma said he does not yet know what comes next. He wrote that he hopes to explore a poetry degree and “devote myself to the practice of courageous speech,” alongside work in writing, facilitation and community building. “I want to contribute in a way that feels fully in my integrity,” he said.

Topics

About the Author

Tags:

Anthropic Mrinank Sharma

Topics

Share