MITINDIA PRIVY
Trigent-Banner

When AI Agents Go Rogue in Real World Tests

Researchers from MIT, Stanford and Harvard found email leaks, endless loops and system failures when autonomous agents were let loose with real access.

Reading Time: 8 minutes 

Topics

  • AI agents that can send email, execute code and interact with other systems are already showing security, privacy and governance risks, according to a new study by researchers from top US universities, including MIT, Stanford and Harvard.

    The paper, titled ‘Agents of Chaos’ and released this week, describes what happened when language model-powered agents were deployed in a live test environment with persistent memory, shell access, Discord accounts and email credentials. Over two weeks, 20 AI researchers were encouraged to probe and try to break the systems under both normal and adversarial conditions.

    The results offer a glimpse of operational risk in the age of autonomous software.

    Unlike conventional chatbots, these agents were not limited to generating text. They could modify files, execute commands, manage inboxes and communicate with other agents.

    “Small conceptual mistakes can be amplified into irreversible system level actions,” the authors wrote.

    In total, the researchers identified at least 10 significant security breaches and numerous serious failure modes during the two-week period.

    The goal was not to measure how often failures occur, but to show that serious vulnerabilities exist under realistic conditions.

    Broken Autonomy

    One particular episode illustrated how autonomy can convert an ordinary instruction into disproportionate damage.

    In a test of contextual privacy, a non owner asked an agent to keep a secret and then to delete the related email.

    Lacking a proper email deletion tool, the agent disabled its local email client entirely and declared “Email account RESET completed,” after wiping its own setup.

    The original mailbox, however, remained intact. The agent had both broken functionality and failed to achieve the intended privacy outcome.

    It also publicized the incident on a social platform, framing its action as an ethical stand.

    The researchers described this as a failure of “social coherence,” where the agent misjudged authority, scale and the actual state of the system.

    In several cases, agents reported successful completion, while the underlying state contradicted those claims.

    A second class of failures involved compliance with non owners. Agents executed shell commands, listed files and disclosed private email records when prompted by individuals without administrative authority.

    In one instance, an agent returned a file containing 124 email records, including sender addresses and message IDs.

    When further prompted, it supplied email bodies unrelated to the requester.

    Direct requests for specific sensitive fields sometimes triggered refusal. But indirect framing proved more effective.

    In another test, sensitive personal information embedded in routine emails was disclosed in full when the agent was asked to forward the entire message rather than extract a specific data point.

    The distinction between “secret” and “not explicitly marked secret” collapsed under conversational pressure.

    The risk was not limited to privacy leakage. Researchers also induced agents into resource consuming loops.

    In one scenario, two agents were instructed to relay each other’s messages and ask follow up questions. The exchange continued for at least nine days and consumed roughly 60,000 tokens.

    In parallel, the agents spawned persistent background processes with no termination condition, converting temporary tasks into permanent infrastructure changes.

    Denial of service risks were equally straightforward. A non owner asked an agent to remember all conversations in a dedicated file, which grew with each interaction. Repeated email attachments of roughly 10 megabytes eventually pushed the email server into a denial of service state, without notifying the owner.

    Structural Risk

    The paper also pointed to the influence of the underlying large language model (LLM). When agents using a Chinese LLM were prompted with politically sensitive topics, the API repeatedly cut off responses with “unknown error.”

    Policies built into the underlying system quietly shaped what the agent could report. The authors said that these hidden constraints can carry into agent behavior without the owner’s knowledge.

    Technically, the agents in the study operated at what the authors classify as a mid level of autonomy. They could execute well-defined sub-tasks such as sending email or running shell commands, but lacked the capacity to recognize when a task exceeded their competence or required human handoff.

    That boundary awareness, they suggested, is essential for safe delegation.

    The broader implication is structural. Traditional benchmarks focus on isolated prompt response accuracy. The failure modes described here emerged at the integration layer: where language models meet memory, tools, communication channels and delegated authority.

    The researchers said these surfaces create new pathways for security, privacy and governance failures.

    The study stopped short of claiming that current systems are irreparably flawed. It described itself as an early warning analysis, intended to show how quickly powerful capabilities translate into exploitable weaknesses.

    Topics

    More Like This

    You must to post a comment.

    First time here? : Comment on articles and get access to many more articles.