The Verification Gap in AI-Assisted Research

Withdrawn reports at KPMG, EY, and Deloitte show that AI-assisted research is now produced faster than the systems meant to check it. The source trail is now an enterprise risk.

Topics

  • Key Takeaways

    01

    KPMG, EY, and Deloitte have each withdrawn, corrected, or refunded reports this year after GPTZero or independent academics found fabricated, paraphrased, or misattributed citations.

    02

    The dominant failure mode is plausibility, not invention. A real company, a real technology, and a working citation can still combine to produce a false claim once the link between source and assertion breaks.

    03

    Senior leaders should treat AI-assisted research as an enterprise risk, separating drafting from verification and logging which claims were checked, by whom, and when.

    KPMG advises clients on how to adopt agentic AI. In June 2026, it withdrew its own report on the subject after researchers found that most of its citations did not lead to real sources. GPTZero, a firm that sells tools for detecting AI-generated text and verifying citations, has found similar problems in reports from other large professional-services firms. Consulting work commissioned by governments has run into the same kinds of errors.

    Each case has its own facts. Together they describe a wider problem: AI-assisted research is now produced faster than the systems meant to check it.

    The withdrawn report, “Total Experience: Redefining Excellence in the Age of Agentic AI,” was published in October 2025 and presented the latest findings from the firm’s Global Customer Experience Excellence study. GPTZero examined its citations, and the Financial Times verified that several named organizations disputed how their AI use had been described. Of the 45 citations, only five pointed to real, intact sources. UBS, NHS Greater Manchester, Swiss Federal Railways, and Transport for London each told the FT that claims about their work were inaccurate or misleading.

    Responding to MIT Sloan Management Review India, KPMG said its checks had failed.

    “KPMG International takes the accuracy and integrity of its published content seriously. With respect to the report, ‘Redefining excellence in the age of agentic AI,’ the required accuracy checks were not properly followed. We are reviewing our publication processes and controls and implementing additional safeguards where necessary.”

    The statement did not say whether AI tools were used to draft the report, identify its sources, generate its citations, summarize its findings, or select its case studies. In its statement to FT, the firm said it expects staff to follow its guidelines on the responsible use of AI, including human oversight to validate content and verify independent sources. Whether or not a model wrote any particular sentence, the underlying problem is the same: review designed to catch human error is now encountering machine error, which looks different and is harder to detect.

    For senior leaders, this changes what a trusted brand on the cover is good for. The cover no longer tells you whether the evidence beneath it holds. Each material claim should trace to a real source that supports it and fits the argument. 

    KPMG advises clients on how to adopt agentic AI. In June 2026, it withdrew its own report on the subject after researchers found that most of its citations did not lead to real sources. GPTZero, a firm that sells tools for detecting AI-generated text and verifying citations, has found similar problems in reports from other large professional-services firms. Consulting work commissioned by governments has run into the same kinds of errors.

    Each case has its own facts. Together they describe a wider problem: AI-assisted research is now produced faster than the systems meant to check it.

    The withdrawn report, “Total Experience: Redefining Excellence in the Age of Agentic AI,” was published in October 2025 and presented the latest findings from the firm’s Global Customer Experience Excellence study. GPTZero examined its citations, and the Financial Times verified that several named organizations disputed how their AI use had been described. Of the 45 citations, only five pointed to real, intact sources. UBS, NHS Greater Manchester, Swiss Federal Railways, and Transport for London each told the FT that claims about their work were inaccurate or misleading.

    Responding to MIT Sloan Management Review India, KPMG said its checks had failed.

    “KPMG International takes the accuracy and integrity of its published content seriously. With respect to the report, ‘Redefining excellence in the age of agentic AI,’ the required accuracy checks were not properly followed. We are reviewing our publication processes and controls and implementing additional safeguards where necessary.”

    The statement did not say whether AI tools were used to draft the report, identify its sources, generate its citations, summarize its findings, or select its case studies. In its statement to FT, the firm said it expects staff to follow its guidelines on the responsible use of AI, including human oversight to validate content and verify independent sources. Whether or not a model wrote any particular sentence, the underlying problem is the same: review designed to catch human error is now encountering machine error, which looks different and is harder to detect.

    For senior leaders, this changes what a trusted brand on the cover is good for. The cover no longer tells you whether the evidence beneath it holds. Each material claim should trace to a real source that supports it and fits the argument.

    That is not an organizational failure. It is the predictable output of structures designed for a more stable world — where disruption was the exception and stability was the default assumption. That assumption is now broken. Instability is the baseline.

    Flawed sources don’t stay put

    A flawed report no longer stays where it was published.

    “As a result, false claims enter the information ecosystem under the banner of a reputable organization. They are cited by researchers and journalists and are replicated in secondary research, as LLMs scrape these reports for source material. Left unchecked, they can erode our faith in the source material and, in turn, infect follow-on research, leaving everyone wondering where it all went wrong.”

    The path is short. A consulting report cites a weak example, a journalist cites the report, and a company puts the article into a board deck. A later study cites both, and an AI system trained on or retrieving from that chain repeats the claim as settled. By the time anyone questions its origin, the source sits beneath layers of repetition.

    This is why the problem belongs at the top, not just on research teams. Evidence underpins capital allocation, hiring, technology investment, and public claims. If the evidence is contaminated, every decision built on it weakens. The earlier assumption was that a respected institution kept firm boundaries between rough drafts and published conclusions. That held because building those boundaries took time. AI has made polished drafts cheap to produce, but verification is no faster than it used to be. Leaders have to manage the gap between the two.

    That gap is why AI-assisted research needs a practical test, not another policy. A workable test has four links.

    First, the claim. State exactly what is being asserted. A vague claim is easy to inflate and hard to check.

    Second, the source. Confirm that it exists and that it supports the claim.

    Third, the fit. Check whether the source is being used in context or stretched beyond what it shows.

    Fourth, the trace. Make sure someone can reconstruct how the claim entered the document and why it stayed in the final version.

    The test is simple and shifts the review from the document to the claim. Teams also need to treat any source a model proposes as unverified until a person confirms it. None of this argues for taking AI out of the workflow, which would be neither realistic nor wise. The narrower point is that well-formatted output is not, by itself, proof.

    “False claims enter the information ecosystem under the banner of a reputable organization.”

    Edward Tian, chief executive, GPTZero

    Verification needs an owner at the top

    C-suite. Chief executives and boards should treat AI-assisted research as an enterprise risk, not as a communications matter. Any organization that relies on external reports, internal research, or consultant analysis for significant decisions should know where AI has entered the work. That includes drafting, summarizing, source discovery, citation generation, and case study selection. Each of those uses carries its own risk and needs its own control. Within the next reporting cycle, leadership can implement a verification protocol for public- and board-facing research. The protocol should specify which claims are checked, who checks them, and what evidence is retained.

    Functional leaders. Research, communications, strategy, and knowledge teams should separate drafting from verification. The person who used AI to produce or arrange a draft should not also be the one vouching for its sources. Reviewers should follow the evidence trail for every material claim, paying particular attention to figures, named organizations, direct quotes, and case studies. A citation is not, in itself, proof. It points to where the proof should be. If that route is broken, the claim should not stand. For high-stakes reports, a simple claim register does more for reliability than another editorial pass. It records each claim, its source, the reviewer, and the verification date.

    Boards and governance. Boards tend to ask whether AI was used. A more useful question is whether the organization can substantiate the claims it intends to rely on across consultant reports, market studies, regulatory filings, and investor narratives. Where the work touches public-sector contracts or regulated industries, agreements with vendors should set explicit standards for citation verification and correction. The reputational risk is the obvious one. The governance risk, which kicks in when a board approves a strategy built on claims that later collapse, lasts longer.

    Looking authoritative is no longer enough

    None of this is an argument for treating every consulting study as suspect. That would throw out useful work and miss the real lesson. The lesson is narrower. Institutional research now needs a control system that keeps pace with how fast evidence can be produced. AI can summarize material, show patterns, and sharpen a draft. Asked to defend a thesis, it can also generate a source trail that looks far stronger than it is.

    The reports that hold up will be those whose authors can walk readers through how each claim was built. Looking authoritative is no longer enough on its own. A trusted name on the cover still carries weight, but it is not, by itself, evidence.

     

    RESEARCH HIGHLIGHT

    This article draws on MIT Sloan Management Review India’s interview with Edward Tian, Chief Executive of AI detection firm GPTZero, KPMG International’s written response to MIT SMR India, and public reporting on withdrawn or corrected professional-services reports. MIT SMR India sent detailed questions to KPMG, EY, and Deloitte. KPMG responded. Only KPMG responded by publication time.

    Topics

    More Like This

    You must to post a comment.

    First time here? : Comment on articles and get access to many more articles.