Hallucination Is Not an AI Problem. It’s a Validation Problem.

The AI industry has spent billions trying to solve hallucination.

The wrong profession has been doing it.

Here’s the contradiction nobody talks about: the problem of confident, coherent, wrong output — a system that fabricates answers indistinguishably from correct ones — was solved decades ago. Not by AI researchers. By auditors.

Every chartered accountant, every due diligence professional, every credit analyst has spent their career handling exactly this problem. A management team presents projections. A developer claims a title is clear. An investee says approvals are “in process.” Confident. Plausible. Potentially fabricated.

Finance built an entire infrastructure to handle it. Not to eliminate wrong information — to detect, flag, and contain it.

The people designing AI governance frameworks have, almost without exception, never built one of these systems. They are solving a known problem without knowing it is known.

The industry’s instinct is to reduce hallucination. Better training data. RLHF. Constitutional AI. Chain-of-thought reasoning. All valuable. All pointed at the wrong goal.

You will not get to zero. No model, no training regime will produce a system that is never wrong. More importantly: a system with a 2% error rate and robust validation is safer than a system with 0.1% error rate and none.

The question was never “how do we prevent the model from being wrong?”

It was always “how do we build systems that detect, flag, and handle wrong outputs gracefully?”

That is a process design question. Finance answered it decades ago.

Here is what happens when you don’t answer it.

A corporate legal team deploys AI to review inbound contracts. The system is good — it surfaces 95% of material clauses accurately. Nobody builds a materiality threshold. Nobody designs a rule that says: any clause affecting financial obligation above X requires human review.

Six months later, a liability clause in a supplier agreement goes undetected. The AI missed it. Fluently. Confidently. With the same tone it uses when it is right.

The clause surfaces in arbitration. The question is not “why did the AI fail?” The question is “why was there no process to catch it?”

A finance professional would have asked that question before deployment. It is the first question in every due diligence process: what does a failure look like, and how would we know?

The framework already exists. Finance professionals use it every day.

The AI Validation Stack — five pillars: Cross-referencing, Source Citation, Materiality, Triangulation, Audit Trails

Cross-referencing — never rely on a single source. If the AI states a number, trace it back to its origin.

Source citation — every AI output should point to the specific document and paragraph it used. An unsourced AI answer is treated the same way as an unsourced number in a financial model: with suspicion.

Materiality thresholds — a meeting summary can tolerate minor errors. A clause that determines a financial obligation cannot. Build different review requirements for different output types, exactly as you build different audit procedures for different account balances.

Triangulation — if the AI extracts a revenue share from one section, check it against the defined terms, the payment schedule, and the financial illustrations. When three independent sources agree, confidence is high. When they diverge, something is wrong.

Audit trails — log every query, every retrieval, every response. Not as surveillance. As governance. When something goes wrong, you need to trace the chain of reasoning to understand where the system failed.

AI governance committees are currently staffed with engineers, ethicists, and policy professionals. These are valuable. They are not sufficient.

They are missing people who think in verification systems — professionals trained to handle outputs that are confident and potentially wrong. A chartered accountant who treats every number as unverified until traced to source. A CFA who prices uncertainty rather than ignoring it. A legal professional who knows that “the system made an error” is not a defence when fiduciary duties are involved.

You do not need all three in every room. You need someone who thinks the way they do.

These people should be in the room where AI governance is designed. Not as compliance functions. As the architects.

Hallucination is not a model problem to be solved with better training. It is a validation problem to be solved with better process.

Finance has been building that process since before the first balance sheet was drafted.

(RLHF: Reinforcement Learning from Human Feedback)

Leave a ReplyCancel Reply