Raiana Verify

Trust is good, checking is better. Especially in Regulatory Affairs. For all the increased speed that AI solutions like Raiana have brought, a lot of time can still be spent in checking the conclusions of the AI, especially for complex subjects.

The need to check also comes from one of the first properties that the general public knew about Large Language Models: they can hallucinate. That’s how we learned to be the human in the loop and check the outputs of AI tools just like you would check the deliverables of other humans in your organisation.

The Raiana Verify feature aims to close that gap. Have an AI check the results of another AI.

The idea is straightforward. When you trigger Verify, the entire conversation — your questions, Raiana’s answers, and all the regulatory reference materials cited — is submitted to a second AI model (Claude by Anthropic) with a single mandate: check whether the conclusions are correct and whether every reference actually says what Raiana claims it says. Two independent minds, not one.

History of hallucinations
Hallucinations are not what they used to be. In the early GPT-3 era (2020), studies found that large language models produced factually incorrect or fabricated content at rates above 40% on open-ended benchmarks.¹ By 2023–2024, that picture had changed dramatically: leading models tested on structured factual tasks began showing error rates in the low single digits, and independent leaderboards such as the Hughes Hallucination Evaluation Research (HHEM) benchmark tracked consistent year-on-year improvement across the major providers.² The models got better — fast.

Raiana was built from the ground up to push further. The platform works directly against the authoritative legislative texts of EU MDR, IVDR, the FDA 21 CFR part 800, and the EU AI Act — grounding every answer in the source rather than a model’s paraphrase of it. Structured retrieval, explicit citation, and conservative answer generation all reduce the surface area for confabulation. In practice, Raiana’s hallucination rate on regulatory queries is a fraction of what general-purpose models produce on the same material.

But a fraction is not zero.

Raiana Verify closes that gap using the same technology that created it. Because the second model receives the full conversation context and the underlying reference materials, it is not simply rubber-stamping the first answer — it is independently reasoning from the same sources. Disagreements surface as flagged discrepancies; agreements give you justified confidence, not just repetition.

Raiana Verify is available now.

¹ Maynez et al. (2020), “On Faithfulness and Factuality in Abstractive Summarization,” ACL 2020. Early GPT-3 evaluations on TruthfulQA (Lin et al., 2021) showed truthful answer rates around 58%, meaning ~42% of outputs contained false or unsupported claims.

² Vectara Hughes Hallucination Evaluation Research (HHEM) Leaderboard, 2023–2024. Frontier models including GPT-4 and Claude 3 scored hallucination rates of 3–6% on the same RAG summarisation benchmark where GPT-3.5 scored ~15%.