Hallucination and validation

How has accuracy been validated in Raiana? Does it hallucinate?

Accuracy validation in Raiana is a combination of human oversight – the company’s founders validating the answers by hand – and automated checks by an independent AI. This last principle is called “LLM as a judge”, where LLM stands for Large Language Model, the kind of AI model we’ve grown used to with ChatGPT, Claude, and so on.

Hallucination, for those who don’t know what it is, is the phenomenon where AI ‘makes up’ an answer if it doesn’t know it. It is a consequence of the way these models work. They are probability engines, so they produce the most likely answer. If the answer is not known from the context (“The answer is 42. What was the question?”) and the model is instructed to find an answer, then it may make one up that sounds plausible (“what is 20+22?”).

General-purpose AI (GPAI) models are most prone to this in their standard consumer versions, because they have been instructed to always be helpful. Also, while they were trained on vast amounts of data, they can lack context information about more niche subjects, such as medical device/IVD regulation. That’s when hallucinations tend to happen and that’s why GPAI is less suitable for regulatory work.

In Raiana, hallucinations are reduced in three ways and made detectable in one very important manner. First of all, the models receive a lot of context from up to date regulatory information; about 30 pages of A4 text per question. You notice that in the processing time: what is happening is the model is consuming all of that context. The probability that the answer is in there becomes very high. Secondly, the models chosen and the parameters they are used with are selected for a minimum of ‘creativity’. This is not something you can set on your GPAI. Finally and very simply, the instructions to the model tell it to say “I don’t know” or ask follow-up questions and not ‘feel’ compelled to come up with a response.

The short answer is: yes, Raiana, too, can hallucinate. All AI models do, by definition of how they work. But even if that happens, Raiana outputs are adorned with verifiable references. The human in the loop can immediately, literally with one mouse click, check if the regulatory article exists and whether it supports what Raiana stated. That is the detection strength: Raiana allows you to verify.

And as a wise man once said: in God we trust, but everyone else, bring evidence!