Mechanistic interpretability and observability

We received a very detailed question during our launch webinar: do the models your applications use allow for mechanistic interpretability and how do you offer observability (what the model is doing) to your end users?

Transparency of the operation of Raiana AI models is very important. Transparency helps with verifying the AI answer by the human in the loop.

True mechanistic interpretability is not possible in the kind of powerful AI models that are underneath Raiana, for both a technical and a legal reason. This type of insight involves showing which neurons fired in the neural network that makes up the model, or have heatmaps of which neuron areas or layers were most activated; the AI equivalent of a functional MRI, if you will.

However, a modern GPT-5.2 or Gemini 3 Pro model has hundreds of billions of parameters and therefore tens of millions of neurons. What these neurons, or the layers they’re in, exactly do is not publicly documented. Even if it was, it would not be interpretable by anyone other than an advanced data scientist with knowledge of the model architecture. Raiana customers tend not to have that expertise, or that time. The other problem is that the frontier AI labs do not make this information available for competitive reasons.

So which instruments are there in Raiana for observability? First of all, there is the chain of thought. This is a preamble to when the model actually produces it output. You can follow it along on your screen while the model is thinking and even review it afterwards. You can compare this chain of thought to ‘the model talking to itself’. It will say ‘hmmm, the user wants to know about PEP and PER in IVDR. That sounds like Annex XIII, but what about article 56? Let me look into that…’ (yes, the model will actually say ‘hmmm’ to itself sometimes).

AI scientists have discovered that this reasoning helps to improve results and then focused their engineering on it. When you follow along, you may catch the model misunderstanding your intent along the way. This will help you define a follow-up prompt (“I actually meant SSP”).

The other tools you have in understanding what the model did and how it came to its conclusion, are references. Raiana models are instructed to provide references for everything they do. Not only can you verify that the model did the right thing (as we explained in our post about minimizing hallucinations), but you can see why it decided to include some considerations in its answer. In the previous IVDR example, the model will tell you that the PEP and the PER are defined in Annex XIII, but activated by Articles 10 and 56. By verifying those references, you can then confirm the model’s reasoning is correct.