Nobody knows how AI models work, admits Claude CEO — and why we should worry

It’s not every day that the CEO of a large AI company admits something this big – nobody really knows how AI works.
Anthropic (Claude AI’s parent) CEO Dario Amodei just admitted this on a blog post and vowed to create a deep “MRI on AI” to figure out what’s happening inside these models.
Here’s how he explained the problem:
“If an ordinary software program does something—for example, a character in a video game says a line of dialogue, or my food delivery app allows me to tip my driver—it does those things because a human specifically programmed them in. Generative AI is not like that at all. When a generative AI system does something, like summarize a financial document, we have no idea, at a specific or precise level, why it makes the choices it does—why it chooses certain words over others, or why it occasionally makes a mistake despite usually being accurate,” he wrote.
This honest admission is applause-worthy. It puts the dangers of “unknown unknowns” front and center at a time when the hype train is pushing everyone and their aunts to become AI prompt engineers and whatnot.
The technology is transformative, surely. It’s also necessary to adapt and upskill if one has to be relevant in the modern economy. (After all, this post’s cover image is a gift from ChatGPT). However, over-reliance on AI, either as a work alter ego or a career crutch, could be detrimental.
Here’s why.
Anthropic’s Amodei wrote that AI model’s current inexplicability means that it can’t be used in many high-stakes situations because a small mistake can have seriously devastating consequences.
So, what happens when you have highly opaque but capable models that everyone’s betting on to be the next big thing? We’ve now entered a paradigm where AI seems to be advancing faster than we can make sense of it.
And it’s not just a problem for the AI companies who will struggle to scale or refine these capabilities in the absence of inherent knowledge of their own creations.
It’s also an enormous worry for the rest of the world as everything from our education, work, finances, economic and foreign policy comes under the grip of supernatural models that can very well dictate our life outcomes.
“I worry that AI itself is advancing so quickly that we might not have even this much time… I am very concerned about deploying such systems without a better handle on interpretability. These systems will be absolutely central to the economy, technology, and national security, and will be capable of so much autonomy that I consider it basically unacceptable for humanity to be totally ignorant of how they work,” Amodei wrote in the post.
This is a rare moment of sombre self-reflection from a technology pioneer. It deserves praise, reception and serious reflection. The post itself is worth reading in full. It lays out the complexity of the problem, what happens if it goes unsolved, and the roadmap to improve the interpretability of AI models over the long run.
The problem with AI so far has been of known unknowns i.e. users knowing that it can make mistakes, researchers acknowledging that it might have bias, but this adds a new dimension. This is now a problem of the unknown–unknowns – we don’t know just how a model works and thus have no way to predict what it could do next or exactly how to improve the predictability of its outcomes.
Imagine an autonomous weapons system misfiring a missile at a bird because it “thought” that the bird was a military drone. The domino effect is as ridiculous as this example itself. But the scare is real – improving recognition of birds vs drones is one way to fix this problem, but not even knowing why a model interpreted that bird as a drone is where we find ourselves.
We have reached a fork in the road. The digital overlords will choose a path for us. But it’s important to know where we are in the wild before our outdated compass fully gives up.
Link to the full post by Anthropic CEO here: The Urgency of Interpretability
This article was originally published by Mayank Jain on LinkedIn.