Mind vs. Machine: Why the Human Brain Still Outthinks AI

Posted September 2Sep 2

A recent paper by six Apple scientists, “The Illusion of Thinking”, has sparked significant debate in the tech community.

The work challenges the ability of current language models to perform complex reasoning, arguing that despite appearing to engage in sophisticated thought processes, these models collapse when faced with more complex tasks. Using well-known classical mathematical puzzles like The Tower of Hanoi, River Crossing, Conway’s Soldiers or Blocks World, the researchers observed that model accuracy drops drastically as problem difficulty increases, sometimes falling to zero and therefore of no use.

This phenomenon raises a fundamental question: what do we mean by intelligence or thinking in the context of AI? Traditionally, we associate intelligence with the ability to reason, understand and adapt to new situations. However, current language models, while possessed of an impressive ability to generate coherent text, don’t really understand what they produce. Their operation is based on statistical word prediction, with no grasp of underlying meaning.

Nevertheless, the amount of information these models have been able to digest during training is remarkable: GPT-3 is trained on hundreds of billions of words from books, articles, websites, along with Wikipedia content. This is a volume of reading and memorization that no human could even remotely match, as it would require chaining many consecutive lifetimes just to have time to scan through all of them. However, if it were possible, a human with access to that wealth of knowledge who could also remember and use it fluently would be considered an unquestionable authority.

Are language models at that level when compared to humans? In terms of information volume, undoubtedly. But when it comes to applying that knowledge with sense, judgment, or intentionality, the answer is more nuanced. What models do isn’t thinking as such, but finding statistical patterns amid a sea of data.

Why do we struggle so much to understand or define whether an LLM is intelligent or not? Because people we’ve historically labeled as intelligent or to whom we grant important positions in society stand out precisely for that capacity to memorize information. But… are they really intelligent? What makes a judge a better judge, for instance? Their ability to memorize more and more rulings or their capacity to rationalize logic, proportionality and other attributes for potential application in a specific case?

The Apple study has focused attention on a key difference between the most popular models: while Anthropic’s Claude 3 Opus performed even the most complex tasks well, models like GPT-4 or Gemini showed much steeper declines. This suggests that LLM reasoning mechanisms aren’t all the same: some are designed to appear to reason, while others have improved at maintaining structural consistency across longer contexts or multi-step tasks.

However, even the most advanced and specialized models, like OpenAI’s deep research agent, still fails extensively. They don’t truly understand what they’re investigating, can’t discern which sources are more reliable or relevant, and don’t reach the level of a research assistant who is not only a novice but also quite lazy. The difference from humans lies not only in the amount of available knowledge, but in the ability to contextualize it, evaluate it critically, and apply common sense. An agent capable of analyzing scientific papers doesn’t automatically become an expert: it lacks intentionality, lived experience and independent judgment.

This disconnect between the appearance of intelligence and the lack of real understanding has led many experts to warn about the dangers of anthropomorphizing AI. In their book “The AI Con”, Emily M. Bender and Alex Hanna critique the hype around AI, arguing that many claims about its capabilities are exaggerated and can lead to misunderstandings about its true nature.

Despite these limitations, the industry continues developing models that seek to emulate more human aspects of intelligence. OpenAI’s highlights GPT-4.5’s “emotional intelligence”, claiming it responds more naturally and empathetically, better adapting to user emotions. However, it’s crucial to remember that these responses result from learned patterns, not a genuine understanding of human emotions.

I have recently argued that judging generative AI by its current capabilities is a mistake. Technology is constantly evolving, and what today might seem like a limitation — such as the fact that these types of models are fundamentally based on language and its structure — may be overcome in the future. However, it’s essential to maintain a critical and realistic perspective about what these models can and cannot do.

Is AI progressing at warp speed? Undoubtedly. Will it transform how we live and work? Of course, thanks to its ability to process and organize information. But we must be cautious about attributing human capabilities like thinking, judgment, or emotional understanding to it. Recognizing the current limitations of these models isn’t dismissing them, it’s trying to understand them better so we can use them responsibly.

—

This post was previously published on Enrique Dans’ blog.

***