Beyond Tokens - Why Real Perception Is the Missing Key to AGI

New state-of-the-art models are being released every week. The evals keep improving. But ask yourself: What can a person do with AI today that they couldn’t do a year ago? Two? The list is slim.

Frontier AI labs are throwing staggering compute at the problem. GPT-4.5 reportedly required 20x the compute of GPT-4 and took two years to train. OpenAI claims the model is 10x smarter, but users aren't 10x more productive, they aren't making 10x wiser decisions, and they aren't launching 10x better companies.

We’ve hit a plateau.

As Sam Altman has said, there needs to be one more breakthrough before we can say we have AGI.

What is that breakthrough?

In medieval times, there was a concept called "scholasticism." It was a way of teaching that combined the best of the Greek and Roman traditions, firmly rooted in text.

The scholastic process:

Formulate the Question: Pose a paradox or challenge.
Survey the Authorities: Present key positions from respected thinkers.
Present the Objections: Lay out the best counterarguments.
Offer the Response: Defend your position with logic and synthesis.
Refute the Objections: Rebut each objection directly.

Example:

Is AI capable of creativity?
Objection: It lacks consciousness, intention, originality.
Yet: AI-generated art is sold, judged, and debated as creative.
Response: Creativity is about novel, valuable output, not inner experience.
Rebuttal: Humans also remix. Intent isn’t required for impact.

Sound familiar? Ask any LLM a question today, and its response will follow a similar shape. It surveys a huge corpus of text, weighs context, and synthesizes a response. LLMs are industrial-scale scholastics, trained on centuries of argument, logic, and debate.

But like their medieval counterparts, LLMs operate within the walls of language. It generates insight, not understanding. Like the scholastics before them, these models know everything that’s been said, but they can't originate new ideas. They’ve read the world. They haven’t seen it.

The final breakthrough will be when we can train a model that can perceive nature in real-time, at full-throughput and reason about it. Imagine a model that watches the world like a child. Eyes at 60fps. Ears at 24kHz. A mind that doesn’t just process text, but observes, remembers, and reasons across time. A machine that learns not from books, but from experience.

The final step to AGI won’t come from more tokens. It will come from perception. From models that see and hear like we do. When we give machines the power to observe the world, not just read about it, they won’t just sound smart. They’ll be alive.

The path to AGI won’t be paved with more tokens. It will begin the moment AI opens its eyes.

Written on Apr 14th, 2025