So, What if Plato Was Right? (Part 1)

I’ve seen the Bluesky AI crew riffing to the effect of “wait, Plato was right?” on three recent papers:

Emergent misalignment https://www.emergent-misalignment.com/
Harnessing the universal geometry of embeddings
- https://arxiv.org/abs/2505.12540
- this cites a paper, The Platonic Representation Hypothesis, that I missed when it came out
- https://arxiv.org/abs/2405.07987

The way I would unpack this riff is roughly: certain emergent properties of LLM training and behavior seem similar to the philosophical claims Plato made, and the Theory of Forms in particular; roughly, that abstract concepts are real, and in some sense more real than the world we perceive and exist in, and that philosophers and possibly other humans can access this as some kind of transcendental reality, and gain an objective understanding of what is Good, in particular.

I read a lot of Plato when I was younger (I majored in Classics) so I find this fun, but also fruitful, to think through a bit more - because the question of “to what extent can an LLM understand reality, rather than process and reproduce language it has previously encountered” seems relevant to the moment we are in now, and to be connected to the epistemological questions in Plato’s work, roughly “to what extent can a human understand reality, rather than process sensory impressions and reproduce language they have previously encountered.”

So - I’m going to try to present my understanding of Plato, as I learned it from Herman Sinaiko and a few other professors in college, but that was a long time ago, and there are certainly other interpretations out there. These are complex topics, and way beyond the depth of a blog post, but I’ll do my best.

Dialogues and Dialectic

What I really enjoy about Plato is that he is more interested in asking good questions, reframing, and examining things critically, than he is in laying out expansive theories like, say, Hegel. There he is no treatise in which he says, “my name is Plato, and here is my theory of forms.”

Instead, Plato wrote a series of dialogues in which a variety of figures argue various points of view about a great many topics, spanning political philosophy, epistemology, ethics, metaphysics, math, etc. So even when he presents an argument for a given theory, there is usually some degree of doubt, hedging, or counterargument, either immediately or in another dialogue.

This is very intentional, because Plato did believe quite strongly that dialectic, or philosophical debate, was how one could exercise the philosophical skills and gain philosophical knowledge, not by reading and memorizing by rote. If there is one thing that is clear from Plato’s entire work that he actually, strongly, and consistently believed, this is it.

The Theory of Forms

Thus, the “Theory of Forms” is not delivered in terms of “here is the theory of forms.” Instead, in the Republic, Socrates (the most frequent speaker in the dialogues) at one point makes three different extensive analogies that convey it in different aspects:

The Allegory of the Sun compares our ability to know things to our faculty of vision. Although we have the capability to see things, we can’t see in the dark, and can only see if light is present, and the principal source and cause of natural light is the sun.
- Socrates claims that we likewise have an innate capability to comprehend reality, even though our senses convey only partial representations of things.
- And that similarly, the “light” of this faculty you could call “truth” or even “being” or “reality”, and that this light/truth/being/reality emanates from the concept of Goodness, or the Form of the Good, itself.
The Allegory of the Divided Line is very dense, but outlines a four-level hierarchy of different ways of perceiving reality, as well as a corresponding hierarchy of reality itself. He proposes four levels of “realness”, from low to high:
- Appearances: shadows and reflections are cast by physical objects, and imitate them, but are not the objects themselves. Our sensory perceptions of objects are appearances as well, as is everyday speech and discourse
- Things: the actual objects that appearances represent, that we can only partially discern, in the case of a physical object, by closely examining it from multiple angles and perhaps with multiple senses
- Empirical or Mathematical Knowledge: systematization and formalization of appearances as observations, and forming deductions, theories, and arriving at certain kinds of truths that are accessible at this level
- Philosophical understanding: comprehensive understanding, arrived at only with philosophical training, of the ideas and concepts underlying them.
The Allegory of the Cave is the most famous, and applies the insights of the prior two allegories to our political reality, and the social role of philosophy and education
- Imagine that you are a prisoner, chained to a bench, with your head fixed at a wall, and people are behind you with torch and various shadow-puppets, and all you can see is the shadows of those puppets cast upon the wall, and you can’t even turn your head and see the prisoner next to you, but you can talk about shadow puppets and how they move.
- According to Socrates, this is approximately the existence of a person without philosophical training, and the quality of their communication and understanding of the political problems that occupy a great deal of our social existence
- Education in general, and philosophical education in particular, “breaks the chains” of the prisoner, allows them to get up and see the puppets, and the fire, leave the cave, and see the actual world and objects in it. A person who attains this is considered a philosopher, but this is almost more like “enlightenment”.
- The trouble is everyone else is still in the cave, and the philosopher has to go back, and then tries to explain to them that “no that isn’t a dog, that is just a puppet in the shape of a dog, and there is a dude behind you with a dog puppet making it appear to move” - and the people still chained to the bench think the philosopher is crazy.
- Not only that, the people chained to the bench would observe that his pursuit of philosophy, and getting up and leaving the cave, led to his madness, and further discourage others from doing so.

These summaries are already too long, but they are also too short, and these 20-something pages of the Republic are some of the richest words ever written, please go read them!

Forms and Truth

Rather than elaborate on the theory further, I’ll observe that Plato wrote all this 2500 years ago, in a world without social media, cable news, or political propaganda as we would understand it, but that analogizing the cave to the doomscroll of social media, or a movie theater, is probably appropriate.

Despite this - Plato absolutely lived in a world where political power could obscure and distort truth. Much earlier in the Republic, Thrasymachus argues that there can be no ideal of Justice, and instead, Justice is just a fiction created by whoever happens to be in control of the state at moment, to their own advantage.

In other words, the world Plato lived in was more like our postmodern “Total Information Collapse” infosphere than most of the intervening centuries. In that light, the Theory of Forms, and the argument that we do have the ability to perceive Truth and Good and objective reality, are a very sincere argument against that kind of “postmodern” nihilism.

Which is to say, I take the Theory of Forms to be a practical argument about epistemology - a claim that humans have the capability to gain a shared and reasonably accurate understanding of objective reality, independent of social and political constraints, by pursuing education, critical thinking, and debate.

The Forms and LLMs

So how do we bring this back to AI? I’m going to split this up into a few more blog posts just to keep this one from getting crazy long, but I think the key claims are:

The convergence of the internal representations of language models suggests a “universal latent structure of text representations” - a shared, high-dimensional space of concepts and meanings
- This can be framed as: are LLM’s gaining a shared and intelligible understanding of the real world?
- If true, this would be a strong counterargument against “stochastic parrots”
- A lot of the complexity here is around text and representation, and Plato has another dialogue, the Phaedrus, that has a lot to say here.
- But we should ask ourself how much the LLM training process can resemble the kind of philosophical education that Plato says would give a human access to non-superficial reality.
The emergence of similar values and preferences across models, and the apparent difficulty of producing misaligned models that are also useful, could suggest that alignment has similar convergent properties
- This does not necessarily “solve” alignment even if it is true in the strongest form
- This raises questions about how LLMs manifest “belief” and “values”
- I probably disagree with this one, but it is still worth exploring to work through some of these concepts

Anyhow, I’ll write these two posts soon, but in the meantime, ping me on Bluesky if you have feedback/comments - this is not the kind of post I usually write, so interested to hear what folks have to say!

blog.whal.ing

Posts

So, What if Plato Was Right? (Part 1)

Dialogues and Dialectic

The Theory of Forms

Forms and Truth

The Forms and LLMs

Table of Contents

Backlinks