In this conversation, Dr Henry Shevlin (University of Cambridge) and I explore the complex and multifaceted topic of AI consciousness. We discuss philosophical and scientific dimensions of consciousness, discussing its definition, the challenges of integrating it into a scientific worldview, and the implications of such challenges for thinking about machine consciousness. The conversation also touches on historical perspectives, ethical considerations, and political issues, all while acknowledging the significant uncertainties that remain in the field.
Takeaways
Consciousness is difficult to define without controversy.
The relationship between consciousness and scientific understanding is extremely complex.
AI consciousness raises significant ethical questions.
The Turing test is a behavioural measure of intelligence, not consciousness.
Historical perspectives on AI consciousness are helpful for understanding current debates.
Cognition and consciousness are distinct but related.
There is a non-trivial chance that some AI systems may have minimal consciousness.
Consciousness in AI systems is a scientific question, not just a philosophical one.
The debate on AI consciousness is messy and strangely polarising (and often heated) but fascinating and important.
Chapters
00:00 Exploring the Nature of Consciousness
17:51 The Intersection of AI and Consciousness
36:16 Historical Perspectives on AI and Consciousness
59:39 Ethical Implications of AI Consciousness
Transcript
Please note that this transcript is AI-generated and may contain errors.
Dan Williams: Okay, welcome everyone. I’m Dan Williams. I’m here with the great Henry Shevlin. And today we’re going to be continuing our series of conversations on artificial intelligence, some of the big picture philosophical questions that AI throws up. And today specifically, we’re going to be focusing on AI consciousness. So could machines be conscious? What the hell does it even mean to say that a machine is conscious? How would we tell whether a machine is conscious? Could ChatGPT-5 be conscious and so on? Before we jump into any of that, Henry, I’ll start with a straightforward question, or what seems like a straightforward question. What is consciousness?
Henry Shevlin: So it’s very hard to say anything about consciousness that is either not a complete platitude or rephrasing like consciousness is experience, consciousness is your inner light, consciousness is what it’s like. Those are the platitudes. Or saying something that’s really controversial, like consciousness is a non-physical substance or consciousness is irreducible and intrinsic and private. So very hard to say anything that is actually helpful without also being massively controversial. But probably let’s start with those kind of more platitudinous descriptions.
So I assume everyone listening to this, there is something it’s like to be you. When you wake up in the morning and you sip your coffee, your coffee tastes a certain way to you. When you open your eyes and look around, the world appears a certain way to you. If you’re staring at a rosy red apple, that redness is there in your mind in some way. And when you feel pain, that pain feels a certain way. And more broadly, you’re not like a rock or a robot, insofar as we can understand you purely through your behavior. There’s also an inner world, some kind of inner life that structures your experience, that structures your behavior.
All of which might sound very obvious and not that interesting, or not that revolutionary, but I think part of what makes consciousness so exciting and strange is it’s just very hard to integrate it with our general scientific picture of the world. And I’ll say in my own case, this is basically why I’m in philosophy. I mean, I was always interested in ethics and free will and these questions. But the moment where I was like, shit, I’ve got to spend the rest of my life on this, came in my second year as an undergrad at Oxford studying classics. I was vaguely interested in brains and neuroscience, so I took a philosophy of mind module with Professor Anita Avramides. And I read an article that I’m sure many of the listeners at least will have heard of called “What is it Like to Be a Bat?” by Thomas Nagel, and it blew my mind.
And immediately afterwards, I read an article called “Epiphenomenal Qualia” by Frank Jackson, which is the article that introduces Mary’s room. And it blew my mind even more. And basically, I’d spent most of my life up until that point thinking the scientific picture of the world was complete. I spent most of my life until that point basically thinking the scientific picture of the world was complete. And you know, there was some stuff we didn’t understand, like what was before the Big Bang, maybe exactly what is time, but when it came to biological organisms like us, we had Darwin, we had neuroscience, it was basically all solved. And then reading more about consciousness, I realized, my god, we don’t even begin to understand what we are, what this is.
Dan Williams: Yeah. I think that’s... Let me just interrupt there to flag a couple of those things, because I think they’re really helpful in terms of structuring the rest of the conversation. The first is, when it comes to consciousness, it’s really, really difficult to articulate precisely in philosophically satisfying ways exactly what we’re talking about. You mentioned this classic article, “What is it Like to Be a Bat?” And I think it’s a fantastic article, actually. I’m teaching it at the moment. And one of the reasons I think it’s fantastic is because it does convey in quite a concise way, quite quickly, the sort of thing that we’re interested in.
So I’m talking to you, and I assume that there’s something it’s like to be you. Nagel’s famous example is with bats. They are these amazing animals. Their perceptual systems are very alien to ours, but we assume there’s something it’s like to be a bat. So it’s very difficult to state precisely exactly what we’re talking about, but you can sort of gesture at it—something to do with subjective experience, what it’s like to have an experience and so on.
And then the other thing that you mentioned, which I think is really interesting, and in a way, it’s sort of disconnected from the machine consciousness question specifically in the sense that even if we had never built AI, there would still be all of these profound mysteries, which is just how the hell do you integrate this thing called subjective experience into a scientific worldview? I mean, there are other sorts of things where people get worried about a potential conflict between, roughly speaking, a scientific worldview and a kind of common sense picture of the world. So maybe free will is another example, or maybe objective facts about how you ought to behave. Some people take that seriously. I’m not personally one of them, but some people do. But I think you’re right. Consciousness feels so much more mysterious as a phenomenon than these other cases that still seem to pose puzzles for a broadly scientific worldview.
Henry Shevlin: Also, unlike free will and unlike objective morality, I think it’s very, very hard to say that consciousness doesn’t exist. I mean, it’s pretty hard to say that free will doesn’t exist and painful perhaps to take the view that objective morality doesn’t exist. But these are just very well established positions. And there are some people out there, illusionists, who try and explain away consciousness. Maybe how successful they are is a matter of debate. But it’s very, very hard to just say, like, your experience, your conscious life—nah, it’s not there. It’s not real. It doesn’t exist.
Dan Williams: Yeah, right. Actually, I think that’s another nice place to go before we go to the specific issues connected to artificial intelligence. So there’s this metaphysical mystery, which is how does consciousness, how does subjective experience fit into a broadly scientific, we might even say physicalist picture of the world? And so then there are lots of metaphysical theories of consciousness.
I’ll run through my understanding of them, which might be somewhat inadequate, and then you can tell me whether it’s sort of up to date. Roughly speaking, you’ve got physicalist theories that say consciousness is or is realized by or is constituted by physical processes in the brain, in our case. You’ve got dualist theories that say consciousness is something over and above the merely physical. It’s a separate metaphysical domain, and then that comes in all sorts of different forms.
You’ve got panpsychism, which is, to me at least, strangely influential at the moment, or at least it seems to be among some philosophers, that says basically everything at some level is conscious, so electrons and quarks are conscious. And then you’ve got illusionism, and I suppose probably the most influential philosopher that’s often associated with illusionism would be Daniel Dennett. I understand that he had a sort of awkward relationship to the branding. But there the idea is something like, look, we take there to be such a thing as consciousness. We take there to be such a thing as subjective experience. But actually, it’s kind of just an illusion. It doesn’t exist. Is that a fair taxonomy? Is that how you view the different pictures of consciousness in the metaphysical debate?
Henry Shevlin: Yeah, I think that’s pretty much fair. A couple of tiny little things I’ll add. So panpsychism maybe doesn’t completely slot into this taxonomy in quite the way you might think. Because a lot of panpsychists would say, no, we’re just physicalists, right? We believe that everything is physical. There is only the physical world. But consciousness is just a basic physical property associated with all physical stuff. So that’s, for example, Galen Strawson’s view. He’s got a great paper called “Real Materialism.”
But there’s another way you can be a panpsychist, which was the Bertrand Russell sort of view and the view developed by people like Philip Goff that says the underlying nature of reality is neither physical nor mental. This is also sometimes called neutral monism. Also a view associated with Spinoza, the great historical philosopher. So underlying reality is neither physical nor mental, but everything has physical aspects and mental aspects. Spinoza’s version of this is called dual aspect theory.
So the other thing, other little qualification worth mentioning is if we go dualist, there is one very important, I think, pretty defining distinction. Okay, if consciousness is not physical, does it interact with the physical? Descartes, of course, famously thought that consciousness was a substance or the mind was a substance that interacted with the physical, which immediately runs into some really messy issues. But the other predominant form of dualism is epiphenomenalist dualism, where basically everything the world is made of is physical, but through complex or strange or basic metaphysical processes, there’s this conscious layer that sometimes emerges from reality, but doesn’t itself drive any causal processes. But that’s the basic metaphysical picture.
Dan Williams: Yeah. And I think that thing you said there at the end about emergence is really important. And I think many people coming to these debates to begin with, they don’t draw the distinction between a view that says consciousness in some sense just is or is constituted by physical goings on in the brain, for example, and a view that says consciousness emerges from physical goings on in the brain. They sound like they’re similar claims, but actually that latter claim about emergence is really very different from the ordinary physicalist view, because you’re basically saying there’s something over and above the physical. It might be related to the physical in a law-governed manner, but it’s a different kind of thing. Is that right as a distinction?
Henry Shevlin: Absolutely, yeah. And I think it’s one of the ways of problematizing the kind of knee-jerk physicalism that I think a lot of people have, or the kind of physicalism that doesn’t see consciousness as problematic. It’s like, yeah, consciousness emerges from the brain. It’s like, okay, but what we ultimately need is a theory that tells us how pain, orgasms, perceptions, everything, all of your experiences, how at some level they just are neurons, right? There’s nothing over and above neuronal activity or computational processes or information. The theory actually has to ultimately bottom out in neurons. If you say it emerges, and without being identical to it, without being identical to the brain stuff, that still does not neatly fit with the physicalist picture.
Dan Williams: Right, right. Yeah, and it’s very unsatisfying in a way, or at least I find it unsatisfying just to have brute emergence. By the way, I love that your first two examples there were pain and orgasms as the most salient examples of conscious experience that come to mind. It just occurred to me, something that we might be taking for granted as philosophers who know this literature well—you know it a lot better than I do, but we’re both immersed in it to some degree—which is we said consciousness poses a problem for, or at least a puzzle for a scientific worldview, we’ve gestured at why that is. Maybe it’s helpful to return to something you said, which is one of the things that you found most influential in your own philosophical journey was this article by Frank Jackson, which touches on what’s often called the knowledge problem.
So this is basically a thought experiment where you’re asked to imagine a color scientist called Mary who is color blind, or perhaps not color blind—in fact you can correct me if I’m misdescribing the story—but basically is an expert in color science but inhabits a black and white room, so she’s never actually seen, for example, the color red, but we’re supposed to imagine that she knows everything there is to know about the neuroscience, the neurophysiology, the physics of perception, of color perception, but she’s never actually herself experienced the color red. So she knows all of the physical facts, the facts about neurobiological mechanisms and so on, the facts about how light interacts with the visual system, knows all of the physical facts, never actually experienced red.
And then Jackson asks, okay, suppose that one day she does, let’s say, leave this purely black and white environment and encounters the color red for the first time. He says, well, it’s obvious that she’s going to learn something new, which is what it’s like to experience the color red. But by stipulation, she knew all of the physical facts already. So the fact that she’s learned something new suggests that the physical facts don’t exhaust the facts. Is that, roughly speaking, the thought experiment?
Henry Shevlin: Yeah, that’s exactly right. So yeah, in Frank Jackson’s original framing, she’s just in this black and white room, but typically when teaching undergrads, undergrads say, can’t she just rub her eyes really hard or bang her head or find some other way to generate phosphenes or something like that. So the way I usually frame it for pedagogical purposes is she’s got a condition called cerebral achromatopsia, which is a real condition, which is a form of neural deficit that means you can’t experience color. And then one day she gets her neural deficit fixed and she’s like, my god, now I know what all these colors are like. I spent my whole life researching color but I never knew what color actually looked like.
But yeah, that’s just my preferred framing but I think you’re absolutely right. The way the actual structure of the argument is by hypothesis, she knows all physical facts relevant to color vision at the neural level, the level of optics, the level of physics. She learns a new fact when she gets her vision fixed or leaves the room. Therefore, there are some facts that are not physical facts, or the domain of fact is not exhausted by purely physical facts.
And the interesting thing about this, I think—I mean, the argument is basically there in the Nagel article as well, maybe in slightly less clear form—one way of capturing one of the points that Nagel makes in that article is, look, no matter how much we ever knew about bat behavior, bat brains, about bat evolution, that’s never going to tell us what it’s actually like to see with ultrasound for bats. Presumably there is something it’s like, this phrase that Nagel popularizes in that article, but you’re never going to get to that purely from a neurophysiological or optical understanding of the world.
Dan Williams: Yeah, I mean, from what I remember in the Nagel paper—I reread it relatively recently—but I think Nagel says at the very least, this suggests that there’s this fundamental what philosophers would call epistemic challenge in the sense that even if you’re committed to the view that ultimately everything is physical, nevertheless, what these sorts of thought experiments and these sorts of examples demonstrate is we, at least at present, can’t even understand how that could be true.
Whereas I think Frank Jackson makes a more directly metaphysical conclusion, which is this thought experiment, at least when he was initially publishing these ideas—I think he’s later changed his mind—but this thought experiment demonstrates physicalism to be false. Maybe it’s worth also noting that another thing that philosophers will often bring up in this context is the zombie thought experiment, I guess, or the imagined situation where you’ve got a system which is behaviourally, functionally identical to a human being. So exactly as you are right now talking to me, we’re having a conversation about consciousness, and yet there’s just nothing going on inside. There’s nothing it’s like to be that system, no subjective experience.
And the point is not supposed to be that people actually are zombies, but the fact that we can allegedly coherently imagine that situation without running into any kind of logical contradiction demonstrates that consciousness is something over and above the merely physical, functional, behavioural. Roughly speaking, that’s the kind of idea. Before we move on to how these sorts of metaphysical puzzles and issues connect to AI, did you have anything else you wanted to add about any of that?
Henry Shevlin: Yeah, two very quick things. So just to give a name to the difference you are isolating between Nagel and Jackson, the difference between epistemic versus metaphysical arguments, and to round out our trifecta of guaranteed readings for class one of a philosophy of mind or consciousness session. So this is the idea of the explanatory gap, an idea coined by Joe Levine, who basically points out that, look, even if you think from inference to the best explanation, our best model of the world is that everything is ultimately made of physical matter and energy, right now, we have no idea how to integrate consciousness in that picture. This is the explanatory gap. And you can see that as a challenge that a good theory of consciousness, a good physicalist theory of consciousness should close. It should fill out and make obvious why consciousness exists and why things are the way they are.
And I think this is actually my preferred way of framing the zombie arguments, right? You can see the zombie argument as a challenge to physicalism that says, all right, give me a theory of consciousness that shows why zombies are logically impossible, right? In other words, why it is impossible, a contradiction in terms, which is what you ultimately need for a complete physical theory, to talk about beings who are not just behaviorally, but microphysically identical to us—beings that are microphysically identical to us that are non-conscious. That seems like something I could imagine, like something, someone’s walking around but there’s no lights on on the inside, so to speak. A good physicalist theory of consciousness, a complete physicalist theory, should close the explanatory gap and show why that is not just implausible but like an actual logical contradiction.
Dan Williams: Yeah, yeah, and I think that’s nice, and I think it connects to, in some ways, some actually quite complex philosophical ideas. The more you dig into this, you realize actually you need access to a whole complex philosophical machinery to really even articulate the core ideas and positions and debates and so on. But I think we’ve said enough to sort of frame this and give it context. Maybe now we can move on to artificial intelligence.
Before we get to the present day, I think it’s worth maybe just taking a little detour through some of the most important historical developments in thinking about artificial intelligence and consciousness. So famously Alan Turing, who’s a pioneer in computer science and digital computers and AI and so on, he published an article, I think it was 1950, “Computing Machinery and Intelligence.” And the opening line of that is something like, we’re going to consider the question, can machines think? And then he goes on to propose what’s subsequently referred to as the Turing test, which roughly speaking is, could a machine via conversation, text-based conversation, trick us into thinking that we’re talking to a human being? And if it could, then it passes the test. It’s imitated human behavior and Turing seems to kind of say that if a system did in fact pass that test then there’s some sense in which we could say that we can describe it as a thinking entity or we can describe it as intelligent.
I’m being sort of vague there because I think the article itself is a little bit vague in terms of how it expresses these different ideas, which is understandable in a way because obviously this was written in 1950. So I think there are kind of two issues there. One is that Turing was focused on, at least on the surface, thought and intelligence in the first instance, whereas we’ve been talking about consciousness. And then the second facet of this is Turing was proposing a kind of behavioral test for establishing whether a system, whether a computing machine can accurately be described in these sorts of psychological terms.
Let’s take the first of those first. This issue of intelligence, understanding, thought on the one hand and consciousness on the other. My sense is many people bundle these two things together, but you might think there’s an important distinction between them. Where do you come down there?
Henry Shevlin: Yeah. So I think one distinction that maybe is present in pretty much all debates in cognitive science, but is less present in public understanding of these issues is the distinction between cognition or the mind on the one hand and consciousness on the other. So basically, for the last hundred years or so, it has been not uncontroversial, but widely recognized that there are at least some forms of unconscious cognition. Whether you think that’s Freudian unconsciousness—we have unconscious drives and motivations—or the more cognitive neuroscience view of unconscious. Think about the stuff that happens in early vision or things that happen in linguistic processing or for that matter, we’ve got a rich literature now on unconscious perception. All the various ways your brain could register and interpret representations without thereby giving rise to any conscious experience.
And I think our language around the mind and cognition doesn’t need to be conscious in order to be useful. In particular, I think there’s another route to understanding what mental states are that doesn’t involve consciousness, which is they’re in some sense representational states. And as soon as we need to start talking about things like what you’re perceptually representing or what your unconscious beliefs are or what your unconscious motivations are, you’re talking about your brain managing and interpreting representations. At that point, as far as I’m concerned, you’re in the domain of the psychological, you’re talking about the mind, and a lot of that stuff happens unconsciously.
So I think, yes, we shouldn’t assume at the outset, certainly, that all mental states are conscious. So I think it’s entirely coherent to say, for example, I think LLMs think, I think LLMs understand, I think LLMs reason, but I don’t think LLMs are conscious. And I think we can keep those two apart, although of course, different theoretical positions are going to lead you to say, actually, maybe the only real form of understanding is conscious understanding. For example, that’s a position you can adopt.
Dan Williams: Right, right. Yeah, that’s really interesting. I definitely—it seems to be the case that people are kind of comfortable, well, they’re comfortable talking about artificial intelligence, which suggests that they don’t really object, for the most part, to the idea that these systems are at least in some sense intelligent. I think there’s also a tendency for people to use terms like thought and reasoning quite easily when it comes to using these systems. But people, I think, are much more suspicious of the idea that we should think of them as conscious systems when it comes to something like ChatGPT-5 or Claude or Gemini. And I think that’s kind of in line with basically what you’re saying, that there’s an important distinction between these two things. They might go together in the human case in the sense that we are intelligent animals and we’re also conscious animals. But in terms of thinking about this issue carefully, you do need to draw a distinction between them.
Henry Shevlin: Absolutely. I’d also add that I think this is something that’s really clear in any kind of comparative cognition. So if we’re looking at non-human animals, okay, maybe we all agree that chimpanzees, dogs and cats are both conscious and have a mind. But there’s a really rich field of insect cognition. People work on cognition in quite simple invertebrate organisms, even more simple than insects. There’s a growing field, believe it or not, of plant cognition that tries to understand plant behavior in cognitive terms and even—and so actually really quite interesting research on bacterial cognition, cognition in microbes.
Now, we can debate whether that’s really called cognition or whether it’s better understood in more basic terms. But the point is you don’t need to decide in advance if a given animal species is conscious in order to do useful cognitive science and comparative psychology about its behavior and its internal structures.
Dan Williams: Yeah, that’s interesting. It reminds me actually, when I teach AI consciousness, I would often start by putting some images of things up on the PowerPoint and asking people for a kind of snap judgment of whether it’s conscious or not. So it’s like, here’s a chimpanzee—conscious. Of course it’s conscious, people think. Here’s a squirrel. And then I’ll do things like, here’s bacteria or here’s a tree or here’s the entirety of planet earth. And when you get to those sorts of cases, I think people just fill up either it’s not conscious or they’re in this complex situation where they’re not quite sure what the question even means.
But funnily enough, when I then put up the logo of ChatGPT, and I realize this is not a representative sample necessarily, but I think basically always the response is no way is that conscious. And we should get to that later on because I think that’s an interesting intuition and I think you’ve also got views about how widespread that intuition is and how we should expect that to develop in the future. Maybe before moving on from this then, have you got any thoughts about the Turing test? Because it has been very, very historically influential. So we should probably say at least a couple of things about it before moving on.
Henry Shevlin: I think it’s a beautiful paper and it’s very accessible and I’d encourage everyone to read it. You don’t need any particular specialization to read it. And it’s just amazing how many arguments Turing successfully anticipates. And I quote that article on so many different issues. He talks about how even with the very simple computers he was using at the time, they regularly do things that he wasn’t expecting them to do. They regularly surprise him. He’s addressing this idea of a computer can never do anything except what it’s been programmed to do. It can never surprise us. And he’s like, no, of course. Just because you program in the initial instructions doesn’t mean it can’t surprise you in all sorts of interesting ways.
He also does—you’re right that it’s framed around this question of can machines think. But he explicitly addresses consciousness or the argument from consciousness, as he calls it. And he makes this slightly behaviorist move where he basically says, this is at least a lightweight behaviorist move where he says, look, none of us know whether anyone else is conscious, right? He calls it a polite convention, right? What we do is we look at each other’s behavior and assume, well, look, it’s behaving in relevantly similar ways to me, I’m conscious. So I guess he’s conscious too. This is what he calls the polite convention.
So he says, basically, look, the question is, under what circumstances, if we’re worried about consciousness, should we extend that convention to machines? And it seems like if they can successfully pass as humans—I’m doing some slight retro interpretation here, but I think one way of seeing the argument is if they can, to the extent that machines can do the same kind of behavioral capacities, relevantly similar behavioral capacities as humans, then we should extend the polite convention to them as well.
Dan Williams: Yeah, and I think just to connect that point about, can they exhibit the appropriate behavioral capacities? So there’s a kind of claim there about consciousness, but obviously connected to him proposing the Turing test is this broader view, which is that there must be some kind of behavioral test of whether a system can accurately be described as—his primary focus, as we’ve said, is on thought and intelligence, but also potentially consciousness as well.
My sense is most people these days who are experts in the field, they think that the specific test that he proposed was in a sense not very good, just because you can get systems that imitate human beings in such a way that they can trick people into attributing intelligence without actually being all that intelligent. We should also say that, I mean, my understanding of this, and I wouldn’t find this surprising at all, is that state of the art large language models can basically pass pretty stringent versions of the Turing test, as I understand.
Henry Shevlin: Yeah, yeah, so there have been some large scale—replication isn’t quite the right word—instantiations of the Turing test. One from earlier this year that involved five minute dialogues with state of the art language models, found that state of the art language models were actually judged to be human more often than their human competitors. But the slightly tricky thing here is that I think the Turing test is not a single highly defined experimental set of parameters, right? Because you can vary so many different dimensions. How long do you get to talk to the person for, for example, right? Are language models allowed to be deliberately deceptive and say things that you know to be false? So there’s a whole bunch of different rules parameters you can vary. So there’s not going to be like a single moment where it’s like, amazingly, machines have finally passed the one and only Turing test, right? There are lots of different variations of it.
And there was a long running prize called the Loebner Prize. I think it started in the 90s and ran for about 20 years, but basically pitted state of the art chatbots against humans in an instantiation of the Turing test. So a lot of people dismissed it as very showbiz rather than real science, but it led to some interesting journalism and some interesting chatbots that were developed. But I think the last one was held before, certainly before the release of ChatGPT. I think it was about seven or eight years ago. But yeah, so people have attempted to actually run Turing type tests for many years.
Dan Williams: Yeah, yeah. Okay, before we turn finally to state of the art AI and these deep questions about AI consciousness, maybe we should also just briefly touch on another kind of argument that I sort of hate, but I think has been quite historically influential, which comes from John Searle and it’s the Chinese room thought experiment. Very roughly speaking, Searle asks us to imagine a situation in which you’ve got somebody in a room. They don’t speak a word of Chinese, they don’t understand any Chinese. They’re receiving Chinese text from outside of the room. They are then consulting a sort of instruction manual, maybe a set of if-then instructions, about what to do when you’re given certain kinds of inputs. And then in response, they are producing, sending outside of the room, certain kinds of outputs according to this set of instructions.
And so he says, look, you could imagine the instruction book being designed in such a way that someone just mechanically following that procedure would, from the perspective of somebody outside of the room, come across as coherent, intelligible Chinese conversation. But by stipulation, the man inside the room doesn’t understand anything about Chinese. And then the move is something like, well, what’s going on within the room is in some ways relevantly similar to how a computer or a computing machine works. So what this is supposed to tell you is that doing whatever it is that a computer does in and of itself won’t produce genuine understanding. Is that right? Is that a fair representation of it?
Henry Shevlin: Yep, that’s a very good summary. Maybe just a couple of small points. So just to add extra clarity for listeners, the person in the room is not using a phrase book and translating from Chinese into English and back into Chinese. They have no idea what any of the characters mean. They just have basically a lookup table that says if you see pictures that look like this, if you see characters that look like this, respond with these characters in return. So the idea is there is zero semantic understanding happening inside the room. However, the room as a system seems to produce this appearance of understanding Chinese. And that is the core claim that you can have this appearance of understanding without any real semantic understanding. Hence, the Turing test is inadequate, is insufficient as a test for genuine understanding.
And the other thing to mention is that in the article or the chapter where Searle lays out this argument, “Minds, Brains and Programs,” he’s very much focused on understanding, but in his later work, he connects this to consciousness quite explicitly because his whole theory of what semantic understanding is, of what some real semantic content is, is that it’s ultimately grounded in consciousness. So there’s sometimes a little bit of slippage, but when people shift between understanding and consciousness, it’s very unclear to me that understanding requires consciousness. I don’t think it does. But it’s very clear for Searle, they are two sides of the same coin.
The other thing I’d really emphasize about the Chinese room, two things I’d emphasize, because it is probably worth noting—John Searle sadly died two weeks ago. One of the most influential philosophers of language and obviously the Chinese room was massively influential. Due respect to John Searle, but I will say it is one of my least favourite arguments in philosophy. I think in particular, it’s one of these arguments that’s easy to fit, nod along to without thinking through the details.
So just to give one little example of how it’s probably a lot more complicated than it sounds. Imagine I say to you, imagine we’re having a conversation and I say, “Hey Dan, how are you?” You might respond, “Not bad Henry, how about yourself?” If I say exactly the same thing again, “Hey Dan, how are you?” You might say, “Did you not hear me the first time? I’m fine.” If I say it again, you’d be like, “I’m sorry, is there a problem with the line?”
That’s a very simple example, but it illustrates the fact that language does not work in a series of neat input-outputs, but you actually need to keep track of the entirety of the conversation that’s gone before. So given this kind of combinatorial complexity, it’s a classic combinatorial explosion in even the most basic sentence, right? The kind of scale of the lookup table that would be required for managing even a basic conversation would exceed very quickly the amount of bits in the entire universe.
This is why you can easily say sentences that have never been said before. If I say, “The elephant looked forlornly at the cheese grater,” I can almost guarantee no one has ever said that exact sentence before because all natural languages have this insane combinatorial complexity. So that, I think, makes the idea that you could ever build a Chinese room that literally consisted of a lookup table—that makes it very implausible, as in a lookup table that consisted of just a set of one-to-one correspondences for every possible conversation and every possible extended conversation. So that’s one point I’d emphasize.
A second point is that ultimately, and I think we might come back to this in a second, it’s an argument from intuition. The argument is saying, do you reckon a system like this is genuinely understanding or in later versions is genuinely conscious? And I think basically what a lot of the argument is designed to do is designed to promote this feeling of disquiet or unease or skepticism that a system like that would be conscious. But if we’re approaching consciousness like a genuine scientific problem, for example, then it’s very questionable whether our intuitions have any epistemic or evidential weight at all. I mean, how are we supposed to know if this incredibly exotic, nomologically impossible to build machine—why should we trust our intuitions about whether such a system would genuinely understand or be conscious? I mean, there are theories about the semantics of our mental vocabulary about what words like conscious and understand mean that maybe give our intuition some weight, but you’ve got to do a lot of work to show why our intuitions would have any validity for these kind of cases at all.
Dan Williams: Yeah, no, I’m completely with you there. And it is maybe also worth saying, I actually don’t think this is that relevant, but it is worth saying when Searle was writing this, it was very much the era of what’s called good old fashioned AI, good old fashioned artificial intelligence, where the thought was intelligence in machines is rooted in a certain kind of rule-governed manipulation of symbols according to some program, which you might think very loosely maps on to what he was imagining in terms of a man following instructions in this lookup table.
Even there, I don’t think it makes any sense or is coherent. But these days when we’re thinking about state of the art AI, it’s really, for the most part, a completely different approach rooted in neural networks and the deep learning revolution where you’ve got these vast networks of, in state of the art AI today, hundreds of billions of neuron-like units and trillions of connections between them. And in terms of our actual examples of things which can produce conversation and hold conversation, that’s the kind of underlying architecture that we’re dealing with. Obviously, I’ve described that in a very simplified cartoonish way, but it’s so alien to anything that could even be confused on a foggy night for a man following an instruction book as imagined in Searle’s Chinese room thought experiment. Okay, let’s move on to state of the art AI now.
Henry Shevlin: Cool. Sorry, one quick thing before we do, if I may, because I think on that last point you mentioned, one of the most exciting conversations, one of the most striking conversations I’ve ever had, this was with GPT-4, was discussing the Chinese room with it. And just to give the full context, I was talking out loud, I was driving in my car, having an out loud conversation with GPT as I often do, and I was talking about this combinatorial explosion problem, the fact that in order to build any kind of nomologically possible version of the Chinese room, you would need something like a memory system to keep track of prior interactions. And as soon as you start to fill this out with any kind of remotely plausible architecture, like LLMs, rather than just this pure lookup table where everything is perfectly coded in advance, here is exactly what you should say in this instance—something a little bit more dynamic. At that point, I think our intuitions just get a lot murkier.
Anyway, I said this to ChatGPT, I was ranting and it replied out loud. I’ve got the quote here because I saved it. “You’re spot on, Henry. Searle’s thought experiment often gets simplified. But when you dig into the details, those rule books would have to be incredibly complex to account for context, syntax, previous conversation and so on. Essentially, the rule books would have to be some form of state or memory to handle a genuine conversation in Chinese or any language, really. So in a way, the rulebooks would resemble a state machine, keeping track of prior interactions to generate a meaningful current response. If you look at it this way, it starts to sound a lot like the algorithms that power language models, like… well, me.”
Dan Williams: Wow, yeah.
Henry Shevlin: And it said it with that exact intonation. And I almost stopped the car and I was so stunned at this seeming moment of—it looks a little bit like self-awareness because I should stress in this conversation, we had not discussed language models at all. We were just having a classic conversation about philosophy of mind.
Dan Williams: That is interesting. Okay, well, let’s bring it to these current systems like ChatGPT-5 or Claude or Gemini, state of the art large language models. I mean, in some ways they’re more than large language models as that concept was understood a few years ago because of all of these other aspects which they now involve, to do with the post-training that they receive and the multimodality of these systems and so on. But focusing on these systems, let’s return to this question of, okay, we can talk about what the test should be, but do you think there could be a purely behavioral test, one which doesn’t focus on the actual mechanisms by which a system produces that behavior, that could tell you whether or not that system is conscious?
Henry Shevlin: Oh my gosh, that’s a very tricky question. I think in short, given dominant assumptions about what consciousness is in consciousness science, the answer I would say is no. Because basically any behavioral test you’re going to adopt is going to have to come with certain metaphysical assumptions. And there is no straightforward or even probably possible way to test those metaphysical assumptions.
So let’s just take a view that a lot of people hold, which is that only biological systems can be conscious. That just as a matter of basic metaphysical fact, only systems that have metabolism that are alive can be conscious. No matter how good the behavior is, how complex the behavior is, even if we basically produce a perfect simulation of a human brain down to simulating individual neurons or even at the sub-neuronal level, that’s still not going to give rise to consciousness. So that is a perfectly possible metaphysical position you could hold. It’s one that I think is—that I don’t agree with. But equally, I can’t give like a clear refutation against it, right?
And I think there’s a whole bunch of different metaphysical assumptions you need to make to say anything about machine consciousness, its presence or absence or possibility. And those do not seem to be testable, number one, and two, they’re massively contentious. So you can propose various tests, but they’re only going to make sense given certain metaphysical assumptions.
Dan Williams: Yeah, okay, that’s good. Let’s actually—I mean, this is not necessarily where I was expecting this to go, but I think it’s actually really interesting—which is that position you’ve just mentioned that says there’s some kind of essential connection between the fact that we’re biological systems and maybe we’ve got a certain kind of material constitution, a kind of carbon-based material constitution and consciousness. I find this view baffling honestly, but there are very, very smart and intelligent people who argue for this view.
Maybe we could say something about what this view is positioned against, which is an alternative perspective, often connected to what philosophers call functionalism that says, roughly speaking, psychological states, including those connected to consciousness, should be understood functionally in terms of what a system could do. And if you think that, then you’re going to think psychological states are substrate neutral. So it might happen in the case of human beings that how we do the things that human beings do, like perceiving and imagining and reasoning and deliberating and so on, it might happen to be the case that those functions are performed by ultimately carbon based matter. But if you could build a silicon based system that could perform the same functions, then it would be accurate to describe that system in this psychological vocabulary of perceiving and understanding, and also potentially experiencing, if you think that you’ve got a functionalist view of experience.
Now, I find that view, I mean, as with philosophy, it’s difficult often to precisely articulate exactly the content of the appropriate formulation of that view, but I find the intuition very, very plausible. I find it weird the idea that if you could build a system that’s functionally identical to a human being, but it’s just made of different stuff, then we should deny that that system has consciousness. But as we’ve said, there are some smart people who disagree with that, who think that no, consciousness is essentially connected to the fact that we’re living systems with a carbon-based substrate. Why do they think that? So what’s the motivation for this biology-centric view of consciousness?
Henry Shevlin: So I should say I’m probably not the best person to steelman this view because frankly I’m also very sympathetic towards functionalism. Maybe one thing I’ll just quickly say to slightly problematize functionalism before we get into motivations here is it’s very hard to know—so you say, if a system does the same function, it’s functionally identical to humans, right? But there are different levels of granularity there that we can talk about. So I presume probably you would say that a system that’s functionally identical to humans just at the level of behavior, but works completely differently on the inside—it’s at least conceivable that that system is not conscious, right?
So you might say, okay, well, behavior is not quite enough to guarantee that the system is conscious in the same way that we are, at least. So we need to move below the level of function to something more like architecture. So people talk about microfunctionalism. Maybe the system needs discrete areas of like perception, maybe it needs something more like working memory, but you can still say, but how different could the system get while still being conscious, right? So functionalism is—I think I’m very sympathetic to some form of functionalism, but actually spelling out how similar a system would need to be in terms of its internal organization, in terms of its cognitive architecture is a really, really tough challenge, unless you just grasp the nettle and say, no, look, if it’s behaviorally identical to a human, that’s all that matters, right? So that’s the most extreme form of functionalism in one sense. So functionalism is—it’s not a simple slam dunk.
But as to why someone might think that, that the biological view that only living systems can be conscious, I think by far the most persuasive and sympathetic defender of this kind of view currently is Anil Seth. I think he’s written some great work on this. I’ve just written a response to him in Behavioral and Brain Sciences. And he says, look, there’s all sorts of very, very complex features of biological systems, stuff like they maintain homeostasis with their environment, they’re self-propagating. You have sequences of biological processes that he interprets through Friston’s free energy principle, which you definitely don’t want to get into now. But basically, biological systems do a hell of a lot more than just produce sentences of English or accomplish tasks, right?
So the idea that those are the only things that matter for consciousness is to beg massive theoretical questions, right? All we know for sure is that systems like us are conscious, and the fact that we can produce text and we can produce verbal outputs and accomplish goals is not even the most interesting thing about us. There are so many other relevant facts to our constitution, to the kind of beings that we are, all this kind of fine-grained biological stuff that really is critical to the kind of things that we are, and we should therefore have at least some reason to think that it might be instrumental in consciousness. As I say, I think we should get Anil on the show really, because I think he’ll do a much better job of steelmanning it, and I’m sure he’d love to come on as well.
Dan Williams: We should do, yeah. Yeah, no, he definitely would, yeah. Yeah, just quickly on that point, I mean, I think this is a bit nerdy and getting into the weeds of things a little bit, but it is an important distinction to make philosophically, which is I think there’s a position that says something like, we should be skeptical that it will be possible in practice to build silicon-based systems that could in fact perform all of the complex functions that our biology can perform. And it’s important to distinguish that view from another view that says even if you could build a system that was functionally identical, it wouldn’t have consciousness.
Those are two very, very different views. They might seem like they’re the same, and I think people often go back and forth between them. But they are different. The second one is making a much stronger metaphysical claim. The former one is ultimately an empirical claim about the capabilities, the capacities that you can get from certain specific forms of matter. And I’m not entirely sure which one Anil Seth is opting for, but I find, to be honest, the empirical claim about the limitations of, let’s say, silicon-based systems—I find that interesting and quite implausible, but I find it more plausible than the other claim, which is that even if you could replicate all of this stuff functionally, it wouldn’t be conscious.
Henry Shevlin: Yeah, I think the first challenge, first claim is basically an engineering challenge, right? It’s like saying, I don’t reckon you’ll be able to build something with a full range of human cognitive and behavioral capabilities just using silicon. There’s too much about our specific architecture that really matters for what we can do. And I mean, this has actually gotten me—it’s closely related to the ongoing debates about, for example, whether the whole architecture of LLMs can scale up to something like general intelligence. You might say, this is a source of current controversy and debate. Maybe we need to go back to something that is much closer to brain-inspired AI. Maybe, in fact, we need to be using literal biological neurons if we want to build AGI. It’s not a position I find super persuasive, but it’s an empirical question. It could turn out that just silicon architectures—and certainly silicon architectures like transformer models are just never going to give us the full range of behavioral capabilities that humans have. But yeah, it’s 100% an empirical question.
Where I really start to disagree though is when people say, well yeah, even if you could do that, exactly as you say, it still wouldn’t be conscious. Because at that point you’re really entering the domain not of engineering or science, but of metaphysics.
Dan Williams: Yeah, I mean, just to sort of double click on that point, I mean, not that I’m a great expert in this area or anything like it, but my sense is not only is it not the case that we’ve got certain kinds of material properties that means we’ll be able to do things that silicon-based systems won’t be able to do, I think rather it’s almost the opposite situation where I think we’ll be able to build forms of intelligence with silicon-based systems which are far more complex and impressive than the sorts that you get with carbon-based systems. But that’s a whole complex conversation and a bit of a digression. Let’s go back to... go on.
Henry Shevlin: Although I will say, I’ll just add one other thought here, which is, you sort of signposted quite helpfully an important distinction that gets blurred between the engineering version of the challenge to AI consciousness and the metaphysical version of the challenge to AI consciousness. But another line that sometimes gets blurred is the difference between the view that AI systems can’t be conscious at all and the view that AI systems can’t be conscious in exactly the same ways that we are.
I think the second claim is a lot more plausible. It may be that the precise qualities of pain or orgasms that humans have, that they would be very hard to instantiate in any purely silicon-based architecture. Or even if you did, you’d need to basically build an entire micro-scale functional model of a human mind or something like that. I find that view very plausible. So I think if AI is conscious and has a different architecture from ours, which it’s almost certainly going to do, then yeah, I imagine its conscious experience will be very different. But that is a move that I think sometimes gets glossed over, going from the claim that AIs won’t be conscious like us to AIs couldn’t be conscious at all.
Dan Williams: Yeah, and that’s such an important point. I mean, just to return also to what we were talking about earlier on, bats are not conscious in the way that human beings are conscious. They’ve got a very different, presumably a very different kind of set of conscious experiences, and yet we don’t think that means that they’re therefore not conscious. And I take the point that there might be even greater differences between the consciousness of the kinds of machines that we’re potentially building and us, but I think the point still stands.
You mentioned consciousness science, and I realize this is a huge can of worms, and it might be that we’ll need to do a whole other episode on this to really get into the weeds of this. So there are metaphysical views about what consciousness is in some very general abstract sense that we’ve already touched on. And then there are views which you find in neuroscience and psychology about the sort of—what’s the appropriate theory about consciousness, which presumably might be consistent with different metaphysical interpretations of the theory. What’s your sense of the big players when it comes to that area of like the theories of consciousness in that more specific non-metaphysically committal sense?
Henry Shevlin: Yeah, so this is a really important distinction between metaphysical theories of consciousness, like physicalism, dualism, and so on, and scientific theories of consciousness. Just to add a little bit of autobiographical detail here. So I spent four years banging my head against metaphysical problems in consciousness, and then was lured away into philosophy of cognitive science towards scientific approaches to consciousness, which seemed to me to be potentially a lot more—I won’t say interesting, but more fruitful or productive.
Now, to be clear, these are not trying for the most part to answer the same questions as the metaphysical theories. These are not trying to answer, to solve the hard problem, the problem of why there is consciousness at all. Instead, they’re for the most part trying to understand which kinds of brain dynamics or functional dynamics or informational dynamics are associated with conscious versus unconscious processing.
So I think probably modern consciousness science in many ways you can see—not the very beginning, but it really starts to come into its own in the early 90s with the work of people like Christof Koch. Actually, maybe I could push it a little bit back to the 80s and the work of people like Bernie Baars. But let’s talk about what was happening in the 90s. In particular, Francis Crick, one of the discoverers of DNA, has this lovely paper where he basically claims consciousness should be approached from a scientific angle. I’ve got a nice quote from this I can read: “No longer need one spend time attempting to understand the far-fetched speculations of physicists, nor endure the tedium of philosophers perpetually disagreeing with each other. Consciousness is now largely a scientific problem. It is not impossible that with a little luck we may glimpse the outline of the solution before the end of the century.”
And he was writing that in 1996. Suffice to say, that’s not how it worked. That’s not how it played out. But one of the projects that Crick and others contributed to was this search for neural correlates of consciousness. So we know about unconscious perception, for example, versus conscious perception. What’s going on in your brain that distinguishes the two cases? So if I show an image just below threshold, so as far as you’re concerned, you didn’t see anything, and I show it just above threshold, so you are aware that you’ve seen something, what’s going on in terms of brain dynamics that distinguishes those two cases?
And this question, this way of framing the question, it’s very relevant, for example, if we’re interested in things like predicting recovery of patients in persistent vegetative states or identifying lingering consciousness in people in minimally conscious states. And it’s given rise to a whole host of different theories of consciousness, scientific theories of consciousness. So there are many of these and new ones are being added all the time. I think it’s safe to say that none of them have won the room.
But the big ones are views like global workspace theory, which basically says consciousness is a kind of way that information gets shared across the brain. So information is processed in the brain—I should say more neutrally in intelligent systems. If information is localized, only available to some subsystems, it’s non-conscious. But when information is available to all subsystems, or a dedicated global workspace, as it’s called, basically a dedicated broadcasting network across the brain, that’s when information becomes conscious.
Another example of an influential theory is higher order thought theory that says information in the brain becomes conscious when it is the target of a further higher order thought. So at some level, your brain—you think to yourself, not consciously, although that’s a messy question—at some level your brain is representing, I’m having a perception of red right now, or I’m perceiving a red apple. So when you get a higher order state turning a spotlight as it were on a first order state, a first order mental state, that first order mental state becomes conscious. Now that’s a very crude...
Dan Williams: Can I just ask, just really quickly on that as a follow up, because I think this might be occurring to people. So in the global workspace theory, the conscious states are those that get broadcast to other cognitive systems within the system, within the brain in our case, but you can imagine AI systems working like this. Whereas the higher order thought theory says a psychological state becomes conscious when it’s the target of another thought. And the difference between those is in the global workspace theory, whether or not a state gets broadcast to these other cognitive systems is not a matter of whether it’s the target of another thought. Is that correct as a way of understanding the difference? Okay.
Henry Shevlin: Exactly. Exactly. Yeah. So on a global workspace theory, it doesn’t matter whether a thought is the target of another thought or a mental state is the target of a thought, a higher order representation. As long as it’s just available to everything, it’s conscious. Whereas for the higher order theorist, only those thoughts that are directly thought about or any mental states that are directly thought about are conscious.
Dan Williams: Yeah, okay, so those are two influential theories. As you’ve mentioned, there are a ton of theories, maybe not a ton of influential theories, and potentially people having their own theories that they’re introducing all of the time. I mean, from your perspective then, I suppose one way in which you could approach AI consciousness is you look at scientific theories of consciousness, which have been developed, presumably developed entirely by looking at examples of consciousness either in human beings or to a lesser extent maybe in other animals. You take those scientific theories of consciousness and then you sort of conditionalize and you say if such and such a theory is correct then what are the implications of that theory for thinking about AI consciousness? Is that how you think about this topic or do you approach things differently?
Henry Shevlin: Well, I think that’s a productive way to think about it. So there’s this wonderful report by Patrick Butlin, Robert Long and others from a couple of years ago called “Consciousness in Artificial Intelligence: Insights from the Science of Consciousness.” It basically does exactly that. It says, let’s take all of the leading theories and say, what would an AI system need to do in order to be conscious by the lights of these theories and do any current AI systems do it? And what they find—it’s a very long, very good report—but basically is that consciousness does not seem like the kind of thing that’s impossible for current architectures to realize. In some cases, not for every theory, but for many theories, even with fairly minor tweaks, existing kinds of architectures could give rise to conscious systems.
But there are a couple of problems with this. The first big one, of course, is that there are literally hundreds, if not thousands, of different scientific theories of consciousness. And I think they’re basically never refuted. New theories are constantly being added to the table. The second thing is that as they do in the report, right, there are different ways of operationalizing these theories of consciousness.
So for example, global workspace theory in its modern form, sometimes called global neuronal workspace theory, it’s associated with Stanislas Dehaene, a fantastic cognitive neuroscientist. And he’s got a great book called Consciousness and the Brain. He spells out what global workspace theory says, but he spells it out in a few subtly different ways. At one point he says something like, consciousness is any kind of system-wide information sharing. At other points he says, consciousness occurs when information from working memory is made globally available to modules including episodic memory, language and so on.
Hang on, those are two quite different claims, right? The first one suggests that even quite simple architectures might be conscious, maybe some existing architectures. The second one makes it sound like only creatures with our specific sorts of cognitive organization can be conscious. So even with our existing theories, there are different ways of spelling them out. This is something I go into in a paper called “Non-Human Consciousness and the Specificity Problem.” Different ways of unpacking or operationalizing them that have potentially very different conclusions for whether or not AI systems or for that matter non-human animals are conscious.
Dan Williams: Yeah, okay. I mean, yeah, that seems right. I suppose one issue here, which is hanging over the entire thing is understanding of consciousness, philosophically, metaphysically, scientifically, there’s so much uncertainty still that all of that is then carrying over to these issues of AI consciousness in a really significant way.
I mean, maybe we can just end with two overarching questions. The first of them, I think, follows pretty directly from what we’ve been saying. The second is a question which connects to, I guess, social, political, ethical stuff. So the first question I think we should look at is, okay, in light of all that we’ve said so far about the metaphysics of consciousness, the weirdness of consciousness in the scientific world, these different scientific theories, and so on, how should we actually think about state of the art AI systems? What are your views about that? And then the second question is what’s at stake here? Why is this an important issue? Because I think there are ultimately, you know, there are very interesting scientific, there are very interesting metaphysical questions here, but there are also presumably very, very important ethical questions when you’re dealing with the possibility of conscious machines that’s hanging over the conversation as a whole.
Taking that first question, I mean, what’s your sense, given your expertise in this area, given your views in this area, if you take a system like Claude, Gemini, ChatGPT, what’s your sense? Are these systems conscious in some sense, is there something it’s like to be them?
Henry Shevlin: So it’s a very reasonable question and it’s one I don’t have a good answer to. I think basically the only kind of answer I can give, given the massive uncertainty, is to hedge across so many different theories, so many different methodological approaches. Probably my conviction is that basically we don’t know our ass from our elbow when it comes to what consciousness is or how to measure it. Therefore, I think we are basically in a state of near total uncertainty when it comes to consciousness in AI systems.
That said, I’m a good Bayesian, I can deal with all this. So if I had to put numbers on it, they would come with huge error bars. But I think there’s a non-trivial chance that some existing AI systems have at least some minimal form of consciousness. And in particular, we don’t want to get too deep in the weeds here, but I don’t think it’s likely that any AI systems feel pain or have perceptual experience. But there’s a type of consciousness that’s sometimes called cognitive phenomenology. Think about the kind of experiences you have when you’re reasoning through a problem or come to a sudden insight or comparing two different ideas in your head without any accompanying visual imagery, just the raw processing of concepts. If you think there’s some kind of conscious experience associated with that, it doesn’t seem crazy to me to think there could be some kind of analog of that in AI systems.
And I guess one reason I’m a little bit more open minded about that than I think some people is because I’m pretty liberal about consciousness in biology. I think I’m very high credence, probably above 80%, that honey bees, for example, are conscious. I think it’s just the best way of understanding complex behavior in honey bees. There’s a whole big story there. But the point is I’m pretty liberal about where I think consciousness extends in nature. It can arise in quite simple systems, which I think pushes me towards being a bit open-minded about the possibility of consciousness in AI systems.
But it’s worth just really emphasizing the degree of uncertainty among experts on this. I’ve got some nice choice quotes here. So back in February 2022, Ilya Sutskever, former chief scientist at OpenAI said, “It may be that today’s large neural networks are slightly conscious.” This happened on Twitter, like all good philosophical discussions. And he got a reply from Yann LeCun, who’s obviously now head of AI research at Meta, or at least was until recently. Very prominent AI researcher at Meta.
Dan Williams: And also we should add a critic of the large language model based approach to AI. Sorry to interrupt.
Henry Shevlin: No, absolutely. It’s an important addendum. So he replied to Ilya saying, “No, not even true for small values of slightly conscious and large values of large neural nets.” But Murray Shanahan of DeepMind, I thought had the best reply here. He said, “They may be conscious in the same sense that a large field of wheat is slightly pasta.” Which I think is just brilliant and hilarious. So in other words, you’ve got the raw materials there, but it hasn’t been turned into the finished product. That’s one way of interpreting that.
And on the philosophers, philosophers are just as divided. Dave Chalmers has said, “Questions about AI consciousness are becoming ever more pressing. Within the next decade, even if we don’t have human level artificial general intelligence, we may have systems that are serious candidates for consciousness.” But Dave’s colleague at NYU, Ned Block, another one of the titans of modern consciousness, says, by contrast, “Every strong candidate for a phenomenally conscious being has electrochemical processing in neurons that are fundamental to its mentality.” And my old PhD supervisor, Peter Godfrey-Smith, also said in his book, “If the arguments in this book are correct, you cannot create a mind by programming some interactions into a computer, even if they’re very complicated and modeled on things that our brains do.”
So I think that just gives you a sense, these are all titans in their fields, just how divided opinion is on this issue.
Dan Williams: Yeah, right. You can’t just trust the experts, as people like to say with that slogan. I mean, I should say, I actually share your view, and there’s massive uncertainty, but it’s certainly not absurd to think that there’s something it’s like to be these systems that are state of the art, even though we shouldn’t think that what it’s likeness is anything like what it’s like to be a human being. But just to play the devil’s advocate view, I think lots of people think that is just kind of crazy. Actually, we should be certain there’s nothing it’s like to be these systems. They’re chatbots. They’re doing next token prediction on vast bodies of text. That’s not quite right, actually, for some of these systems, but that’s the kind of view, right? They’re stochastic parrots.
I saw a post the other day on another social media platform, Bluesky, with its own pathologies as a social media platform. And that was very much the spirit of the post. And from what I can gather, this is also the spirit of the popular opinion on Bluesky, which is just, it’s kind of absurd to even be talking about consciousness in a large language model. And maybe not even just absurd, but also potentially kind of dangerous or troubling or buying into the hype of these profit-seeking corporations and so on. So I don’t believe any of that, but I want to throw that at you and get your response.
Henry Shevlin: Yeah, so I find it a little bit baffling that people would see this as an offensive question. I mean, I’ve had people say, it’s offensive to think that an AI system could be conscious. And I just want to say, look, this is a scientific question, right? And it’s a philosophical question. And honestly, it doesn’t even necessarily have any direct normative implications. Maybe we can get to that in a second. But I mean, simply saying there might be some basic forms of cognitive phenomenology in an AI system—that doesn’t entail robot rights by itself, for example. So anyway, I think I don’t get the offensiveness angle.
One argument that I have engaged with, and I think it’s worth unpacking a bit, is this idea that it’s just matrix multiplication, or it’s just next-token prediction. And without wanting to go on too long a digression here, I think people really need to go away and read their David Marr. David Marr, one of my absolute heroes in cognitive science, died tragically young, but wrote a very influential book, his one and only book published posthumously in the 1980s on vision. And one of the basic insights he brings to bear is that look, almost any complex informational system, whether you’re talking about human vision or a computer, can be analyzed at multiple levels of explanation.
So you’ve got the high level functional, what he called computational explanation, which is what is this system or subsystem doing? So in the case of a part of your vision, it might be the system detects edges. Then moving down a level, you’ve got the algorithmic explanation. So how is that function accomplished in informational terms? What kinds of computations are being done to calculate where an edge is in your visual field? And finally, you’ve got the implementational level explanation, like what neurons are doing what, what circuits are doing what, how is this algorithm actually instantiated? And the point is almost any system is going to be analyzable at multiple levels right?
So at some level there’s going to be a mathematical gloss on what’s going on when I’m thinking through a problem in the human brain, at least if you accept even the most basic form of scientific naturalism, right? And even if you think there’s much more that’s going on, sure there may be in the human brain, but we’re also doing the computations, right? There’s going to be at least a computational level of description. And you can see this so clearly in a lot of perception, for example, where people produce very accurate predictive models, computational models of how early vision works or how early linguistic processing works.
And the fact that you can give the implementation level, algorithmic level, lower kind of functional level explanations of what any system is doing doesn’t exclude psychological or phenomenal level descriptions at all.
Dan Williams: Right, right, yeah. Such an important point. And I think one of the things that being a philosopher, or at least a good philosopher, trains you to do is whenever you encounter someone saying X is just Y, for alarm bells to go off, because I think that often smuggles a whole lot of very, very dubious ideas. Whether these systems are just minimizing prediction error on a next-token prediction task, or human beings are just complex biophysical machines or whatever. There’s a lot going on when people say a comment like that, a lot that’s getting smuggled in that needs to be actually thought about rigorously in a way that it’s often not.
Okay, we’ve touched on this already. Let’s end with this issue of what’s at stake. So I mean, this is, surprisingly to me at least, but it’s undeniably kind of a polarized area of discussion, both AI generally. I think there’s lots of heated, let’s say, conversations about how to make sense of this technology, but also specifically when it comes to these issues of AI consciousness or sentience, as people often describe consciousness in sort of popular conversation. What’s your sense of what’s at stake? Why does this matter? Why is this an important conversation?
Henry Shevlin: Yeah, so I think this is a really key point. And maybe we should have even mentioned this at the start of the program. One of the reasons why consciousness matters so much more, I think, than most other kinds of psychological states—we can argue about whether LLMs have beliefs or understand, but what makes consciousness so important is its connection to ethical issues, right? So there’s this famous passage in Peter Singer, one of the godfathers of modern utilitarianism and effective altruism, where he says, “A schoolboy kicking a stone along the road isn’t doing anything morally wrong because the stone can’t suffer. It’s not conscious and it therefore doesn’t have any interests, right?”
So one of the reasons we care about consciousness is because consciousness seems like a prerequisite on many views for having interests at all. If you can’t suffer, if you can’t feel pain or orgasms, if you can’t have positive or negatively valenced experiences, right? Then it’s very unclear whether you deserve any kind of moral consideration at all, or at least you get a lot of extra moral consideration by virtue of this.
And this is also one of the reasons why I think consciousness in animals is what actually got me so interested in animal consciousness, because one of my favorite essays of all time is David Foster Wallace’s “Consider the Lobster.” Fantastic read, highly recommended. And when a chef drops a lobster into a pot of boiling water, it seems to me to matter a great deal whether there’s something it’s like for that lobster to experience that, whether it suffers pain and suffering. That seems like a very important question. And equally, if we’re thinking about which animal welfare interventions to prioritize, right? Is it worth spending money on shrimp welfare, for example, to give a controversial area? Right? It seems to matter a great deal whether there’s anything it’s like to be shrimp, whether they can genuinely suffer. And so I think consciousness has this special normative connection that just isn’t clearly shared by any other psychological concepts. And that’s part of what makes it so important.
Dan Williams: Yeah, I completely agree with that. I’m going to maybe just end with a discussion potentially of the contrast between those cases where we’re thinking about non-human animals and these cases where we’re thinking about AI systems. I mean, there are two mistakes you can make here, right? There’s a kind of false positive where you attribute consciousness where it doesn’t exist or a false negative where you fail to acknowledge consciousness that does exist.
It seems to me that when it comes to, say, lobsters, we should err on the side of caution. I think there’s very, very good reason to think there is something it’s like to be a lobster. But fair enough, there’s uncertainty. It doesn’t seem like it’s the end of the world if we have a false positive here and that stops us from burning them alive. It seems, I mean, beyond egregious, on the other hand, to make the mistake of failing to take into consideration that they’re conscious.
With AI systems, I suppose it’s complicated because clearly there are real big issues here if we are in fact manufacturing conscious systems and we’re not recognizing them as conscious. But I can also see the argument from people who say there could also be big issues here if we’re just building—just to use that word—if we’re building machines that aren’t conscious, there’s nothing it’s like to be these systems, they’re sophisticated chatbots or whatever, and we’re treating them as conscious. To me at least, I can understand what the downsides might be in that kind of scenario. I think there is something a little bit troubling if people start treating systems that aren’t conscious, that is AI systems that aren’t conscious as if they are, in a way that when it comes to these other cases, I’m not so sure. Have you got any thoughts about that, just to sort of wrap things up?
Henry Shevlin: Yeah, absolutely. So I think another really important difference here is that there are some states that we undergo that we recognize as very, very bad, like extreme pain, starving to death, extreme nausea, that seem to have fairly straightforward physiological analogues in non-human animals. And I think that justifies a pretty strong precautionary attitude about inflicting those kinds of states on animals, right? Sticking me in a pot of boiling water, I can tell you would be pretty horrific. And so it’s probably a good idea not to do the same things to creatures that have relatively similar behavioral responses to pain to me, right?
But it’s just much less clear what it would even mean for an LLM to suffer, right? They don’t have bodies. They don’t have any kind of somatosensory processing. That doesn’t mean they can’t suffer, right? But it makes the question of what suffering in LLMs would look like a lot harder to answer. So I think one model for thinking about how LLMs work, that Murray Shanahan has popularized, is that they sort of role play. So if Claude says, like, this is awful, I’m really distressed right now, right? Is that more like an actor who’s portraying Romeo on stage? “My heart is shattered by the death of my beloved Juliet,” right? Or is it actually suffering? I think this is one of the things that makes AI welfare really hard.
My general sense here is that this is an area where we absolutely need better understanding, but also just better theoretical models of what it would even mean for an AI system to suffer in the first place. And I also, I think I am sympathetic to the idea that precautionary principles are at least easier to apply in fairly straightforward ways in the animal welfare case compared to the AI case.
That said, I also don’t think we can rest easy in the AI case. And partly, just to give one little example, suffering in biological organisms seems to be relatively biologically constrained. You have various kinds of negative feedback mechanisms like endorphins and so forth, just because suffering is generally not a particularly adaptive state to be in. It’s a powerful way that your body can send a signal that you’re injured or you desperately need to eat and so forth. But there are biological dampening mechanisms in most cases, not in all, but in that sort of create a kind of upper limit.
But it’s not clear that those would arise spontaneously or by design in AI systems. So the kind of theoretical upper limits of the extremes of suffering in AI systems may be less constrained than in biological systems. All that is very speculative, of course, but just to note that there could potentially be a lot of suffering associated with badly designed AI systems. And of course, if you’re dealing with systems that can have billions or trillions of instantiation simultaneously. That could quickly add up to some really messed up things that we’re doing.
I don’t think, for what it’s worth, I’ve not seen any compelling evidence, or I don’t think there’s much compelling reason to say any specific things we’re doing with AI right now are plausible candidates. There’s no AI equivalent to factory farming that’s obviously objectionable. Also, just to forestall one argument that I hear all the time, and I think is just so—it really frustrates me is when people say, the only reason people want you to think AI machines suffer is because AI companies want that extra recognition for the value of their products. I can tell you this is the last possible thing in the world that any tech company wants. The idea that their products that are making them billions of dollars right now, the idea that they might have rights, would be really, really bad for their business models.
Dan Williams: Right.
Henry Shevlin: Insofar as there are people pushing for greater awareness of AI welfare, they are not operating with any kind of commercial agenda in mind. And I think commercial agendas, in fact, push in the opposite direction.
Similarly, also just because—it’s so interesting how many people get really angry about this debate, the fact the debate is even happening. I meet people who’ve said it’s obscene to even debate the idea that AI systems might one day deserve rights. What I would say to these people is look, even if you think, like, even if your cognitive science of sentience or consciousness or theory of moral patency, even if you think there’s basically no chance that any AI systems are conscious, well, you should be engaged in this debate because a lot of people are going to take that seriously. And if you think they’re making a mistake, you need to engage with them and tell them why they’re making a mistake, right? It’s not a debate that I think you can just dismiss with a look of disgust, right? If you think that we’re in danger of making massive false positive ascriptions of moral status to AI systems, you need to tell people why and actually have that conversation, rather than dismissing it with a grimace of disgust.
Dan Williams: Yeah, yeah, just dismissing it doesn’t seem like it’s an option now. And I think as these systems develop in sophistication and their capabilities develop, that that conversation is going to become more and more important. And I think one thing that you alluded to there is there’s a question about how we should treat these systems. And there’s a question about how human beings will treat these systems. And I know that connects to some of your interests when it comes to things like social AI and so on. But we should postpone that to a future conversation. Henry, this was fantastic. Any final word, final comment that you want to end on?
Henry Shevlin: All right, so I’ll close on two reflections. The first is, on the one hand, as listeners who were not previously familiar with consciousness debates will have realized, it is possibly the messiest debate out there. Both scientifically, theoretically, metaphysically, it is an absolute snake pit of a debate. But don’t be put off, because I think it is also the most fascinating and rewarding question that I’ve ever worked on. I have happily dedicated basically most of my academic life to working on consciousness and I don’t regret it for a second.