Henry Shevlin and I sat down to discuss a topic that is currently driving both of us slightly insane: the impact of AI on education.
On the one hand, the educational potential of AI is staggering. Modern large language models like ChatGPT offer incredible opportunities for 24/7 personal tutoring on any topic you might want to learn about, as well as many other benefits that would have seemed like science fiction only a few years ago. One of the really fun parts of this conversation was discussing how we personally use AI to enhance our learning, reading, and thinking.
On the other hand, AI has clearly blown up the logic of teaching and assessment across our educational institutions, which were not designed for a world in which students have access to machines that are much better at writing and many forms of problem-solving than they are.
And yet… there has been very little adaptation.
The most obvious example is that many universities still use take-home essays to assess students.
This is insane.
We discuss this and many other topics in this conversation, including:
How should schools and colleges adapt to a world with LLMs?
How AI might exacerbate certain inequalities.
Whether AI-driven automation of knowledge work undermines the value of the skills that schools and colleges teach today.
How LLMs might make people dumber.
Links
John Burn-Murdoch, Financial Times, Have Humans Passed Peak Brain Power?
James Walsh, New York Magazine, Everyone Is Cheating Their Way Through College
Rose Horowitch, The Atlantic, Accommodation Nation
Transcript
Note: this transcript is AI-generated and may contain mistakes.
Dan Williams
Welcome everyone. I’m Dan Williams, and I’m back with my friend and co-conspirator, Henry Shevlin. Today we’re going to be talking about a topic which is close to both of our hearts as academics who have spent far too long in educational institutions: the impact of AI on education and learning in general, but also more specifically on the institutions—the schools and universities—that function to provide education.
There’s a fairly simple starting point for this episode, which is that the way we currently do education was obviously not built for a world in which students have access to these absolutely amazing writing and problem-solving machines twenty-four seven. And yet, for the most part, it seems like many educational institutions are just carrying on with business as usual.
On the one hand, the opportunities associated with AI are absolutely enormous. Every student has access twenty-four seven to a personal tutor that can provide tailored information, tailored feedback, tailored quizzes, flashcards, visualizations, diagrams, and so on. On the other hand, we’ve quietly blown up the logic of assessment and lots of the ways in which we traditionally educate students—most obviously with the fact that many institutions, universities specifically, still use take-home essays as a mode of assessment, which, at least in my view (and I’m interested to hear what Henry thinks), is absolutely insane.
So what we’re going to be talking about in this episode are a few general questions. Firstly, what’s the overall educational potential when it comes to AI, including outside of formal institutions? What are the actual effects that AI is having on students and on these institutions? How should schools and universities respond? And then most generally, should we think of AI as a kind of crisis—a sort of extinction-level threat for our current educational institutions—or as an opportunity, or as both?
So Henry, maybe we can start with an opening question: in your view, what is the educational potential of AI?
Henry Shevlin
I think the educational potential is insane. I almost think that if you were an alien species looking at Earth, looking at these things called LLMs, and asking why we developed these things in the first place—without having the history of it—you’d think, “There’s got to be some kind of educational tool.” If you’ve read Neal Stephenson’s The Diamond Age, you see a prophecy of something a little bit like an LLM as an educational tool there.
I think AI in general, but LLMs specifically, are just amazingly well suited to serve as tutors and to buttress learning. Probably one key concept to establish right out the gate, because I find it very useful: some listeners may be familiar with something called Bloom’s two sigma problem. This is the name of an educational finding from the 1980s associated with Benjamin Bloom, one of the most prominent educational psychologists of the 20th century, known for things like Bloom’s taxonomy of learning.
Basically, he did a mini meta-analysis looking at the impact of one-to-one tutoring compared to group tuition. He found that the impact of one-to-one tutoring on mastery and retention of material was two standard deviations, which is colossal—bigger than basically any other educational intervention we know of. Just for context, one of the most challenging and widely discussed educational achievement gaps in the US, the gap between black and white students, is roughly one standard deviation. So this is twice that size.
Now, worth flagging, there’s been a lot of controversy and deeper analysis of that initial paper by Bloom. For example, the students in the tutoring groups were mostly looking at students who had a two-week crammer course versus students who had been learning all year, so there were probably recency effects. He was only looking at two fairly small-scale studies. Other studies looking at the impact of private tutoring versus group tuition have found big effects, even if not quite two standard deviations. And this makes absolute intuitive sense—there’s a reason the rich and famous and powerful like to get private tutors for their kids.
Dan Williams
Yeah.
Henry Shevlin
There’s a reason why Philip II of Macedon got a tutor for Alexander. And more broadly, I think we can both attest as products of the Oxbridge system: one of the key features of Oxford and Cambridge is that they have one-to-one tutorials (or “supervisions,” as the tabs call them). This is a really powerful learning method.
So even if it’s not two standard deviations from private tuition, it’s a big effect. Now, people might be saying, “Hang on, that’s private tuition by humans. How do we know if LLMs can replicate the same kind of benefits?” It’s a very fair question. In principle, the idea is that if it’s just a matter of having someone deal with students’ individual learning needs, work through their specific problems, figure out exactly what they’re misunderstanding and where they need help, there’s no reason a sufficiently fine-tuned LLM couldn’t do that.
I think this is the reason Bloom called it the “two sigma problem”—it was assumed that obviously you can’t give every child in America or the UK a private tutor. But if LLMs can capture those goods, everyone could have access to a private LLM tutor.
That said, I think the counter-argument is that even if we take something like a two standard deviation effect size on learning and mastery at face value, there are things a human tutor brings to the table that an AI tutor couldn’t. Social motivation, for one. I don’t know about your view, but my view is that a huge proportion of education is about creating the right motivational scaffolds for learning. Sitting there talking to a chat window is a very different social experience from sitting with a brilliant young person who’s there to inspire you. Likewise, I think it’s far easier to alt-tab out of a ChatGPT tutor window and play some League of Legends instead, whereas if you’re sitting in a room with a slightly scary Oxford professor asking you questions, you can’t duck out of that so easily.
So I think there are various reasons why we probably shouldn’t expect LLM tutors to be as good as human private tutors. But I think the potential there is still massive. We don’t know exactly how big the potential is, but I think there’s good reason to be very excited about it. And personally, I find LLMs have been an absolute game-changer in my ability to rapidly learn about new subjects, get up to speed, correct errors. In a lot of domains, we all have questions we’re a little bit scared to ask because we think, “Is this just a basic misunderstanding?”
Dan Williams
Yeah.
Henry Shevlin
Anecdotally, I know so many people—and have experienced firsthand—so many game-changing benefits in learning from LLMs. But at the same time, there’s still a lot of uncertainty about exactly how much they can replicate the benefits of private tutors. Very exciting either way.
Dan Williams
I think there’s an issue here, which is: what is the potential of this technology for learning? And then there’s a separate question about what the real-world impact of the technology on learning is actually going to be. That might be mediated by the social structures people find themselves in, and also their level of conscientiousness and their own motivations. We should return to this later on. Often with technology, you find that it’s really going to benefit people who are strongly self-motivated and really conscientious. Even with the social media age—we live in a kind of informational golden age if you’re sufficiently self-motivated and have sufficient willpower and conscientiousness to seek out and engage with the highest quality content. In reality, lots of people spend their time watching TikTok shorts, where the informational quality is not so great.
But let’s stick with the potential of AI before we move on to the real-world impact and how this is going to interact with people’s actual motivations and the social structures they find themselves in.
So the most obvious respect in which this technology is transformative, as you said, is this personal tutoring component. Maybe we could be a bit more granular. We’re both people who benefit enormously from this technology. I think I would have also benefited enormously from this technology if I’d had it when I was a teenager. I remember, for example, when I was a teenager, I did what many people do when they feel like they want to become informed about the world: I got a subscription to The Economist when I was 14 or 15. And I would work my way through it, dutifully trying to read all of the main articles.
Henry Shevlin
God.
Dan Williams
And at the time I’d think, “What’s fiscal and monetary policy? I don’t fully understand the political system in Germany.” But if I’d had ChatGPT at the time, I could have just asked, “Explain to me the difference between fiscal and monetary policy.”
Henry Shevlin
I’m cringing because I did exactly the same thing. The Economist and New Scientist were the two cornerstones of my teenage education.
Dan Williams
So let’s maybe talk about how we use it, and at least in principle, if people had the motivation and the structure to encourage that motivation, how the technology could be beneficial for the process of learning. For example, how do you personally use this technology to enhance your ability to acquire and integrate and process information?
Henry Shevlin
I think one useful way to introduce this is to think about other kinds of sources of information and what ChatGPT adds. As a kid and as a teenager, I remember very vividly—I think I was about 10 years old when we got our first Microsoft Encarta CD-ROM encyclopedia. It blew my mind; I could do research on a bunch of topics. Some of them even had very grainy pixelated videos. It was great fun. And obviously the internet adds a further layer to your ability to do research. I’m also the kind of person who, even before the launch of ChatGPT, at any given time had about 30 different tabs of Wikipedia open.
So if you’re the kind of person who is interested and curious about the world, we live in an informational golden age. Our ability to learn about things has been improving; our tools for learning about the world have been improving. So what does ChatGPT and LLMs add on top of that?
First, I often find that even Wikipedia entries can be very hard to get my head around, particularly if I’m trying to do stuff in areas I’m not so good at. If I’m looking at some concepts in physics or maths—maths is particularly hilarious here. If you look up a definition of a mathematical concept, it’s completely interdefined in terms of other mathematical concepts. Absolute nightmare. Even philosophy can be the same: “What is constructivism? Constructivism is a type of meta-ethical theory that blah, blah, blah.” It can quickly get lost in a sea of jargon where all the terms are interdefined. Whereas you can just ask ChatGPT, “I’m really struggling. What is Minkowski spacetime? Please explain. ELI5—explain to me like I’m five.”
So in terms of getting basic introductions to complex concepts, being able to ask questions as you go—this is huge. Being able to check your knowledge and say, “Is this concept like this? Am I right? Am I misunderstanding this?” Being able to draw together disparate threads from topics—this is something that’s basically impossible to do prior to LLMs unless you get lucky and find the right article. So if I ask, “To what extent did Livy’s portrayal of Tullus Hostilius in his book on the foundation of Rome draw inspiration from the figure of Sulla?” (This is a specific example because I wrote an essay about it.) These kinds of questions where you’re drawing together different threads and asking, “Is there a connection between these two things, or am I just free-associating? Is this thing a bit like this other thing?”
Dan Williams
Yeah.
Henry Shevlin
These kinds of questions—you can just ask them. Other really good things you can do, getting into more structured educational usage: you can ask for needs analysis. Recently I was trying to get up to speed on chemistry—chemistry was my weakest science at high school. I said, “ChatGPT, I want you to ask me 30 questions about different areas of chemistry. Assume a solid high school level of understanding and identify gaps in my knowledge. On that basis, I want you to come up with a 10-lesson plan to try and plug those gaps.” And then you can just talk through it. I did a little mini chemistry course over about 30 or 40 prompts. So that’s a slightly more profound or interesting use.
Another really powerful domain is language learning. I’m an obsessive language learner; at any one time, I usually have a couple on the go.
Dan Williams
Hmm.
Henry Shevlin
Duolingo—I had a 1,200-day streak at one point—but it sucks, I’ll be honest, for actually improving fluency. It’s very good for habit formation, but it doesn’t really teach grammar concepts very well. It doesn’t build conversational proficiency very well. It’s okay for learning vocab. But LLMs used in the right way can be fantastically powerful tools for this.
Particularly with grammar concepts, you’ve often got to grok them—intuitively understand them. So being able to say, “Am I right in thinking it works like this? How about this kind of sentence? Does the same rule apply?” Or when learning a language, you’ll often encounter a weird sentence whose grammar you don’t understand. This is something you couldn’t really do prior to ChatGPT in an automated fashion: “Can you explain the grammar of this sentence to me? I just don’t get it.”
Also, Gemini and ChatGPT both have really good voice modes that are polyglot. So you can say, “ChatGPT, for the next five minutes, I want to speak in Japanese” or “I want to speak in Gaeilge. Bear in mind my language level is low, my vocabulary is limited. Try not to use any constructions besides these.” Or even, “Let’s have a conversation practising indirect question formation in German.” You can do these really tailored, specific lessons.
I’ll flag that language learning is one area where in particular the applications and utility of LLMs are just so powerful and so straightforward. But it’s funny—I’ve yet to see the perfect LLM-powered language learning app. Someone might comment on this video, “Have you checked out X?” But I’m sure in the next couple of years, someone is going to make a billion-dollar company on that basis.
Dan Williams
Surely, yes. Just to add another couple of things in terms of how I use it, which actually sounds very close to how you use this technology. One thing is: I think an absolutely essential part of thinking is writing. People often assume that with writing, you’re just expressing your thoughts. Whereas actually, no—in the process of writing, you are thinking.
One of the things that’s really great as a writer—I’m an academic, so I write academic research; I’m also a blogger and I write for general audiences—is to write things and say, “Give me the three strongest objections to what I’ve written.” And often the objections are actually really good. That’s an incredible opportunity because historically, if you wanted to get an external mind to critique and scrutinise what you’ve written, you’d have to find another human being, and they’re going to have limited attention. That’s really challenging. Whereas now you can get that instantly.
I also find that now when I’m reading a book—and I think reading books is absolutely essential if you do it the right way for engaging with the world and learning about the world—I’ll do the thing you’ve already mentioned: if there’s anything I don’t understand or don’t feel like I’ve got a good grip on, I’ll ask ChatGPT to provide a summary or explain it in simpler terms. But I’ll also often upload a PDF of the book when I can get it and think, “Here’s my current understanding of chapter seven. Can you evaluate the extent to which I’ve really understood it and provide feedback on something I’m missing?”
What you can also do—and I find Gemini is much better at this than ChatGPT—is ask it to generate a set of flashcards on the material, then take the flashcards it’s generated and ask it to create a file for Anki (which is a flashcard program) that you can import directly and use to test yourself on the knowledge over time. In principle, you could have done that prior to the age of AI, but the ease and pace with which you can do it today is absolutely transformative in terms of your ability to really quickly master material. So those are just a few things off the top of my head. I’m sure there are many other uses.
Henry Shevlin
That Anki suggestion is gold. I use Anki, and just to be clear to anyone not familiar: it is one of the best educational tips I can ever recommend. In any situation where you need to remember a mapping from some X to some Y—that could be learning vocabulary, mapping an English word to a Japanese word; it could be mapping a historical event to a date; mapping an idea to a thinker; or, one use case for me, mapping a face to a name (it doesn’t just need to be words).
One trick I used to do with my students: we’d have 40 students join each term in our educational programs. I’d create a quick flashcard deck with their photos (which they submit to the university system) and their names, and you can memorise their names in half an hour to an hour. It really does feel like, if you’ve never used it before, a cheat code for memory. It’s astonishing.
Dan Williams
Yeah.
Henry Shevlin
But I have not used Gemini for creating Anki decks. This is genius. And I think this illustrates a broader point: we’re still figuring out, and people are stumbling upon, really powerful educational or learning use cases for these things all the time. Even in this conversation—I think we’re both pretty power users of these systems—but I just picked up something from you right there. I have these conversations all the time: “Great, that’s a brilliant use case I hadn’t thought of.”
One thing I’ll also flag that maybe more people could play around with is really leaning into voice mode a bit more. Voice mode is currently not in the best shape—well, it’s the best shape it’s ever been, but I think we’re still ironing out some wrinkles on ChatGPT. Still, if I’m on a long car journey (I drive a lot for work, often to the airport), I’ll basically give a mini version of my talk to ChatGPT as we’re driving along. I’ll say, “Here’s the talk I’m going to be giving. What are some objections I might run into?” And we’ll have a nice discussion about the talk.
Or sometimes I’m just driving back, a bit bored, I’ve listened to all my favourite podcasts. I’ll say, “ChatGPT, give me a brief primer on the key figures in Byzantine history,” or “Give me an introduction to the early history of quantum mechanics.” Then I’ll ask follow-up questions. It’s like an interactive podcast.
Dan Williams
That’s awesome. One thing that’s really coming across in this conversation is the extent to which we’re massive nerds and potentially massively unrepresentative of ordinary people.
Okay, maybe we can move on. We both agree that the potential of this for learning material, mastering material, improving your ability to think, understand, and know the world is immense.
But obviously there’s a big gap between the potential of a technology in principle and how it actually manifests in the world. I mentioned the internet generally, social media specifically, as an obvious illustration of that. Even though I think a lot of the discourse surrounding social media is quite alarmist, at the same time it does seem to have quite negative consequences with certain things and among certain populations—specifically, I think, those people who don’t have particularly good impulse control. Social media is really a kind of hostile technology for such people.
Henry Shevlin
I’d also add online dating as another example of a technology that sounds like it should be so good on paper. And at various points it has been really good—I met my wife on OkCupid back in 2012. But it seems like what’s happened over the last 10 years, speaking to friends who are still using the various dating apps, is it’s almost a tragedy of the commons situation. Something has gone very wrong in terms of the incentives so that it’s now just an unpleasant experience. Straight men who use it say they have to send hundreds of messages to get a response. Women who use it say they just get constantly spammed with low-effort messages. I give that as another example: we’ve built these amazing matching algorithms—why isn’t dating a solved problem right now? It turns out there can be negative, unexpected consequences with these technologies.
Dan Williams
That’s a great example. So what’s your sense of what the actual real-world impact of AI is on students and teachers and educational institutions at the moment?
Henry Shevlin
This is probably a good time for our disclaimers. I’m actually education director of CFI with oversight of our two grad programs. So everything I say in what follows is me speaking strictly in my own person rather than in my professional role.
Dan Williams
Can I just quickly clarify—CFI is the Centre for the Future of Intelligence, where you work at the University of Cambridge. Just for those who didn’t know the acronym.
Henry Shevlin
Exactly, that’s helpful. The Centre for the Future of Intelligence, University of Cambridge. I’m the education director, with ultimate oversight of our 150-odd grad students. But we only have grad students, and I think this means my perspective on the impact of AI on education is quite different from where I think the real catastrophic or chaotic impacts are happening. Grad students are a very special case; they tend to have high degrees of intrinsic motivation. The incentive structures for grad students—where they’re directly paying for the course themselves in many cases, or for our part-time course they’re being paid for by employers who expect results—all of this creates a quite different environment.
So when I talk about impacts on education, I’m going to be mainly focusing on undergrad education and high school. These are areas where I’m not speaking from first-hand experience, but from many conversations with colleagues who teach undergrads. I don’t really teach undergrads, but lots of colleagues do. And several of my very closest friends are teachers in high schools and, in a couple of cases, primary schools. So I’m drawing on what they’re seeing.
Dan Williams
Can I also give my own disclaimer: as with every other topic we focus on, I’m an academic—an assistant professor at the University of Sussex—and I’m giving my personal opinions, not the opinions of the institution I work for. Okay, sorry to cut you off.
Henry Shevlin
No, excellent. Our respective arses thoroughly covered.
So with that in mind, there’s a phenomenal piece by James Walsh in NY Mag from back in May this year called “Everyone is Cheating Their Way Through College.” It’s a beautiful piece of long-form journalism. Can’t recommend it enough—absolutely exhilarating and horrifying—talking about the impact of ChatGPT and other LLMs on education.
“Complex and mostly bad” is the short answer for what the actual short-term impacts of LLMs have been. When ChatGPT launched, I said something similar to what you said at the opening of the show: the take-home essay assignment is dead for high school, and pretty soon it’ll be dead for university. And then, yeah, pretty soon it was dead for university.
So I think that’s the most straightforward initial impact: we can no longer assign graded take-home essay assignments with any real confidence, particularly for high school and undergrad students, because it’s just so easy to get ChatGPT to do it. I believe that people, even with the best intentions, are responsive to incentives. And if you can produce an essay that’s—honestly, these days with contemporary language models—very good quality, particularly at high school level, even undergrad level, if ChatGPT can just do something as good or better than you, then why bother putting in the work? If you’re hungover, or there’s a really cool party you want to attend, or you’re working a second job—we shouldn’t assume all students are living lives of leisure; lots of them are struggling to pay the rent.
So with all these incentives in place, no surprise that basically, as the article says, everyone is cheating their way through college. And I’m kind of appalled. I was in California a couple of weeks ago, just chatting to some students at a community college. One of them was a nursing student, and she said, “Yeah, I’m learning nothing at university. ChatGPT writes all my assignments. Ha ha ha ha ha.” And I was like, “Okay, got to be a bit careful about getting medical treatment—which hospital are you planning to work at?” But I think that’s symptomatic of broader problems.
Dan Williams
Of course. Maybe we can break that down point by point. We can say now with basically 100% certainty that large language models of the sort that exist today can write extremely good essays at undergraduate level. And I think there’s basically no way professors are going to be able to detect whether this has happened, at least if students are sufficiently skilful in how they do it.
I constantly come across academics who are still living in 2022, and they think, “Of course there are going to be these obvious hallucinations, and of course it’s going to be this mediocre essay.” I just think that’s not at all the reality of large language models today. If you know at all what you’re doing, you can delegate the task of writing to one of these large language models, which will produce an exceptional essay, and there’s no real way of knowing whether it’s been AI-generated.
There are tools which claim they can determine probabilistically whether an essay has been AI-generated. I don’t think those tools work, and I think they create all sorts of issues. If it’s not going to be 100% certain—which I think it basically never can be when it comes to AI-generated essays—then it becomes an absolute institutional nightmare trying to demonstrate that a student has used AI. I also think the incentives at universities, and indeed at schools more broadly, don’t encourage academics to really pursue this. It’s going to be an enormous amount of hassle, an enormous amount of extra work.
So I think what’s happening at the moment is that, to the extent universities and other institutions of higher education are using take-home essays specifically—but I’d say take-home assessments more broadly—to evaluate students, you’re basically evaluating how well they can cheat with AI. And I think that’s absolutely terrible.
Not just because, as you say, it means students aren’t actually encouraged to learn the material—they don’t really have an incentive to learn it. But one of the main functions of universities, of educational institutions more broadly, is credentialing. These are institutions that provide signals. They evaluate students according to their level of intelligence, their conscientiousness, and so on. The signal you get with a grade, and overall with your credential, is incredibly useful to prospective employers because they know: you got a first from Cambridge, you got a first from Bristol, whatever it might be. That’s a really good signal—not a perfect signal, but a pretty good signal—that you’re likely to be a good employee in certain domains.
To the extent that students are using AI to produce their assessment material, the signalling value of that just dissolves completely. That’s why, unless there’s urgent reform of the system and a move away from those sorts of take-home assessments, the problem here is not just that students aren’t learning things (which would be bad enough). I think it’s a kind of extinction-level threat for these institutions, because once it becomes clear that the grades you’re giving students don’t really provide any information about their intelligence or conscientiousness, then the social function of the institution dissolves completely. So that’s my take—do you disagree?
Henry Shevlin
No, completely agree. To pick up on a few thoughts: I imagine some people listening will say, “Yeah, but I can kind of tell when a student’s essay is written by ChatGPT.” I think a useful idea here is something I’ve heard called the toupee fallacy. People will say, “You can always tell when someone’s wearing a wig.” And you ask, “So what’s your reference set for that?” “Well, I often go out and I see something and it’s an obvious wig.” Okay, you’re seeing the obvious ones.
Dan Williams
Hmm.
Henry Shevlin
In other words, you don’t know. You can tell in cases where something is obviously a wig or obviously AI-generated. But you have no idea what the underlying ground truth is when it comes to the ones you can’t spot. You don’t have a way of identifying your rate of false negatives. I think that’s a really big problem.
Of course, anyone who marks papers will find occasional students who have left in “Sure, I can help you with that query” or “As a large language model trained by OpenAI...” But you know that’s the minority, and lots of essays you might assume are non-AI-generated are almost certainly AI-generated as well.
Relatedly, on the hallucinations point: this is obviously a big topic (we could probably do a whole episode on hallucinations), but rates of hallucinations have gone down dramatically. Particularly since search functionality was added to LLMs—they can go away and check things themselves. And also, you get the analogue of hallucinations just with student writing all the time. Even long before LLMs, students would falsely claim that Kant was a utilitarian or something, because they hadn’t properly understood the material or had misunderstood. So hallucinations are not a particularly good sign.
I think it’s basically impossible to tell. And as you emphasise, the false positive problem: even if you’re really confident an essay is AI-generated, good luck proving that. And is it really worth it for you as an educator to fight this tough battle with a student to bust them when everyone else is doing it? We just don’t have the incentives for educators or instructors to really enforce this.
Two other quick points. First, this creates huge problems not just with assessment but also with tracking students. This is something my high school teacher friends have really emphasised. It used to be, before ChatGPT, that essay assignments were a good way to keep track of which students were highly engaged with the class, which students were struggling, which students were really on top of the material. Whereas now we’ve seen a kind of normalisation effect where even the weakest students can turn in pretty solid essays courtesy of ChatGPT.
Dan Williams
Yeah.
Henry Shevlin
You’ve got no way of knowing which students need extra help versus which are already doing fine. That’s a big problem.
The final thing I’ll mention is that although take-home essay assignments are the ground zero of these negative effects, it covers other kinds of assignment as well. A colleague of mine who teaches at a big university (not Cambridge) was saying he’s been doing class presentations. Then he quickly realised students would generate the scripts for their presentations from ChatGPT. So he said, “Okay, we’re also going to partly grade them on Q&A, where students are graded on the questions they ask other presenters but also the responses they give.” And he said it quickly became clear: people would say, “Give me a moment to think about that question,” type into a computer, get the response from ChatGPT. Or people were using ChatGPT to generate questions.
I think there’s almost a generation of students for whom this is just their default way of approaching knowledge work, which I think is potentially a problem.
Dan Williams
The obvious solution, it seems, would be that you need modes of assessment where students can’t use AI—such as in-person pen-and-paper exams, such as oral vivas. And I do think that’s basically the direction these educational institutions are going to have to go.
However, that obviously creates issues. One is that there’s something incredibly valuable about learning how to write essays—not for everyone. Sometimes people like us, because of our interests and our passions and our profession, think it’s really important to have the ability to write long-form essays. And I totally understand that for many people, that’s a skillset which isn’t particularly useful for them. But in general, I do think for people who aspire to be engaged, thoughtful people, the skillset involved in writing long essays is incredibly valuable. So to the extent that the take-home essay and coursework disappear altogether, I think that’s a real issue in the sense that certain kinds of skills won’t be getting incentivised by our educational institutions.
But I also think it’s incredibly important that students learn how to use AI. That should be one of the main things educational institutions are providing to students these days: the ability to use AI effectively. And I think that skill is only going to become more important—in the economy, the labour market, and so on.
So on the one hand, it seems like large language models have made it basically impossible to have any kind of assessment other than in-person pen-and-paper tests or oral examinations. But on the other hand, to the extent we go down that route, many of the skills and knowledge you want students to acquire will no longer be encouraged and incentivised by educational institutions. That seems like a really big issue, and I have absolutely no idea what to do about it.
Henry Shevlin
Really good points. I’d agree you can do in-person essay exams. Most of my finals as an undergrad consisted of three-hour-long exams in which I had to write three essays. But that trains a very specific type of writing—quite an artificial one. It’s training your ability to write essays under tight pressure. If you want to do any kind of writing for a living, that’s only one of many skills you want.
If you’re producing writing you want people to read—whether it’s blogging or writing academic articles or scientific papers—you don’t typically write it under incredible time pressure where you’ve got to put out two and a half thousand words in three hours. You go through multiple drafts. You test those drafts with colleagues. There’s a whole bunch of writing skills that rely on the take-home component, the ability to think things through. And I don’t know how we test those.
Second, I completely agree that one of the things education is for is preparing people for knowledge work, and knowledge work these days is almost always going to involve the use of LLMs. So we should be training people to use them.
As to how we respond, my very flat-footed initial thought is we need to separate quite clearly: courses where LLM usage is trained and developed and built in as part of the assessment—where it’s assumed everyone will be using LLMs at multiple stages of the process and part of the skillset is using them effectively—versus other courses that say, “This is an LLM-free course; all assignments will be in-person vivas or in-person written exams.”
Another downside with in-person vivas and exams, which I hear particularly from high school teacher friends, is they’re just very labour-intensive. Compared to take-home essays, running exams regularly is classroom time where you’ve got to have a teacher in the room, where students are not learning. That creates problems for resource-scarce education environments—schools and universities. There are also problems around equality or accessibility.
Dan Williams
Yeah.
Henry Shevlin
There was a great piece in The Atlantic a couple of days ago by Rose Horowitz called “Elite Colleges Have an Extra Time on Tests Problem,” talking about the fact that 40% of Stanford undergrads now get extra time on tests because of diagnoses of ADHD and other things. Test-taking has its own set of problems. There are lots of classic complaints that it incentivises, rewards, or caters to certain kinds of thinkers more than others. It’s not great for people who maybe think more slowly or have special educational needs. I think it’s got to be part of the solution, but I don’t think it’s a panacea to solving the problems LLMs create.
A final point I’ll flag is that I worry a little bit about deeper issues of de-skilling associated with LLMs. On the one hand, yes, we want students to learn how to use them. But particularly earlier in the educational pipeline, there is a danger that easy access to LLMs just means students don’t develop certain core skills to begin with.
I’m going here based on testimony from a friend of mine who’s a high school teacher. He said his sixth formers (17-18 year olds in the UK system) seem to use LLMs really well because they do things like fact-checking, they restructure the text outputs of LLMs, they can use them quite effectively to produce good reports or written work. And he says there’s a really striking disparity between them versus the 13-14 year olds, who basically just turn in ChatGPT outputs verbatim.
Now you might say, “Yeah, of course—17 year olds versus 13 year olds, big difference.” But his worry is that the 17-18 year olds grew up doing their secondary education in a pre-LLM world. They actually learned core research skills, core writing skills. Whereas the 13-14 year olds—all of their secondary education has happened in an LLM world. So they haven’t developed the skills that are ironically needed to get the most out of LLMs: the ability to augment their outputs with critical thinking, human judgment, their own sense of what good writing looks like.
Dan Williams
I think this idea that AI can be an incredible complement to human cognition—an enhancer—but it can also be a substitute in ways that will, as you say, lead to de-skilling. And there are issues of inequality as well. As we were alluding to earlier, if you know how to use this technology well, and more importantly, you’re motivated to do so, it can be an incredibly beneficial tool for improving your ability to learn, understand, and think. But if you’re not motivated to do so—if you’re motivated to cut corners—it can really be a serious issue, using it as a substitute for developing the skillset and habits which are essential for becoming a thoughtful person.
In general, over the past century or so (if not even longer), as you get the emergence of meritocratic systems in liberal democracies plus the emergence of this really prestigious knowledge economy, basically there have been increasing returns to those who have high cognitive abilities plus those who are conscientious and have good impulse control. I think this has created a lot of political issues, including resentment among those people without formal education and without the skillset and temperament to succeed within educational institutions.
And it really does seem like a risk with AI that it’s going to amplify and exacerbate those issues. For people (and this is also going to be an issue with parents and what they prioritise with their children) who know how to use this technology and can encourage the right motivations to use it as an enhancer and complement to cognition, there are going to be massive returns. But for those without that—either because they don’t have the privilege or opportunities, or just because they don’t have good impulse control, they’re not very conscientious—it could result in really catastrophic de-skilling.
Some people think—and I think the evidence here is not as strong as many people claim—that since smartphones emerged, you’ve seen somewhat of a decline in people’s cognitive abilities, their literacy, their numeracy. There’s an interesting article by John Burn-Murdoch in the Financial Times where he goes into this in some detail; we can put a link to that in the video. But I think that’s potentially a really socially and politically explosive issue which we need to grapple with.
Another thing worth talking about: at the moment, people are going to school and university, and they’re trying to acquire the skills and credential which will make them valuable within the economy and society as it exists today. But AI is likely to transform the economy and the nature of work. One thought might be: up until now, it’s been very beneficial for people to acquire cognitive abilities, the capacity to succeed in the knowledge economy. But if, over the next years and decades, AI results in automation of white-collar work, automation of knowledge economy work (precisely because of the abilities of these systems), that might erode the motivation for learning those skills to begin with.
Have you got any thoughts about that? The way in which attitudes towards education should also be shaped by our understanding of how AI is going to shape the society that people will enter after they’ve left education.
Henry Shevlin
It’s a fantastic and tricky issue. On the one hand, my timelines on economic transformation caused by AI have become a bit longer over the last two or three years. One of the big calls I got really quite badly wrong is when ChatGPT launched, I thought, “This is going to revolutionise the knowledge economy. Three years from now, the knowledge economy is going to be completely different.” That was a very naive view.
Since then, I’ve done more work with different companies and organisations trying to boost AI adoption. And it’s really, really hard to get people to use AI. Not only that, it’s really hard to transform business models to incorporate AI skills effectively.
I’ll give this quick sidebar because I think it’s quite interesting. There’s this great article called “The Dynamo and the Computer,” looking at the impact of different technologies and how they were rolled out in the workforce. My favourite example from this paper: towards the end of the 19th century, we had what’s sometimes called the second industrial revolution. First industrial revolution: coal, steam, railroads. Second industrial revolution: oil, electricity.
You had this interesting phenomenon where factories (the second industrial revolution mostly started in the US) were using electric lighting, but they were still using coal- and steam-powered drive trains for the actual machines in the factory. This is massively inefficient because you need to ship in coal every day, run a boiler, have big clunky machinery that needs tons of gearboxes. It would be far better to shift to a fully electrified system where all your machines run on electricity. But that transition took another 20 years or so to really get going, partly because it required literally rebuilding factories from scratch.
A lot of factories were designed with a single central drive train—literally a spinning cylinder that all the machines in the factory would draw their power from. It was only when you’d sufficiently amortised the costs of your existing capital and were rebuilding and refurbishing factories that people were able to say, “All right, now we’re in a position to move to a fully electrified factory.”
I think we’ve got an analogy or parallel in terms of the rollout of AI in knowledge work. Most existing firms that do knowledge work—their value chain, their whole sequence of processes—would be completely different if they were building as an AI-first company. I think it could easily be another decade before we start to see the full potential of AI and knowledge work being applied systematically. A lot of firms are going to go bust; a lot of startups are going to scale up and become multi-billion-dollar companies. But it’s going to be a slower process than I naively thought.
All of this is to say that I think the economic impacts and transformations of AI in knowledge work, although they’re going to be significant and persistent pressures, I no longer think that by 2030 no one is going to be working white-collar jobs. That’s a mismatch between what the technology can do and the actual challenges of application. We’re still going to need knowledge workers in the longer run.
But specifically which domains, what kinds of knowledge work are going to be most valuable or important—really, really hard to judge. One of the questions I get asked most often when I do public engagement work is, “I’ve got two kids in high school. What should they be studying? What should they be learning in order to really succeed in the AI age?” Five years ago, 10 years ago, you would have said coding. “Learn to code” was a meme.
Dan Williams
Yeah.
Henry Shevlin
But that’s a terrible piece of advice for many people these days. Not that we won’t need coders—probably we’ll still need some. But the proportion of jobs in coding is going to be dramatically fewer because a lot of entry-level basic coding can be done perfectly well by AI, and probably fairly soon even expert-level coding.
My slightly wishy-washy answer, but I think it’s the best I can give to that question, is: the higher-order cognitive skills. Cultivating curiosity and openness to new ideas and new tools is probably far more important now than it has been for most of the last few decades, precisely because we’re in a period of such radical change. Cultivating the kind of mindset where you’re actively seeking out new ways to do old processes, seeking out new tools, building that kind of creativity and curiosity—those skills are going to be as important as ever, or more important than they were before, as a result of the AI age.
It’s very hard to say, “If you want to secure a career in knowledge work, this is the line to go into.” As one colleague put it (and I don’t quite agree with this framing, but I think it captures some of the spirit behind your question): we’ve solved education at precisely the place and time where it’s very unclear what the relationship between education and work is going to be.
Dan Williams
That’s interesting. Some of the things you said there bring us back to the conversation we had about AI as normal technology—the idea that there’s a difference between the raw capabilities and potential of a technology and the way it actually diffuses and rolls out throughout society. My sense is AI will have really transformative effects on the economy, but I think it’s very unlikely you’re going to see full automation for several decades.
But what I do think is likely is that the ability to use AI well for the jobs human beings will be doing is going to become really, really important. That connects us back to: if that’s the case, it seems like one of the things educational institutions should be doing is thinking very carefully about how they can prepare students for a world in which AI is going to be centrally embedded in the kind of work they’re doing. And at the moment, my sense is educational institutions are not doing a good job with that at all.
Maybe to start wrapping up: if you had to give a high-level take on this overarching question—is this a crisis for our educational institutions? Is this an opportunity? Is it a bit of both?—what’s your sense?
Henry Shevlin
It’s definitely a crisis. In fact, if you want to give an example of a single sector in which AI has had devastating effects—some positive, but mostly negative, devastating effects—it’s education. This is one of my go-to responses when people try to push the “AI as a nothing burger” take. I say, “Go speak to a high school teacher. Tell them AI’s nonsense, just a nothing burger.” Their daily lives and their interactions with students and the way they can teach has been utterly transformed in mostly negative ways so far by AI.
Certainly in the short to medium term, AI has basically broken large parts of our existing educational system—in terms of assessment, in terms of tracking. It’s very demoralising for a lot of educators and teachers. All that said, the potential we discussed earlier is incredible.
But it’s a question of how we rebuild the boat while we’re at sea. We can’t just say, “We’re going to stop education for five years, redesign the whole thing from scratch, and come up with something effective.” Managing that transition, particularly in conditions of massive uncertainty about the kinds of jobs and skills that are going to be necessary, is really hard.
One reason perhaps that I’m less devastated by this—it is a bit of a disaster—is that I think formal education has been accreting so many deleterious problems for several decades now, ranging from credentialism (I think the expansion of higher education, which was seen as an unalloyed good, has had lots of negative effects) to things like grade inflation (a really serious problem) to the ubiquity of smartphones and declining attention spans, the slippage in standards, the shift away from the more traditional model where your professors were these exalted, almost priestly caste and you hung on every word. I realise it was never quite like that, but there was more of this implied hierarchy, versus a model where students regard themselves as customers—they’re paying for a credential and they want that credential.
All of these sociological and institutional shifts have been creating massive problems in higher education in particular, but also high school. AI, although it’s bringing many of these problems to a head, they were problems we were going to have to deal with at some point anyway. But what’s your take—crisis or opportunity?
Dan Williams
I completely agree with everything you just said. And it’s a nice optimistic note to end on, isn’t it? This is a crisis, but our institutions of education have already been confronting all of these other crises. So it’s just adding something on top of all the other problems our educational institutions confront.
Yes, I think it’s a crisis. I think it’s an emergency in the sense that universities and other institutions of education—schools, colleges—need to be taking this a lot more seriously than they currently are.
You can’t have people in these institutions where the last time they used ChatGPT was in 2022 and they’re completely oblivious to the capabilities of the current technology. I think you’ve got a responsibility, if you’re an academic or you work in a school or college, to know how to use these technologies, because you need to be aware of what they can do. And we need to really quickly fix assessment. As I mentioned at the beginning, the take-home essay, in my view, is absolutely insane—literally insane that this is still happening. And we also have to think carefully about how to reform the way we teach and what we teach to prepare students to use these technologies.
But okay, I’m conscious of the time, so we can end on that really nice happy note: this is a disaster, but these educational institutions are already confronting a disaster. Did you have any final thought you wanted to add before we wrap things up?
Henry Shevlin
Just to build on something you said as a closing note: I think another deeper, structural problem in responding to the challenge of AI is that there’s so much interpersonal variation in how much people like, are open to, or are interested in using AI. This shows up at faculty level. I don’t know about your experience, but mine is that even in Cambridge, a lot of academics have very little interest in AI.
Dan Williams
Mmm.
Henry Shevlin
So the idea that we’re going to be an AI-first institution, or that we expect all our staff members to be at least as familiar with the capabilities of AI as their students—that’s an incredible ask. That’s a massive challenge. At a university, you can’t just fire staff who are doing brilliant research on Shakespeare’s early plays just because they don’t happen to get on with AI.
I think that’s another big structural problem: the fact that there’s so much variation in how comfortable and capable instructors are with AI. The only solution here, going back to a suggestion from earlier, is to really separate the job of education into two different streams. One explicitly builds AI in as both a method of assessment and a skill you’re trying to teach—making AI core to what you’re trying to do. And then a separate stream that is absolutely, strictly AI-free. Maybe the people who hate AI, who are not interested in AI, who are currently teaching courses—maybe they can handle the second stream. And those of us who love AI, who are super excited about it, who know as much about it or more than our students, we can be in charge of the first stream. That’s a very basic suggestion, but I wanted to flag that this is the other side of the problem.
Dan Williams
That’s great. I love that. We can end on a constructive suggestion rather than a note of pessimism. So thanks everyone for tuning in. We’ll be back in a couple of weeks with another episode.








