Academics Must Wake Up on AI (with Alexander Kustov)

Playback speed

Share post at current time

Share from 0:00

0:00

Academics Must Wake Up on AI (with Alexander Kustov)

Is AI already better at many research tasks than humans? And if so, is this a reflection of how good AI is, or how bad much existing research is?

Dan Williams, Alexander Kustov, and Henry Shevlin

Jun 02, 2026

The political scientist Alexander Kustov recently published a Substack post with a provocative claim: that AI can already do social science research better than most professors. The post went viral. It attracted more than a million views and over a thousand responses, many of them very angry. (Some people even demanded that Alex’s university fire him.)

In this conversation, we talk about this controversy and the claims that triggered it, including:

What agentic AI tools like Claude Code and Codex can already do for research, from coding and data analysis to literature reviews, translation, and brainstorming, and why only around 20% of quantitative social scientists currently use them.
What best predicts whether researchers adopt or reject AI: ignorance, openness to experience, methodological background, or the awkward role of self-interest.
How much published academic research is genuinely mediocre, and whether the cause is laziness, lack of skill, or a broken incentive structure, with a detour through the replication crisis and some high-profile fraud cases.
Whether AI will raise the quality of research or simply flood the literature with more slop, and what journal editors could do about it.
Whether AI can be genuinely creative or only recombine what already exists, by way of Margaret Boden’s three kinds of creativity, Thomas Kuhn on paradigm shifts, and AlphaGo’s “Move 37”.
The fight over AI writing and detection tools like Pangram, and why current disclosure norms end up punishing the honest.
The angry response to Alex’s series, and what is really driving reflexive opposition to AI among academics.

Links and further reading

Alexander Kustov — Alex’s homepage, with an overview of his research on immigration, public opinion, and effective governance.
Popular by Design — Alex’s Substack on public opinion, persuasion, and the politics of getting good ideas adopted.
Academics Need to Wake Up on AI — followed by a Part II and Part III
Pangram — the AI-detection tool discussed at length, which labels text as human, AI-assisted, or AI.
AlphaGo versus Lee Sedol — the 2016 match, including the famous “Move 37” that Henry raises as a candidate for genuinely transformative machine creativity.
Margaret Boden — the cognitive scientist whose distinction between combinational, exploratory, and transformative creativity frames part of the discussion.
The Structure of Scientific Revolutions — Thomas Kuhn’s account of normal science and paradigm shifts, referenced in the exchange about AI and discovery.
“AI Is a Better Researcher Than You” — The Chronicle of Higher Education‘s account of the controversy around Alex’s series.

Transcript

Please note that this transcript is lightly AI-edited and may contain minor mistakes.

Dan Williams: Welcome back. I’m Dan Williams, and I’m back with my co-host, Henry Shevlin. Today we are honoured to be joined by Bluesky’s favourite academic, Alexander Kustov. Alex is a political scientist at the University of Notre Dame and the author of one of my favourite Substacks, Popular by Design. His primary research is on immigration and public opinion, but that’s not really what we’re going to be talking about today. We’re going to be talking about a fascinating and hugely viral series he published at his Substack titled “Academics Need to Wake Up on AI,” about what AI can already do when it comes to research, and what that means for the academics who are not paying attention, which is many of them. It was very widely read, and it generated, let’s say, a somewhat polarised response. So Alex, to kick us off: what’s the central thesis of this series, and what motivated you to write it?

Alexander Kustov: Thanks, Dan, for having me. I’m a huge fan of the Substack and the whole podcast series with you and Henry. So, like some of us, I’ve been using some of these AI tools. I’ve been reading some of the other folks like yourself, and it really transformed everything I do in my life. And I should say I was also on sabbatical, so I had a little bit more time than some of my colleagues to try some of these tools. I just hadn’t really seen any of my colleagues talk about it. And when they did talk about it, they usually tried not to be vocal about it. I just didn’t think it was a good equilibrium, where basically people were using these tools to be ten times more productive and not talk about it. It really heightened this sense of inequality for me, which I do care about. You’d have a situation where someone would publish ten papers in a year and someone else would publish one, and the only difference is that the person publishing more is the one using Codex or whatever. I just wanted to write about it. And I saw that the prevailing academic discourse on the issue, especially on platforms like Bluesky, was very counterproductive.

I didn’t really say much, to be honest. I didn’t think it would be that controversial. But the biggest thesis that really rubbed people the wrong way was that right now a lot of these tools are better at a lot of the tasks that we do as professors. I’ve refined this idea a little bit, going back and forth with some of my critics, but I feel comfortable right now saying that if you look at it globally, and think about what professors do around the world, in social science and adjacent fields especially, AI agentic tools can do most of the tasks they do in terms of literature review, data analysis, and even coming up with some research questions, better than those professors on average. I think that’s a pretty uncontroversial statement at this point, but obviously a lot of people were very, very upset about it.

Dan Williams: Empirically speaking, it is a controversial statement, in the sense that it provokes controversy when you say it. In a minute we can get to the question of what AI can actually do in the context of research. But for what it’s worth, I completely agree with you that on many tasks AI is clearly better than what human beings can do. Is your sense that lots of people just weren’t aware of that fact, that they literally didn’t have exposure to these tools? Or was your sense that the reason people weren’t really talking about it is because of all the controversy surrounding the use of these tools, not just mere ignorance?

Alexander Kustov: I think it’s both, for sure. There was recent research done by Anthropic. They tried to do, not a representative survey, because obviously the population is very hard to define here, but they surveyed something like 1,200 quantitative social scientists, and the estimate right now is that about 20% of folks use agentic tools. That doesn’t seem like much at all, and if anything it’s probably an overestimate, because they’re more likely to tap into well-resourced universities. So I do think it’s both: the little uptake we have, and the fact that people who do use these tools don’t want to talk about it.

There are two things here. First, you want to maintain your comparative advantage. This moment right now is exactly the moment where, if you’re one of the few people using these tools, you can write a bunch of papers and get tenure while the tenure system is still in existence. And the other thing is that if people are very upset about anything AI-related, you don’t want to talk about it and be shamed by your colleagues. Just to give you one funny anecdote: at the height of the vitriol I experienced, where hundreds of people literally were quoting me and trying to tag my employer to get me fired, the exact same people were often DMing me and asking for my setup and prompts. So it’s very crazy to me that you have this big disconnect between what people say publicly and what they actually do privately.

Dan Williams: I find it crazy that it’s only 20% of social scientists, or whatever the exact number is, that’s actually using agentic AI. Just before moving on, maybe we should explicitly address: in your view, what is it that agentic AI, as it exists right now, can do? What are the kinds of tasks it can do better than human beings, and how can it improve the workflow of an average social scientist?

Alexander Kustov: Coding is the first thing. It’s literally in the name, Claude Code. That’s what these tools were designed for. If you talk to any coding person, a computer scientist, or even someone who isn’t a computer scientist but does a lot of coding for their work, I don’t think anyone would doubt that it’s a huge productivity improvement tool. And the vast majority of quantitative social scientists who do any kind of data analysis do a lot of coding, so they have to be very receptive to this by definition. And I think they often are.

What happens is that social scientists are comprised of a bunch of different tasks and topics that people can disagree over, depending on the field. Economics is pretty homogeneously quantitative and formal, so there you can definitely see the biggest uptake. But a lot of disciplines, like political science or sociology, are a mix of qualitative and quantitative folks. And a lot of this AI polarisation overlapped with that pre-existing divide. People who didn’t like stats, who didn’t believe in positivism, the idea that you can learn something about the social world using evidence, were also more reluctant to believe that AI is helpful for them. Which is funny, because, as I also mentioned in some of my writing, if anything those people are going to benefit from these tools, because Claude cannot really interview people and do ethnography yet. So in a way there will be more demand for very high-quality qualitative work. And there are some good examples of qualitative people I respect who embraced AI completely.

You can still use a lot of these tools to boost productivity outside the coding realm. You can write emails. One thing I think anyone would acknowledge, including the critics of AI, is that it’s definitely helping them respond to administrator emails, which no one likes, and which isn’t considered an unethical thing to do. I remember someone on a big account on Bluesky posted that AI should be banned except for transcription purposes, because they do a lot of interviews and it’s good for transcription, but everything else is off limits. And it’s interesting how the goalposts are changing right now. The biggest fight I’m getting involved in again, because that’s kind of what I do, is this idea of AI detection and disclosure, and we can talk more about it later. But there’s this interesting consensus forming that AI is good for research now, which was not the case half a year ago, from the same people. Now those people are saying AI is obviously good for research, duh, but it’s not good for writing, for a bunch of different reasons.

So we talked about coding and data analysis. There’s a lot of other things a normal researcher would need help with: getting a basic summary, a literature review, translation. It’s above my pay grade, I’m not a machine learning person, but my understanding is that LLMs are exceptionally good at translation, and the fact that a lot of people deny they can translate things well is insane to me. You can transcribe your interviews, translate your survey questionnaire into different languages, whatever you need. I also used AI a lot for public engagement recently. You can translate your website, you can create your website from scratch in a day or two. You’d be surprised how few academics actually have good functioning websites. And I’m not talking about very old people who reject technology, but also young PhD students, who you’d think would have an interest in making sure people can find them online. But no. You can just install Claude and do it in a day. The fact that people are not doing it is insane to me, and I’m trying to spread the word. I’ve convinced a lot of folks to do it, but there’s only so much I can do as one person.

What predicts whether an academic uses AI?

Henry Shevlin: It really resonates, hearing about your experiences with some academics being completely oblivious to AI and others enthusiastically adopting it. When I’ve visited different businesses and universities, I sometimes literally see the same people doing exactly the same job, maybe even sitting at the same desk, one of them doing amazing things with AI and the other one not using it at all. I’m curious whether you’ve got a sense of what the best predictors are for whether someone is an AI user. If you could only know one thing about someone in the social sciences in order to predict whether they were a big user of agentic AI, what kind of things do you think predict it?

Alexander Kustov: That’s a very interesting question. I’m pretty sure there’s some kind of deep personality thing. Everything goes back to personality, whether it’s socialisation, upbringing, or even genes. Openness to experience probably jumps out to me as one of the first predictors. My initial thought was that, since this AI debate overlaps with the qualitative–quantitative debate, people who are methodologists, econometricians, or psychometricians would be much more likely to adopt AI. On average that’s true, but it’s not completely lopsided, for some reason. In fact, there are some very interesting examples of people who were very good methodologists, developing their own regression models, doing machine learning, who then got very sceptical about LLMs. One way to think about it is that those people actually know more about these tools, and so their scepticism has more value, and I’m trying to be tuned to that.

But there’s also something about self-interest. Previously you were this privileged person, where the whole department would come to you to help with methods or regressions. I was a person like that in my department back in North Carolina, where people would come to my office and say, “Alex, can you help me with this game theory model?” And I’m not even really a methodologist. So it really depends on the comparative advantage people have. Now basically anyone can go to Claude Code and try to do a very fancy analysis. Obviously you have to know something, you have to know what to ask, but these models are exceptionally good at giving you the basics. If I’m not really good at geospatial statistics, for example, I can go to Claude, do some spatial regressions, and learn about it on the spot. Previously I would have had to go to some spatial colleague in the geography department for that. Now it’s just much easier to do it myself with my computer. And it’s probably going to be as good or even better.

Dan Williams: I think for all of these areas within quantitative social science, from coding to data analysis to literature reviews to writing, which we can return to in a bit, the quality of writing you can get from these models is really exceptional. But I’d also point to things like brainstorming. I’m a philosopher, Henry’s a philosopher. I don’t do quantitative social science, so I do research that’s constrained and informed by empirical research, but I don’t actually collect data. In terms of having access to a very smart interlocutor who you can literally prod, telling it, “give me the three strongest objections to these ideas I have for a paper,” and use that as the basis for thinking through an idea, that’s such a huge advantage, even when it comes not to quantitative social science but to theory construction and many aspects of qualitative research. I’m really baffled by, well, I somewhat understand people who just haven’t used these tools, or whose last use was in 2023, so they’re just ignorant. But I’m really baffled by anyone who’s actually used the paid version of Claude or ChatGPT and doesn’t understand the extent to which they can improve your ability to think through topics, understand things, and get information. Henry, are there any ways you use AI in your research, and in how you think about topics, that we haven’t touched on already?

Henry Shevlin: For me, the primary use case for AI systems is always just learning, and learning about new topics. Being able to ask questions and verify my own knowledge has been a game changer. Although I will say it’s also been a massive time sink. I’ve gone down so many rabbit holes that I probably would not have prioritised if I’d had to dig out articles. But in some sense it’s been good for my education in the round, even going down those rabbit holes. Otherwise, I find it useful for summarising and making sense of my data: getting a whole bunch of research papers and using Claude Cowork to create summaries of them. NotebookLM is also very useful in its own right for dealing with defined archives. The thing I really need to do this year, and the thing I’m most looking forward to, is getting a good agentic workflow for dealing with email. I already use Claude for drafting quick emails that require a certain degree of precision but don’t involve any warmth or human feeling. But being able to have an email assistant is something I’m looking to build in the next couple of months.

Alexander Kustov: A few thoughts on that. Brainstorming is a big one. I did mention it, because that’s the default way you should use these tools: to have a very smart person to talk to, especially when you don’t have access to your colleagues. It’s a really good substitute. Even setting aside the question of whether LLMs can generate great novel ideas, you really just want a conversation partner who can rehash old ideas and tell you why you’re wrong. The reason people don’t realise this is that a lot of folks don’t do the very simple thing of paying for a premium subscription and installing one of these agentic tools. It’s happened to me several times: I’d talk to people about an agentic tool, I’d specify Codex or Claude Code, “do you have it, have you used it?”, and people would nod, and then five minutes later in the conversation it turns out they completely missed that part and still think about the chatbot thing. So a lot of people are confused about this.

What I find helpful, at some of the workshops I’ve done and that others have done, is that you just sit with folks and install one of these tools for them, and ask them to do one simple task that’s good for their career. Like create a slide deck. A lot of people are still amazed that you can create a slide deck much better than the average academic slide deck in a minute. That really changes people’s minds; it’s mind-blowing for a lot of folks. Or, I don’t know if you’ve had this experience, there’s this Refined service where they do peer reviews for papers; I think an economist created it. I’m not a huge fan of it, because it costs $50, but the first one is free, and there are a lot of free systems that can imitate this exact functionality. I’ve seen several of my colleagues at Notre Dame use the free upload for one of their papers, and they received the best feedback they’d ever had in their lives, the kind you’d never get at an average academic conference, and they got completely converted overnight. It really takes one magical event for people to understand that something is definitely going to change very, very soon.

How much academic research is actually any good?

Dan Williams: Just to double-click on one thing, this question of what drives the differences in how people view AI. One thing we haven’t really touched on, but which I think is very important, is how you view the nature of research and what you’re even doing as a researcher. Whether you view it fundamentally as being about producing the best output possible, or whether you view it as some journey of self-discovery, exploration, and authentic engagement with the material. I think the latter model of research is very threatened by the idea that you would integrate these AI tools into it. If you’re ruthlessly focused on how to produce the best outputs possible, as evaluated according to relatively objective metrics, then I’d speculate you’d be much more disposed to make use of whatever tools help you do that, including AI.

But this connects to another thing that’s just come up in what you said, Alex, which is something you write about in this series. So far we’ve been focusing on how good AI is. There’s this other side to it all, which is how bad much actually existing human research is, even before we talk about anything to do with AI. You get into this a lot in the second and third essays in the series. Do you want to say a little about that, about how that other side factors into how you’re thinking about this topic?

Alexander Kustov: The third installment of the series basically came to me while I was at a political science conference, or rather an interdisciplinary conference called ISA, for international studies specialists who study relationships between countries. And it was really bad. Big academic conferences are always bad; that’s something you expect. There’s so much money, including public funds, spent on all these conferences and travel for people from around the world. But if you ask a regular academic, forget about AI, they would tell you they don’t expect to get good feedback, their panel is going to be completely empty, and the reason they do it is because they have to spend their $2,000 travel fund and potentially hang out with some friends and do some networking, which is not bad. Networking is a huge part of conferencing.

So I was sitting at this conference, seeing really bad presentations where people would have tons of grammar and sense mistakes and senseless research questions. It’s bad both in terms of substance and execution. And exactly at that moment I was getting all this vitriol for saying that AI can do better stuff, like slides. I was like, no, this is just a huge disconnect. I started thinking about that. The issue I see is very pronounced in the conversation around self-driving cars, where people compare them to some ideal in which there are no accidents and no one dies. When a self-driving car runs over a cat, it’s a huge news story, but humans do that every single day, in their hundreds, and we don’t care, because we accept that humans are fallible and bad and not doing good work. I think it’s the same with academic work. The vast majority of things produced by professors globally is just not good and not contributing to human knowledge.

For some people this can be even more controversial for me to say than anything I said on AI. A lot of people view this world from their own parochial angle of being a research professor in a top-tier American or British school, or Cambridge. But the vast majority of folks are not like that. I experienced academia in the post-Soviet world where I grew up, and in most cases people just want to get by. They publish in some predatory journal with a random, rehashed argument that probably reinvents the wheel and doesn’t really contribute. No one’s going to read it. We know that 80% of published papers in the humanities are never cited, and probably never read either, except by your editors or reviewers. And as an associate editor of a journal, I can tell you I doubt the reviewers actually read some of the papers they review. So compared to the actual status quo of what’s happening right now, automating it all and using AI tools mindfully and responsibly is going to be a big win.

Another problem is that we have this binary thinking that it’s either/or: we either do one-shot papers that aren’t good, or we don’t do anything. But you can write your own paper, do your own slides, and then ask your AI agent to help you brainstorm, create a graphic, or redesign your graph. People might disagree on the details of what’s more acceptable and useful, but at the end of the day there are so many use cases for these tools that are completely uncontroversial at this point.

Henry Shevlin: Just very briefly, I think it might matter whether academia’s problems are due to things like laziness or just not caring, versus a lack of skill. I’m curious whether you have a theory about where these problems in academia come from. Is it the fact that a lot of people are just really bad, for the most part, at doing data analysis, for example? If so, then AI is amazing; it’ll lift the floor. But if it’s that people just want to commit fraud and do whatever it takes to get ahead, then maybe AI isn’t going to make the situation better, or could even make it worse.

Dan Williams: Or a third thing: it could just be the nature of the institutional incentives. I feel like a lot of what’s behind the replication crisis, the reproducibility crisis, the generalisability crisis, and so on, is not so much that people are lazy or unskilled. It’s that you can get ahead and win the status game within academic research by engaging in shoddy research practices, and as a consequence that’s what you get: a lot of shoddy research practices. But a lot of those findings that don’t replicate were done by really brilliant, energetic, ambitious scientists. It’s just within this flawed incentive structure.

Henry Shevlin: Brilliant, energetic, but perhaps not fully scrupulous.

Dan Williams: Yeah, but you can’t rely on human beings to be scrupulous. You need the incentives set up in such a way that even by default unscrupulous people will be driven to act in pro-social, beneficial ways. That’s my cynical perspective. What do you think, Alex?

Alexander Kustov: I’m going to say something very controversial: I want to believe that people are good by nature. At least, my knowledge of evolutionary psychology tells me that even those people who commit all these bad practices at least want to believe they’re doing something good. They’re often motivated by good things, with some exceptions; there are some people who are truly evil. But even if we take some of the most famous fraud cases in academia, like Francesca Gino at Harvard, I think the way it probably works is that you start by doing some research you care about, it gets picked up by the public, you’re very successful, there’s a lot of demand for what you do, and then you get some uncomfortable result and you tweak it a little bit. There’s all this literature about p-hacking, where you have some theory you want to prove, and when you have to make a choice between presenting model A and model B, you unconsciously choose the model more in line with the result. You can even justify it to yourself, that this model makes more sense, that it’s obviously much better. And any individual case might be right. But in aggregate it doesn’t lead to good outcomes.

I also think there’s a lot to say about the incentive structure in academia. Right now you really have to publish or perish, still, despite the fact that we can talk about whether the journal model is going to be sustainable in the near future. You have to publish a lot, no matter what your field is. Which means that if you have to decide between doing a better job with data analysis and spending a year on it, you’d probably spend less time on it and publish as soon as possible. You’re not really incentivised to replicate data. It’s very hard to publish critical responses and replication studies, and we have all this evidence that failed replications are usually much less popular and less cited than the original studies that have been disproven. Another thing I’ve been talking a lot about is public engagement, where you’re very rarely rewarded for actually spreading the knowledge of what you do, because that’s not something your dean would appreciate. So people default to publishing shoddy papers no one’s going to read. And peer reviewers don’t really check your data in most cases. When I submit my paper to a political science journal, people take it for granted that my analysis is legit, and they quibble about the framing or some other superficial thing.

That’s one of the reasons I’m so concerned right now about this whole Pangram hysteria, because people are going to be looking for em-dashes or whatever instead of the substance of the underlying claims. I see the Bayesian argument that if something is clearly AI slop, it probably also doesn’t have good data in it. But knowing modern AI tools, if you ask Opus 4.8, which just came out, to create a report on some topic with publicly available data, I’m pretty sure it’s going to be able to download things and create a chart that’s probably more legit than a chart you saw published in an academic paper four or five years ago. Even if the prose isn’t as good, and we don’t usually have good writing in academia anyway, it’s going to be more human than em-dashes or “it’s not X, it’s Y.” So I think it’s a combination of all those things, but I do want to believe that very few people actually want to commit fraud.

Dan Williams: Let’s definitely talk about this writing thing. But just on this previous point about incentives, I agree that people aren’t sadistic and don’t go out there thinking they want to do bad things. I just think academia is a status game with certain norms and institutional procedures. People are often ferociously ambitious, and they do whatever’s going to get them status, prestige, and recognition, as that’s defined and understood within academia. All the human slop produced in the context of academia is just because the incentive structure is messed up. You can rack up lots of status by churning out a load of crappy, non-replicable findings that don’t add anything to the academic literature. But if that’s your model of what’s going on, you might think: well, if the problem ultimately is not to do with human beings being lazy or unskilled, but to do with the incentive structure, then why would merely giving us access to AI improve things? You might think all that’s going to happen is people will play the same status game, but do it a lot quicker and at lower cost, and we’re not actually going to advance the frontier of knowledge, because all the same structural causes of bad research are still in play.

Alexander Kustov: I think that’s a key question. We’re facing a forking path of some sort. You can imagine a scenario in which the future is as bleak as you just described, but with more slop. That’s the problem I see with what might happen: take all these bad incentives, give this miraculous tool to researchers, and instead of one paper per year they’d produce ten that don’t lead to anything productive. It just inflates everyone’s expectations and creates more problems. But there’s an alternative scenario, and I think it’s still in our hands to do something about it. Instead of increasing productivity in terms of quantity, we can use these tools to increase productivity in terms of the quality of research. Since you can now generate something very simple in a minute, you really have to do something better than a shoddy regression with no account for endogeneity concerns, or rehashing the same exact philosophical argument people have been making for years and years. So there’s a way to do better with these tools, and it’s in the hands of current journal editors to raise the standards, do more desk rejects, and say the quality bar is now much higher. I think it’s already happening somewhat, and it’s something we can consciously decide to change.

I also have some hope for the frontier models. There’s been some interesting research showing that when you explicitly ask a model to p-hack, it doesn’t do that. You can jailbreak it, so to speak, and say “please, please, I really need that,” and it’ll do it sometimes. But with those basic guardrails, they’re going to help people, because no one is consciously justifying p-hacking; people don’t like that. When the model refuses to do it, it’ll make them think that maybe they should do something different. So I have some hope. But obviously it depends on what happens to academia in five to ten years and how the models develop. We’ll definitely have to redesign the incentive structure, because I’m not sure the number of papers you have is the best indicator of what you’re trying to do. The paper itself as a format is a weird thing, because now you can also have updated dashboards with new data. It seems like a very outdated format, at least for some arguments, but it’s not like we have a better equilibrium yet. A lot of things are in flux right now, and I don’t have a simple solution.

That’s the whole point of my series: I wanted people to start talking about it. I think it did help a little. I’ve gotten calls from deans around the country, and I’ve participated in panels where people have a university-wide conversation about these things, and a lot comes to the ground that people aren’t aware of. There’s definitely some hope, because a lot of the people in positions of power right now, the older, tenured, full professors, don’t use these tools. According to that poll we discussed, it was 20% in the general population, and I think it was about 9% among full professors. Some of those folks might not be reachable, or they might not care; they just want things to continue the old way. So we definitely have to do something about that.

Will AI make academic inequality worse?

Henry Shevlin: Do you think there’s a risk that we see growing academic inequality, a kind of rich-get-richer effect, where the most prestigious, maybe not the older generation but certainly rising scholars with their own brands, use AI to put out twenty times the number of papers? We’re living in a tide of slop, but those with good reputations or good brands dominate. That might not be disastrous in every way, but it might lead to highly unequal outcomes within academia, with less well-known or less skilled researchers being completely left behind.

Alexander Kustov: There are several things that lead in opposite directions here. In theory, and I don’t think I’m making an original argument, a lot of people have written about this, there are certain equalising things coming out of all this. For instance, the ability of these tools to translate things. If you’re a non-English speaker, it’s much easier for you to write those papers now, which is a huge productivity boost, and from the perspective of science it means we’re going to be able to get all those talented people and their arguments from all over the world, regardless of where they come from. And at least for now, the premium subscription is one or two hundred dollars, and people in most major universities can afford it, even in more developing countries. It’s not equal, but whether you’re at Harvard or a community college, you can afford a $100 tool, at least for some time, and presumably you can do exactly the same thing with it. Compared to the status quo, where as a community college professor you have to teach five classes a semester with no research budget, while at Harvard you don’t have to teach at all for the first two years and have a $200,000 startup, that’s a very big difference. So there are some equalising things going on, and it’s important to acknowledge that.

But you’re right that it’s also the case that the people able to use these tools most productively and efficiently are the people who already have a lot going on. Even though I’m very sceptical of the idea that LLMs can’t come up with new ideas, because in general new ideas are recombinations of old ideas, I do think you have to have a coherent set of ideas and goals of your own to be able to utilise these tools. It’s really all about your creativity and imagination. Every single day I see someone post something they did with Claude and think, “wow, I hadn’t thought about it.” Just yesterday someone posted about this idea of making your papers machine-readable, and I converted all my PDFs and my website to Markdown with all the figures. I think everyone should do this. I could have done it last year, I just hadn’t thought about it. There are a lot of things like that where you really have to have good ideas to begin with. So people who already have a whole research pipeline and some budget are now able to execute it much faster. This rich-get-richer dynamic is definitely going to happen. And in the future, where those models are potentially going to be much more expensive, that’s a possibility. My understanding is that right now it’s all subsidised, and the $200 model is actually going to be a $2,000 model. Then only the Harvard people are going to be able to afford it. So I just hope Notre Dame is going to be part of that.

Can AI be genuinely creative?

Dan Williams: This point you made, Alex, also in one of the essays, about creativity and what’s really going on when it comes to coming up with new ideas in science, I was a little sceptical of. It’s a surprising feature of state-of-the-art AI today that, given how smart these models are in some sense, and given the vast knowledge base they have, they don’t really seem to make discoveries of a really new and impressive character. There are potentially some counterexamples, but my sense is you might think of this roughly in terms of the philosopher Thomas Kuhn’s distinction between science that happens within the context of a paradigm, normal science where you have relatively well-defined problems and puzzles, maybe the Erdős problems fall into that category in maths, and I suspect that for that kind of thing, AI, if you prompt it the right way as it exists today, can be used to help make progress. But when it comes to true creativity, the sort you find in really bringing about paradigm shifts, moving outside the space of predefined problems, reconceptualising an entire domain, and coming up with radically novel theoretical insights, I actually think AI as it exists today doesn’t really seem to have that capability. And that potentially tells us something interesting about the limitations of the models. I’m interested in what you think, and also in what Henry thinks about that view.

Alexander Kustov: Henry, you can start.

Henry Shevlin: On one hand, you might point to something like transformative creativity. Margaret Boden has this breakdown of creativity into three categories: combinatorial creativity, recombining existing ideas or elements to create new things; exploratory creativity, where you’ve got a predefined dimensional space and you’re going to bits of it that haven’t been mapped out yet; and transformative creativity, which is completely upending the apple cart, developing new dimensions. People point to Picasso or Einstein as examples of that kind of transformative creativity, and often will say AI can definitely do the first thing, maybe can do the second thing, but it’s not clear it can do the third thing. That’s maybe one way of putting your point, Dan. It’s certainly true that we’ve not seen any dramatic scientific breakthroughs that have been primarily AI-driven as opposed to AI-assisted.

One reason I am a little optimistic here, though, is that in other domains, most notably Go, there’s the famous “Move 37” in game two. In case anyone doesn’t know, and I think we’ve talked about it before on the show, this is in the second game between AlphaGo and Lee Sedol, the Go world champion, back in 2016. AlphaGo made this bizarre move that no human player would make or had made in the past, and yet it was really effective. The system knew what it was doing, and this has now been incorporated into the way human players actually play Go. So I think that’s probably a pretty strong candidate for a genuinely transformative piece of creativity, at least if we’re classifying it by its impact rather than its process. That’s obviously a very different domain; you’re operating with very well-constrained rules and goals that maybe allow for that kind of transformative creativity. But I am optimistic those kinds of transformative leaps could eventually come from AI systems, even general-purpose ones like LLMs. What do you think, Alex?

Alexander Kustov: I really like this distinction between combinatorial creativity and the other types. Combinatorial creativity is definitely something LLMs are really, really good at. It’s kind of similar to translation: you mix and match different things. I’ve definitely seen a lot of really cool ideas come out, on the immigration stuff I work on, from LLMs, when I was doing brainstorming. This is undeniable at this stage. When it comes to transformative creativity, I wonder whether the reason we don’t really see it much is because we don’t really have AGI yet. I know you’ve talked about AI consciousness and all those questions. Maybe if we let the model think for itself and live in the wild, it’s going to happen. But right now, for most people, they set up a goal themselves for these models. Maybe that’s exactly why we don’t see transformative creativity, because you can’t just set up a goal and have it come up with something transformative. You have to specify the goals, and the goals are usually specified by people who can’t really do the transformation themselves.

But going back to Dan’s point about the paradigm shift, I do think we’re in this stage right now where, even if you concede that AI can’t have transformative creativity, just because we can now offload all this grunt work to AI, including email and all the other stuff that takes a lot of time, we can do other things that are creative and potentially transformative. That’s what I see with myself: I’m spending less time on administrative stuff and email, and more time brainstorming my ideas, talking to people, and doing really valuable networking and public engagement, which I’d never be able to do otherwise.

Dan Williams: We’re in this great space at the moment where you’ve got incredibly smart, helpful AI tools, but you don’t have truly transformative AGI. So there’s still a role for human insight, judgment, and creativity. If that gets taken away over the next several years, that’s a very different kind of situation. I think there’s definitely a chance that by 2030 we have AI systems that can substitute for everything human beings can do cognitively. And then that’s a very different kind of world, and a very demotivating kind of world in some ways.

AI writing, detection, and disclosure

Dan Williams: Let’s talk about writing. We’ve touched on this a few times already, but I know you’ve got interesting things to say about it, Alex, and potentially quite heterodox views. At the moment, more and more people are using AI to write. There are also these AI detectors. I think Pangram is the one which seems to be used the most, or that people trust the most. It’s got a very low false positive rate, as I understand it, although I’m not entirely sure how they go about establishing that. Many people think that if you use AI to write something, whether it’s a blog post, a novel, a poem, or an academic article, and it’s found out that you’ve done that, you’ve done something really bad and discrediting. My understanding is you don’t see it that way, Alex. So what’s your view?

Alexander Kustov: A lot of it goes back to this idea of disgust sensitivity, talking about personality traits. There are some things people just think are “yuck” for whatever reason. It’s totally subjective. I don’t think you can really rationalise it; I think it’s some ground truth. I should say I’m coming to this from the perspective of someone born in the Soviet Union, where the Russian culture is very literate and people take a lot of pride in using proper grammar and speaking properly. I see a lot of parallels here with the previous wave of grammar Nazism, where people would ignore the substance of what you’re trying to do and point out typos, or “whom” instead of “who,” or the other way around. Obviously it has some function and might be useful in some respects, especially when you’re in school, but it takes up a lot of energy. My worry is that this whole new AI detection situation is going to be similar, where people spend a lot of time on very superficial pattern recognition. Right now you look for em-dashes and some other patterns and try to decide whether something is worth reading. That’s the common justification for Pangram use, that you want to make sure what you’re reading is worth it.

The problem is that even within the realm of human-made writing there’s a lot of slop, and you’re not going to be exposed to and won’t read 99.9% of it. Given the trajectory of the tools, I’m not sure that knowing something is AI-generated is necessarily worse. A lot of it is about the status signals people have. I personally don’t like very clear AI tells either; it rubs me the wrong way. But who am I to judge? What if it’s a non-native speaker, and the counterfactual to me reading their AI-generated text, which is potentially thoughtful, is just not reading it at all, because they can’t speak English well? People don’t think about it this way. They compare AI-written text to the best, to Shakespeare. I don’t think that’s the relevant comparison. Most of the text people write is not good, and to the extent that some people can improve it using AI, I think that’s good.

Practically speaking, if you’re an academic and you want to write more and you’re afraid of others calling you out for using AI, just use a style guide. Use a CLAUDE.md or AGENTS.md file to tell it not to use those phrases. Tell it multiple times, because it still adds em-dashes. But there’s a way to use AI for writing in your own voice, and I think it should be morally justifiable, depending on the realm. One thing I’ve been thinking about, and I’m going to workshop this idea with you, is that there’s a spectrum of the ethical justification of whether it’s okay to use AI for writing.

Clearly we can think of some examples, like a student assignment that needs to be human-made; when it’s AI-written, it’s a failed assignment. That’s a pretty clear case. The way professors think about this mostly comes from detecting their students cheating, and that’s why they think about it that way. But it’s a very rare scenario. In fact, a lot of professors right now encourage their students to use AI. I talked to some colleagues recently in stats classes who produce a regression paper in ten minutes on their computer and tell their students, “that’s something I can do in ten minutes, so you should do something better than this,” with AI or not. That’s a pretty good educational approach for some situations.

Another example I mentioned in one of my posts is that when you go to a live concert, there’s an implicit presumption that it’s going to be a live event and people are going to be singing themselves. If you notice and catch them not singing and using some device, that’s not cool. The same thing here: if you’re paying for someone to write you a human-made letter, a condolence email, it’s totally fine to be upset if they use AI for it. That’s totally justifiable. But on the opposite side of the spectrum, when you get a very formulaic email from your administrator, I think it’s totally justifiable to outsource that to AI, to your agent who knows your schedule and what you’re going to do, and no one’s going to be upset about it. People disagree on the margins of what’s acceptable. When you create a graph with data you worked on and understand, and you ask AI to describe it, I don’t see the problem; it’s probably going to be more accurate than most humans. Maybe we can have a social norm where if you say “I feel,” then it should be you who says that, as opposed to Claude. We’re still in this limbo where the norms aren’t clear, but we should be clear about what’s good and what’s not. It’s very hard for me to make a blanket statement that AI writing is good or bad; it really depends on the particular scenario. There are scenarios where it’s totally uncontroversial to say it’s okay to use AI, and scenarios where it’s totally uncontroversial to say it’s not. But the middle ground is what we’re trying to figure out right now as a community of knowledge.

Henry Shevlin: I’m curious: Dan, how much of a hatred for obviously AI-generated text do you have? I have to say, I’m generally pretty AI-positive. I’m a very heavy user of AI. But I do definitely downgrade my assessment of text when I realise it’s just obviously AI-written. There are a few things going on there. One is that it’s not even so much that the text is AI-written, it’s the AI voice, the very specific voice. I just think it’s such a boring voice at this point. It’s so homogeneous. If someone wrote a brilliant comment or a brilliant reply to me on Substack or Twitter, or sent me a brilliant email, and I subsequently found out it was AI-generated, I don’t think I would care. But this one specific, overfitted, “it’s not X, it’s Y” just drives me up the wall.

I guess, focusing just on the question of whether there are, even setting aside those stylistic issues, specific contexts in which AI usage itself might be problematic. Another example, Alex, I love your example of the bands and people not lip-syncing. Another silly one is that a handwritten note does mean a lot more than a generic email, so sometimes it is precisely the effortfulness that makes the difference. But I also wonder whether, to some extent, we’re misled into thinking the average quality of AI-generated writing is worse than it is, because of what I’ve heard called the “toupée phenomenon.” Everyone thinks wigs look so bad, and that’s because your sample of wigs that look bad is the ones you can tell are wigs. If they’re good toupées, they don’t even make it into your sample. So in the same way, I think probably all of us are reading tons of AI-generated text that we’re not clocking as AI-generated.

Alexander Kustov: Yeah, there’s definitely survivorship bias. With my first post about the AI series, one of the reasons it got so controversial is because I used Claude to generate 99% of it, and I didn’t disclose it right away, and then I did post factum, and Pangram gave it 100% human. So that’s a false negative, which is not a huge deal, but it’s interesting. A lot of good writing is AI-assisted right now; we should just take that for granted. When we see something bad that’s clearly AI-written, it’s just those particular instances. The strongest argument I’ve heard for being upset about it is that if someone doesn’t bother editing the text, or even creating a style sheet to make sure they don’t use all those constructions at the same time, it probably means the underlying substance isn’t good either. But I’m not sure how true that is; it really depends on the context.

The problem with social media comments, when you see something clearly AI-generated, is that it’s also not clear whether it’s a bot or a real person using AI to voice their opinion. But if you know this person and their account isn’t hacked, and they have some AI writing tells, I think it’s fine. I’m also not very happy to see a lot of clearly AI-written stuff, but I’m trying to rationalise it in a different direction and think about why it’s actually a problem. I’m not sure.

Dan Williams: I think that ultimately, in contexts that have to do with academic writing, or people publishing their views and participating in debates, you should just be judging things on the quality of the contribution rather than its provenance. It just so happens that, at least when it comes to the AI writing that I detect, the quality is bad, for the reasons we’ve discussed. I just hate the style of writing you find with these models. I find there’s something really cringe and annoying about it. But that’s not a necessary feature of AI writing; it’s just the way the current models have been post-trained to produce a particular kind of style. And to Henry’s point, if I discovered that, for example, my favourite blogger, Scott Alexander of Astral Codex Ten, who I think is the king of Substack, had been generating his posts with AI over the past two years, well, I think those posts have been amazing. So I wouldn’t think, “now that I know it’s AI-generated, I’m going to retract that assessment.” That would be ridiculous to me. So in principle we should be judging things based on the quality of the output, not the provenance.

But I do then think, even if you think there’s this separate question about disclosure norms and what they should be, you make this really important point, Alex, which is that at the moment there’s a problem with disclosure norms: they end up just punishing honest people. Because if you come out and say you used AI to write something, as you did with your first post in the series, there’s a massive backlash. So if you’re honest, you get this huge reputational damage associated with doing it, which is going to discourage people from being honest, which means the dishonest people get access to the benefits of AI-generated writing without any of the reputational costs. As an equilibrium, asking for disclosure norms just doesn’t really seem either desirable or possible. Firstly, is that an accurate summary of your point of view? And secondly, do you still think that’s basically the correct point of view when it comes to disclosure norms?

Alexander Kustov: As a newly minted associate editor at a journal, where we’re probably going to expect a surge in AI slop that we have to deal with, I’m very cognisant of the potential problems. Right now the go-to move among people doing journal editing, and probably what we’re going to do in our journal, is to introduce checkboxes for AI use. My sense is that we’re going to do that, but no one’s going to care, because no one’s going to report it truthfully. This is one thing where I strongly disagree with Kelsey Piper, who I deeply respect: I really don’t think it works out, at least for academics, especially in this environment where people feel very strongly and viscerally about this. Coming out and saying you use AI is just not going to do any good for anyone.

Another issue is that, to the extent you have some people who are completely anti-AI, disclosing that you used AI for, say, research assistance with data collection, as opposed to writing, what’s going to be worse for them? Any checkbox you have there is probably not going to satisfy them. So it’s strange to me that this is the solution we came up with. I can see how honesty can be rewarded in some contexts, and I’ve seen people on Substack say they used AI for help with data collection or writing. As a quantitative social scientist who primarily cares about data and quality, I’m surprised people think it’s more okay to use AI to collect and analyse data but not to write about it, because the first part is much more important. So I’m really going back and forth on it. But it’s hard for me to come up with a scenario where AI disclosure is actually going to work and solve anything.

What needs to happen is for us to change some of those norms. The same way we’re upset with AI tells, we’re also upset with how Gen Z, or whatever the new generation is, writes without capital letters. I can’t stand it, but that’s how people write, and who am I to judge? So I understand why people want to judge the quality of the substance, but they use the shortcut of the style of the prose to substitute for the quality. Going back to Henry’s point, what’s going to happen is that people are going to use your previous reputation as the main marker of whether you do something valuable. That’s why I feel for incoming grad students and newly minted professors, because it’s really hard to establish your reputation now with all this stuff happening with AI. Whereas if you were Daron Acemoglu, the most cited economist in the world, who’s been writing a hundred papers before AI was cool, there’s literally nothing he can do with AI or without AI that’s going to change your opinion about him. So people are going to be using these shortcuts more, which means that, from a certain ethical perspective, people are going to be discriminating more based on hopefully immutable but also immutable traits. That’s another thing to consider: people are going to be more trustful of their ethnic in-groups, or people who went to Harvard or work at Cambridge, than minorities. So there are interesting questions coming up about all that. I don’t have any solutions, unfortunately.

The backlash

Dan Williams: Should we come full circle? We touched on this at the beginning, but there’s the content of what you wrote in the series, and then there’s the response to what you wrote, a lot of which was very angry. You mentioned you had people calling for you to be fired. A lot of that came from people on Bluesky. We’ve talked a little about the Bluesky intelligentsia previously on the show, and I’ve written about it on my blog as well. Firstly, do you want to say a bit more about what the reaction has been in general? You’ve touched on it here and there, but summarise it. And do you think there’s a way of steelmanning it? What’s the best possible case for why some people get so furious, so angry, with this kind of stuff?

Alexander Kustov: I had some conversations. I haven’t lost any friends, so that’s one thing I should say; I haven’t gotten cancelled, because I’m tenured. I specifically waited for all my hot takes to happen after I got tenured, and maybe that was a good idea after all. But I did have some conversations with some really good friends who disagree with me on AI, and it definitely helped me refine some of my points. The biggest criticism I received that I see some relevance in is the idea, going back to our conversation, that humans are really not that good, and the concern that giving them this AI tool is just going to amplify all the bad stuff. To the extent you want to encourage norms of no p-hacking and doing really good, careful work, just telling people you can produce a paper with AI easily is not a good thing to talk about.

There was a recent big thing on Twitter about the practices of academic citations, where someone was saying that in practice academics don’t really read the stuff they cite well, and a lot of people interpreted it in a moralistic way, saying no, you should cite things. So you have the same thing here, where people interpreted my arguments in a normative way, that I’m saying they should do something or not do something, and it was against what they were trying to do. It’s also about this idea you mentioned about the role of academics as a kind of vocation, where you explore the world and self-actualise. I don’t think that, when people actually think it through, they would defend it on the merits, but implicitly that’s how a lot of academics think about their job, and to the extent that we now have tools that are threatening to them, it’s just not going to end well.

Trying to steelman the concerns people generally had, some people thought strategically it’s not a good idea to be vocal about it right now, in this moment. As someone who does a lot of work on immigration, I very much disagree with that, because I think we lost voter trust, as liberals and mainstream institutions, on immigration exactly because we were not saying certain things, and the same thing can happen on AI. It’s never a good idea to have a strategy where you do something that’s supposed to be good but that you don’t want other people to know about. I just don’t see how it works out in equilibrium. But I also see some argument that maybe in this particular moment it was not the best time to talk about it. That’s what I got from a lot of that.

Dan Williams: Henry, your microphone’s not on.

Henry Shevlin: Sorry, I keep making that mistake. I was going to ask whether there could just be a straightforward economic analysis here about why the current anti-AI coalition has the shape it does, namely that elite knowledge workers are overwhelmingly liberal, and AI predominantly threatens elite knowledge workers. You could maybe draw parallels in the same way that most of the opposition to climate change is concentrated on the right, and, speaking very crudely, to the extent that you’re looking at manual workers who in the US context skew a bit more to the right and maybe work in more energy-intensive industries. But I guess the question I’m asking is, is this just about the economics with a social gloss over the top? What does that explanation miss?

Alexander Kustov: Some of it, for sure. But a lot of my work in public opinion says that a lot of people’s preferences are sociotropic, based on their ideas about what’s good for society, not necessarily their self-interest, unless it’s really in your face that it’s going to be bad for you. Some of the interesting contingent of haters I had on Bluesky were professional translators who were very upset with my takes on the fact that AI can translate things. I had this silly example, which is a true thing, that my mom wasn’t sure about a prescription she got from the doctor, because it was all in English, and she translated it. Someone was saying that I’m putting my mom under potential harm because she didn’t use a qualified certified translator, and the person saying that was a certified translator. So you see some connection there. But for the vast majority of folks, when it comes to academics who produce a lot of critical theory slop or DEI slop, AI can do this much better than them, but I don’t think they realise it. So there is an objective threat to their self-interest, but the reason they oppose AI is because they have a lot of other bad ideas.

Dan Williams: There’s also this thing that I think Dean Ball calls the “omni-cause.” You’re against AI, but that means you have to be against AI in every possible respect. And if you point out one area where AI can actually be quite good, people draw inferences about you, that you’re not on the right team. I found this a couple of months ago, when I wrote some essays, and I was at a workshop where I argued that, relative to the actually existing alternatives, like social media pundits and a lot of legacy media, large language models actually are a pretty good source of high-quality information, that they’re a force for truth. There was a lot of negative response to this, which in my view is a fairly obvious thesis. Afterwards I was getting this response: “so you’re pro-AI.” To me that’s just such an unsophisticated way of thinking about it. I’m really worried about many aspects when it comes to AI, when it comes to power concentration, the economic impact, and how we’re going to cope with it. It doesn’t mean that with every single question you have to think AI is bad in every single way. I sense that, especially on the left, there’s this reflexive opposition to AI, this view that any claim that AI can actually do anything useful or have positive consequences is viewed as a betrayal. Okay, we’re coming to the end, Alex. Was there anything else you wanted to talk about that you didn’t mention?

Alexander Kustov: Yeah, related to the last thing you mentioned. I don’t know if you saw it, but after getting all this vitriol on Bluesky, there were a few days where I got positively retweeted by hundreds of folks, because the thing I said was that we should ban electronic devices in all classes. When it comes to teaching, I’m much more pessimistic about AI. A lot of people were like, “what? This guy is an AI booster, how can he use AI but not allow his students to use AI? What’s going on?” You can have a complex opinion on a difficult issue. So there’s definitely this omni-cause, binary thinking, and also moral contamination, where once you start doing something you’re not supposed to be doing, you’re a bad person in all other respects.

To finish all that, I feel like we need to move on beyond that. In line with my immigration research, we have to meet people where they are. If people have concerns about AI, they might be mistaken, but they probably have some ground truth in them. So we shouldn’t just say they’re mistaken and wrong and stupid. We should explain to them that they can actually use AI for the good, for whatever they want to do. You can make slides with AI, and when professors learn about that, they forget about all the bad stuff they wrote just a few days ago.

Dan Williams: Fantastic. Well, thanks, Alex, and thanks everyone for listening. We’ll be back soon with another episode of Conspicuous Cognition.

Alexander Kustov: Thank you.