How good is AI at literary criticism?

it is starting to improve opinion into knowledge.

Jan 12, 2025

Until recently, although I was excited about LLMs, I thought they were pretty hopeless literary critics. (And the poetry they write was—and is—the purest dross.)1 While it was doing well on maths and law exams, it made little progress on English exams.

LLMs have long been good as a “footnote on demand” service. You can drop a confounding line of poetry or a page of Ulysses in and get a pretty good explanation. It will also provide context. Whenever you are stuck wanting to know how to play the card games in Jane Austen, or what “Ineluctable modality of the visible” means, ChatGPT or Claude can tell you.

You still have to think for yourself. You still need to puzzle out the words, think through the fog of uncertainty. But now you can ask for etymologies, test your opinions, and so on. I don’t do this very much, but it’s one way that LLMs have been performing well. (LLMs are also useful for scholarship, such as testing the claims made by scholars at the macro level, but that’s a distinct topic, so I won’t cover it here. This piece is Aeon covers such interesting subjects as how Propp’s folktale theory was updated using AI.)2

But for bigger critical questions, they were dreadful. The main problem was that they seemed to have learned how to analyse literature from those people who think the study of English is a largely subjective affair. You know the sort. They talk a lot about the human condition and they say things like “literature can only teach us about itself” or “literature shows us what it means to be fully human.” Obvious nonsense dressed in the robes of respectability.

What on earth does it mean to be fully human? No-one can say. One reason why literature struggles to attract people in modern culture is that these statements are unbearably vague. Spend thousands of hours reading Shakespeare and Tolstoy and you too could speak in the most mundane platitudes! Give years of your life to the close reading of the great poets in order to wave your hands and talk about the human condition!

I won’t link to any of this work, partly because it would be mean-spirited, but mostly because I don’t want to encourage you to read that stuff.

To begin with, the models spoke like this too. They must have learned it online.3 All that vague and unknowledgeable talk about feelings and impressions meant it sounded like a school child trying to find something to fill up half a page.4

That has now changed.

The question I like to use as a test is taken from a 1970s A-level exam.

“Whatever else its merits, The Clerk’s Tale does not achieve its effects through surprise.” Discuss.

The Clerk’s Tale is a test narrative.5 To give a good answer, you must deal with the genre, the audience expectations, and preferably with the Bible. You need to explain that it is not real. It is more like allegory, in the vein of the Book of Job. And you need to make it clear that Chaucer’s audience knew that. Knowing what to expect, of course, doesn’t make it less astonishing, even if it makes it less surprising. (I call this the expectation of astonishment.)

In 2023, ChatGPT’s answer was fine. It said things like “The Clerk's Tale is a story that achieves its effects primarily through the use of characterization, symbolism, and themes.”6 Nothing really about the genre, the Bible, and so on. GPT4o mini did better, talking about Boccaccio.7 But it still talked a lot of flannel: “its primary effects are achieved through its exploration of profound themes, rich character development, and the moral questions it raises.”

Now, o1 gives a better answer yet, starting with Petrarch, noting that the tale is more about character than plot, points out that Walter says he will test Griselda, makes a Christian parallel, and so on. The conclusion is much less vague.

…the real drama emerges from the audience’s foreknowledge and the tension between moral idealism and human compassion.

I also asked o1 about The Wife of Bath, Sonnet 27, Prospero, and the origins of the novel. Each answer showed the same thing. Much more knowledge, and much less of the vague claptrap that passes the wrong sort of English Literature exams. This is far from perfect, but it is improving. The model is learning not to talk about “profound themes”.

Good criticism is specific. Increasingly, ChatGPT is improving at this sort of criticism. It failed at the question of whether the Baroness in The Sound of Music is a Nazi. Ambiguity bothers it. It isn’t very good at discussing mood. But it is pulling away from the genre of criticism that waffles about the human condition. Its answers are less and less torpid.

Sure, it lacks style. And it keeps to the surface. LLMs aren’t going to be writing in the London Review of Books in the short term. But as they get smarter, they will hew more and more to the true purpose of criticism, which is, in the words of Samuel Johnson, to improve opinion into knowledge. And as they do so, they will become increasingly useful to all good common readers, who are themselves in the pursuit of knowledge.

Francesco di Stefano Pesellino - Episode from the Story of Griselda

My one success was asking a model to read an Emily Dickinson poem, analyse it, write a commentary, tell me what it had learned, and then asking it to write an imitation. The result was largely poor, but it had one image, describing a postbox during the American civil war as a sentinel of hope that I still remember.

“Conversation is one area where computational methodology has been shown to trump the claims of literary scholars – even scientifically inclined ones. In his Atlas of the European Novel (1999), Moretti suggested that the bustling urban setting of much 19th-century fiction tends to involve more characters but less dialogue, compared with narratives set within the confines of the family in the village or the countryside. A group of computational linguists and literary scholars at Columbia University decided to investigate this claim, using software that built a conversational social network from a corpus of 60 novels from the 19th century.

“The software parsed each sentence in terms of its syntax, and then found references to people. It also flagged stretches of quoted speech and attributed the quotes to speakers. This allowed the system to discern who was talking to whom. Although Moretti’s theory predicted an inverse correlation between the amount of dialogue and the number of characters, these scholars found no such statistically significant effect. Instead, they discovered that narrative voice, such as first- or third-person narration, was more relevant than the setting in urban or rural environments.”

For example, an LLM told me recently that the problem plays and the romances in Shakespeare are often the same. I haven’t heard this. The only place I could find that information were places like SparkNotes or similar.

One problem is that it is possible to verify a wrong answer in the sciences quite easily. Whereas in literature we are more likely to say an answer is bad than wrong, opening up the possibility of mediocrity defending itself with the weak sword of personal opinion.

Walter, the Marquis, marries Griselda, a village girl. He decides to test her wifeliness and has her children taken away at birth. She thinks they have gone to die. Another Marquis, however, raises them in secret. Years later, Walter tests Griselda again: he sends her home and tells her he is marrying a younger woman. Griselda dutifully goes home but comes back to help with the wedding preparations. The big surprise is that the children are coming back. Instead of marrying one of them, Walter presents the young woman and the young man to Griselda as her returned children.

Here’s another extract from the answer.

The primary theme of the tale is the unwavering loyalty and obedience of Griselda to her husband, Walter, despite the harsh tests he puts her through. Chaucer’s use of characterization highlights this theme through Griselda’s humility and virtue, as she willingly submits to her husband’s will. The symbolism of the name “Griselda,” which means "gray" or "humble," reinforces this theme, as Griselda's humility is a key aspect of her character.

https://chatgpt.com/share/678434f3-9d8c-800b-9783-3a389ed129d0

Chaucer’s audience would likely have been familiar with the Griselda story, which originated in Giovanni Boccaccio’s Decameron and was widely circulated in medieval Europe. This familiarity reduces the potential for surprise, as the audience would anticipate the plot's progression and focus instead on the moral and ethical implications presented. The tale's effectiveness, therefore, hinges not on unforeseen developments but on the contemplation of its central themes and the emotional resonance of Griselda's trials.

https://chatgpt.com/share/678436bb-f948-800b-af01-df9e7d10e208

Irina Dumitrescu

Jan 13

If it's getting "better," that's probably because some academic publishers are signing deals allowing AI companies to train their software on their authors' work. About twice a week I get an email from Cambridge UP reminding me to sign the updated contract giving away my rights in this respect. They don't offer anything for this, nor do they offer the chance to refuse. I've written their staff and told them I will not sign this, they've apologized and promised the emails will stop, the emails did not stop. So lovely that my publisher is keen on selling away rights to my work and increasing the speed according to which my students stop reading and writing.

In short: AI did not produce that "knowledge." People did. But people won't anymore, not the next generations, because they will be so reliant on this software that they will neither be able to read full texts for themselves (this is already a challenge) nor will they be able to write a few hundred words of their own to figure out their own thoughts. I hope someone is having fun with this.

Expand full comment

SkinShallow

Jan 12

It's still really completely useless at anything even slightly less known, and in annoying ways that take a minute to figure out (that it doesn't know anything and is genuinely confabulating); or at less standard takes. So for example, it deals with the pentancle symbolism and role in the "Gawain & Green Knight" ok, but it mostly falls on its digital face on the subject of the Green Chapel. And the reason for it seems pretty simple: much more had been written on the former than the latter.

Similar with close reading of more contemporary poetry. I'm not sure what would happen if it's given a text verbatim and some context/approach angle but I'm not holding my breath for eg metre analysis considering it's still incapable of consistently formatting citations in less frequently used styles.

8 replies by Henry Oliver and others

19 more comments...

The Common Reader

Discussion about this post