Jane Austen, computer algorithms, and the enduring importance of the literary expert

February 1, 2013 by · Leave a Comment 

To the literary critic’s toolbox, which includes concepts such as mimesis, irony, and the unreliable narrator, it might soon be necessary to add stylometry and culturomics. The former refers to a quantitative analysis of a writer’s vocabulary, syntax, and lexicon, and the latter refers to a similar quantitative analysis undertaken in the area of the humanities. What is significant about both is that they are handled by a computer running sophisticated algorithms like the kind used by Google or Amazon.

A recent New York Times article points to the way these computer algorithms were employed to determine that Jane Austen and Sir Walter Scott are the two most influential writers of the 19th century. The study, undertaken by Matthew L. Jockers, found that Austen and Scott “had the greatest effect on other authors, in terms of writing style and themes.” To some extent, it is unsurprising that the authors of romantic social comedy on the one hand and mass-appeal adventure stories on the other should be influential: these are still the kinds of novels that dominate bestseller lists today. (It is important to note that when people talk about their affection for Jane Austen, it is usually Pride and Prejudice they’re thinking of, not Northanger Abbey.)

Here’s the NYT on Jockers’ project:

He based his conclusion on an analysis of 3,592 works published from 1780 to 1900. It was a lot of digging, and a computer did it.

The study, which involved statistical parsing and aggregation of thousands of novels, made other striking observations. For example, Austen’s works cluster tightly together in style and theme, while those of George Eliot (a.k.a. Mary Ann Evans) range more broadly, and more closely resemble the patterns of male writers. Using similar criteria, Harriet Beecher Stowe was 20 years ahead of her time, said Mr. Jockers, whose research will soon be published in a book, Macroanalysis: Digital Methods and Literary History (University of Illinois Press).

While not claiming to know what “the patterns of male writers” means precisely, this is interesting information, “an intriguing sign that Big Data technology is steadily pushing beyond the Internet industry and scientific research into seemingly foreign fields like the social sciences and the humanities.” It is probably overstating the case, however, to compare (as the NYT goes on to do) a statistical algorithmic literary analysis to the impact of the microscope or the telescope.

In any literary endeavour, statistics will only get you part way. Human beings are still needed to effect a more nuanced investigation into literary history and the traditions that inform it, something the NYT article points out: “Quantitative tools in the humanities and the social sciences, as in other fields, are most powerful when they are controlled by an intelligent human. Experts with deep knowledge of a subject are needed to ask the right questions and to recognize the shortcomings of statistical models.”

While unarguably true, this is not good news in a world that seems to devalue the role of “experts with deep knowledge of a subject.” In an editor’s note in The Walrus, John Macfarlane bemoans exactly this problem, noting that in the digital age, expert analysis has been forced to take a back seat to popular opinion:

A people’s choice award was once a consolation prize for not winning something more estimable, like an Oscar or an Emmy, but in the age of Facebook and Twitter popularity rules.

This egalitarian impulse is the cultural assertion of the neo-liberal belief – itself increasingly popular – that the market should determine nearly anything. But more alarming is the flip side: a growing disrespect for knowledge and expertise. In contemporary North America, one person’s opinion is as good as the next, no matter how uninformed.

Popularity is paramount, as Macfarlane notes, and frequently in matters that don’t carry a whole lot of substance or import. More people in 2013 are likely to vote for the winner of So You Think You Can Dance? than are likely to vote in a federal election. And when people who do vote are asked what quality most attracts them in a potential leader, the answer is frequently, “The person I’d most like to have a beer with.” While conviviality and approachability are certainly admirable traits, it is devoutly to be hoped that substantial intellect and sober judgment would be more desirable attributes.

There isn’t much in the current culture to bolster such hope, however, and certainly not in the literary sphere. Substantial book review sections are shrinking or disappearing for want of readers, who would rather give a quick thumbs up or thumbs down to a book on Goodreads than work through 1,500 words of carefully crafted analysis by a knowledgeable critic like James Wood or Rohan Maitzen. While computers are busy counting the number of times authors use certain words, and making quantitative judgments about their relative influence as a result, it would be good if we did not forget the importance of having human experts capable of parsing the data and placing it into a broader, deeper context.