How Forensic Linguistics Revealed J. K. Rowling's Secret

What can statistical analysis of a text tell us about its author?

You’ve probably heard of forensic handwriting analysis, where experts use quirks of penmanship to link chicken-scratch to an individual, but what about forensic linguistics? Here’s Virginia Hughes, writing at Only Human:

With computers and sophisticated statistical analyses, researchers are mining all sorts of famous texts for clues about their authors. Perhaps more surprising: They’re are also mining not-so-famous texts, like blogs, tweets, Facebook updates and even Amazon reviews for clues about people’s lifestyles and buying habits. The whole idea is so amusingly ironic, isn’t it? Writers choose words deliberately, to convey specific messages. But those same words, it turns out, carry personal information that we don’t realize we’re giving out.

The United Kingdom’s Sunday Times received a tip via Twitter earlier this week that Robert Galbraith—the alleged author of a fairly obscure crime novel titled The Cuckoo’s Calling— was actually J. K. Rowling. In response, reporter Cal Flyn contacted two experts in forensic linguistics. One of them was Patrick Juola of Duquesne University, whom Flyn handed the text of five different books— Cuckoo , Rowling’s The Casual Vacancy , and three British crime novels.

Juola ran each book (or, more precisely, the sequence of tens of thousands of words that make up a book) through a computer program that he and his students have been working on for more than 10 years, dubbed JGAAP . He compared Cuckoo to the other books using four different analyses, each focused on a different aspect of writing.

For example, one test focused on concepts; another centered on adjacent words. Even a book’s most common words—”a,” “and,” “of,” and “the”—leave characteristic marks.

Juola’s final test completely separates a word from its meaning, by sorting words simply by their length. What fraction of a book is made of three-letter words, or eight-letter words? These distributions are fairly similar from book to book, but statistical analyses can dig into the subtle differences. And this particular test “was very characteristically Rowling,” Juola says. “Word lengths was one of the strongest pieces of evidence that [ Cuckoo ] was Rowling.”

Hughes’s article is chock full of juicy details on this and another authorship disputes. It’s not to be missed.

