The other day, Harry Potter author J.K. Rowling was outed as the real writer behind “Robert Gailbraith” and his debut novel, The Cuckoo’s Calling. After receiving an anonymous tip, an editor at the Sunday Times teamed up with two computer scientists to compare the two authors’ writing and connected Rowling to her pen name.
Virginia Hughes talked to both those scientists got learned what they looked for in their writing analyses.
[Duquesne University computer scientist Patrick Juola’s] final test completely separates a word from its meaning, by sorting words simply by their length. What fraction of a book is made of three-letter words, or eight-letter words? These distributions are fairly similar from book to book, but statistical analyses can dig into the subtle differences. And this particular test “was very characteristically Rowling,” Juola says. “Word lengths was one of the strongest pieces of evidence that [Cuckoo] was Rowling.”
It took Juola about an hour and a half to do all of these word-crunchings, and all four tests suggested that Cuckoo was more similar to Rowling’sCasual Vacancy than the other books. And that’s what he relayed back to Flyn. Still, though, he wasn’t totally confident in the result. After all, he had no way of knowing whether the real author was somebody who wasn’t in the comparison set of books who happened to write like Rowling does. “It could have been somebody who looked like her. That’s the risk with any police line-up, too,” he says.
Meanwhile, the other scientist, Peter Millican, was doing his own digging in Europe.
Millican found a few potentially distinctive words in his Rowling investigation. The other authors tended to use the words “course” (as in, of course), “someone” and “realized” a bit more than Rowling did. But the difference wasn’t statistically significant enough for Millican to run with it. So, like Juola, he turned to the most common words. Millican pulled out the 500 most common words in each book, and then went through and manually removed the words that were subject-specific, such as “Harry”, “wand”, and “police”.
Of all of the tests he can run with his program, Millican finds these word usage comparisons most compelling. “You end up with a graph, and on the graph it’s absolutely clear that Cuckoo’s Calling is lining up with Harry Potter. And it’s also clear that the Ruth Rendell books are close together, the Val McDermid books are close together, and so on,” he says. “It is identifying something objective that’s there. You can’t easily describe in English what it’s detecting, but it’s clearly detecting a similarity.” [Only Human]