How we solved it: Stylometric Analysis

By The History Detectives Team
3 May 2010
Category: DIY Investigations

During Season 6 Tukufu investigated the book Female Life Among the Mormons, Maria Ward’s epic 1856 tale of her encounters with polygamy in Utah Territory. After reading the scandalous narrative, the owner of the book wanted to know who Maria Ward was and whether her story was true. To find out we turned to a technique called stylometric analysis.

Put simply, stylometric analysis is a way to identify an object’s creator by studying the style in which the object was created. While it’s been used to connect paintings to painters and songs to musicians, it’s most commonly applied to who-wrote-it mysteries. The technique gained notoriety in 1998 when FBI agents used it to compare the Unabomber’s manifesto against the writings of their suspect, Ted Kaczynski. This form of stylometric analysis, called authorship attribution analysis, is what we used to solve the Mormon book case.

How it works

Authorship attribution analysis rests on the assumption that every literate person possesses a unique literary fingerprint. The fingerprint is the frequency with which we write common words: the, of, but, if, and though, to name a few. For example, you might use the word “how” 12 times for every 1,000 words that you write. Abraham Lincoln might write “how” 7 times per 1,000 words. One word reveals nothing. But when we use a computer to calculate the usage frequency of all the common words -there are nearly 900 of them- a literary fingerprint begins to take shape. The fingerprints can then be compared to find a match.

Finding suspects

Every authorship investigation begins by assembling a lineup of suspects. This is where research comes in. Based on historical inaccuracies in the narrative, and a lack of genealogical info on Maria Ward, we determined that the book was a work of fiction written under a pseudonym. We also knew that the author had a pointedly anti-Mormon stance, so we sought out authors from the period who had demonstrated a similar prejudice. We then added another suspect who had been proposed by a historian in the 1970’s. Once we had our lineup, we needed a group of control authors who had written in the same voice during the same time period. Theoretically, these control authors would be weak matches for our mystery book, and as such would help us gauge the strength of similarity among the suspect authors.

Gathering evidence

After we’d identified all of our suspects and control authors, we started to collect digital samples of their texts. Google books, the Internet Archive and university and library websites had most of the texts we needed available digitally. But some we had to digitize ourselves. This was an arduous process of scanning books, page-by-page, into PDF or tif files until we had captured around 10,000 words from each book. We then converted the PDF and tif files into text format by running them through an Optical Character Recognition (OCR) program. The processed text was then proof-read to catch the mistakes inevitably made by the computer. For instance, every “be” for one text was interpreted as “loe.”

Crunching the text

The final step was the actual computer-driven analysis. For the Mormon book we turned to linguistics expert Dr. David Hoover of New York University’s English department. He employed a program that he designed himself to organize the texts by author and count the individual words. Once texts were fed into the program, the data was crunched to display the most likely matches. After months of searching for suspects and processing texts, this was our moment of truth. The results surprised us all. According to the test, the autobiography had been written by that prolific and enigmatic author… “A. Nonymous.” As is often the case, the analysis did not yield any strong matches, because we had not included the right suspect. But the process did lay to rest a long-standing myth that suggested an anti-Mormon politician had penned the novel. Stylometric analysis is highly accurate, but ultimately, it must be supported by historical and circumstantial evidence to solidify any conclusions about authorship.

Do you have a burning who-wrote-it mystery? Tell us about it on our story submission page.

Comments

This is a place for opinions, comments, questions and discussion; a place where viewers of History Detectives can express their points of view and connect with others who value history. We ask that posters be polite and respectful of all opinions. History Detectives reserves the right to delete comments that don’t conform to this conduct. We will not respond to every post, but will do our best to answer specific questions, or address an error.

 

blog comments powered by Disqus