Who’s the real main character in Shakespearean tragedies? Here’s what the data say

In Shakespeare’s 37 plays, more than 1,200 characters speak more than 880,000 words to each other.

The numbers are staggering. But for Martin Grandjean, a data scientist based in Geneva, they’re also an opportunity.

Grandjean examined how many and which characters appear on stage together at different points in each play to produce data visualizations that graphically represent these numbers. The graphics reconsider the relationships between different characters in the play, including who plays the most central role in the script — a result that Grandjean said sometimes surprised him.

I asked Grandjean about his process to create the visualizations.

Can you describe to me what these graphics show?

These graphics are a network analysis of the characters of Shakespeare’s tragedies. Two characters are connected if they are simultaneously present in the same scene. This graphical representation helps to realize the scope of the “social network” behind the drama plays. It shows that some tragedies involve only small groups of characters, while others involve almost all the characters in key scenes. The size of the circles indicate how many connections a character has, showing, for example, that Caesar is not the main character in Julius Caesar.

How did you create them?

First, I create a file where I list the characters appearing in every scenes of the play, taking into account that some characters are likely to enter the scene after its beginning, or sometimes leave before the end. Then I make this spreadsheet into a file where every character is linked to the character that appeared at the same time in the first file.

I visualize the file in a Network analysis software (here, Gephi), and create the graph with a force-directed algorithm, an algorithm that arranges the circles like magnets on a table: if they’re connected they are attracted, if not they repulse themselves. This produces a graph where the communities are very visible, if strong communities exist, like in Timon of Athens, Romeo and Juliet, or Antony and Cleopatra.

I repeat this process for every play, and then export everything in a vector file to design the labels and colors.

Data visualization by Martin Grandjean

Data visualization by Martin Grandjean

Why did you decide to use Shakespeare texts as subject matter?

I’m currently developing an interface with three colleagues from the University of Lausanne and the EPFL to read theater pieces with a dynamic network showing the interaction between the characters. This experimental project will be presented at the Digital Humanities international conference in Kraków, Poland, this summer, and we are particularly testing this model on French-speaking works (Jean Racine, Molière, Jean de La Fontaine and others).

I also chose to map the well-known works of William Shakespeare. The interest of this analysis is especially its comparative aspect. I initially wanted to map a larger corpus, but I decided to limit myself to the tragedies.

Data visualization by Martin Grandjean

Data visualization by Martin Grandjean

What can we conclude from these visualizations?

Above everything, the purpose is to compare the structure and density of the whole corpus. The longest tragedy, Hamlet, is not the most structurally complex and is less dense than King Lear, Titus Andronicus or Othello. Some plays clearly reveal the groups that shape the drama: Montague and Capulets in Romeo and Juliet, Trojans and Greeks in Troilus and Cressida, the triumvirs and Egyptians in Antony and Cleopatra, the Volscians and the Romans in Coriolanus or the conspirators in Julius Caesar.

Of course, a literary scholar will know very precisely the plot of every play. But giving this type of overview to the non-academic readers is an important challenge for me.

What are some other projects you’re working on?

As I’m not a literary scholar, my research is mostly about the application of social network analysis and data visualization to historical sciences. I’m mapping exchanges of 30,000 letters between scientists and intellectuals after the First World War, trying to understand the structure of this international field.

See more of Grandjean’s work below.

Data visualization by Martin Grandjean

Data visualization by Martin Grandjean

Data visualization by Martin Grandjean

Data visualization by Martin Grandjean

Support PBS NewsHour: