Support Provided ByLearn More
Ancient WorldsAncient Worlds

From ashes to AI: How technology puts a new lens on ancient texts

Recent breakthroughs in scanning, image processing, and machine learning are helping researchers read historic documents once considered lost to time.

ByKatherine J. WuNOVA NextNOVA Next

One of the many Herculaneum scrolls damaged by the eruption of Mount Vesuvius in 79 CE. Image Credit: The Digital Restoration Initiative, University of Kentucky

Age is kind to no one.

Unfortunately, that includes the books, scrolls, and scrawlings of ages past. Thanks to centuries of floods, fires, volcanic eruptions, and plain old wear and tear, many ancient texts have degraded past the point of legibility—leaving countless chapters of human history as inscrutable as the words that once chronicled them.

But the key to deciphering the past might just lie in the technologies of the future. Thanks to recent advances, researchers can now digitally unravel, read, and translate texts once considered lost to time. Here are three of the most powerful methods scientists are using to put a new lens on the past.

Unveiling the innards of a scroll

Nowadays, the 1,700-year-old En-Gedi Scroll—one of the most ancient snippets of the Old Testament ever uncovered—isn’t much to look at. Ravaged by a fire that consumed a Jewish synagogue around 600 CE, the document transformed from a supple scroll into a charred, brittle cylinder of charcoal. The artifact is so delicate that, for decades, scholars dared not even attempt to peel back its layers for fear of destroying it for good.

But several years ago, a team of computer scientists led by University of Kentucky’s Brent Seales managed to crack the scroll’s contents without physically unfurling it. The non-invasive method, called “virtual unwrapping,” extracted intel from the burnt scroll with a combination of scans and image-processing algorithms.

The scroll itself made only a brief cameo in the team’s investigation, when X-rays were shot through its layers from several angles to visualize what was inside. These computerized tomography (CT) scans, Seales says, are the same ones radiologists perform on human patients with internal injuries—only instead of searching for bone breaks and tissue damage, researchers hunt for hidden text.

“Tomography is really powerful,” Seales says. “It gives you the ability to infer what’s inside something by taking views from all the way around, with a signal that goes all the way through.”

the charred En-Gedi scroll damaged by a synagogue fire

The charred En-Gedi Scroll. Image courtesy of the Leon Levy Dead Sea Scrolls Digital Library, IAA. Photo: S. Halevi. Pictured in Seales et al. Science Advances 2016

After the document was imaged, it went back into storage, and the team switched to a computational approach. What’s produced by a CT scan, Seales says, isn’t the type of picture you’d get from a typical camera. Instead, these machines generate cross sections—2-D “slices” of the inside of an object, like a front-on view of a cut loaf of bread. After collecting a series of these images, the team fed them into an algorithm that determined where one layer of the animal-skin scroll ended and another began.

Accomplishing this is especially difficult when a document’s pages have been warped, deformed, and smashed up against each other, says Yukun Lai, a visual computing expert at Cardiff University who, together with his colleagues, has published a series of papers that have digitally recovered the contents of other damaged historic texts. One way to deal with this, he says, is to use the typical thickness of parchment and other materials as a guide to digitally peel pages apart.

Interpreting (almost) invisible ink

Piecing together a page isn’t the same as reading what’s on it—and it’s not always easy to identify where ink has been laid down.

Relatively speaking, the text of the En-Gedi Scroll was fairly straightforward to detect, Seales says. Different materials block X-rays to different extents, and dense, metal-rich inks tend to pop when they’re printed on something carbon-based, like plant fibers or, in the case of the scrolls, the skin of an animal.

computerized tomography scan of the En-Gedi scroll showing a slice

A cross section of the En-Gedi Scroll, as imaged by a CT scanner. Image Credit: Seales et al. Science Advances 2016

Things get more complicated when you’re dealing with a document that’s carbon on carbon, as in the case of the Herculaneum scrolls—a trove of papyrus that survived the fateful eruption of Mount Vesuvius in 79 CE. The scrolls’ scribes used a substance called carbon black as their ink of choice, which, from an X-ray’s perspective, is just as dense as the stuff on which it sits, Seales says.

But subtle differences still exist between material that’s been written on and material that hasn’t. For instance, the addition of carbon ink alters the shape of plant fibers, creating tiny bumps on the surface of a sheet of papyrus. Though these mini-mountains can’t be picked out by human eyes, Seales says, they can be made obvious to a well-trained machine. For the past few months, Seales and his team have been training an algorithm to search for the structural signatures of ink in data produced from CT scans of scrolls.

The researchers have yet to apply the method to the Herculaneum papyri. But ideally, that’s where this research is headed, Seales says. With a bit more tinkering, he says, the “lost” texts of Herculaneum might soon be found.

Herculaneum papyri viewed from the side

Even with CT scans, Herculaneum scrolls are particularly challenging to analyze because the ink and paper block X-rays to the same extent. Image Credit: The Digital Restoration Initiative, University of Kentucky

Reviving a dead language

The En-Gedi Scroll is emblazoned with Hebrew, and the Herculaneum papyri with Greek and occasionally Latin. But many other texts were inscribed in languages that have since disappeared from common use.

“Just because you can read the letters, that doesn’t mean you know what they mean,” says Regina Barzilay, a computer scientist at the Massachusetts Institute of Technology.

Barzilay, whose work in artificial intelligence has spanned applications from early cancer detection to drug discovery, is now working with her graduate student Jiaming Luo to decipher lost languages.

Support Provided ByLearn More

Their strategy hinges on the predictable way languages change over time—a process that’s similar to how species split on the tree of life. Like organisms, languages can be grouped into families that share common traits, including the way their alphabets, vocabularies, and grammar look and sound. Even if words manifest differently between languages, they often remain recognizable: the French terre and the Spanish tierra, for instance, share the Latin root for “earth."

These similarities aren’t coincidences, Barzilay explains. If two languages share a point of origin, there are only so many evolutionary paths they can take, even as they’re diverging away from each other.

Feeding these rules into a machine can give it a structured way to translate a language, as long as it’s given appropriate familial context, Barzilay says. So far, she and Luo have successfully tested out their strategy with two extinct languages: Ugaritic, an early form of Hebrew, and Linear B, which shares roots with ancient Greek.

En-Gedi scroll unrolled after virtual unwrapping

The digitally unwrapped En-Gedi scroll. Image Credit: Seales et al. Science Advances 2016

These were proof-of-concept experiments, as both languages had already been meticulously deciphered by (human) linguists before any algorithms got involved, Barzilay says. Now, she and Luo are interested in teaching a machine to do what a person hasn’t: decode a totally lost language—perhaps even one with deeply contested evolutionary roots.

One example is Northeastern Iberian script, which has yet to be firmly classified into a language family. That means the algorithms will have to reverse engineer yet another piece of the puzzle, predicting the script’s origins before tackling its translation.

Once that’s possible, these techniques and more could even enhance the identification of ink in damaged texts, Barzilay says. “Even if letters are missing, or you don’t understand all of them, maybe you can complete it,” she says. “With all this’s about tying the whole process together.”

To learn more about how researchers are uncovering and analyzing ancient texts, watch “Dead Sea Scroll Detectives,” premiering on PBS at 9/8c on November 6.

Receive emails about upcoming NOVA programs and related content, as well as featured reporting about current events through a science lens.

Funding for NOVA Next is provided by the Eleanor and Howard Morgan Family Foundation.

Major funding for NOVA is provided by the David H. Koch Fund for Science, the Corporation for Public Broadcasting, and PBS viewers. Additional funding is provided by the NOVA Science Trust.