Replicating Life in Code

A few years ago, Michael Levin faced a conundrum. He and his colleagues at the Tufts Center for Regenerative and Developmental Biology just outside Boston wanted to find a model that would explain why the flatworm—a model organism used throughout biology—looks the way it does. At a fundamental level, they wanted to be able to describe the cascade of events that leads to the growth of a head in one place and a tail in the other.

In fact, it was almost the same problem that Thomas Hunt Morgan, a Nobel Prize-winning evolutionary biologist and one of the founders of the modern study of genetics, faced more than a century ago. Back then, he was busying himself making careful cuts into flatworms. He sliced them lengthwise. He dissected them in half. From each segment, the worm grew a complete body. Eventually, Morgan carved off 1/279th of the worm, a bit selected from its mid-section, and demonstrated that it could regenerate an entirely new animal. He was trying, unsuccessfully, to understand why and how certain body parts developed where they did: Why did a head appear at one end of the worm after a cut?

Thomas Hunt Morgan, years before his famous flatworm research was published, reading papers at Columbia University.

Over the next 100 years, using new tools and insights, researchers replicated Morgan’s efforts in increasing detail: Make a cut here, a fully formed trunk and head will reappear. Tinker with this particular chemical or knock down this gene, and create a worm with a tail at either end. There are more than a thousand such papers. Yet, still, nobody has been able to fully explain why a head forms where it does.

Levin and his colleagues at the center, where he’s the director, have been tackling this problem for years. In that time, they have helped explain some basic questions about development and regeneration. By tweaking certain signals within flatworms, for example, he has been able to grow a worm with four heads or one with no head but a tail on either end. He and his team have even grown an eye on a tadpole’s belly. But those experiments didn’t help them understand how it all fit together.

“We have a massive literature of results saying ‘I did this to the worm and this happened,’ ” Levin says. “And we’re increasingly drowning in ever higher resolution genetic data sets. And yet, since Morgan cut his first worm, we still don’t have any model that explains more than a couple of different cuts.” And so, “after 120 years of really smart people going at it, I started to wonder, maybe this is beyond the ability of us to come up with off the top of our heads, to create a model that fits all the data.”

Schmidtea mediterranea
Developmental biologists use the flatworm Schmidtea mediterranea to study regeneration.

Computers, however, now offer enough power that Levin thought it might be possible to create such a model for a flatworm, one that would detail, in silico, the cascade of events that lead to the growth of a head at one end and a tail in the other. Still, even with a supercomputer’s worth of computational power, it wouldn’t be easy.

Yet Levin’s ambitions didn’t stop there. Full, in-depth models beguile nearly every aspect of biology, and Levin hoped to create a tool that could be employed beyond flatworm development, one that could eventually model the vastly more complex world of human diseases. If he did it right, it might even help develop cures for those diseases. All he needed first was the right computer program.

Enter Evolution

It’s fitting that Levin’s computer models of biology are inspired by one of the fundamental tenets of biology itself, evolution. He builds his models around the idea that computer algorithms can meet, mate, and select the most fit version to then mate with other fit algorithms. These so-called genetic algorithms were first proposed by computer scientist John Holland in the 1960s. But even earlier, in the 1940s and ’50s, “people were already thinking about using inspiration from biology to build life-like computer programs,” says Melanie Mitchell, a professor of computer science at Portland State University and author of the book An Introduction to Genetic Algorithms. For instance, John Von Neumann, one of the earliest computer scientists in the 1940s, envisioned computers that could replicate themselves, with code serving as what we now call DNA. Mitchell says that Holland saw the field of genetic algorithms as a mathematical tool that could help explain how adaptation occurs in evolution.

What Holland saw as theory, others took as a practical tool. For instance, one of his students, David Goldberg, used these new genetic algorithms to optimize plans for gas pipelines by mating different models until the algorithm came up with the best design. But while early computer scientists were limited by memory and speed, today’s more powerful computers can run increasingly complex models to process millions of possible combinations and save the best chunks of code before passing them on to the next model, just as in natural evolution. Mitchell says these models have applications in engineering, big data, drug design, banking, and ever more realistic computer graphics and animations.

Unlike biological evolution, where individuals meet, mate, and pass on useful traits that best fit the environment, computer-based evolution starts with a goal or a set of rules. Then the computer generates millions, even billions, of models to try to meet those goals. The ones that solve part of the problem or meet some aspect of the goal have a higher likelihood of passing on that relevant code to the next generation.

Robotics researchers have employed this approach as well. Josh Bonguard at the University of Vermont modeled robots that learned to evolve walking. Columbia University’s Hod Lipson used this approach to simulate machines that learned how to crawl on a table. He spun off a company called Nutonian that allows scientists to input their data, and then the program evolves equations until one explains the data. That equation could allow researchers to optimize designs, model what might happen in the future, or show how changes in one part of a system might affect the final result. “It can be used anywhere,” Lipson says, “from finance to rainfall in the Amazon.”

“This approach—reverse engineering—it’s like a Russian spy movie,” says Johannes Jaeger, a developmental geneticist and scientific director of the Konrad Lorenz Institute for Evolution and Cognition Research in Austria. “You have some kind of gadget, like in the Cold War, some kind of Russian technology. You don’t know what it does. Here, you have an organism, you don’t know how it works, and you’re trying to infer the networks that make the pattern that you see in animals.” Jaeger began working in this field more than a decade ago and has used such algorithms to model the genetic network that created a segmented body pattern in fruit flies.

But none of the models are as complex as the development of an entire creature’s shape.

Billions of Experiments

When Levin first proposed his modeling project a few years ago, his colleagues in biology found the proposal absurd. “Pretty much nobody I talked to thought it was going to work,” Levin says. His critics had two overarching reactions. Some thought it would be impossible to find any model that worked; biologists would say, he said: “ ‘You’re telling me this program is going to take random models and by recombining random changes to random models you’re going to find the right model? That’s ridiculously impossible.’ ” Levin disregarded that criticism. That was how evolution had worked, he thought, and computers finally seemed powerful enough to try.

The second criticism he heard was that they’d find many models that explained the data, maybe 10, maybe 1,000. How would they know which one was the correct one? “In theory,” Levin says, “that was a possible outcome. But we didn’t have any. It’s actually very difficult to find a model that does what it needs to. I wasn’t worried. If we found more than one—fabulous.”

Levin hired post-doc Daniel Lobo to lead the flatworm modeling project, a computer science PhD who had worked with Hod Lipson, and whose research was also inspired by Johannes Jaeger. (Lobo now heads his own lab at the University of Maryland, Baltimore County.) Lobo had the right combination of computer expertise and interest in biology, and he’d written a paper about applying genetic algorithms to the evolution of shape that caught Levin’s attention. He’d used such algorithms to automatically design structures optimized for unmanned landings, such as those of the rovers sent to Mars, and Levin was impressed with the way Lobo combined a deep knowledge of the field with an interest in making it practical.

The first challenge was to take the more than 1,000 experiments that had been done on flatworm shape and create one language to describe those results. It’s a not insignificant challenge. Natural language, as opposed to computer code, is ambiguous, even in scientific papers. At the same time, the team had to decide what to encode. They didn’t need exact dimensions of a flatworm head, but they did need to include relative scale, for instance. Eventually, Lobo created a standardized language and a standardized mathematical approach that represents the shapes of the worm’s regions, its organs, and how they’re interconnected. The end result was a searchable database of results, which is now available to all flatworm biologists.

Next, Lobo designed the simulation itself, a virtual worm on which candidate models would test their results. The computer compares the results of the simulated experiments to the real-world results expected from the database. The models receive scores based on how well they predict the outcomes seen in flesh-and-blood flatworms. Those with high scores reproduce; those with bad scores are discarded.

Simulated flatworms can help anticipate the results of experiments before they happen.

After four years of work, they’d come up with a common language for scientific papers, distilled the most crucial aspects of the worm to put in a model, developed a simulator, and created the algorithm to find the shape model. Levin felt fairly sure that the project would work—but they needed one more piece of equipment. Common lab computers, no matter how powerful, can’t yet quickly process the massive computations needed to evolve an entire biological model, in effect replicating millions of years of evolution. They needed a supercomputer. So the team rented time on Stampede, the University of Texas supercomputer that can perform up to 10 quadrillion mathematical operations per second.

Levin says the first models performed terribly; they didn’t get anything right. But by the 100th generation, some of the models started to predict some of the correct results. By the 1,000th, the models were increasingly matching the real-world experimental results.

In the end, it took 6 billion simulated experiments, 26,727 generations of models, and about 42 hours of processing by the Stampede computer before the computer came up with one result. This was what they’d been waiting for: one model that could explain the 1,000 existing experiments that generate the head-trunk-tail pattern in a flatworm.

To test the model, the team introduced data from two papers on shape formation that had purposefully not been included in the dataset when developing the models. The model accurately explained the results of both papers.

So Levin and Lobo took it one step further. The model predicted experiments that had never been done before. The team tested those experiments in the real world, and they worked. The new results were published in May, 2016, describing the activity of a previously unknown gene that played a role in shape formation.

Intriguingly, the model suggests the existence of a second node that hasn’t yet been explained by current scientific knowledge; it could be a protein, it could be a particular chemical. “The computer knows there’s a product that should be there, that seems to be important. In a way, it’s predicting a product that we don’t yet know,” Lobo says.

From Silicon Models to Hard Data

The shape investigation a success, Levin and Lobo turned their attention to modeling disease. They started with melanoma—skin cancer—and did so by focusing on pigmentation cells in tadpoles.

They conducted experiments in which tadpoles were exposed to particular chemicals during their development. The tadpoles had no obvious chromosomal damage or genetic trigger, but for some of them, the chemicals would spark a change in all of their pigmentation cells. Those cells then turned metastatic and invaded tissues throughout their bodies. But for other tadpoles under exactly the same conditions, nothing changed. Could the algorithm determine why?

Graduate student Maria Lobikin conducted dozens of experiments, knocking out genes or applying drugs and determining what percentage of the tadpoles became hyper-pigmented. Then she combined those results with other research published in the last decade. The team followed the same approach, creating a standardized language to describe the experiments and using a supercomputer to evolve a model to understand how, in some tadpoles, under certain biological circumstances, the cells flipped to a hyper-pigmented, cancerous state.

The computer-generated model came quite close to the results of the existing experiments. The model predicted the results of all papers but one; perhaps the algorithm had even caught inaccurate data, a finding that was published in the journal Science Signaling in October 2015. “I said, you know what, go back and redo this experiment just to make sure, and sure enough, the data were slightly off,” Levin says. “It’s almost like a verification step. If it’s having trouble matching the results of one experiment, maybe the problem’s not the model, maybe the problem’s your data.”

More recently, they asked the model a question. Is there any way to create a scenario where only some of the pigment cells become cancerous, where it’s not an all-or-none response? The computer generated an unusual three-step combination of drugs. When the team tried the experiment suggested by the computer model, they were able to create the first partially-pigmented animals.

While the tadpole model is far from an ideal surrogate for human disease, Levin points out that this research supports what other scientists have claimed, that cancer is not always a result of specific DNA damage. Rather it may also be a systems disorder, where the exact right set of circumstances in the system generate conditions for cancer to grow.

The simulated experiments demonstrate how artificial intelligence can augment human abilities, both Lobo and Levin say. “I see it definitely not as a way to replace biologists,” Jaeger says with a laugh. He says it’s nearly impossible for humans to process all the relevant parameters to generate a model, but computer power can do what our brains simply can’t.

The duo see many more such experiments in the future. At his Baltimore lab, Lobo is now focusing on bacteria, as modeling the ways in which the microbes create different compounds could be useful for the field of synthetic biology. He’s also trying to reverse-engineer cancer tumors to attempt to discover the best possible treatments to cause them to collapse. Levin sees applications in many fields: drug development, regenerative medicine, and understanding metabolism and disease. (Levin recently was awarded one of the first two Paul Allen Frontiers Group grants, a $30 million grant over eight years, to support risky, unconventional research; these computer-generated models are only a portion of his lab’s research.)

In any case, employing computer algorithms to wrestle with yet unanswered questions in biology will, researchers say, only become more mainstream. Lipson says this approach is crucial: “We’re in a stage where biology is producing lots and lots of data,” he says. “But magically it’s not going to make sense out of nowhere. You need these types of systems to make sense of the data we have.” In other words, systems that mimic evolution—and might help us evolve solutions as well.