Breaking the Human Genome Code
[Sorry, the video for this story has expired, but you can still read the transcript below. ]
JIM LEHRER: Now today’s news on mapping the human genetic code. Susan Dentzer of our health unit begins. The unit is a partnership of the Henry J. Kaiser Family Foundation.
SUSAN DENTZER: Today’s announcement by a private company, Celera Genomics, that it had completed the sequencing of one human being’s genome caught many by surprise. Celera’s president, J. Craig Venter, made the announcement at a congressional hearing.
J. CRAIG VENTER: This is a very exciting milestone in Celera’s history and in science. We’re going to have now the complete repertoire of human genes, which is the beginning of the next phase of science.
SUSAN DENTZER: With that, Celera seemed to have passed a major milestone in its effort to win a scientific horse race. For the past two years, that race has pitted Celera against the Human Genome Project — a consortium of university researchers funded primarily by the U.S. National Institutes of Health and a private British charity, the Wellcome Trust. Both have been rushing to decode the human genome, or the sum total of all the DNA that makes up a human being. Sometimes called “The Book of Life,” the genome consists of roughly 3 billion chemical units of DNA arranged along 23 pairs of chromosomes that contain an estimated 80,000 or more individual genes. Identical copies of all these genes are contained in almost every cell of the human body, where they serve as the recipe for producing the proteins that perform the body’s essential work. Knowing how all the chemical units of DNA are arranged — in other words, the sequence — is a monumental breakthrough in understanding how the entire body really works. David Botstein heads the Genetics Department at Stanford University Medical School.
DAVID BOTSTEIN: It’s like having the entire parts list for some project that you want to put together, a bicycle or we used to have kits for making amplifiers. So every transistor, every wire, every resistor, every little piece is there and accounted for. So, for example, if one of them goes wrong, you can actually figure out which one is the one that’s missing or isn’t doing what it’s supposed to. So it’s really major progress.
SUSAN DENTZER: Yet at the same time, genomics experts quickly raised red flags about the significance of Celera’s announcement. In reality, they said, Celera hadn’t truly determined the order of the entire genome. Instead, the company had identified the chemical composition of millions of individual chunks of DNA – each about 600 units long — but it hadn’t gone the final important step of understanding how all these jigsaw pieces really fit together as chromosomes. Celera agreed and said that critical step would take another three to six weeks. Some of the company’s critics also charged that Celera was trying to steal some limelight from the public Human Genome Project. The Project recently announced that it had completed the actual sequence of three-fourths of the genome and would finish its so-called “rough draft” by late May. Unlike the private Celera, which hopes to earn substantial profits from much of its sequencing information, the public project has been posting its findings on the Internet for the benefit of genetics researchers around the world.
JIM LEHRER: And to Margaret Warner.
MARGARET WARNER: And with me is Craig Venter, the president and chief scientific officer of Celera Genomics.
Welcome, Mr. Venter.
CRAIG VENTER: Thank you.
MARGARET WARNER: At the risk of repeating what’s in the set-up, this is a hard concept to get. Let me see if I can get this. We’re talking about the human genome is the whole set of human genes in the body, 80- or 100,000.
CRAIG VENTER: All are chromosomes.
MARGARET WARNER: All arranged on these 23 chromosomes, all in every single cell almost in the body.
CRAIG VENTER: That’s correct.
MARGARET WARNER: So what is it that your company has done?
CRAIG VENTER: The difficulty with trying to determine the sequence of every one of our chromosomes is the current technology gives us about 500 letters at a time. So the question is, how do you sequence something that’s 100 million letters long?
MARGARET WARNER: When you say “sequence,” do you mean identifying all the bits or arranging them in the right order?
CRAIG VENTER: Determining the exact genetic code of something that is 100 million letters.
MARGARET WARNER: Of all these bits.
CRAIG VENTER: There’s two different approaches that have been taken: the public approach has been to take — to map small segments, they’re called back clones, bacterial artificial chromosomes, they’re about 100,000 letters long, and to line those up along the chromosomes, and then to sequence the individual clones. What we showed was the first three genomes in history that we did at the Institute for Genomic Research is we could use mathematical algorithms and large computing capacity to solve the jigsaw puzzle of whole genomes. We did this first with the key pathogens from influenzae that causes meningitis in children, ear infections. The challenge was to see if that could be done on larger chromosomes. So we break chromosomes down, in the case of the human genome, into millions of pieces and we determine the actual sequence, the actual genetic code of those pieces.
MARGARET WARNER: And that’s where you are now.
CRAIG VENTER: That’s where we are now. That’s the phase that just got finished, a major phase of all the sequencing. And as it happens with the different projects at different phases, with the whole genome, you do everything at once. You do all the sequencing. And over the next few weeks we’ll be feeding this to our super computer and trying to assemble it until we get the complete sequence of each of the chromosomes.
MARGARET WARNER: Let me read you one description that was in the Washington Post — how they describe it. They said essentially that you sheered this genetic material into bits tiny enough to be read by a computer or machine.
CRAIG VENTER: By the machine.
MARGARET WARNER: That’s what you’ve done. If we use the analogy of the man in the set-up, it’s as if you now have all the parts of that transistor or whatever out there. And then what you’re going to do is put them in the order that makes sense, that makes….
CRAIG VENTER: That’s right. It’s like a giant jigsaw puzzle, only this jigsaw puzzle has about 30 million pieces to it. Each of those pieces are strings of letters of genetic code 500 or 600 letters long. You can see why people thought it was impossible to do and to have computers — this approach — solve it. Last week we published the largest genome in history, that of the fruit fly. They said that was impossible to do with this technique. It worked fantastically. And that was actually harder to do than it looks like the human is in the assembly phases.
MARGARET WARNER: Now is this shotgun strategy as you call it and everyone else calls it, is it as accurate or complete as the approach taken by the publicly-funded Genome Project?
CRAIG VENTER: In fact, it’s more accurate and more complete because we have to have the highest standard of sequence quality in order to do these massive computer assemblies. So if we were sequencing smaller clones, we could accept slightly lower quality sequences. We have to have nearly perfect sequences in the raw data or the computer, the biggest super computer built by a civilian group, couldn’t deal with the information. We published — between Celera and the work I did at the Institute for Genomic Research now — 15 complete chromosomes, including the one for tuberculosis, malaria, cholera, meningitis, the drosophila genome. These are the most accurate sequences that are out there; they’re of outstanding quality.
MARGARET WARNER: Of what use… Is it fair to call this right now, you call it a milestone or a breakthrough — if all the pieces are still out there and they haven’t been reassembled, is it of any use yet?
CRAIG VENTER: Yes, it is. We have lots of pharmaceutical subscribers that are — in fact — even using the pieces and have been while we’ve been generating to make key medical breakthroughs. There’s some very exciting work now already going into the drug development process to help with deal with key diseases. But in a matter of weeks it will be put together in the long strings of sequences that we know as the chromosomes. That’s when the exciting phase will begin. That’s where we begin to really interpret the genetic code, where we define how many genes there are. Nobody knows. We still guess, you know, it’s 80 to 100,000 genes. Why do we guess? Because nobody knows. That’s the phase we’ll begin in a few weeks, of deciphering this information, knowing for the first time the complete repertoire of genes.
MARGARET WARNER: Now, you mentioned that your subscribers could get this information. And that does raise the issue about how accessible your data is. Can only people who have paid to subscribe get this data?
CRAIG VENTER: At the present time, yes. I mean that’s, you know, we’re not using taxpayer money to sequence the human genome. We’re using private investment capital to do this. We couldn’t wait for the other effort. It was not going to be done until the end of this decade until Celera announced what it was doing. We spurred forth the government program and the Celera program. It’s good for everybody. We’re going to get the genome much faster that way. We just published the drosophila genome; that is on the Internet. Anybody can download.
MARGARET WARNER: It’s the fruit fly one.
CRAIG VENTER: The fruit fly. As soon as the human genome is of the same quality standard that we just published for the fruit fly genome, it will be in the same place. It can be downloaded and used by anybody on the Internet. There’s different styles in science. Most scientists don’t publish their findings until they’re really complete and accurate. The Wellcome Trust made the U.S. Government labs dump its data nightly because that’s what they were doing. That’s not what most scientists do with their information, putting out the raw data. They interpret it. They make sure it’s accurate. I’m not complaining about what the government is doing. In fact, it’s terrific. We’re helping make good use out of that information like most of the other groups in the world are but Celera is going to publish the complete finished thing when it’s done to that quality.
MARGARET WARNER: So though President Clinton and Prime Minister Blair three weeks ago called on companies like yours — I imagine they were thinking of your company — to put out the raw data just the way the public does. That doesn’t faze you?
CRAIG VENTER: They changed that two days ago. They said they weren’t really talking about the private company that’s sequencing the genome. There’s only one and it’s Celera. It’s not like there’s a long string of them. We are making our data… It’s pretty extraordinary what we’re doing with our own investment capital. We’re sequencing the genome for the world for free, not at taxpayer expense, and we’re giving it to the world because it’s going to drive discovery that will make people need our databases and our software and our computer capacity much more than they would now.
MARGARET WARNER: All right. But then at what point… How do your investors recover their investment? You’re obviously going to have to make serious money off this. What do you charge for?
CRAIG VENTER: The general public and scientists actually don’t usually use the kind of database services that people in the news industry, lawyers use — Lexis-Nexis, Bloomberg, we say we’re going to be the Bloomberg’s of the scientific world; somebody from Bloomberg told me their goal is to become the Celera of the business world.
These are big, complex databases that software tools are put together to help people interpret it. Instead of having to go gather all that information yourself, you can sit down at your computer terminal and generate whatever information you want to very quickly out of this vast data source. We’re doing the same for genomics, only it will be a far bigger database. Celera’s database is already 80 terabytes of data. That’s five or six time the national Library of Congress. It’s a massive amount of information. Bloomberg’s has I think over $1 billion a year revenue. That’s how you get it is by subscriptions that people want to help in interpreting the information, it’s not because we’re keeping it secret.
MARGARET WARNER: Somebody… We were discussing this, this afternoon, compared this. The fact that your company, little known, fairly small, took on this huge project of real basic research and took on — and said you were going to race the U.S. Government project, it would be a private company saying they’re going to beat NASA getting on the Moon or something. Is that more possible today and if so why? Is this a new model for a new approach to really kind of basic research?
CRAIG VENTER: It was an exciting set of opportunities. Our parent company, P.E. Biosystems, are the ones that developed the new sequencing technology, the instruments, that in fact the federal funded scientists use as well as ourselves. So, you know, basically we are combined P.E. Biosystems and Celera, a tool company. But major breakthroughs in technology, you can go back 400 years. Galileo had a telescope. That changed science and changed our view of the world. Having new scientific instruments like Mike Hunkapeler developed from P.E. Biosystems gave us a new tool set that we could approach genomics with. That coupled with the new algorithms to deal with the massive compute structure and the high end computing… We’re dealing with the limits of computing power right now. We’re waiting for the next generation computers already because Biology’s needs are greater than any other in computing right now. All these came together: High-end computing, exciting new automated instruments, and new mathematical algorithms, altogether to solve the genetic code.
MARGARET WARNER: With the potential pay-off of real pharmaceuticals and everything else in the end.
CRAIG VENTER: And understanding everything we can about our new scientific heritage.
MARGARET WARNER: All right. Well, thank you.