What do you think? Leave a respectful comment.

A combination image shows FBI sketches of an unknown individual known as the East Area Rapist/Golden State Killer describe...

DNA ancestry searches can now identify most white Americans. Here’s why that’s legally questionable

When Sacramento police captured Joseph James DeAngelo, accused of being the notorious “Golden State Killer,” their chief described the event as a “perfectly executed arrest.” The episode drew nationwide attention not only because it allegedly solved a cold-case string of murders and rapes, but due to how detectives settled on DeAngelo after 44 years.

The investigative team created a fake profile on GEDmatch, a public genealogy database, based on DNA from the crime scene. They probed the database for distant relatives, building a family tree with thousands of individuals—and one branch pointing to a 72-year-old retiree living in a Sacramento suburb. Deputies put DeAngelo under surveillance, waiting until he discarded something with his DNA, which matched their sample of the crime scene.

DeAngelo was an extraordinary case. But it turns out the police—or just about anyone—could use the same tactics to find the identities of millions of Americans, according to two studies published Thursday.

The first, released in Science Magazine, shows most Americans of European descent can already be found this way, merely because so many people’s relatives have shipped their DNA to genealogy databases like GEDmatch.

If your family hails from Europe and your DNA record is public—even if this record is anonymous because your name has been removed from it—these scientists can figure out who you are 60 percent of the time.

The second study, published in Cell, shows that customer profiles on these genealogy websites, regardless of their race, could be matched with DNA records stored in law enforcement databases. This cross-pollination undermines a common assumption that these forensic databases do not contain sensitive information, such as health data.

These studies expand what’s possible with genealogical searches, but employing such searches for law enforcement or other purposes could violate federal privacy laws. Here’s what you need to know.

Expanding the ‘Golden State Killer’ method

The first study was led by Yaniv Erlich, a computer scientist at Columbia University and chief science officer at MyHeritage, a genealogy website based in Israel.

Much like GEDmatch, MyHeritage allows its customers to upload DNA tests that they obtain from direct-to-consumer companies—like 23andMe or Ancestry.com—into a publicly searchable database.

In theory, people add their info to these databases, so they can find long-lost relatives. But, MyHeritage became concerned by the Golden State Killer events and their aftermath — since April, at least 13 cases were reportedly solved by finding suspects through relatives in genealogy databases. Erlich said MyHeritage wanted to quantify the success rate of catching criminals via their relatives, so that the company could build a strategy to prevent misuse of their customers’ genetic data.

With a database of more than 1 million individuals, one of the largest in the industry, Erlich said that MyHeritage “felt that we had a responsibility to our users and the commercial ecosystem in general.”

When you take a direct-to-consumer test, it typically scans about 700,000 locations in your DNA code, otherwise called your genome. Think of your genome as a railway line, and these locations as train stations. My genome has its own set of train stations, known as single nucleotide polymorphisms, and many overlap with my immediate family in a particular sequence.

Let’s say the first three stops on our genomes are Washington, D.C., Baltimore and Philadelphia. Meanwhile, my cousins are similar — Washington D.C., Baltimore and Pittsburgh — but not quite the same. Someone outside our family is totally off the tracks: New York City, New Haven and Boston.

“By quantifying the length of these identical segments, we can basically infer the genealogical relationship between people,” Erlich said.

Erlich’s team acquired a DNA profile that had been submitted anonymously to a research project. Within a day, the team found the person’s identity. Photo by Robert Brook/Science Photo Library/via Getty Images

Erlich’s team acquired a DNA profile that had been submitted anonymously to a research project. Within a day, the team found the person’s identity. Photo by Robert Brook/Science Photo Library/via Getty Images

Individual searches using this process of elimination could locate third cousins or even closer relatives for 60 percent of European Americans in their database, their study found. Within two to three years, due to the popularity and near-exponential growth of these ancestry databases, scientists could identify close to every European American in the United States using this method, Erlich said.

The search works best for European Americans in the U.S. because they are the biggest consumers of direct-to-consumer DNA tests. Likewise, family tree records for people of European heritage tend to be more comprehensive than the records for other ethnic groups like African Americans.

What’s significant about the paper is that it gives a concrete measurement of these databases’ reach, said Natalie Ram, a law professor at the University of Baltimore who specializes in the ethics of biotechnology. “It basically says that almost all of us are already findable.”

As a next step, Erlich’s team acquired a DNA profile that had been submitted anonymously to a research project. Within a day, the team found the person’s relatives in a genealogy database and built enough family trees to track down the identity of anonymous subject.

Many research projects, both public ones led by universities and private ones led by direct-to-consumer companies, remove names and aggregate genetic profiles so their subjects cannot be identified.

“They say that you can delete your data, but they might still keep a hold of your biological info,” Ram said. “In some instances, if you’ve consented to research, your DNA cannot be withdrawn if it’s already being used.”

But prior research showed that these profiles can be unscrambled. Erlich’s new findings suggest that many of these people — who might not want to be revealed — can be identified through genetic genealogy services.

Cleaning up forensic databases

The second study could be viewed as a necessary step in modernizing forensic DNA databases kept by law enforcement.

Rather than use our “train stations” (single nucleotide polymorphisms), such databases rely on a small set of DNA markers called non-coding microsatellites — basically building low-resolution genetic fingerprints for suspects.

These microsatellites, by design, cannot tell you very much about an individual person. For one, police use only 20 of these markers for each profile. Compare that to the 700,000 genetic markers used by genetic services like 23andMe, Ancestry.com, GEDmatch and MyHeritage, which can expose family connections, health status and even physical features like height.

In the past, law enforcement could not match genetic profiles in their forensic databases with DNA records stored in ancestry sites. Photo by KTSDESIGN/SCIENCE PHOTO LIBRARY/via Getty Images

In the past, law enforcement could not match genetic profiles in their forensic databases with DNA records stored in ancestry sites. Photo by KTSDESIGN/SCIENCE PHOTO LIBRARY/via Getty Images

Law enforcement forensics experts want to switch from their older microsatellite system to a newer system based on single nucleotide polymorphisms (which offers more data points), said Michael Edge, a population geneticist at University of California, Davis, who co-authored the second study.

Because so many people use genealogy services, Edge and his colleagues found law enforcement could already start making this switch. Some microsatellite markers land close enough to the “train stations” mentioned earlier that you could technically match 30 to 36 percent of people in forensic files to individuals in genetic genealogy databases, their study found.

“We’re not lawyers, but we thought as geneticists, it was our job to question the limits of this technology,” Edge said.

By the way, how are any of these searches legal?

Great question. Using genetic genealogy services like GEDmatch and MyHeritage to hunt down criminals raises the prospect of a violation of the Fourth Amendment, which protects against warrantless searches and seizures.

Except, as Ram expressed in a policy paper published in June, customers nullify their Fourth Amendment protections when they willingly share their DNA with direct-to-consumer genetic services or public genealogy websites. Police do not need permission to search platforms like GEDmatch, though they would need a warrant to probe a service like 23andMe.

Health privacy laws — such as the Genetic Information Nondiscrimination Act (GINA) and the Health Insurance Portability and Accountability Act (HIPAA) — restrict the uses of identifiable genetic information, but typically for employers and health providers. Genetic genealogy databases typically note that they are not involved in the health care business.

Ram questioned whether law enforcement would be able to continue their assertion that forensic DNA databases do not contain sensitive information in light of the second study, as well as the 2018 Supreme Court decision in Carpenter v. United States. That judgement outlawed warrantless searches of cellphone location data due to privacy concerns.

“This ruling is likely to apply more broadly to deeply sensitive information shared with a third party, such as genetic data,” Ram said. She said some state laws already restrict these types of searches but argued that broader regulations are needed.

Until they arrive, everyone should expect to see more investigations like the one used to find the “Golden State Killer” suspect, given that some companies are now offering to conduct forensic searches in genealogy databases.

“The use of DNA databases to identify family members who are not themselves in the database raises serious and troubling questions about genetic privacy,” Ram said. “Individuals who are not directly a part of a database are findable as a byproduct of biology, and not through any voluntary conduct of their own.”