|1
|2
Are we being too free with our genetic information? What if you started receiving targeted ads for Prozac for the depression risk revealed by your publicly accessible genome? As increasing amounts of genetic information is placed online, many researchers believe that guaranteeing donors' privacy has become an impossible task.
The first major genetic data collection began in 2002 with the International HapMap Project – a collaborative effort to sequence genomes from families around the world. Its aim was to develop a public resource that will help researchers find genes associated with human disease and drug response.
While its consent form assured participants that their data would remain confidential, it had the foresight to mention that with future scientific advances, a deliberate attempt to match a genome with its donor might succeed. "The risk was felt to be very remote," says Laura Lyman Rodriguez of the US government's National Human Genome Research Institute in Bethesda, Maryland.
Their fears proved to be founded: in a paper published in Science this week, a team led by Yaniv Erlich of the Whitehead Institute in Cambridge, Massachusetts, used publicly available genetic information and an algorithm they developed to identify some of the people who donated their DNA to HapMap's successor, the 1000 Genomes Project.
Anonymity not guaranteed
Erlich says the research was inspired by a New Scientist article in which a 15-year-old boy successfully used unique genetic markers called short tandem repeats (STRs) on his Y chromosome to track down his father, who was an anonymous sperm donor. Erlich and his team used a similar approach.
First they turned to open-access genealogy databases, which attempt to link male relatives using matching surnames and similar STRs. The team chose a few surnames from these sources, such as "Venter",and then searched for the associated STRs in the 1000 Genomes Project's collection of whole genomes. This allowed them to identify which complete genomes were likely to be from people named Venter.
Although the 1000 Genome Project's database, which at last count had 1092 genomes, does not contain surname data, it does contain demographic data such as the ages and locations of its donors. By searching online phonebooks for people named Venter and narrowing those down to the geographic regions and ages represented in the whole genomes, the researchers were able to find the specific person who had donated his data.
In total, the researchers identified 50 individuals who had donated whole genomes. Some of these were female, whose identity was given away because of having the same location and age as a known donor's wife.
Matter of time
Before publishing their findings, the team warned the US National Institutes of Health (NIH) and other institutions involved in the project about the vulnerability in their data. Rodriguez says that they had been anticipating that someone would identify donors, "although we didn't know how or when".
To prevent Erlich's method from being used successfully again, age data has been removed from the project's website. Erlich says that this makes it difficult, although not impossible, to narrow the surnames down to an individual.
"The genie's out of the bottle," says Jeffrey Kahn of Florida State University in Tallahassee. "It's a harbinger of a changing paradigm of privacy." A cultural zeitgeist led by companies such as Facebook has led to more information sharing than anyone would have thought possible back in 2002 when HapMap first began, he says.
Recurring problem
This is not the first time genome confidentiality has been compromised. When James Watson made his genome public in 2007, he blanked out a gene related to Alzheimer's. But a group of researchers successfully inferred whether he carried the risky version of this gene by examining the DNA sequences on either side of the redacted gene.
While someone is bound to find another way to identify genetic donors, says Rodriguez, the NIH believes it would be wrong to remove all of their genome data from the public domain. She says that full accessibility is "very beneficial to science", but acknowledges that the project needs to strike a careful balance between confidentiality and open access.
It is especially pertinent, says Kahn, because genetic data does not just carry information from the person from whom it was taken. It can also reveal the genetic details of family members, some of whom might not want that information to be public. A relative's genome might reveal your own disease risk, for example, which you might not want to know or have an employer learn of. While laws prohibit health insurers and employers from discriminating against people based on their genetic data, it would not be difficult to give another reason for denying you a job.
An individual's relatives could not prevent that individual from learning about themselves, says Rodriguez, but researchers should encourage would-be genome donors to discuss the risks and benefits with their families.
|1
|2
If you would like to reuse any content from New Scientist, either in print or online, please contact the syndication department first for permission. New Scientist does not own rights to photos, but there are a variety of licensing options available for use of articles and graphics we own the copyright to.
All comments should respect the New Scientist House Rules. If you think a particular comment breaks these rules then please use the "Report" link in that comment to report it to us.
If you are having a technical problem posting a comment, please contact technical support.