Saturday, March 04, 2006

Hash the DNA

Last year seven EU member countries, including Finland, agreed to share access to DNA, vehicle, and fingerprint databases, and I've heard several suggestions to construct a DNA sample database of all Finns. This worries me.

Gathering DNA is typically justified by referring to 9/11 or the relatively small threat of global terrorism, but the logic and evidence behind this justification is as solid as the plans of evidence ... ooops!, I mean evidence of plans of WMD in Iraq. But it is definitiely true that DNA, just like fingerprints, is a valuable forensic tool in identifying the victim, the suspect and connecting him physically to a crime scene. Not only is DNA a tool for positive evidence, more importantly it has been used in several cases to redeem a falsely convicted person.

But there is a significant distinction: DNA carries much more information about the person than fingerprints, surveillance camera footage, etc. Based on DNA one can, or will in future be able to, determine much of one's pedigree, race, health, appearance, personality, physical and mental capabilities, among others. Most of this has very limited forensic value, but is more interesting to for example companies hiring employees, selling life insurance, or marketing departments trying to find customers most easily addicted to a given product. This is a much bigger threat to Joe Average's everyday life than occasional acts of terrorism. And there's little doubt security agencies share their information with private companies - in fact economic espionage in its various forms was the primary task of many security agencies from the end of the cold war at least up to 9/11, now it's probably the second most important task.

I suggest we could reconcile the desire of privacy and the need of evidence by sharing cryptographic hashes of DNA instead of the actual genome data. The hash would still serve for identification, but would be of no value to anything else. In fact, storing identified DNA data of any other human except yourself (and possibly your immediate family) should be outlawed as comparable to having, say, child pornography or downloaded music on your hard disk.

I know there's a technical challenge: DNA data is rarely perfectly correct and complete, and hence traditional cryptographic hashing is not possible. If that turns out to be impossible, the second best alternative would be to never ship identified DNA data - instead evidence would be shipped to few trusted repositories for identification, and destroyed immediately after a positive match has been found.


Anonymous Anonymous said...

It is currently very, very expensive to sequence ("read") a person's full DNA - this is essentially what was done by the two famous genome projects. The DNA found at a crime site and the DNA of a suspect are compared by inspecting a small number (9?) of markers (locations of the genome with a high degree of polymorphism among humans). If they are going to create a database of people's DNA, these markers are most likely what they will store; of course, they could store physical DNA samples, but that wouldn't be a database any more.

Blogger cessu said...

Naturally there wouldn't be any point in sequencing markers with little or no genetic variation between humans - I guess it is publically already known that most people are humans instead of frogs, for example. But that doesn't change anything: unless we know these interesting markers in no way affect our health and character (and thus be of no value in wrong hands), then I definitely suggest cryptographically hashing them before any storage or exchange.

Most public discussion doesn't address whether only the sequenced markers or real DNA samples would be stored. On some occasions I've interpreted some comments as indicating storing samples, because that would be of higher value for example when studying genetic diseases.

I don't think it is clear that something called a "database" indicates storage of only these markers. In the Finnish media also words corresponding to "registry" have been used.

