For the past several years, a highly technical but very important debate has raged among privacy experts: How easy is it to identify an individual from a collection of data that supposedly lacks personally identifiable information? Those who say that it is relatively easy argue for greater restrictions on the release of data and more stringent efforts to anonymize it. Their opponents argue that worrying too much about the risk of “re-identification” deprives researchers of valuable data in fields such as epidemiology.
A centrepiece of the debate is a 1997 incident in which Latanya Sweeney, then an MIT graduate student and now a computer scientist at Harvard, identified the medical records of Massachusetts Governor William Weld from information publicly available in a state insurance3 database. The incident led to important changes in privacy rules for medical information, especially under the Health Insurance Portability and Accessibility Act (HIPAA), and 15 years later it is still influencing the debate over data privacy.