Unique Identifiers

As I was completing the Collaborative Institutional Training Initiative program for work the other day, I came across the below quote.
Latanya Sweeney (2000), director of the Data Privacy Lab at Harvard University, has demonstrated that 87 percent of Americans can be uniquely identified by only three bits of demographic information: five-digit zip code, gender, and date of birth.

I found this percentage surprisingly high and went looking for the actual paper, published in 2000 and supported in part by Carnegie Mellon University and U.S. Bureau of Census.

Reviewing the paper further, this 87 percent drops precipitously to 0.04 percent when only year of birth information is included in analysis.
Experiment D reported that 0.04% of the population in the United States had characteristics that were likely made them unique based only on {5-digit ZIP, gender, Year of birth}.

For privacy advocates, this might be comforting. However, these experiments were conducted before the advent of Facebook.

Every morning, I routinely wish people a Happy Birthday via Facebook. In order to send these wishes, Facebook must provide me their day and month of birth, although not necessarily the year; in this way, the birth data attribute is flipped.

While there's no study I could find instead using these identifiers {5 digit ZIP, gender, day and month of birth}, I would expect a sharp increase in unique identification from Experiment D.

In short, de-identified data is not anonymous. Therefore, we are no longer anonymous. For better or for worse, most Americans can be uniquely identified with just 3 pieces of publicly available information. And this is only the beginning.

Popular posts from this blog

New Year’s Resolutions for 2018

Blog Name Change

Having Dogs