I didn’t get a post in yesterday as I had wanted, but it is for good reason. The past few days have been packed wall to wall with activities both at work and at home and yesterday just completely eluded me. One of the activities has been attending a conference on risk and privacy, two concepts that I spend a great deal of time thinking about. One of the presenters spoke about de-anonymizing small data sets. I know it doesn’t sound really interesting yet, but stay with me and I think you’ll be intrigued.
As a data scientist, this presenter spends a great deal of time looking at how machine learning algorithms can be used to identify patterns we haven’t seen previously. As part of his research, he became intrigued with the idea of hospitals and doctors offices redacting patient records and providing research data that includes age, gender, diagnosis, location, and date of birth. What he discovered is that, with very minimal effort it is possible to de-anonymize those records and determine which individuals were seen for what medical issue and when. He then walked us through an example of where, through the analysis of just 6-8 movies a person watches on Netflix, you could determine identity data, when correlated with other social media information.
To me this type of activity is both intriguing and, frankly, frightening. As he described different data analytic projects at his university, I began to become convinced all over again that there is no true anonymity anymore, because we all leave unique digital trails across our environment every day. In fact, we leave so many unique trails that just the smallest number of data points are really necessary to identify us. Whether it be the style of your writing, the routes you take to different locations, the food you purchase, the binge-watching you do, or the search you did on a specific item, your digital fingerprint becomes more and more defined making you less and less anonymous and potentially more susceptible to abuses of your right to privacy.
This was demonstrated pretty clearly in the CNN threat to “dox” the reddit user who posted a video of the President body-slamming a wrestler with the CNN logo over his face. Through the correlation of a number of small pieces of information, CNN was able to identify the user in question, at which point, they threatened to release his information online until his subsequent apology. From their own report, they issued this statement.
“CNN is not publishing “HanA**holeSolo’s” name because he is a private citizen who has issued an extensive statement of apology, showed his remorse by saying he has taken down all his offending posts, and because he said he is not going to repeat this ugly behavior on social media again. In addition, he said his statement could serve as an example to others not to do the same.