K-anonymity.

1.02K views

I have read a bunch about this, but I am not a data scientist, mathematician, or programmer. I am doing a report on de-identification and this is one of the methods I am researching, but I cannot for the life of me wrap my head around it.

In: Technology

2 Answers

Anonymous 0 Comments

A table has K-anonymity when any row (minus suppression) is part of a set of at least k identical rows. Often this is done by replacing values that can’t helpfully be suppressed, like age or weight, with a range. If you’re doing a study where weight is important, and after you suppress the obvious unique parameters like name you still have less than the k which makes your study’s regulators OK with publishing the data, you might replace weights with ranges; 83 => 80-89. This will make rows that weren’t identical before (like 83 and 86) now identical, increasing k.

Anonymous 0 Comments

Say you have a database of people: age, gender, city of birth, postal code, occupation. If there’s a row in that database where a person has unique values in all those columns, knowing all the columns would 100% be able identify the person with all that info. Now say there are 2 women who share an age, city of birth, postal code and occupation, you’d have a 50% chance of identifying them knowing all those columns. You can calculate these probability based on the commonalities of all the columns.
Usually you’d set a value for the probability of identifying a person, say 25%. Now any cell that can be used to narrow a persons identity with more than 25% probability, you mask. So back to the 2 women, you can mask their city of birth for example so that you’d only have the other 3 columns to use to try and identify them – this would decrease the probability of finding them to be lower than 50% since there are others who share those 3 columns – it’s no longer only 2 people with those 3 columns in common.