153 — Blind De-anonymization Attacks using Social Networks
Even if a dataset undergoes anonymization steps — such as removing names, social security numbers, and other sensitive information — it’s often still possible to de-anonymize the data using external datasets. As a simple example, if you remove the real names of users but leave in the usernames, it’s often possible to find real names by looking up reused usernames elsewhere on the internet.
Many of the existing de-anonymization systems, however, require “seeds”: In other words, the attacker should know something about someone in order to anchor the association of the anonymized graph with the constructed, de-anonymized one. This paper introduces a way to automatically generate the de-anonymized graph without requiring these manually placed “seeds.”
The authors accomplish this by implementing a pseudo-relevance feedback support vector machine (PRF-SVM), a machine learning approach that optimizes for the similarity of structure between the overlaid graphs $G_a$ (anonymized) and $G_u$ (the auxiliary graph). Because this can be run without any prior knowledge about the graphs, it is considered “seed-free,” or “blind.”