SPEAR Anonymization Techniques

10/6/2021

The insurance of user privacy is of utmost importance in the context of SPEAR. Therefore, in order to achieve anonymity of the exchanged data three well-known anonymization techniques are being reviewed for usage. These three techniques are k-anonymity, ℓ-diversity and group signatures. Group signatures enable the anonymous upload of data, while k-anonymity is used to verify the anonymity of the data and ℓ-diversity is used for group anonymization. In a group signature scheme, a group of users is formed, where each member of the group can sign a message on behalf of the group using their private key. A public group key is used to verify the signature of the message, but it is impossible to identify the signer using the public key. The group manager is responsible for adding and removing users and revealing the identity of the singer in case of legal disputes. Although the data are anonymously uploaded using group signature schemes, it is not sufficient to fully ensure anonymity. It is possible to identify the members of the group by matching the uploaded data, or by recognizing unique characteristics.

k-anonymity is property possessed by anonymized data, assuring that the data’s owner cannot be re-identified. It ensures that each record in a dataset has at least k-1 indistinguishable records, while it protects against identity disclosure. Finally, it does not provide sufficient protection against attribute disclosure. The two common methods for achieving k-anonymity are suppression and generalization.

ℓ-diversity, is a form of group-based anonymization that is used to preserve privacy in data sets by reducing the granularity of a data representation. The ℓ-diversity method was created to further k-anonymity by additionally maintaining the diversity of sensitive fields. The principle behind ℓ-diversity is that a q-block (a set of tuples in a published table T whose non-sensitive attribute values generalize to q) is ℓ-diverse if it contains at least ℓ ‘’well-represented ‘’ values for the sensitive attribute S. A table is ℓ-diverse if every q block is ℓ-diverse. The main advantage of ℓ-Diversity is that it provides privacy even when the data publisher does not know what kind of knowledge is possessed by the adversary. Additionally, the values of the sensitive attributes are well-represented in each group. In ℓ-diversity, each equivalence class must have both enough different sensitive values and those values must be distributed evenly enough. In each equivalence class, the entropy of the distribution of the sensitive values must be at least log(ℓ). In cases where some values are common, the table may have a very low entropy which leads to less conservative notion of ℓ-diversity. The limitations of ℓ-diversity can be observed in situations where a single sensitive attribute or very different degrees of sensitivity exist i.e. in a table where 1% of the patients are HIV positive and 99% are HIV negative. In addition to that, it is difficult to have a distinct 2-diversity in large records i.e 10000 records in total can be at most 10000*1%=100 equivalence classes [1].

Group signature methodology can be defined as the signing scheme proposed for groups which benefits by giving authority to a member in the team or group to sign instead of his team. In the group signature methodology only members of the group can sign messages and the receiver can verify that it is a valid group signature, but cannot discover which group member made it. Additionally, if necessary, the signature can be “opened”, so that the person who signed the message is revealed. Finally, in Group signature method the group manager plays the most crucial role since he is the one who both manages the group and can reveal the identity of the anonymized signer.

References

Machanavajjhala, A., Gehrke, J., Kifer, D., & Venkitasubramaniam, M. (2006). ℓ-Diversity: Privacy beyond k-anonymity. Proceedings - International Conference on Data Engineering, 2006, 24. https://doi.org/10.1109/ICDE.2006.1.