Easy-to-use GDPR guide for Data Scientist. Part 2/2

De-identification vs Anonymization

De-identification is the process used to prevent a person’s identity from being connected with information.


Removing personally identifying information (PII) is a form of anonymization.

Masking or suppression

Masking or suppression is anonymization technique allows an important/unique part of the data to be hidden with random characters or other data.


Generalization is anonymization technique replaces individual values of fields with a broader category.


K-anonymization is anonymization technique combines generalization and masking/suppression.


Scrambling is anonymization technique involves a mixing or obfuscation of characters.


Data blurring is anonymization technique uses an approximation of data values to render their meaning obsolete and/or make it impossible to identify individuals.


GDPR defines pseudonymization as the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information.


Tokenization is pseudonymization technique substitutes a sensitive data element with a non-sensitive equivalent, referred to as a token, that has no extrinsic or exploitable meaning or value.


Hashing is pseudonymization technique returns a fixed size output from an input of any size and cannot be reversed.


Encryption with secret key is pseudonymization technique. In this case, the holder of the key can trivially re-identify each data subject through decryption of the dataset because the personal data are still contained in the dataset, albeit in an encrypted form.

Key deletion or crypto-shredding

Key deletion or crypto-shredding pseudonymization technique may be equated to selecting a random number as a pseudonym for each attribute in the database and then deleting the correspondence table.




