Pseudonymization with keyed-hash function in Python and AWS
Salted-hash function is not enough to secure personally identifying information (PII)?! I can strongly recommend keyed-hash function as pseudonymization technique to be GDPR compliant.

Read more about pseudonymization hashing technique…
Table of Contents
Disclaimer: I do not represent my current/previous employers on my personal Medium blog.
Guide below covers main steps for GDPR compliance in Data Science area. Do not use this guide in operation or security area. Salt/key reuse is common mistake.
Step 1: AWS KMS
Navigate to https://console.aws.amazon.com/kms/ and click Create a key
button:

Enter Alias
(e.g. medium
) and Description
(e.g. Key for AWS Secrets Manager
). Click Next
button:

Provide tags like Team
, Owner
, and Impact
. Click Next
button:

Click Next
button. Click Next
button. Review policy and click Finish
button:

Step 2: AWS Secrets Manager
Navigate to https://console.aws.amazon.com/secretsmanager/ and click Store a new secret
button:

Select Other type of secrets
secret type. Enter variable name for hash key (e.g. hash_key
). Enter value for hash key (e.g. passwd
). Select KMS key from Step 1 (e.g. medium
). Click Next
button:

Enter name (e.g. Medium
) and description (e.g. Secret key for keyed-hash function
). Provide tags like Team
, Owner
, and Impact
. Click Next
button:

Select Disable automatic rotation
option and click Next
button:

Review and click Store
button:

Step 3: Python
Pseudonymize ruslan@korniichuk.com
e-mail address with keyed-hash function in Python.
Import hashlib
and json
Python standard libraries:
import hashlib
import json
Install boto3 Python lib. Set up AWS Credentials and Region for Development or configure the AWS CLI. Import boto3 Python lib:
import boto3
Initialize email
variable with ruslan@korniichuk.com
value:
email = 'ruslan@korniichuk.com'
Get your secret (e.g. Medium
), created in Step 2.
secretsmanager = boto3.client('secretsmanager')
response = secretsmanager.get_secret_value(SecretId='Medium')
secret_string = response['SecretString']
hash_key = json.loads(secret_string)['hash_key']
Use sha3_512()
function from standard hashlib
Python library. Add secret key to email address. Hash e-mail address with keyed-hash function:
sha3 = hashlib.sha3_512()
data = email + hash_key
sha3.update(data.encode('utf-8'))
digest = sha3.hexdigest()
print(digest)'fab8b7051dfe55b84c702e24611b2bd7e4564f217eb43deb8292d1afc1548766b2000b2e67b9fac54bcb0598d410c34f3b0adb5deed122798d8bf8697eda4056'
See all parts of code in one file below: