Pseudonymization with keyed-hash function in Python and AWS

Ruslan Korniichuk
3 min readApr 18, 2019

Salted-hash function is not enough to secure personally identifying information (PII)?! I can strongly recommend keyed-hash function as pseudonymization technique to be GDPR compliant.

Read more about pseudonymization hashing technique…

Table of Contents

Disclaimer: I do not represent my current/previous employers on my personal Medium blog.

Guide below covers main steps for GDPR compliance in Data Science area. Do not use this guide in operation or security area. Salt/key reuse is common mistake.

Step 1: AWS KMS

Navigate to https://console.aws.amazon.com/kms/ and click Create a key button:

Enter Alias (e.g. medium) and Description (e.g. Key for AWS Secrets Manager). Click Next button:

Provide tags like Team, Owner, and Impact. Click Next button:

Click Next button. Click Next button. Review policy and click Finish button:

Step 2: AWS Secrets Manager

Navigate to https://console.aws.amazon.com/secretsmanager/ and click Store a new secret button:

Select Other type of secrets secret type. Enter variable name for hash key (e.g. hash_key). Enter value for hash key (e.g. passwd). Select KMS key from Step 1 (e.g. medium). Click Next button:

Enter name (e.g. Medium) and description (e.g. Secret key for keyed-hash function). Provide tags like Team, Owner, and Impact. Click Next button:

Select Disable automatic rotation option and click Next button:

Review and click Store button:

Step 3: Python

Pseudonymize ruslan@korniichuk.com e-mail address with keyed-hash function in Python.

Import hashlib and json Python standard libraries:

import hashlib
import json

Install boto3 Python lib. Set up AWS Credentials and Region for Development or configure the AWS CLI. Import boto3 Python lib:

import boto3

Initialize email variable with ruslan@korniichuk.com value:

email = 'ruslan@korniichuk.com'

Get your secret (e.g. Medium), created in Step 2.

secretsmanager = boto3.client('secretsmanager')
response = secretsmanager.get_secret_value(SecretId='Medium')
secret_string = response['SecretString']
hash_key = json.loads(secret_string)['hash_key']

Use sha3_512() function from standard hashlib Python library. Add secret key to email address. Hash e-mail address with keyed-hash function:

sha3 = hashlib.sha3_512()
data = email + hash_key
sha3.update(data.encode('utf-8'))
digest = sha3.hexdigest()
print(digest)
'fab8b7051dfe55b84c702e24611b2bd7e4564f217eb43deb8292d1afc1548766b2000b2e67b9fac54bcb0598d410c34f3b0adb5deed122798d8bf8697eda4056'

See all parts of code in one file below:

--

--