Pseudonymization with keyed-hash function in Python and AWS

Ruslan Korniichuk
3 min readApr 18, 2019

Salted-hash function is not enough to secure personally identifying information (PII)?! I can strongly recommend keyed-hash function as pseudonymization technique to be GDPR compliant.

Read more about pseudonymization hashing technique…

Table of Contents

Disclaimer: I do not represent my current/previous employers on my personal Medium blog.

Guide below covers main steps for GDPR compliance in Data Science area. Do not use this guide in operation or security area. Salt/key reuse is common mistake.

Step 1: AWS KMS

Navigate to https://console.aws.amazon.com/kms/ and click Create a key button:

Enter Alias (e.g. medium) and Description (e.g. Key for AWS Secrets Manager). Click Next button:

Provide tags like Team, Owner, and Impact. Click Next button:

Click Next button. Click Next button. Review policy and click Finish button:

Step 2: AWS Secrets Manager

Navigate to https://console.aws.amazon.com/secretsmanager/ and click Store a new secret button:

Select Other type of secrets secret type. Enter variable name for hash key (e.g. hash_key). Enter value for hash key (e.g. passwd). Select KMS key from Step 1 (e.g. medium). Click Next button:

Enter name (e.g. Medium) and description (e.g. Secret key for keyed-hash function). Provide tags like Team, Owner, and Impact. Click Next button:

Select Disable automatic rotation option and click Next button:

Review and click Store button:

Step 3: Python

Pseudonymize ruslan@korniichuk.com e-mail address with keyed-hash function in Python.

Import hashlib and json Python standard libraries:

import hashlib
import json

Install boto3 Python lib. Set up AWS Credentials and Region for Development or configure the AWS CLI. Import boto3 Python lib:

import boto3

Initialize email variable with ruslan@korniichuk.com value:

email = 'ruslan@korniichuk.com'

Get your secret (e.g. Medium), created in Step 2.

secretsmanager = boto3.client('secretsmanager')
response = secretsmanager.get_secret_value(SecretId='Medium')
secret_string = response['SecretString']
hash_key = json.loads(secret_string)['hash_key']

Use sha3_512() function from standard hashlib Python library. Add secret key to email address. Hash e-mail address with keyed-hash function:

sha3 = hashlib.sha3_512()
data = email + hash_key
sha3.update(data.encode('utf-8'))
digest = sha3.hexdigest()
print(digest)
'fab8b7051dfe55b84c702e24611b2bd7e4564f217eb43deb8292d1afc1548766b2000b2e67b9fac54bcb0598d410c34f3b0adb5deed122798d8bf8697eda4056'

See all parts of code in one file below:

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Ruslan Korniichuk
Ruslan Korniichuk

Written by Ruslan Korniichuk

Python Developer and Artificial Intelligence Engineer

No responses yet

Write a response