Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Story
As a data scientist, I want to perform anonymization operations on my data, so that I can prepare it for input to predictive analytics algorithms. I also want to be able to de-anonymize my data.
This feature is relevant especially given the recent GDPR policy:
https://eugdpr.org/
Proposed functionality:
- Create conversion table for anonymization.
- Create an anonymized version of a table.
- Create a deanonymized version of a table
Must be able to:
- anonymize multiple columns in a table
- datasets will still join correctly even on masked columns
- the aggregates on masked columns will match to the original
- add salt to hash function for better security
References
[1] PDL tools
http://pivotalsoftware.github.io/PDLTools/group__grp__anonymization.html
[2] General information on anonymization
https://en.wikipedia.org/wiki/Data_anonymization
[3] Blog on hashing
https://crackstation.net/hashing-security.htm