Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
1.7.1
-
None
Description
There has been discussion on the mailing lists regarding the use of the existing HashAttribute processor and the introduction of CryptographicHashAttribute. The behavior of these processors does not currently overlap, but CHA can be made to include HA's functionality in the documented use cases (to generate a unique identifier over a set of specific attributes and values), if not its exact behavior.
From discussion
Given your well-described use cases for HA, I think I may be able to provide that in CHA as well. I would expect to add a dropdown PD for “attribute enumeration style” and offer “individual” (each hash is generated on a single attribute), “list” (each hash is generated over an ordered, delimited list of literal matches), and “regex” (each hash is generated over an ordered list of all attribute names matching the provided regex). Then the dynamic properties would describe the output, as happens in the existing PR. Maybe a custom delimiter property is needed too, but for now ‘’ could be used to join the values. I’ll write up a Jira for this, and hopefully you can both let me know if this meets your requirements.
Example:
Incoming Flowfile
attributes: [username: “alopresto”, role: “security”, email: “alopresto@apache.org”, git_account: “alopresto”]
CHA Properties (Individual)
attribute_enumeration_style: “individual”
(dynamic) username_sha256: “username”
(dynamic) git_account_sha256: “git_account”Behavior (Individual)
username_sha256 = git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) = 600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23
Resulting Flowfile (Individual)
attributes: [username: “alopresto”, role: “security”, email: “alopresto@apache.org”, git_account: “alopresto”, username_sha256: “600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”, git_account_sha256: “600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23"]
CHA Properties (List)
attribute_enumeration_style: “list”
(dynamic) username_and_email_sha256: “username, email”
(dynamic) git_account_sha256: “git_account”Behavior (List)
username_and_email_sha256 = $(echo -n "aloprestoalopresto@apache.org" | shasum -a 256) = 22a11b7b3173f95c23a1f434949ec2a2e66455b9cb26b7ebc90afca25d91333f
git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) = 600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23Resulting Flowfile (List)
attributes: [username: “alopresto”, role: “security”, email: “alopresto@apache.org”, git_account: “alopresto”, username_email_sha256: “ 22a11b7b3173f95c23a1f434949ec2a2e66455b9cb26b7ebc90afca25d91333f”, git_account_sha256: “600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”]
CHA Properties (Regex)
attribute_enumeration_style: “regex”
(dynamic) all_sha256: “.*”
(dynamic) git_account_sha256: “git_account”Behavior (Regex)
all_sha256 = sort(attributes_that_match_regex) = [email, git_account, role, username] = $(echo -n "alopresto@apache.orgaloprestosecurityalopresto" | shasum -a 256) = b370fdf0132933cea76e3daa3d4a437bb8c571dd0cd0e79ee5d7759cf64efced
git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) = 600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23Resulting Flowfile (Regex)
attributes: [username: “alopresto”, role: “security”, email: “alopresto@apache.org”, git_account: “alopresto”, all_sha256: “ b370fdf0132933cea76e3daa3d4a437bb8c571dd0cd0e79ee5d7759cf64efced”, git_account_sha256: “600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”]
This will necessitate switching the "order" of dynamic properties in CryptographicHashAttribute – rather than a dynamic property existing_attribute_name - new_attribute_name_containing_hash, the ordering will be new_attribute_name_containing_hash - existing_attribute_name to allow for other values like attribute_.* or attribute_1, attribute_2.
There will also be a boolean flag to include the attribute name in the hashed value:
Example:
existing_attribute_name - "some value"
If true, the value in new_attribute_name_containing_hash would be hash("existing_attribute_namesome value"). If false, it would be hash("some value"). As no one is using the new CryptographicHashAttribute in the field yet, this change can only be made now.
Attachments
Issue Links
- depends upon
-
NIFI-5147 Improve HashAttribute processor
- Resolved
-
NIFI-5566 Bring HashContent inline with HashService and rename legacy components
- Resolved
- is related to
-
NIFI-11174 Remove Deprecated Processors from nifi-standard-bundle
- Resolved