Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-5582

Integrate legacy behavior of HashAttribute into CryptographicHashAttribute

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.7.1
    • Fix Version/s: None
    • Component/s: Extensions

      Description

      There has been discussion on the mailing lists regarding the use of the existing HashAttribute processor and the introduction of CryptographicHashAttribute. The behavior of these processors does not currently overlap, but CHA can be made to include HA's functionality in the documented use cases (to generate a unique identifier over a set of specific attributes and values), if not its exact behavior.

      From discussion

      Given your well-described use cases for HA, I think I may be able to provide that in CHA as well. I would expect to add a dropdown PD for “attribute enumeration style” and offer “individual” (each hash is generated on a single attribute), “list” (each hash is generated over an ordered, delimited list of literal matches), and “regex” (each hash is generated over an ordered list of all attribute names matching the provided regex). Then the dynamic properties would describe the output, as happens in the existing PR. Maybe a custom delimiter property is needed too, but for now ‘’ could be used to join the values. I’ll write up a Jira for this, and hopefully you can both let me know if this meets your requirements.

      Example:

      Incoming Flowfile

      attributes: [username: “alopresto”, role: “security”, email: “alopresto@apache.org”, git_account: “alopresto”]

      CHA Properties (Individual)

      attribute_enumeration_style: “individual”
      (dynamic) username_sha256: “username”
      (dynamic) git_account_sha256: “git_account”

      Behavior (Individual)

      username_sha256 = git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) = 600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23

      Resulting Flowfile (Individual)

      attributes: [username: “alopresto”, role: “security”, email: “alopresto@apache.org”, git_account: “alopresto”, username_sha256: “600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”, git_account_sha256: “600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23"]

      CHA Properties (List)

      attribute_enumeration_style: “list”
      (dynamic) username_and_email_sha256: “username, email”
      (dynamic) git_account_sha256: “git_account”

      Behavior (List)

      username_and_email_sha256 = $(echo -n "aloprestoalopresto@apache.org" | shasum -a 256) = 22a11b7b3173f95c23a1f434949ec2a2e66455b9cb26b7ebc90afca25d91333f
      git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) = 600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23

      Resulting Flowfile (List)

      attributes: [username: “alopresto”, role: “security”, email: “alopresto@apache.org”, git_account: “alopresto”, username_email_sha256: “ 22a11b7b3173f95c23a1f434949ec2a2e66455b9cb26b7ebc90afca25d91333f”, git_account_sha256: “600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”]

      CHA Properties (Regex)

      attribute_enumeration_style: “regex”
      (dynamic) all_sha256: “.*”
      (dynamic) git_account_sha256: “git_account”

      Behavior (Regex)

      all_sha256 = sort(attributes_that_match_regex) = [email, git_account, role, username] = $(echo -n "alopresto@apache.orgaloprestosecurityalopresto" | shasum -a 256) = b370fdf0132933cea76e3daa3d4a437bb8c571dd0cd0e79ee5d7759cf64efced
      git_account_sha256 = $(echo -n "alopresto" | shasum -a 256) = 600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23

      Resulting Flowfile (Regex)

      attributes: [username: “alopresto”, role: “security”, email: “alopresto@apache.org”, git_account: “alopresto”, all_sha256: “ b370fdf0132933cea76e3daa3d4a437bb8c571dd0cd0e79ee5d7759cf64efced”, git_account_sha256: “600973dc8f2b7bb2a20651ebefe4bf91c5295afef19f4d5b9994d581f5a68a23”]

      This will necessitate switching the "order" of dynamic properties in CryptographicHashAttribute – rather than a dynamic property existing_attribute_name - new_attribute_name_containing_hash, the ordering will be new_attribute_name_containing_hash - existing_attribute_name to allow for other values like attribute_.* or attribute_1, attribute_2.
      There will also be a boolean flag to include the attribute name in the hashed value:

      Example:

      existing_attribute_name - "some value"

      If true, the value in new_attribute_name_containing_hash would be hash("existing_attribute_namesome value"). If false, it would be hash("some value"). As no one is using the new CryptographicHashAttribute in the field yet, this change can only be made now.

      Mailing list discussion

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                alopresto Andy LoPresto
                Reporter:
                alopresto Andy LoPresto
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: