Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-5147

Improve HashAttribute processor



    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0, 1.7.0, 1.7.1
    • 1.8.0
    • Extensions


      The HashAttribute processor currently has surprising behavior. Barring familiarity with the processor, a user would expect HashAttribute to generate a hash value over one or more attributes. Instead, the processor as it is implemented "groups" incoming flowfiles into groups based on regular expressions which match attribute values, and then generates a (non-configurable) MD5 hash over the concatenation of the matching attribute keys and values.

      In addition:

      • the processor throws an error and routes to failure any incoming flowfile which does not have all attributes specified in the processor
      • the use of MD5 is vastly deprecated
      • no other hash algorithms are available

      I am unaware of community use of this processor, but I do not want to break backward compatibility. I propose the following steps:

      • Implement a new CalculateAttributeHash processor (awkward name, but this processor already has the desired name)
        • This processor will perform the "standard" use case – identify an attribute, calculate the specified hash over the value, and write it to an output attribute
        • This processor will have a required property descriptor allowing a dropdown menu of valid hash algorithms
        • This processor will accept arbitrary dynamic properties identifying the attributes to be hashed as a key, and the resulting attribute name as a value
        • Example: I want to generate a SHA-512 hash on the attribute username, and a flowfile enters the processor with username value alopresto. I configure algorithm with SHA-512 and add a dynamic property usernameusername_SHA512. The resulting flowfile will have attribute username_SHA512 with value 739b4f6722fb5de20125751c7a1a358b2a7eb8f07e530e4bf18561fbff93234908aa9d2577770c876bca9ede5ba784d5ce6081dbbdfe5ddd446678f223b8d632
      • Improve the documentation of this processor to explain the goal/expected use case
      • Link in processor documentation to new processor for standard use cases
      • Remove the error alert when an incoming flowfile does not contain all expected attributes. I propose changing the severity to INFO and still routing to failure


        Issue Links



              alopresto Andy LoPresto
              alopresto Andy LoPresto
              0 Vote for this issue
              5 Start watching this issue

