Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-13715

StandardProvenanceEventRecord.hashCode() is not consistent with equals() in handling Parent/Child FlowFiles

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.28.0, 2.0.0
    • None
    • None

    Description

      StandardProvenanceEventRecord may contain Child FlowFile UUIDs stored in a list (and similarly Parent FlowFile UUIDs in another list). The equals() method sorts the UUID lists of the event objects before comparing them and therefore 2 event objects are considered equal if they have the same Child FlowFiles but in different order. On the other hand, hashCode() does not apply sorting and produces different hashes for these equal objects which breaks the equals/hashCode contract: If two objects are equal according to the equals() method, then the hashCode() method must return the same value for them.

      Real life flow example where the improper hashCode() method causes an issue:

      QueryRecord with multiple queries and output relationships. The processor's code emits a FORK provenance event with 2+ Child FlowFiles (that many outputs it has). The framework (StandardProcessSession) can also generate the FORK event automatically and it checks if the component has already emitted the event and if yes, it will skip the automatic one. Due to the wrong hashCode() method, this check may fail and in this case 2 FORK events are saved in Provenance repository. This leads to "Unable to generate Lineage Graph because multiple events were registered claiming to have generated the same FlowFile" error when opening the next event after the FORKs.

      Attachments

        Activity

          People

            turcsanyip Peter Turcsanyi
            turcsanyip Peter Turcsanyi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 10m
                1h 10m