Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-9390

MergeContent does not work properly in Kafka Connect Stateless flows

    XMLWordPrintableJSON

Details

    Description

      MergeContent does not work in Kafka Connect Stateless flows if the processor is not the first one in the flow.

      NIFI-8469 solved this issue for flows similar to the ones mentioned in that jira:
      GetFile --> SplitText --> ReplaceText --> MergeContent --> PutS3Object
      In this scenario GetFile reads a file and creates a FF from it. SplitText creates multiple FFs. ReplaceText processes all the FFs first and MergeContent will be triggered only after ReplaceText finished with all FFs. So it is able to merge the FFs (but only splits coming from the same input file read by GetFile).

      A Kafka Connect Stateless Sink flow may look like this:
      Input port --> "Process/Transform messages" --> MergeContent --> PutS3Object
      The Kafka Connect framework polls some messages from the Kafka topic that will be enqueued in the stateless flow. Then the first processor gets triggered with one FF. This FF is sent downstream till the end of the flow. MergeContent can only see one FF at a time so it cannot merge multiple files.

      So the issue is that the first processor gets triggered for each flowfile separately (which is fine for GetFile but not for the KC Stateless flow). Only subsequent processors get triggered in "triggerWhileReady" way:
      https://github.com/apache/nifi/blob/18fc492e4ce97d2bca4f96df1a1f1eb2b3e80899/nifi-stateless/nifi-stateless-bundle/nifi-stateless-engine/src/main/java/org/apache/nifi/stateless/flow/StandardStatelessFlowCurrent.java#L68-L77

      Attachments

        Activity

          People

            markap14 Mark Payne
            turcsanyip Peter Turcsanyi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 3h
                3h