Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-7476

Allow users to configure FlowFile Concurrency on a Process Group




      The Wait/Notify processors are used quite heavily. These processors are very powerful and allow for many different use cases. However, offering this power is done at the expense of making the processors difficult to configure.

      The most common use case, it seems, is to simply allow a Process Group to process only a single FlowFile at a time. We see questions about how to accomplish this fairly frequently in Slack and on the mailing list.

      I propose that we add a new feature to NiFi so that when a user configures a Process Group, they can configure the FlowFile Concurrency: either unbounded (which is the current behavior) or a single FlowFile at a time on each node. In the latter case, only a single FlowFile will be ingested by a Local Input Port, and no more FlowFiles will be ingested as long as there is data queued in the Process Group. Once all data has left the Process Group, the next FlowFile will be allowed through.

      This has several advantages over the Wait/Notify pair of Processors. Firstly, there's no need to create a pair of two Processors and ensure that they are used in concert together properly. Secondly, there aren't a lot of properties to configure. Thirdly, implementing this at the framework level and with limited features means the implementation can be much simpler than that of Wait/Notify, which means it is much easier to maintain.

      Additionally, a related concept can be easily introduced: the notion of a FlowFile Outbound Policy. This is analogous to the FlowFile Concurrency but is related to Output Ports. Here, the use could configure the group such that data should be transferred out of the Process Group as soon as it's available (which is the current behavior) or could be transferred as a batch. In the batch mode, the Output Ports would not transfer any data out of the Process Group until all FlowFiles are queued up at an Output Port (i.e., all processing has finished).

      This allows for very simple configuration for an oft-requested capability: the ability to perform some action only after processing of a batch of data has completed.


          Issue Links



              • Assignee:
                markap14 Mark Payne
                markap14 Mark Payne
              • Votes:
                0 Vote for this issue
                3 Start watching this issue


                • Created:

                  Time Tracking

                  Original Estimate - Not Specified
                  Not Specified
                  Remaining Estimate - 0h
                  Time Spent - 40m