Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-9239

Create processor to run Stateless dataflow, enabling Kafka's exactly once semantics

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.15.0
    • Extensions, NiFi Stateless
    • None

    Description

      There have been many requests for the ability to create a dataflow in NiFi that makes use of Kafka's "Exactly Once Semantics" (EOS). While there are benefits to being able to do so, the requirements that Kafka puts forth don't really work well with NiFi's architecture.

      However, it would make a lot of sense to run a NiFi dataflow using Stateless NiFi, in a manner that could support these EOS.

      To do so, we would need to update the consume & publish processors in order to support the exactly once semantics. The Kafka Consumer would need to be capable of not committing its offsets, and the publisher would need to understand that that was the case and acknowledge the offsets as part of its commit.

      This would require that all messages for the transaction be sent to PublishKafka(Record) as a single group, but that is possible with Batch Output mode of Process Groups.

      While this is possible, it then leaves a concern about the ease of running a Stateless flow with NiFi. While it can be run from command-line, we should also build a Processor that will be capable of fetching a dataflow (from file, registry, etc.) and running that flow as a Processor within NiFi. This offers many additional advantages also, such as the ability to perform a file listing in NiFi, which is persisted, and then processing it with stateless.

      Attachments

        Issue Links

          Activity

            People

              markap14 Mark Payne
              markap14 Mark Payne
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 6h 10m
                  6h 10m