Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-12700

PutKudu memory optimization for unbatched flush mode (AUTO_FLUSH_SYNC)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.26.0, 2.0.0-M3
    • None
    • None

    Description

      The PutKudu processor's existing implementation uses a Map of KuduOperation -> FlowFile  to keep track of which FlowFile was processing when the KuduOperation was created. This is mapping is eventually used to associate FlowFiles with the RowError (if any occurs), a mapping that is necessary for transferring FlowFiles to success/failure relationships or logging failures among other things. 

      For very large inputs, Kudu Operation objects can grow very large. There is no memory leak, but still could cause OutOfMemory issues in very large input data. There is a possibility to not require the use of a KuduOperation -> FlowFile map for unbatched flush modes (e.g. when using the AUTO_FLUSH_SYNC flush mode, where the KuduSession.apply() would have already flushed the buffer before returning, https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html)

      This Jira attempts to capture the efforts for refactoring PutKudu processor to make it more memory optimized.

      Attachments

        Activity

          People

            emilio.setiadarma Emilio Setiadarma
            emilio.setiadarma Emilio Setiadarma
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h
                2h