Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-1008

NiFi should swap out FlowFiles to disk even before the session is committed

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Core Framework
    • None

    Description

      Currently, NiFi will swap out FlowFiles if there are a large number in a FlowFile Queue. This is done to avoid running out of JVM heap space. However, if we have a simple flow like GetFile -> SplitText and GetFile pulls in a large file, SplitText can quickly cause OutOfMemoryError. This is not because it buffers the content of the FlowFile in memory but rather because it holds the millions of FlowFile objects in memory. We can do better.

      When we call session.transfer for the FlowFiles, once we hit a magical threshold (say 10,000), we should swap those FlowFiles to disk and the session should transfer them to the queue "swapped out" flowfiles, rather than having to buffer all of these in memory and then swapping them out once they land in the queue.

      Attachments

        Activity

          People

            markap14 Mark Payne
            markap14 Mark Payne
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: