Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-10817

Stateless NiFi does not release FlowFile content until flow completes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.19.0
    • NiFi Stateless
    • None

    Description

      When a stateless flow is run, the content that is stored in the FlowFile repository is not cleaned up until the flow completes.

      This means that if we have the following flow:

      ConsumeKafka -> ReplaceText -> MergeContent (1000 FlowFile bucket) -> MergeContent (1000 FlowFile bucket) -> PutS3

      The intent here would be to pull data, transform it, merge together many records, and put to s3. The expectation is that we'd have no more than 1,000,000 Kafka messages in the content repo at a time, but we'll have two copies of each (1,000 FlowFiles, each containing 1000 kafka messages, waiting to be merged PLUS the merged result).

      However, what we see is that we have the final merged content, plus the 1,000 bundles ahead of it still in the repo (expected), PLUS the 1,000,000 individual transformed messages PLUS the original 1,000,000 messages. These intermediate FlowFiles' contents should be purged as aggressively as they can be. This is particularly important when using an in-memory content repository.

      The in-memory content repository does not actually store the content within the repo but rather facilitates a mechanism by which the content can be written to the claim held by the FlowFileRecord. Then, when no longer referenced, we rely on garbage collection to clean up. However, it appears that the ProcessSession is holding on to all of these intermediate claims in its records member variable, and we can purge those much more aggressively.

      Attachments

        Issue Links

          Activity

            People

              markap14 Mark Payne
              markap14 Mark Payne
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m