Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-7874

S3List processor in v1.12.1 uses lots of CPU power and RAM

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.12.1
    • 1.13.0
    • Core Framework
    • None
    • Centos 7, Amazon Cloud, 8 CPU cores, 64 GB RAM

    Description

      We are using the S3List processor to collect our log data from S3 and process them further. In Nifi version 1.11.4 the plugin reads a log file from S3, creates a flow file out of it, routes it to success and repeats its loop from the beginning. This is fast and does not need a lot of resources. We can operate Nifi at the default 512 MB RAM with 8 CPU cores which are utilized roughly at 50%.

      With the new version of the S3List processor (v1.12.1) the flow files seem to get cached in memory while the files on S3 are enumerated. Because of this, we set the Xmx and Xms parameters in bootstrap.conf to 4GB which does not suffice (we get an exception from AWS at some time). While the collection of the S3 entries is in progress, all 8 core of the CPUs are utilized at 100% and the RAM gets eaten up. This is especially bad because Nifi then does not have the resources to contact its external zookeeper and gets kicked out of the cluster. Also it is not possible to use the web UI anymore.

      This behavior won´t show up if you just have a few objects in S3 because they can easily be cached in memory but we have millions of entries in our S3 which will eat up the RAM of the machine.

      Maybe it would be a good thing to have an additional parameter for the processor which sets after how many created flow files they have to be routed to success.

       

      If you need any more logfiles I would be happy to provide them!

       

      BTW: Nifi is great Very easy to use and (normally) very economical about resources.

       

       

      Attachments

        Issue Links

          Activity

            People

              markap14 Mark Payne
              dreseldo Dominik Dresel
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m