Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.12.1
-
None
-
Centos 7, Amazon Cloud, 8 CPU cores, 64 GB RAM
Description
We are using the S3List processor to collect our log data from S3 and process them further. In Nifi version 1.11.4 the plugin reads a log file from S3, creates a flow file out of it, routes it to success and repeats its loop from the beginning. This is fast and does not need a lot of resources. We can operate Nifi at the default 512 MB RAM with 8 CPU cores which are utilized roughly at 50%.
With the new version of the S3List processor (v1.12.1) the flow files seem to get cached in memory while the files on S3 are enumerated. Because of this, we set the Xmx and Xms parameters in bootstrap.conf to 4GB which does not suffice (we get an exception from AWS at some time). While the collection of the S3 entries is in progress, all 8 core of the CPUs are utilized at 100% and the RAM gets eaten up. This is especially bad because Nifi then does not have the resources to contact its external zookeeper and gets kicked out of the cluster. Also it is not possible to use the web UI anymore.
This behavior won´t show up if you just have a few objects in S3 because they can easily be cached in memory but we have millions of entries in our S3 which will eat up the RAM of the machine.
Maybe it would be a good thing to have an additional parameter for the processor which sets after how many created flow files they have to be routed to success.
If you need any more logfiles I would be happy to provide them!
BTW: Nifi is great Very easy to use and (normally) very economical about resources.
Attachments
Issue Links
- links to