Description
In case of tailing numerous files, the processor is slow because it repeatedly loops over a large number of tailed files and performs several expensive operations.
- In the OnTrigger method, a loop (loop 1) iterates over all tailed files in the state object.
- Inside this loop, for each tailed file, the recoverRolledFiles method is called (loop 2), which then leads to consumeFilesFully and finally triggers cleanup.
- In the cleanup method, another loop (loop 3) iterates over all tailed files in the state again.
- During the cleanup, persistState is invoked, which removes any legacy state variables from the NiFi state. These legacy state variables originate from NiFi 1.0, when support for "Multiple Tailed Files" was not available, so state keys didn’t have the "file.x." prefix. As the cleanup iterates over and persists each tailed file's state, the overall state size grows (adding six entries per tailed file). This causes the legacy cleanup loop to become progressively slower with each iteration as the number of state entries grows.
This can lead to hours of execution time.
Suggestion for improvement:
- Moving out the loop that removes old state entries from cleanup. The cleanup of old entries should be run on the startup instead.
for(String key : oldState.toMap().keySet()) { // These states are stored by older version of NiFi, and won't be used anymore. // New states have 'file.<index>.' prefix. if (TailFileState.StateKeys.CHECKSUM.equals(key) || TailFileState.StateKeys.FILENAME.equals(key) || TailFileState.StateKeys.POSITION.equals(key) || TailFileState.StateKeys.TIMESTAMP.equals(key)) { getLogger().info("Removed state {}={} stored by older version of NiFi.", new Object[]{key, oldState.get(key)}); continue; } updatedState.put(key, oldState.get(key)); }