[NIFI-9704] Improve verbose diagnostics and change default value of "nifi.content.claim.max.appendable.size" from 1 MB to 50 KB - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.16.0
Component/s: Configuration, Core Framework
Labels:
- ageoff
- content-repo
- content-repository
- disk
- disk-space
- full
- space

Description

We sometimes see users (especially those with a mix of flows where some produce very large FlowFiles and some produce tons of tiny FlowFiles) run into an issue where the UI shows very little space is used up by FlowFiles but the content repository fills up.

Yesterday I was on a call with such a team. Their NiFi UI showed one node had about 200,000 FlowFiles totaling dozens of MB. However, the content repository was 300 GB in size (which was the entire content repo). As a result, their NiFi instance stopped processing data because the content repo was completely full.

We did some analysis to check if there were "orphaned" flowfiles filling the content repository, but there were not. Instead, the nifi.sh diagnostics --verbose command showed us that a handful of queues were causing the content repo to retain those 100's of GB of data, even though the FlowFiles themselves only amounted to a few MB.

This is a known issue and is caused by how we write FlowFile Content to disk, using the same file on disk for many content claims. By default, we allow up to 1 MB to be written to a file before we conclude that we should no longer write additional FlowFiles to it. This is controlled by the "nifi.content.claim.max.appendable.size" property.

The support team indicates that this happens frequently. We need to change the default value of this property from "1 MB" to "50 KB". This will dramatically decrease the incidence rate.

I setup a flow to test this locally. Queued up 5,000 FlowFiles totaling 610 KB, and the Content Repo was taking 45 GB of disk space. I then dropped all data, changed this property from the default 1 MB to 50 KB and repeated the test. As expected, I queued up the same number of files (610 KB worth), and the content repo occupied 2.6 GB of disk space. I.e., making the value 5% of the original value resulted in occupying only 5% as much "unnecessary" disk space.

Performance tests indicate that the performance was approximately the same, regardless of whether I used "1 MB" or "50 KB"

Additionally, when running the nifi.sh diagnostics --verbose command, the information that was necessary for tracking down the root cause of this was made available but took tremendous effort to decipher. We should update the diagnostics output when scanning the content repo to show the amount of data in the content repo that is being retained by each queue in the flow.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Use_Excessive_Disk_Space.json
17/Feb/22 15:08
11 kB
Mark Payne

Issue Links

is related to

NIFI-3376 Content repository disk usage is not close to reported size in Status Bar

Open

links to

GitHub Pull Request #5780

Activity

People

Assignee:: Mark Payne

Reporter:: Mark Payne

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 17/Feb/22 15:02

Updated:: 01/Sep/22 20:54

Resolved:: 22/Feb/22 17:07

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

0.5h