[NIFI-8633] Content Repository can be improved to make fewer disks accesses on read - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.14.0
Component/s: Core Framework
Labels:
None

Description

When FileSystemRepository.read(ContentClaim) or FileSystemRepository.read(ResourceClaim) is called, the repository determines the file path for the claim via getPath(claim, true); where the true indicates that we should verify that the file exists.

This is done so that if we were to pass in a ContentClaim that does not exist, we throw a more meaningful ContentNotFoundException instead of just letting a FileNotFoundException fly.

However, this call to Files.exists(Path) is fairly expensive, as it's a disk access. For a flow that uses a lot of smaller files, this can be extremely expensive.

We can improve this by removing the call to Files.exists all together. Instead, just blindly create the FileInputStream in a try/catch block and catch FileNotFoundException, and then wrap that in a ContentNotFoundException. This results in the same API and the same contracts as before but avoids the overhead of additional disk accesses/seeks.

Attachments

Issue Links

links to

GitHub Pull Request #5104

Activity

People

Assignee:: Mark Payne

Reporter:: Mark Payne

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 26/May/21 16:45

Updated:: 14/Jun/21 07:38

Resolved:: 11/Jun/21 20:12

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

20m