[HDFS-16864] HDFS advisory caching should drop cache behind block when block closed - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 3.3.4
Fix Version/s: None
Component/s: hdfs
Labels:
- pull-request-available

Description

One of the comments in ~~HDFS-4817~~ describes the behavior in BlockReceiver.manageWriterOsCache:

"The general idea is that there isn't much point in calling sync_file_pages twice on the same offsets, since the sync process has presumably already begun. On the other hand, calling fadvise(FADV_DONTNEED) again and again will tend to purge more and more bytes from the cache. The reason is because dirty pages (those containing un-written-out-data) cannot be purged using FADV_DONTNEED. And we can't know exactly when the pages we wrote will be flushed to disk. But we do know that calling FADV_DONTNEED on very recently written bytes is a waste, since they will almost certainly not have been written out to disk. That is why it purges between 0 and lastCacheManagementOffset - CACHE_WINDOW_SIZE, rather than simply 0 to pos."

Looking at the code, I'm wondering if at least the last 8MB (size of CACHE_WINDOW_SIZE) of a block might be left without an associated FADVISE_DONT_NEED call. We're having a discussion in #accumulo about the file caching feature and I found some interesting results in a test that we wrote. Specifically, that for a multi-block file using setDropBehind with either hsync or CreateFlag.SYNC_BLOCK, parts of each block remained in the cache instead of parts of the last block.

I'm wondering if there is a reason not to call fadvise(FADV_DONTNEED) on the entire block in close here when dropCacheBehindWrites is true.

Attachments

Issue Links

links to

GitHub Pull Request #5204

GitHub Pull Request #5216

Activity

People

Assignee:: Unassigned

Reporter:: Dave Marion

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 07/Dec/22 19:05

Updated:: 15/Aug/23 23:06