Sure. My anecdotal experience was with a use-case where the goal was to write a lot of data to disk (effectively copy a large file) without evicting other data from the page cache (which was actively being relied upon by a service on the same host). The goal was to only take a one-time hit on cache misses when switching to the new data, rather than having the process of distributing the data severely affect performance.
In order to do this, not only was DONTNEED required, it was required that one syncs the relevant data before the call. At the time I never looked at the kernel implementation, but based on the cross ref. referenced in the post you reference, DONTNEED results in a call to invalidate_mapping_pages:
Which is defined in truncate.c:
As is noted in the documentation of that function, it won't invalidate dirty pages (among others).
I have never actually tested whether DONTNEED has an affect for reads (I made an untested assumption). I could experiment some more and report back if you think a working posix_fadvise() solution would be preferable to direct I/O (I don't have a problem with the direct I/O solution, personally).
The phrasing in the man page in terms of the intent of DONTNEED seems to match this:
"POSIX_FADV_DONTNEED attempts to free cached pages associated with the specified region. This is useful, for
example, while streaming large files. A program may periodically request the kernel to free cached data that has
already been used, so that more useful cached pages are not discarded instead."
In terms of the documented API of posix_fadvise(), I see where you're coming from and my initial reaction is that yes, NOREUSE seems like the obvoius choice. But on further thought I'm not so sure. Imagine if the kernel actually did implement NOREUSE and others by remembering it and adjusting it's behavior subsequent to the call. What is the expected behavior with respect to:
(1) different threads in the same process
(2) different file descriptors associated with the same file
The man page doesn't say much explicitly, but my informal suspicion would be that any such implementation would tend to be global to a process and to a file, independent of threads or file descriptors. That is pure speculation, but I suppose my point is that we don't really know (also, I just recently looked at a Python wrapper which even made the assumption that it wasn't per-fd).
If an implementation where to be like that, NOREUSE would actually be less suitable than DONTNEED since DONTNEED would only temporarily evict pages just after they were read or written while NOREUSE might potentially cause the kernel to avoid retaining pages for the file for all accesses (including live traffic), permanently (or at least during the compaction window assuming one changes the advise afterwards).
My assumption with posix_fadvise() and fsync()+DONTNEED has been that it is only an attempt to improve characteristics, and it won't be perfect. In particular, on two ends of the spectrum:
- For smaller data sets that mostly or completely fit in memory, and it is being relied on for performance, a compaction using fsync()+DONTNEED would not really help much since the entire data base is evicted from memory very quickly and you end up with a performance impact roughly equal to what you expect anyway strictly as a result of flipping the sstable switch, switching over to "cold" sstables.
- For very large data sets the compaction process takes a long time, and the data touched at any given "few minute" (choose some arbitrary time period) interval is a very small subset of the total data set. Thus, assuming the cluster is not depending on very long-term warm-up periods for performance, the impact should be very limited by the mere fact that the continuous live traffic constitutes a continual warm-up of whatever data is slowly (relative to the total size) evicted incrementally from the page cache. The hit at the point of switch-to-cold-sstable will still be taken, but until that happens the long-running compaction should at least have a much more limited impact.
The nice thing about direct I/O, provided that other concerns (such as alignment, which was mentioned on the mailing list) don't outweigh it, is that the semantics with respect to interaction with the page cache seems more obvious. I would tend to expect that a given OS+fs combination will either support direct I/O or not and that when supporting it would truly not interact with the page cache. The posix_fadvise() behavior I would not be surprised if it varied a lot in future kernel versions (or other OS:es)...