The more I think about this, the more I feel ignoring deleted files is the wrong thing to do
Yes, deleted files is a red-herring (happens to be how we secure the files away from other users).
I think the original problem of YARN killing a process needs to be fixed (the original SMAPS fix was for HDFS Zero Copy read via mmap).
Math.min(info.sharedDirty, info.pss) + info.privateDirty
If as Nathan Roberts suggests, If YARN counted only the "anonymous" pages as the "will be free'd a kill" memory, it would give me a better way.
the write() case is going to eventually be throttled by the OS because it will only allow so many dirty buffer cache pages in the system. I don't believe that's the case for the mmap'd file.
Once you exceed the dirty_ratio, the only way you can avoid a page-fault is by modifying an existing dirty page over & over again.
If I understand page-writeback.c correctly, the blocking operation would be the page fault on a memory block which is missing in memory.
that significant memory use needs to be associated with that process in the accounting.
Accounting isn't the problem, killing processes is the problem.