Description
Currently, the JHS scan directory if the modification of directory changed:
public synchronized void scanIfNeeded(FileStatus fs) { long newModTime = fs.getModificationTime(); if (modTime != newModTime) { <... omitted some logics ...> // reset scanTime before scanning happens scanTime = System.currentTimeMillis(); Path p = fs.getPath(); try { scanIntermediateDirectory(p);
This logic relies on an assumption that, the directory's modification time will be updated if a file got placed under the directory.
However, the semantic of directory's modification time is not consistent in different FS implementations. For example, MAPREDUCE-6680 fixed some issues of truncated modification time. And HADOOP-12837 mentioned on S3, the directory's modification time is always 0.
I think we need to revisit behavior of this logic to make it to more robustly work on different file systems.
Attachments
Attachments
Issue Links
- relates to
-
MAPREDUCE-6251 JobClient needs additional retries at a higher level to address not-immediately-consistent dfs corner cases
- Closed
-
MAPREDUCE-6680 JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc.)
- Closed