Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6680

JHS UserLogDir scan algorithm sometime could skip directory with update in CloudFS (Azure FileSystem, S3, etc.)

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      In our cluster based on a Cloud FileSystem, we notice JHS sometimes could skip directory with .jhist file in scanning.
      The behavior is like:
      First round scan, doesn't found .jhist file:

      16/04/13 11:14:34 DEBUG azure.NativeAzureFileSystem: Found path as a directory with 6 files in it.
      16/04/13 11:14:34 DEBUG hs.HistoryFileManager: Found 0 files
      ...
      

      Then, we see "Scan not needed of ..." for the same directory every 3 minutes until application failed as timeout.

      From our analysis, we found the root cause is: most of Cloud File System (Azure FS, S3, etc.) is truncating file/directory modification time to seconds instead of milliseconds - which could due to limit of http protocol (from discussion at: https://forums.aws.amazon.com/thread.jspa?messageID=476615).

      So if the time sequence is happen to be: latest non .jhist file modification on directory happens at T1, directory scanning happens at T2, .jhist file added to directory at T3. If we have T1< T2 < T3 and T1 is equal to T3 after truncating to seconds, this issue could appear.

        Attachments

        1. MAPREDUCE-6680.patch
          2 kB
          Junping Du
        2. MAPREDUCE-6680-v2.patch
          2 kB
          Junping Du
        3. MAPREDUCE-6680-v3.patch
          2 kB
          Junping Du

          Issue Links

            Activity

              People

              • Assignee:
                djp Junping Du
                Reporter:
                djp Junping Du
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: