Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6436

JobHistory cache issue

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 2.8.0, 2.7.3, 2.6.4, 3.0.0-alpha1
    • None
    • None

    Description

      Problem:
      HistoryFileManager.addIfAbsent produces large amount of logs if number of
      cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
      larger than mapreduce.jobhistory.joblist.cache.size by far.

      Example:
      For example, if the cache contains 50000 entries in total and 10,000 entries
      newer than mapreduce.jobhistory.max-age-ms where
      mapreduce.jobhistory.joblist.cache.size is 20000, HistoryFileManager.addIfAbsent
      method produces 50000 - 20000 = 30000 lines of "Waiting to remove <key> from
      JobListCache because it is not in done yet" message.

      It will attach a stacktrace.

      Impact:
      In addition to large disk consumption, this issue blocks JobHistory.getJob
      long time and slows job execution down significantly because getJob is called
      by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
      This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
      eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
      multiple threads call scanIfNeeded simultaneously, one of them acquires lock
      and the other threads are blocked until the first thread completes long-running
      HistoryFileManager.addIfAbsent call.

      Solution:

      • Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take too long time.
      • Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
        scanning if another thread is already scanning. This changes semantics of
        some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
        because scanIfNeeded keep outdated state.
      • Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls are
        not blocked by a loop at scale of tens of thousands.

      This patch implemented the first item.

      Attachments

        1. MAPREDUCE-6436.4.patch
          3 kB
          Kai
        2. MAPREDUCE-6436.3.patch
          3 kB
          Kai
        3. MAPREDUCE-6436.2.patch
          3 kB
          Kai
        4. MAPREDUCE-6436.1.patch
          4 kB
          Ryu Kobayashi
        5. stacktrace3.txt
          60 kB
          Ryu Kobayashi
        6. stacktrace2.txt
          58 kB
          Ryu Kobayashi
        7. stacktrace1.txt
          59 kB
          Ryu Kobayashi

        Issue Links

          Activity

            People

              lewuathe Kai
              ryu_kobayashi Ryu Kobayashi
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: