Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6351

Reducer hung in copy phase.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 2.6.0
    • Fix Version/s: None
    • Component/s: mrv2
    • Labels:
      None

      Description

      Problem
      Reducer gets stuck in copy phase and doesn't make progress for very long time. After killing this task for couple of times manually, it gets completed.

      Observations

      • Verfied gc logs. Found no memory related issues. Attached the logs.
      • Verified thread dumps. Found no thread related problems.
      • On verification of logs, fetcher threads are not copying the map outputs and they are just waiting for merge to happen.
      • Merge thread is alive and in wait state.

      Analysis
      On careful observation of logs, thread dumps and code, this looks to me like a classic case of multi-threading issue. Thread goes to wait state after it has been notified.

      Here is the suspect code flow.
      Thread #1
      Fetcher thread - notification comes first
      org.apache.hadoop.mapreduce.task.reduce.MergeThread.startMerge(Set<T>)

            synchronized(pendingToBeMerged) {
              pendingToBeMerged.addLast(toMergeInputs);
              pendingToBeMerged.notifyAll();
            }
      

      Thread #2
      Merge Thread - goes to wait state (Notification goes unconsumed)
      org.apache.hadoop.mapreduce.task.reduce.MergeThread.run()

              synchronized (pendingToBeMerged) {
                while(pendingToBeMerged.size() <= 0) {
                  pendingToBeMerged.wait();
                }
                // Pickup the inputs to merge.
                inputs = pendingToBeMerged.removeFirst();
              }
      

        Attachments

        1. jstat-gc.log
          0.7 kB
          Laxman
        2. reducer-container-partial.log.zip
          1.15 MB
          Laxman
        3. thread-dumps.out
          246 kB
          Laxman

          Issue Links

            Activity

              People

              • Assignee:
                lakshman Laxman
                Reporter:
                lakshman Laxman
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: