Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6981

Map Progress is misleading for Distcp job

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.7.3
    • Fix Version/s: None
    • Component/s: distcp
    • Labels:
      None

      Description

      The Progress displayed by client when running Distcp job is misleading. The Map Progress reaches 100% earlier than the map tasks finishes. The issue reproduced by just running Distcp with multiple huge files.

      JobImpl returns progress 1.0 when either task finishes or task progress is 1.0. The MapTask of Distcp gets the progress from SequenceFileRecordReader which looks like updates the progress after reading the list of files and which does not account the time taken to copy the files into Destination.

      17/10/11 13:33:29 INFO mapreduce.Job:  map 100% reduce 0%
      17/10/11 13:34:47 INFO mapreduce.Job: Job job_1506610341926_0016 completed successfully
      

      The MapTask Progress 100% is displayed at 17/10/11 13:33:29 whereas the last map task finishes at 2017-10-11 13:34:45

      2017-10-11 13:34:45,159 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1506610341926_0016_m_000002 Task Transitioned from RUNNING to SUCCEEDED
      

      Attaching the client and application logs.

        Attachments

        1. yarnlog
          200 kB
          Prabhu Joseph
        2. clientlog
          7 kB
          Prabhu Joseph

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Prabhu Joseph Prabhu Joseph
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: