Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6003

Resource Estimator suggests huge map output in some cases

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 1.2.1
    • None
    • jobtracker

    Description

      In some cases, ResourceEstimator can return way too large map output estimation. This happens when input size is not correctly calculated.

      A typical case is when joining two Hive tables (one in HDFS and the other in HBase). The maps that process the HBase table finish first, which has a 0 length of inputs due to its TableInputFormat. Then for a map that processes HDFS table, the estimated output size is very large because of the wrong input size, causing the map task not possible to be assigned.

      There are two possible solutions to this problem:
      (1) Make input size correct for each case, e.g. HBase, etc.
      (2) Use another algorithm to estimate the map output, or at least make it closer to reality.

      I prefer the second way, since the first would require all possibilities to be taken care of. It is not easy for some inputs such as URIs.

      In my opinion, we could make a second estimation which is independent of the input size:
      estimationB = (completedMapOutputSize / completedMaps) * totalMaps * 10

      Here, multiplying by 10 makes the estimation more conservative, so that it will be less likely to assign it to some where not big enough.

      The former estimation goes like this:
      estimationA = (inputSize * completedMapOutputSize * 2.0) / completedMapInputSize

      My suggestion is to take minimum of the two estimations:
      estimation = min(estimationA, estimationB)

      Attachments

        1. MAPREDUCE-6003-branch-1.2.patch
          1.0 kB
          Chengbing Liu

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            chengbing.liu Chengbing Liu
            chengbing.liu Chengbing Liu

            Dates

              Created:
              Updated:

              Slack

                Issue deployment