Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4400

Fix performance regression for small jobs/workflows

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.203.0, 1.0.3
    • Fix Version/s: 1.1.0
    • Component/s: performance, task
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed
    • Tags:
      task, performance

      Description

      There is a significant performance regression for small jobs/workflows (vs 0.20.2) in the Hadoop 1.x series. Most noticeable with Hive and Pig jobs. PigMix has an average 40% regression against 0.20.2.

        Issue Links

          Activity

          Hide
          vicaya Luke Lu added a comment -

          Thanks to John Poelman and Shreyas Subramanya of IBM BigInsights performance QA for noticing the issue and verifying my fix.

          Show
          vicaya Luke Lu added a comment - Thanks to John Poelman and Shreyas Subramanya of IBM BigInsights performance QA for noticing the issue and verifying my fix.
          Hide
          vicaya Luke Lu added a comment -

          MAPREDUCE-2450 fixed the communication race condition that can cause occasional 1 minute timeout but make the PROGRESS_INTERVAL sleep pretty much mandatory. Any tasks including setup and cleanup tasks would need to sleep at least 3 seconds to finish.

          The patch make the wait interruptable when tasks finish.

          With this patch and MAPREDUCE-4399 and outofband heartbeats, PigMix is about 20% faster than 0.20.2. These patches makes Hadoop 1.x series on par with Hadoop 2.x in terms of general performance.

          Show
          vicaya Luke Lu added a comment - MAPREDUCE-2450 fixed the communication race condition that can cause occasional 1 minute timeout but make the PROGRESS_INTERVAL sleep pretty much mandatory. Any tasks including setup and cleanup tasks would need to sleep at least 3 seconds to finish. The patch make the wait interruptable when tasks finish. With this patch and MAPREDUCE-4399 and outofband heartbeats, PigMix is about 20% faster than 0.20.2. These patches makes Hadoop 1.x series on par with Hadoop 2.x in terms of general performance.
          Hide
          tomwhite Tom White added a comment -

          The same issue was fixed in trunk and branch-2 in MAPREDUCE-3809 in much the same way. How about backporting that code to branch-1?

          Show
          tomwhite Tom White added a comment - The same issue was fixed in trunk and branch-2 in MAPREDUCE-3809 in much the same way. How about backporting that code to branch-1?
          Hide
          vicaya Luke Lu added a comment -

          Thanks for the pointer to MAPREDUCE-3809, Tom. IMO, this patch is slightly better as it minimizes synchronization.

          Show
          vicaya Luke Lu added a comment - Thanks for the pointer to MAPREDUCE-3809 , Tom. IMO, this patch is slightly better as it minimizes synchronization.
          Hide
          tomwhite Tom White added a comment -

          Maybe have a patch for trunk/branch-2 to bring the two into line then? I think it's good to minimize the number of differences where possible.

          Show
          tomwhite Tom White added a comment - Maybe have a patch for trunk/branch-2 to bring the two into line then? I think it's good to minimize the number of differences where possible.
          Hide
          jshrinivas Shrinivas Joshi added a comment -

          Hi Luke - In our experiments your patch did achieve the same effect as what MAPREDUCE-4381 was trying to in terms of performance. We noticed good performance gains on Mahout KMeans clustering workload (~ 4%). It would be nice if we can get the branch-1 version of your change reviewed and checked-in in the mean time. Thanks.

          Show
          jshrinivas Shrinivas Joshi added a comment - Hi Luke - In our experiments your patch did achieve the same effect as what MAPREDUCE-4381 was trying to in terms of performance. We noticed good performance gains on Mahout KMeans clustering workload (~ 4%). It would be nice if we can get the branch-1 version of your change reviewed and checked-in in the mean time. Thanks.
          Hide
          vicaya Luke Lu added a comment -

          @Shinivas: have you tried this with mapreduce.tasktracker.outofband.heartbeat=true? (needs a cluster restart of course).

          Show
          vicaya Luke Lu added a comment - @Shinivas: have you tried this with mapreduce.tasktracker.outofband.heartbeat=true? (needs a cluster restart of course).
          Hide
          jshrinivas Shrinivas Joshi added a comment -

          @Luke: I have not tried with outofband heartbeat property. Do you expect this to show more perf gains along with your patch?

          Show
          jshrinivas Shrinivas Joshi added a comment - @Luke: I have not tried with outofband heartbeat property. Do you expect this to show more perf gains along with your patch?
          Hide
          jshrinivas Shrinivas Joshi added a comment -

          Can I request a code review and commit of this patch so that it gets integrated in to MRv1 branch in the mean time it is ported to MRv2? Thanks.

          Show
          jshrinivas Shrinivas Joshi added a comment - Can I request a code review and commit of this patch so that it gets integrated in to MRv1 branch in the mean time it is ported to MRv2? Thanks.
          Hide
          vicaya Luke Lu added a comment -

          Yes. The speed up is more pronounced with outofband heartbeat, which has similar effect of MAPREDUCE-1906 (which is not in branch-1). MRv2 doesn't need this patch as it was addressed by MAPREDUCE-3809. Tom, can we file a separate jira to improve the change in trunk?

          Shrinivas, you're encouraged to review and +1 on the patch

          Show
          vicaya Luke Lu added a comment - Yes. The speed up is more pronounced with outofband heartbeat, which has similar effect of MAPREDUCE-1906 (which is not in branch-1). MRv2 doesn't need this patch as it was addressed by MAPREDUCE-3809 . Tom, can we file a separate jira to improve the change in trunk? Shrinivas, you're encouraged to review and +1 on the patch
          Hide
          tomwhite Tom White added a comment -

          Luke - yes, please do.

          Show
          tomwhite Tom White added a comment - Luke - yes, please do.
          Hide
          jshrinivas Shrinivas Joshi added a comment -

          As I said earlier, I did verify that this patch was working as expected. It does minimize synchronization over MR3809. This patch looks good to be committed to me.

          Show
          jshrinivas Shrinivas Joshi added a comment - As I said earlier, I did verify that this patch was working as expected. It does minimize synchronization over MR3809. This patch looks good to be committed to me.
          Hide
          tomwhite Tom White added a comment -

          +1

          Show
          tomwhite Tom White added a comment - +1
          Hide
          vicaya Luke Lu added a comment -

          Thanks for the review and +1, Shrinivas and Tom! Created MAPREDUCE-4477 to track trunk improvement. Committed the change to branch-1 and branch-1.1. It's not in 1.1.0-rc1 though.

          Show
          vicaya Luke Lu added a comment - Thanks for the review and +1, Shrinivas and Tom! Created MAPREDUCE-4477 to track trunk improvement. Committed the change to branch-1 and branch-1.1. It's not in 1.1.0-rc1 though.
          Hide
          mattf Matt Foley added a comment -

          Due to delay in 1.1.0, brought this into 1.1.0 from 1.1.1.

          Show
          mattf Matt Foley added a comment - Due to delay in 1.1.0, brought this into 1.1.0 from 1.1.1.
          Hide
          mattf Matt Foley added a comment -

          Closed upon release of Hadoop-1.1.0.

          Show
          mattf Matt Foley added a comment - Closed upon release of Hadoop-1.1.0.

            People

            • Assignee:
              vicaya Luke Lu
              Reporter:
              vicaya Luke Lu
            • Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development