Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4400

Fix performance regression for small jobs/workflows

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.203.0, 1.0.3
    • Fix Version/s: 1.1.0
    • Component/s: performance, task
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Tags:
      task, performance
    • Target Version/s:

      Description

      There is a significant performance regression for small jobs/workflows (vs 0.20.2) in the Hadoop 1.x series. Most noticeable with Hive and Pig jobs. PigMix has an average 40% regression against 0.20.2.

        Issue Links

          Activity

          Hide
          Luke Lu added a comment -

          Thanks to John Poelman and Shreyas Subramanya of IBM BigInsights performance QA for noticing the issue and verifying my fix.

          Show
          Luke Lu added a comment - Thanks to John Poelman and Shreyas Subramanya of IBM BigInsights performance QA for noticing the issue and verifying my fix.
          Hide
          Luke Lu added a comment -

          MAPREDUCE-2450 fixed the communication race condition that can cause occasional 1 minute timeout but make the PROGRESS_INTERVAL sleep pretty much mandatory. Any tasks including setup and cleanup tasks would need to sleep at least 3 seconds to finish.

          The patch make the wait interruptable when tasks finish.

          With this patch and MAPREDUCE-4399 and outofband heartbeats, PigMix is about 20% faster than 0.20.2. These patches makes Hadoop 1.x series on par with Hadoop 2.x in terms of general performance.

          Show
          Luke Lu added a comment - MAPREDUCE-2450 fixed the communication race condition that can cause occasional 1 minute timeout but make the PROGRESS_INTERVAL sleep pretty much mandatory. Any tasks including setup and cleanup tasks would need to sleep at least 3 seconds to finish. The patch make the wait interruptable when tasks finish. With this patch and MAPREDUCE-4399 and outofband heartbeats, PigMix is about 20% faster than 0.20.2. These patches makes Hadoop 1.x series on par with Hadoop 2.x in terms of general performance.
          Hide
          Tom White added a comment -

          The same issue was fixed in trunk and branch-2 in MAPREDUCE-3809 in much the same way. How about backporting that code to branch-1?

          Show
          Tom White added a comment - The same issue was fixed in trunk and branch-2 in MAPREDUCE-3809 in much the same way. How about backporting that code to branch-1?
          Hide
          Luke Lu added a comment -

          Thanks for the pointer to MAPREDUCE-3809, Tom. IMO, this patch is slightly better as it minimizes synchronization.

          Show
          Luke Lu added a comment - Thanks for the pointer to MAPREDUCE-3809 , Tom. IMO, this patch is slightly better as it minimizes synchronization.
          Hide
          Tom White added a comment -

          Maybe have a patch for trunk/branch-2 to bring the two into line then? I think it's good to minimize the number of differences where possible.

          Show
          Tom White added a comment - Maybe have a patch for trunk/branch-2 to bring the two into line then? I think it's good to minimize the number of differences where possible.
          Hide
          Shrinivas Joshi added a comment -

          Hi Luke - In our experiments your patch did achieve the same effect as what MAPREDUCE-4381 was trying to in terms of performance. We noticed good performance gains on Mahout KMeans clustering workload (~ 4%). It would be nice if we can get the branch-1 version of your change reviewed and checked-in in the mean time. Thanks.

          Show
          Shrinivas Joshi added a comment - Hi Luke - In our experiments your patch did achieve the same effect as what MAPREDUCE-4381 was trying to in terms of performance. We noticed good performance gains on Mahout KMeans clustering workload (~ 4%). It would be nice if we can get the branch-1 version of your change reviewed and checked-in in the mean time. Thanks.
          Hide
          Luke Lu added a comment -

          @Shinivas: have you tried this with mapreduce.tasktracker.outofband.heartbeat=true? (needs a cluster restart of course).

          Show
          Luke Lu added a comment - @Shinivas: have you tried this with mapreduce.tasktracker.outofband.heartbeat=true? (needs a cluster restart of course).
          Hide
          Shrinivas Joshi added a comment -

          @Luke: I have not tried with outofband heartbeat property. Do you expect this to show more perf gains along with your patch?

          Show
          Shrinivas Joshi added a comment - @Luke: I have not tried with outofband heartbeat property. Do you expect this to show more perf gains along with your patch?
          Hide
          Shrinivas Joshi added a comment -

          Can I request a code review and commit of this patch so that it gets integrated in to MRv1 branch in the mean time it is ported to MRv2? Thanks.

          Show
          Shrinivas Joshi added a comment - Can I request a code review and commit of this patch so that it gets integrated in to MRv1 branch in the mean time it is ported to MRv2? Thanks.
          Hide
          Luke Lu added a comment -

          Yes. The speed up is more pronounced with outofband heartbeat, which has similar effect of MAPREDUCE-1906 (which is not in branch-1). MRv2 doesn't need this patch as it was addressed by MAPREDUCE-3809. Tom, can we file a separate jira to improve the change in trunk?

          Shrinivas, you're encouraged to review and +1 on the patch

          Show
          Luke Lu added a comment - Yes. The speed up is more pronounced with outofband heartbeat, which has similar effect of MAPREDUCE-1906 (which is not in branch-1). MRv2 doesn't need this patch as it was addressed by MAPREDUCE-3809 . Tom, can we file a separate jira to improve the change in trunk? Shrinivas, you're encouraged to review and +1 on the patch
          Hide
          Tom White added a comment -

          Luke - yes, please do.

          Show
          Tom White added a comment - Luke - yes, please do.
          Hide
          Shrinivas Joshi added a comment -

          As I said earlier, I did verify that this patch was working as expected. It does minimize synchronization over MR3809. This patch looks good to be committed to me.

          Show
          Shrinivas Joshi added a comment - As I said earlier, I did verify that this patch was working as expected. It does minimize synchronization over MR3809. This patch looks good to be committed to me.
          Hide
          Tom White added a comment -

          +1

          Show
          Tom White added a comment - +1
          Hide
          Luke Lu added a comment -

          Thanks for the review and +1, Shrinivas and Tom! Created MAPREDUCE-4477 to track trunk improvement. Committed the change to branch-1 and branch-1.1. It's not in 1.1.0-rc1 though.

          Show
          Luke Lu added a comment - Thanks for the review and +1, Shrinivas and Tom! Created MAPREDUCE-4477 to track trunk improvement. Committed the change to branch-1 and branch-1.1. It's not in 1.1.0-rc1 though.
          Hide
          Matt Foley added a comment -

          Due to delay in 1.1.0, brought this into 1.1.0 from 1.1.1.

          Show
          Matt Foley added a comment - Due to delay in 1.1.0, brought this into 1.1.0 from 1.1.1.
          Hide
          Matt Foley added a comment -

          Closed upon release of Hadoop-1.1.0.

          Show
          Matt Foley added a comment - Closed upon release of Hadoop-1.1.0.

            People

            • Assignee:
              Luke Lu
              Reporter:
              Luke Lu
            • Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development