Pig
  1. Pig
  2. PIG-3044

Trigger POPartialAgg compaction under GC pressure

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0, 0.11, 0.10.1
    • Fix Version/s: 0.11, 0.12.0
    • Component/s: None
    • Labels:
      None
    • Release Note:
      Hide
      When pig.exec.mapPartAgg, previously the memory reserved for in-memory aggregation was roughly 1/3 the value of pig.cachedbag.memusage per POPartialAgg operator. Reserving that much RAM could cause problems for jobs that required a lot of memory. POPartialAgg is now spillable, and regardless of the setting of pig.cachedbag.memusage, will happily shrink when memory is needed.
      Show
      When pig.exec.mapPartAgg, previously the memory reserved for in-memory aggregation was roughly 1/3 the value of pig.cachedbag.memusage per POPartialAgg operator. Reserving that much RAM could cause problems for jobs that required a lot of memory. POPartialAgg is now spillable, and regardless of the setting of pig.cachedbag.memusage, will happily shrink when memory is needed.

      Description

      If partial aggregation is turned on in pig 10 and 11, 20% (by default) of the available heap can be consumed by the POPartialAgg operator. This can cause memory issues for jobs that use all, or nearly all, of the heap already.

      If we make POPartialAgg "spillable" (trigger compaction when memory reduction is required), we would be much nicer to high-memory jobs.

      1. PIG-3044.2.diff
        7 kB
        Dmitriy V. Ryaboy
      2. PIG-3044-hotfix.patch
        9 kB
        Jonathan Coveney
      3. PIG-3404.diff
        7 kB
        Dmitriy V. Ryaboy

        Activity

        Hide
        Dmitriy V. Ryaboy added a comment -

        Since this is preventing some jobs from migrating from Pig 9 to Pig 11 in our environment, I would like to add it to 11, but an argument can be made that this is not a critical bug and should therefore not be applied to the 0.11 branch. Committers, please weigh in if you have an opinion on this.

        Show
        Dmitriy V. Ryaboy added a comment - Since this is preventing some jobs from migrating from Pig 9 to Pig 11 in our environment, I would like to add it to 11, but an argument can be made that this is not a critical bug and should therefore not be applied to the 0.11 branch. Committers, please weigh in if you have an opinion on this.
        Hide
        Dmitriy V. Ryaboy added a comment -

        Attaching an initial pass at this.

        Also, mixed into this diff, is a change to drastically reduce the number of times we need to iterate over a large databag to estimate memory. I had to do that to reduce contention and required synchronization on the bag contents.

        Show
        Dmitriy V. Ryaboy added a comment - Attaching an initial pass at this. Also, mixed into this diff, is a change to drastically reduce the number of times we need to iterate over a large databag to estimate memory. I had to do that to reduce contention and required synchronization on the bag contents.
        Hide
        Julien Le Dem added a comment -

        in DefaultAbstractBag.getMemorySize():

        • You use avgTupleSize first to sum the aggregated tuple size and then you divide it to get the average. Could you use different names for clarity?
          something like:
          for (j = 0; i.hasNext() && j < 100; j++) {
             totalTupleSize += i.next().getMemorySize()
          }
          averageTupleSize = totalTupleSize / j;
          
        • Here we take the floor() of the average estimate. Do you want to make it a ceil() by adding + 1 ?
          That way we would slightly overestimate instead of slightly underestimate.

        otherwise it looks all good to me.
        I would even say we remove the % memory budget as the Spillable mechanism is more reliable and much simpler.

        +1

        Show
        Julien Le Dem added a comment - in DefaultAbstractBag.getMemorySize(): You use avgTupleSize first to sum the aggregated tuple size and then you divide it to get the average. Could you use different names for clarity? something like: for (j = 0; i.hasNext() && j < 100; j++) { totalTupleSize += i.next().getMemorySize() } averageTupleSize = totalTupleSize / j; Here we take the floor() of the average estimate. Do you want to make it a ceil() by adding + 1 ? That way we would slightly overestimate instead of slightly underestimate. otherwise it looks all good to me. I would even say we remove the % memory budget as the Spillable mechanism is more reliable and much simpler. +1
        Hide
        Dmitriy V. Ryaboy added a comment -

        attached a version with Julien's comments incorporated. Will commit to 0.11 and trunk.

        Show
        Dmitriy V. Ryaboy added a comment - attached a version with Julien's comments incorporated. Will commit to 0.11 and trunk.
        Hide
        Thejas M Nair added a comment -

        I would even say we remove the % memory budget as the Spillable mechanism is more reliable and much simpler.

        The reason why % memory budget was introduced for SelfSpillBag, was because the spillable mechanism didn't always work well. The cleanup often was getting triggered too late. So I think it is better use the Spillable mechanism here to spill earlier if necessary, as the patch is doing.

        Show
        Thejas M Nair added a comment - I would even say we remove the % memory budget as the Spillable mechanism is more reliable and much simpler. The reason why % memory budget was introduced for SelfSpillBag, was because the spillable mechanism didn't always work well. The cleanup often was getting triggered too late. So I think it is better use the Spillable mechanism here to spill earlier if necessary, as the patch is doing.
        Hide
        Cheolsoo Park added a comment -

        Hi Dmitriy,

        This patch broke trunk and branch-0.11. Several test cases are failing, for example, TestDataBag, TestExampleGenerator, TestLogicalPlanBuilder, etc:

        Testcase: testDataBagIterIdempotent took 0.011 sec 
            Caused an ERROR
        / by zero
        java.lang.ArithmeticException: / by zero
            at org.apache.pig.data.DefaultAbstractBag.getMemorySize(DefaultAbstractBag.java:160)
            at org.apache.pig.data.DefaultAbstractBag.markSpillableIfNecessary(DefaultAbstractBag.java:100)
            at org.apache.pig.data.InternalCachedBag.addDone(InternalCachedBag.java:131)
            at org.apache.pig.data.InternalCachedBag.iterator(InternalCachedBag.java:159)
            at org.apache.pig.test.TestDataBag.processDataBag(TestDataBag.java:1193)
            at org.apache.pig.test.TestDataBag.testDataBagIterIdempotent(TestDataBag.java:1145)
        

        It looks like "/ by zero" is happening when j == 0 in the following code:

        for (j = 0; i.hasNext() && j < 100; j++) { 
            avgTupleSize += i.next().getMemorySize();
        }
        avgTupleSize /= j;
        
        Show
        Cheolsoo Park added a comment - Hi Dmitriy, This patch broke trunk and branch-0.11. Several test cases are failing, for example, TestDataBag, TestExampleGenerator, TestLogicalPlanBuilder, etc: Testcase: testDataBagIterIdempotent took 0.011 sec Caused an ERROR / by zero java.lang.ArithmeticException: / by zero at org.apache.pig.data.DefaultAbstractBag.getMemorySize(DefaultAbstractBag.java:160) at org.apache.pig.data.DefaultAbstractBag.markSpillableIfNecessary(DefaultAbstractBag.java:100) at org.apache.pig.data.InternalCachedBag.addDone(InternalCachedBag.java:131) at org.apache.pig.data.InternalCachedBag.iterator(InternalCachedBag.java:159) at org.apache.pig.test.TestDataBag.processDataBag(TestDataBag.java:1193) at org.apache.pig.test.TestDataBag.testDataBagIterIdempotent(TestDataBag.java:1145) It looks like "/ by zero" is happening when j == 0 in the following code: for (j = 0; i.hasNext() && j < 100; j++) { avgTupleSize += i.next().getMemorySize(); } avgTupleSize /= j;
        Hide
        Jonathan Coveney added a comment -

        I've attached what should fix this. Should be quick to review. Please let me know what you think. test-commit passes.

        Show
        Jonathan Coveney added a comment - I've attached what should fix this. Should be quick to review. Please let me know what you think. test-commit passes.
        Hide
        Cheolsoo Park added a comment -

        +1 for the hotfix patch. I also verified test-commit passes. Thanks Jonathan!

        Show
        Cheolsoo Park added a comment - +1 for the hotfix patch. I also verified test-commit passes. Thanks Jonathan!
        Hide
        Jonathan Coveney added a comment -

        Thanks for the review. Committed to trunk and to 0.11

        Show
        Jonathan Coveney added a comment - Thanks for the review. Committed to trunk and to 0.11

          People

          • Assignee:
            Dmitriy V. Ryaboy
            Reporter:
            Dmitriy V. Ryaboy
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development