Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: tez-branch
    • Fix Version/s: tez-branch
    • Component/s: tez
    • Labels:
      None

      Description

      There are two sets of skewed join e2e tests-

      1. tez.conf: Join7 and 8
      2. nightly.conf: SkewedJoin

      We need to get both passing.

      1. PIG-3719-1.patch
        12 kB
        Cheolsoo Park
      2. PIG-3719-2.patch
        13 kB
        Cheolsoo Park
      3. PIG-3719-3.patch
        18 kB
        Cheolsoo Park
      4. PIG-3719-4.patch
        21 kB
        Cheolsoo Park
      5. PIG-3719-5.patch
        21 kB
        Cheolsoo Park

        Activity

        Hide
        Cheolsoo Park added a comment -

        Committed to tez branch. Thank you Daniel for the review!

        Show
        Cheolsoo Park added a comment - Committed to tez branch. Thank you Daniel for the review!
        Hide
        Cheolsoo Park added a comment -

        Attaching a new patch that fixes right and full outer join cases.

        Show
        Cheolsoo Park added a comment - Attaching a new patch that fixes right and full outer join cases.
        Hide
        Cheolsoo Park added a comment -
        Show
        Cheolsoo Park added a comment - RB link- https://reviews.apache.org/r/17375/
        Hide
        Rohini Palaniswamy added a comment -

        Had to create a new instance for the current as moving to the next record in Tez, resets fields on the current instead of creating a new object. Can you make it cleaner by making PigNullableWritable implement Cloneable and remove the newInstance() method in favor of clone(). NullablePartitionWritable can then override the clone. I should have done that in the first place.

        Show
        Rohini Palaniswamy added a comment - Had to create a new instance for the current as moving to the next record in Tez, resets fields on the current instead of creating a new object. Can you make it cleaner by making PigNullableWritable implement Cloneable and remove the newInstance() method in favor of clone(). NullablePartitionWritable can then override the clone. I should have done that in the first place.
        Hide
        Cheolsoo Park added a comment -

        Attaching a 3rd patch that fixes nightly.conf SkewedJoin_1,2,3,4,5,7,8. Now only remaining failures are 1) self join, 2) right outer, and 3) full outer.

        Show
        Cheolsoo Park added a comment - Attaching a 3rd patch that fixes nightly.conf SkewedJoin_1,2,3,4,5,7,8. Now only remaining failures are 1) self join, 2) right outer, and 3) full outer.
        Hide
        Cheolsoo Park added a comment -

        Attaching 2nd patch with the following change-

        @@ -109,7 +110,10 @@ public class POShuffleTezLoad extends POPackage implements TezLoad {
                                 cur = readers.get(i).getCurrentKey();
                                 if (min == null || comparator.compare(min, cur) > 0) {
                                     min = PigNullableWritable.newInstance((PigNullableWritable)cur);
        -                            cur = min;
        +                            if (isSkewedJoin) {
        +                                ((NullablePartitionWritable)min).setKey(
        +                                        ((NullablePartitionWritable)cur).getKey());
        +                            }
                                 }   
                             }   
                         }
        

        Basically, I explicitly set key after copying NullableWritable in case of skewed join. This lets both tez.conf e2e tests and TestAccumulator pass.

        Show
        Cheolsoo Park added a comment - Attaching 2nd patch with the following change- @@ -109,7 +110,10 @@ public class POShuffleTezLoad extends POPackage implements TezLoad { cur = readers.get(i).getCurrentKey(); if (min == null || comparator.compare(min, cur) > 0) { min = PigNullableWritable.newInstance((PigNullableWritable)cur); - cur = min; + if (isSkewedJoin) { + ((NullablePartitionWritable)min).setKey( + ((NullablePartitionWritable)cur).getKey()); + } } } } Basically, I explicitly set key after copying NullableWritable in case of skewed join. This lets both tez.conf e2e tests and TestAccumulator pass.
        Hide
        Cheolsoo Park added a comment -

        I realized that reverting PigNullableWritable.newInstance() makes TestAccumulator fail. So I will need to add some special handling for NullablePartitionWritable (skewed join) instead.

        Show
        Cheolsoo Park added a comment - I realized that reverting PigNullableWritable.newInstance() makes TestAccumulator fail. So I will need to add some special handling for NullablePartitionWritable (skewed join) instead.
        Hide
        Cheolsoo Park added a comment - - edited

        Attaching 1st patch that fixes NPE in tez.conf e2e tests.

        The root cause of NPE was that POShuffleTezLoad was constructing a new instance of NullableWritable for minimum key, but PigNullableWritable.newInstance() didn't set key for NullablePartitionWritable.

        Looks like there is no need to construct a new instance in the first place, so I am reverting the change made by PIG-3626 as follows-

        @@ -108,8 +108,7 @@ public class POShuffleTezLoad extends POPackage implements TezLoad {
                                 hasData = true;
                                 cur = readers.get(i).getCurrentKey();
                                 if (min == null || comparator.compare(min, cur) > 0) {
        -                            min = PigNullableWritable.newInstance((PigNullableWritable)cur);
        -                            cur = min;
        +                            min = (PigNullableWritable) cur;
                                 }
                             }
                         }
        

        I am going to commit this change first so that tez.conf will be fixed. Rohini Palaniswamy, please let me know if you have any concerns.

        But nightly.conf SkewedJoin tests still fail because results differ between MR and Tez run. This problem happens when pig.skewedjoin.reduce.maxtuple is set. I will upload another patch for this problem.

        Show
        Cheolsoo Park added a comment - - edited Attaching 1st patch that fixes NPE in tez.conf e2e tests. The root cause of NPE was that POShuffleTezLoad was constructing a new instance of NullableWritable for minimum key, but PigNullableWritable.newInstance() didn't set key for NullablePartitionWritable. Looks like there is no need to construct a new instance in the first place, so I am reverting the change made by PIG-3626 as follows- @@ -108,8 +108,7 @@ public class POShuffleTezLoad extends POPackage implements TezLoad { hasData = true ; cur = readers.get(i).getCurrentKey(); if (min == null || comparator.compare(min, cur) > 0) { - min = PigNullableWritable.newInstance((PigNullableWritable)cur); - cur = min; + min = (PigNullableWritable) cur; } } } I am going to commit this change first so that tez.conf will be fixed. Rohini Palaniswamy , please let me know if you have any concerns. But nightly.conf SkewedJoin tests still fail because results differ between MR and Tez run. This problem happens when pig.skewedjoin.reduce.maxtuple is set. I will upload another patch for this problem.

          People

          • Assignee:
            Cheolsoo Park
            Reporter:
            Cheolsoo Park
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development