Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4377

Skewed outer join produce wrong result if a key is oversampled

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.15.0
    • Component/s: impl
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Skewed outer join produce more than expected rows under certain condition. The extra rows contain null left relation. Can be reproduced reliably with reproduce.patch (run SkewedJoin_11).

      1. PIG-4377-1.patch
        16 kB
        Daniel Dai
      2. PIG-4377-2.patch
        28 kB
        Daniel Dai
      3. PIG-4377-3.patch
        1 kB
        Daniel Dai
      4. PIG-4377-4.patch
        10 kB
        Daniel Dai
      5. reproduce.patch
        6 kB
        Daniel Dai

        Issue Links

          Activity

          Hide
          daijy Daniel Dai added a comment -

          Attach a fix.

          Here is what happens:
          1. Certain key x is sampled (by PoissonSampleLoader/PartitionSkewedKeys) to have y reduces
          2. Actually, only y1 < y records carry key x
          3. There are reduce which suppose to get key x does not get row with key x
          4. The reduce does not get x will generate redundant empty left relation (CompilerUtils.addEmptyBagOuterJoin)

          What the patch does is:
          Only generate empty left relation in the first reduce of key x

          Show
          daijy Daniel Dai added a comment - Attach a fix. Here is what happens: 1. Certain key x is sampled (by PoissonSampleLoader/PartitionSkewedKeys) to have y reduces 2. Actually, only y1 < y records carry key x 3. There are reduce which suppose to get key x does not get row with key x 4. The reduce does not get x will generate redundant empty left relation (CompilerUtils.addEmptyBagOuterJoin) What the patch does is: Only generate empty left relation in the first reduce of key x
          Hide
          daijy Daniel Dai added a comment -

          There is still some issue in tez mode. Attach another fix.

          Show
          daijy Daniel Dai added a comment - There is still some issue in tez mode. Attach another fix.
          Hide
          rohini Rohini Palaniswamy added a comment -

          +1.

          Daniel Dai,
          Can you put more details in the description about the problem inplace of "under certain condition" and also add a small description of what the fix does? It would be easy for future reference instead of having to read through the patch.

          Show
          rohini Rohini Palaniswamy added a comment - +1. Daniel Dai , Can you put more details in the description about the problem inplace of "under certain condition" and also add a small description of what the fix does? It would be easy for future reference instead of having to read through the patch.
          Hide
          daijy Daniel Dai added a comment -

          Changed the title to be more restrictive. Patch committed to both 0.15 branch and trunk. Thanks Rohini for review!

          Show
          daijy Daniel Dai added a comment - Changed the title to be more restrictive. Patch committed to both 0.15 branch and trunk. Thanks Rohini for review!
          Hide
          daijy Daniel Dai added a comment -

          TestTezCompiler is broken due to the patch. I need to revert some part of the patch. It seems the change is unnecessary and I don't remember how this comes to the picture. Attach PIG-4377-3.patch.

          Show
          daijy Daniel Dai added a comment - TestTezCompiler is broken due to the patch. I need to revert some part of the patch. It seems the change is unnecessary and I don't remember how this comes to the picture. Attach PIG-4377 -3.patch.
          Hide
          rohini Rohini Palaniswamy added a comment -

          +1

          Show
          rohini Rohini Palaniswamy added a comment - +1
          Hide
          daijy Daniel Dai added a comment -

          PIG-4377-3.patch committed to 0.15 and trunk. Thanks Rohini!

          Show
          daijy Daniel Dai added a comment - PIG-4377 -3.patch committed to 0.15 and trunk. Thanks Rohini!
          Hide
          daijy Daniel Dai added a comment - - edited

          Actually PIG-4377-3.patch is needed, since IsFirstReduceOfKey of the join vertex need sample input. Attach PIG-4377-4.patch to bring it back, add comments and fix TestTezCompiler failures.

          Show
          daijy Daniel Dai added a comment - - edited Actually PIG-4377 -3.patch is needed, since IsFirstReduceOfKey of the join vertex need sample input. Attach PIG-4377 -4.patch to bring it back, add comments and fix TestTezCompiler failures.
          Hide
          rohini Rohini Palaniswamy added a comment -

          +1

          Show
          rohini Rohini Palaniswamy added a comment - +1
          Hide
          daijy Daniel Dai added a comment -

          PIG-4377-4.patch committed to both 0.15 branch and trunk. Thanks Rohini!

          Show
          daijy Daniel Dai added a comment - PIG-4377 -4.patch committed to both 0.15 branch and trunk. Thanks Rohini!

            People

            • Assignee:
              daijy Daniel Dai
              Reporter:
              daijy Daniel Dai
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development