Pig
  1. Pig
  2. PIG-2673

Allow Merge join to follow an ORDER statement

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11
    • Component/s: None
    • Labels:
      None

      Description

      Currently, we insist that data for a merge join must come from an OrderedLoadFunc.
      We can relax this condition and allow explicit ordering operations to precede a MergeJoin.

      1. PIG-2673.2.patch
        21 kB
        Dmitriy V. Ryaboy
      2. PIG-2673_1.patch
        273 kB
        Dmitriy V. Ryaboy
      3. PIG-2673_1_noprefix.patch
        272 kB
        Dmitriy V. Ryaboy
      4. PIG-2673_1_noprefix_now_with_merge.patch
        17 kB
        Dmitriy V. Ryaboy
      5. PIG-2673_0.patch
        11 kB
        Dmitriy V. Ryaboy

        Issue Links

          Activity

          Dmitriy V. Ryaboy created issue -
          Hide
          Dmitriy V. Ryaboy added a comment -

          Attaching patch that relaxes the sanity checks and updates the documentation.

          Note that while working on this patch I discovered we never enforced a stated condition for the merge join, which is that you cannot call UDFs on the data between a load and a merge join. This is also fixed in this patch. This fix may break people's script in cases when they were using UDFs and unwittingly violating the documented constraint.

          Show
          Dmitriy V. Ryaboy added a comment - Attaching patch that relaxes the sanity checks and updates the documentation. Note that while working on this patch I discovered we never enforced a stated condition for the merge join, which is that you cannot call UDFs on the data between a load and a merge join. This is also fixed in this patch. This fix may break people's script in cases when they were using UDFs and unwittingly violating the documented constraint.
          Dmitriy V. Ryaboy made changes -
          Field Original Value New Value
          Attachment PIG-2673_0.patch [ 12524947 ]
          Dmitriy V. Ryaboy made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Ashutosh Chauhan added a comment -

          PIG-959 relaxes the condition for all non-blocking operators too. May be these too should be done in conjunction.

          Show
          Ashutosh Chauhan added a comment - PIG-959 relaxes the condition for all non-blocking operators too. May be these too should be done in conjunction.
          Hide
          Ashutosh Chauhan added a comment -

          Aah.. blocking.. not non-blocking.

          Show
          Ashutosh Chauhan added a comment - Aah.. blocking.. not non-blocking.
          Hide
          Dmitriy V. Ryaboy added a comment -

          I don't think you can relax the conditions around distinct, since that involves hashing the input across reducers?

          Show
          Dmitriy V. Ryaboy added a comment - I don't think you can relax the conditions around distinct, since that involves hashing the input across reducers?
          Dmitriy V. Ryaboy made changes -
          Link This issue is related to PIG-959 [ PIG-959 ]
          Hide
          Dmitriy V. Ryaboy added a comment -

          Ashutosh, can you review this? Think you are the person most familiar with the Merge join code.

          Show
          Dmitriy V. Ryaboy added a comment - Ashutosh, can you review this? Think you are the person most familiar with the Merge join code.
          Hide
          Ashutosh Chauhan added a comment -

          Yeah, will take a look at it.

          Show
          Ashutosh Chauhan added a comment - Yeah, will take a look at it.
          Hide
          Ashutosh Chauhan added a comment -

          I don't think you can relax the conditions around distinct, since that involves hashing the input across reducers?

          Yeah, thats right. For other blocking operators:

          • Order-by: obv we can relax it.
          • Join/Cogroup: We can relax for map-side joins & cogroups i.e., FR,Merge Join and cogroup, but not the ones having reducers.
          Show
          Ashutosh Chauhan added a comment - I don't think you can relax the conditions around distinct, since that involves hashing the input across reducers? Yeah, thats right. For other blocking operators: Order-by: obv we can relax it. Join/Cogroup: We can relax for map-side joins & cogroups i.e., FR,Merge Join and cogroup, but not the ones having reducers.
          Hide
          Ashutosh Chauhan added a comment -

          I took a quick look at your patch. For chaining together of MR jobs (i.e., order-by MR job feeding into merge-join MR job), you will need changes in MRCompiler which are outlined in PIG-959 patch. I don't see any of that in your patch. There must be something which I am missing here.

          Show
          Ashutosh Chauhan added a comment - I took a quick look at your patch. For chaining together of MR jobs (i.e., order-by MR job feeding into merge-join MR job), you will need changes in MRCompiler which are outlined in PIG-959 patch. I don't see any of that in your patch. There must be something which I am missing here.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Patch is incomplete right now.. only fixes the LP blocks, but it does need the MRCompiler changes. Will need some more work. Hackday!

          Show
          Dmitriy V. Ryaboy added a comment - Patch is incomplete right now.. only fixes the LP blocks, but it does need the MRCompiler changes. Will need some more work. Hackday!
          Hide
          Dmitriy V. Ryaboy added a comment -

          This seems to work (verified manually in addition to running tests). Ready for review again

          Show
          Dmitriy V. Ryaboy added a comment - This seems to work (verified manually in addition to running tests). Ready for review again
          Dmitriy V. Ryaboy made changes -
          Attachment PIG-2673_1.patch [ 12530112 ]
          Hide
          Ashutosh Chauhan added a comment -

          Seems like you ran git pull after a long while.

          Show
          Ashutosh Chauhan added a comment - Seems like you ran git pull after a long while.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Now, without git prefixes.

          Show
          Dmitriy V. Ryaboy added a comment - Now, without git prefixes.
          Dmitriy V. Ryaboy made changes -
          Attachment PIG-2673_1_noprefix.patch [ 12530114 ]
          Hide
          Dmitriy V. Ryaboy added a comment -

          sigh.

          Show
          Dmitriy V. Ryaboy added a comment - sigh.
          Dmitriy V. Ryaboy made changes -
          Hide
          Julien Le Dem added a comment -

          for some reason the review board did not post my review back to the JIRA.
          FYI: https://reviews.apache.org/r/5280/

          Show
          Julien Le Dem added a comment - for some reason the review board did not post my review back to the JIRA. FYI: https://reviews.apache.org/r/5280/
          Jonathan Coveney made changes -
          Link This issue is part of PIG-2632 [ PIG-2632 ]
          Dmitriy V. Ryaboy made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Dmitriy V. Ryaboy added a comment -

          Addressed review comments. TestMergeJoin passes (and takes 30 minutes...)

          Show
          Dmitriy V. Ryaboy added a comment - Addressed review comments. TestMergeJoin passes (and takes 30 minutes...)
          Dmitriy V. Ryaboy made changes -
          Attachment PIG-2673.2.patch [ 12533231 ]
          Dmitriy V. Ryaboy made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Jonathan Coveney added a comment -

          LGTM. I'm +1. Julien, you satisfied with the responses?

          Show
          Jonathan Coveney added a comment - LGTM. I'm +1. Julien, you satisfied with the responses?
          Hide
          Julien Le Dem added a comment -

          LGTM.
          +1

          Show
          Julien Le Dem added a comment - LGTM. +1
          Hide
          Dmitriy V. Ryaboy added a comment -

          Committed to 0.11 (trunk)

          Show
          Dmitriy V. Ryaboy added a comment - Committed to 0.11 (trunk)
          Dmitriy V. Ryaboy made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 0.11 [ 12318878 ]
          Resolution Fixed [ 1 ]
          Bill Graham made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Dmitriy V. Ryaboy
              Reporter:
              Dmitriy V. Ryaboy
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development