Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: tez-branch
    • Component/s: tez
    • Labels:
      None

      Description

      Need to establish order in shuffle inputs

        Activity

        Hide
        Cheolsoo Park added a comment -

        +1.

        Just a minor comment. Can we change "pig.shuffled.inputs" to something more generic since inputs can be shuffle, broadcast, or 1-1? What do you think?

        Show
        Cheolsoo Park added a comment - +1. Just a minor comment. Can we change "pig.shuffled.inputs" to something more generic since inputs can be shuffle, broadcast, or 1-1? What do you think?
        Hide
        Daniel Dai added a comment -

        How about "pig.popackage.inputs"?

        Show
        Daniel Dai added a comment - How about "pig.popackage.inputs"?
        Hide
        Cheolsoo Park added a comment -

        Sounds good to me.

        Show
        Cheolsoo Park added a comment - Sounds good to me.
        Hide
        Rohini Palaniswamy added a comment -

        Why do we have to serialize as separate pig.popackage.inputs config? Can setInputKeys() of POShuffleTezLoad be used as POShuffleTezLoad is being serialized anyway as part of the plan?

        Show
        Rohini Palaniswamy added a comment - Why do we have to serialize as separate pig.popackage.inputs config? Can setInputKeys() of POShuffleTezLoad be used as POShuffleTezLoad is being serialized anyway as part of the plan?
        Hide
        Cheolsoo Park added a comment -

        Daniel Dai, I am doing what Rohini suggests here as part of PIG-3604. I need to set inputKeys in POShuffleTezLoad to handle the case where both scatter/gather and broadcast edges are attached to the same vertex. For eg,

        a = LOAD 'foo' AS (x:int, y:chararray);
        a1 = GROUP a BY x;
        b = LOAD 'bar' AS (x:int, y:chararray);
        d = JOIN a1 BY group, b BY x USING 'replicated'; -- replicated join in reducer
        DUMP d;
        

        Let me post a new patch in PIG-3604 that includes the fix for this jira.

        Show
        Cheolsoo Park added a comment - Daniel Dai , I am doing what Rohini suggests here as part of PIG-3604 . I need to set inputKeys in POShuffleTezLoad to handle the case where both scatter/gather and broadcast edges are attached to the same vertex. For eg, a = LOAD 'foo' AS (x: int , y:chararray); a1 = GROUP a BY x; b = LOAD 'bar' AS (x: int , y:chararray); d = JOIN a1 BY group, b BY x USING 'replicated'; -- replicated join in reducer DUMP d; Let me post a new patch in PIG-3604 that includes the fix for this jira.
        Hide
        Daniel Dai added a comment -

        Yes, I also see setInputKeys, that should be better.

        Show
        Daniel Dai added a comment - Yes, I also see setInputKeys, that should be better.
        Hide
        Cheolsoo Park added a comment -

        Fixed as part of PIG-3604. Closing the jira.

        Show
        Cheolsoo Park added a comment - Fixed as part of PIG-3604 . Closing the jira.

          People

          • Assignee:
            Daniel Dai
            Reporter:
            Daniel Dai
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development