Pig
  1. Pig
  2. PIG-3446 Umbrella jira for Pig on Tez
  3. PIG-3618

Replace broadcast edges with scatter/gather edges in union

    Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: tez-branch
    • Fix Version/s: tez-branch
    • Component/s: tez
    • Labels:
      None

      Description

      Previously, I implemented union using OnFileUnorderedKVOutput + broadcast edge. But this is a misuse of broadcast edge since union will create duplicate records when parallel is set to more than 1. We should replace them with ShuffledMergedInput + scatter/gather edge having the entire record as key.

      Ideally, we should implement union using OnFileUnorderedKVOutput + scatter/gather edge with a round robin partitioner. For now, this is not supported by Tez (TEZ-661).

      1. PIG-3618-1.patch
        17 kB
        Cheolsoo Park
      2. PIG-3618-2.patch
        18 kB
        Cheolsoo Park

        Activity

        Hide
        Cheolsoo Park added a comment -

        Committed to tez branch.

        Show
        Cheolsoo Park added a comment - Committed to tez branch.
        Hide
        Cheolsoo Park added a comment -

        Uploading a new patch that addresses Daniel's comments in the RB.

        Show
        Cheolsoo Park added a comment - Uploading a new patch that addresses Daniel's comments in the RB.
        Show
        Cheolsoo Park added a comment - https://reviews.apache.org/r/16165/

          People

          • Assignee:
            Cheolsoo Park
            Reporter:
            Cheolsoo Park
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development