Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3446 Umbrella jira for Pig on Tez
  3. PIG-3618

Replace broadcast edges with scatter/gather edges in union

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • tez-branch
    • tez-branch
    • tez
    • None

    Description

      Previously, I implemented union using OnFileUnorderedKVOutput + broadcast edge. But this is a misuse of broadcast edge since union will create duplicate records when parallel is set to more than 1. We should replace them with ShuffledMergedInput + scatter/gather edge having the entire record as key.

      Ideally, we should implement union using OnFileUnorderedKVOutput + scatter/gather edge with a round robin partitioner. For now, this is not supported by Tez (TEZ-661).

      Attachments

        1. PIG-3618-1.patch
          17 kB
          Cheolsoo Park
        2. PIG-3618-2.patch
          18 kB
          Cheolsoo Park

        Activity

          People

            cheolsoo Cheolsoo Park
            cheolsoo Cheolsoo Park
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: