VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • tez-branch
    • tez-branch
    • tez
    • None
    • Reviewed

    Description

      PIG-3743 implements union using VertexGroup. But there are a couple of optimizations that we can apply to it.

      • Union followed by store
        Union is a blocking operator meaning that a new vertex is added for its succeeding operators. But if there is only one store in the succeeding vertex, MROutput could be directly attached to VertexGroup instead of adding a new vertex for it. Then, each union source vertex will write directly to the destination, and therefore, it will be faster.
      • Replace POLocalRearrangeTez with POValueOutputTez
        Union uses POLocalRearrange by setting the whole record as key. But since union only needs to partition records evenly across tasks, it might make more sense to use POValueOutputTez with RR partitioner instead.

      Attachments

        1. PIG-3835-Initial-1.patch
          121 kB
          Rohini Palaniswamy
        2. PIG-3835-2.patch
          151 kB
          Rohini Palaniswamy
        3. PIG-3835-3.patch
          152 kB
          Rohini Palaniswamy
        4. PIG-3835-addendum-1.patch
          9 kB
          Rohini Palaniswamy

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rohini Rohini Palaniswamy
            cheolsoo Cheolsoo Park
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment