Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: tez-branch
    • Fix Version/s: tez-branch
    • Component/s: tez
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      PIG-3743 implements union using VertexGroup. But there are a couple of optimizations that we can apply to it.

      • Union followed by store
        Union is a blocking operator meaning that a new vertex is added for its succeeding operators. But if there is only one store in the succeeding vertex, MROutput could be directly attached to VertexGroup instead of adding a new vertex for it. Then, each union source vertex will write directly to the destination, and therefore, it will be faster.
      • Replace POLocalRearrangeTez with POValueOutputTez
        Union uses POLocalRearrange by setting the whole record as key. But since union only needs to partition records evenly across tasks, it might make more sense to use POValueOutputTez with RR partitioner instead.
      1. PIG-3835-Initial-1.patch
        121 kB
        Rohini Palaniswamy
      2. PIG-3835-addendum-1.patch
        9 kB
        Rohini Palaniswamy
      3. PIG-3835-3.patch
        152 kB
        Rohini Palaniswamy
      4. PIG-3835-2.patch
        151 kB
        Rohini Palaniswamy

        Issue Links

          Activity

          Cheolsoo Park created issue -
          Cheolsoo Park made changes -
          Field Original Value New Value
          Link This issue is related to PIG-3742 [ PIG-3742 ]
          Cheolsoo Park made changes -
          Description PIG-3742 implements union using VertexGroup. Currently, union is a blocking operator meaning that a new vertex is added for its succeeding operators.

          But if there is only one store in the succeeding vertex, MROutput could be directly attached to VertexGroup instead of adding a new vertex for it. Then, each union source vertex will write directly to the destination, and therefore, it will be faster.
          PIG-3743 implements union using VertexGroup. Currently, union is a blocking operator meaning that a new vertex is added for its succeeding operators.

          But if there is only one store in the succeeding vertex, MROutput could be directly attached to VertexGroup instead of adding a new vertex for it. Then, each union source vertex will write directly to the destination, and therefore, it will be faster.
          Cheolsoo Park made changes -
          Link This issue is related to PIG-3742 [ PIG-3742 ]
          Cheolsoo Park made changes -
          Link This issue is related to PIG-3743 [ PIG-3743 ]
          Cheolsoo Park made changes -
          Summary Optimize union followed by store Improve performance of union
          Description PIG-3743 implements union using VertexGroup. Currently, union is a blocking operator meaning that a new vertex is added for its succeeding operators.

          But if there is only one store in the succeeding vertex, MROutput could be directly attached to VertexGroup instead of adding a new vertex for it. Then, each union source vertex will write directly to the destination, and therefore, it will be faster.
          PIG-3743 implements union using VertexGroup. But there are a couple of optimizations that we can apply to it.

          * Union followed by store
          Union is a blocking operator meaning that a new vertex is added for its succeeding operators. But if there is only one store in the succeeding vertex, MROutput could be directly attached to VertexGroup instead of adding a new vertex for it. Then, each union source vertex will write directly to the destination, and therefore, it will be faster.

          * Replace POLocalRearrangeTez with POValueOutputTez
          Union uses POLocalRearrange by setting the whole record as key. But since union only needs to partition records evenly across tasks, it might make more sense to use POValueOutputTez with RR partitioner instead.
          Rohini Palaniswamy made changes -
          Parent Issue PIG-3446 [ PIG-3446 ] PIG-3839 [ PIG-3839 ]
          Rohini Palaniswamy made changes -
          Assignee Rohini Palaniswamy [ rohini ]
          Rohini Palaniswamy made changes -
          Attachment PIG-3835-Initial-1.patch [ 12637785 ]
          Rohini Palaniswamy made changes -
          Link This issue requires TEZ-1003 [ TEZ-1003 ]
          Rohini Palaniswamy made changes -
          Link This issue relates to PIG-3855 [ PIG-3855 ]
          Rohini Palaniswamy made changes -
          Attachment PIG-3835-2.patch [ 12638004 ]
          Rohini Palaniswamy made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Rohini Palaniswamy made changes -
          Attachment PIG-3835-3.patch [ 12638117 ]
          Rohini Palaniswamy made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Resolution Fixed [ 1 ]
          Rohini Palaniswamy made changes -
          Attachment PIG-3835-addendum-1.patch [ 12638150 ]
          Rohini Palaniswamy made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Rohini Palaniswamy made changes -
          Resolution Fixed [ 1 ]
          Status Reopened [ 4 ] Resolved [ 5 ]
          Daniel Dai made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Rohini Palaniswamy
              Reporter:
              Cheolsoo Park
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development