Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.14.0
    • Component/s: tez
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Tez has it built-in. We can start with reusing it and then look at customization for better performance.

      1. PIG-3846-9.patch
        146 kB
        Daniel Dai
      2. PIG-3846-7.patch
        146 kB
        Daniel Dai
      3. PIG-3846-6.patch
        151 kB
        Daniel Dai
      4. PIG-3846-5.patch
        103 kB
        Daniel Dai
      5. PIG-3846-3.patch
        85 kB
        Daniel Dai
      6. PIG-3846-1.patch
        56 kB
        Daniel Dai

        Issue Links

          Activity

          Hide
          daijy Daniel Dai added a comment -

          Patch committed to trunk. Thanks Rohini for review. Review comments is on RB.

          Show
          daijy Daniel Dai added a comment - Patch committed to trunk. Thanks Rohini for review. Review comments is on RB.
          Hide
          daijy Daniel Dai added a comment -

          Attach the final patch.

          Show
          daijy Daniel Dai added a comment - Attach the final patch.
          Hide
          daijy Daniel Dai added a comment -

          Another updates pending on all Tez patches linked.

          Show
          daijy Daniel Dai added a comment - Another updates pending on all Tez patches linked.
          Hide
          daijy Daniel Dai added a comment -

          Fix skewed join auto-parallelism.

          Show
          daijy Daniel Dai added a comment - Fix skewed join auto-parallelism.
          Hide
          rohini Rohini Palaniswamy added a comment -

          +1

          Show
          rohini Rohini Palaniswamy added a comment - +1
          Hide
          daijy Daniel Dai added a comment -

          Summary of changes:
          1. TezOperDependencyParallelismEstimator, estimate the number of parallelism based on the parallelism of predecessors and operators within predecessors' physical plan
          2. PigOrderByVertexManager, VertexManagerPlugin for sort vertex of order by. It receive event from partition node and decrease parallelism of sort vertex automatically (TEZ-1107 prevent increase parallelism of sort job)
          3. Change of POReservoirSample, FindQuantilesTez, WeightedRangePartitionerTez, PigProcessor to assist PigOrderByVertexManager, FindQuantilesTez will estimate numQuantiles based on the samples sent from POReservoirSample (include stats of the previous job), WeightedRangePartitionerTez will partition the incoming data into the estimated numQuantiles partitions, and PigProcessor will send numQuantiles to PigOrderByVertexManager
          4. Set auto-parallelism flag for ShuffleVertexManager to true for applicable vertex
          5. Add estimatedParallelism to TezOperator. If requestedParallelism is not set, TezOperDependencyParallelismEstimator will estimate the parallelism and instruct VertexManager to figure out parallelism dynamically

          Show
          daijy Daniel Dai added a comment - Summary of changes: 1. TezOperDependencyParallelismEstimator, estimate the number of parallelism based on the parallelism of predecessors and operators within predecessors' physical plan 2. PigOrderByVertexManager, VertexManagerPlugin for sort vertex of order by. It receive event from partition node and decrease parallelism of sort vertex automatically ( TEZ-1107 prevent increase parallelism of sort job) 3. Change of POReservoirSample, FindQuantilesTez, WeightedRangePartitionerTez, PigProcessor to assist PigOrderByVertexManager, FindQuantilesTez will estimate numQuantiles based on the samples sent from POReservoirSample (include stats of the previous job), WeightedRangePartitionerTez will partition the incoming data into the estimated numQuantiles partitions, and PigProcessor will send numQuantiles to PigOrderByVertexManager 4. Set auto-parallelism flag for ShuffleVertexManager to true for applicable vertex 5. Add estimatedParallelism to TezOperator. If requestedParallelism is not set, TezOperDependencyParallelismEstimator will estimate the parallelism and instruct VertexManager to figure out parallelism dynamically
          Hide
          daijy Daniel Dai added a comment -
          Show
          daijy Daniel Dai added a comment - RB link: https://reviews.apache.org/r/21302/
          Hide
          daijy Daniel Dai added a comment -

          Attach initial patch. Still need to add test cases and run through unit tests/e2e tests.

          Show
          daijy Daniel Dai added a comment - Attach initial patch. Still need to add test cases and run through unit tests/e2e tests.

            People

            • Assignee:
              daijy Daniel Dai
              Reporter:
              rohini Rohini Palaniswamy
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development