Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3839 Umbrella jira for Pig on Tez Performance Improvements
  3. PIG-3775

Use unsorted shuffle in Orderby, Skewed Join to improve performance in Tez

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • tez

    Description

      When implementing Pig union, we need to gather data from two or more upstream vertexes without sorting. The vertex itself might consists of several tasks. Same can be done for the partitioner vertex in orderby and skewed join instead of 1-1 edge for some cases of parallelism.

      TEZ-661 has been created to add custom output and input for that in Tez. It is currently not in the Tez team priorities but it is important for us as it will give good performance gains. We can write the custom input/output and contribute it to Tez and make the corresponding changes in Pig.

      This is a candidate project for Google summer of code 2014. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2014

      Attachments

        Issue Links

          Activity

            People

              rohini Rohini Palaniswamy
              rohini Rohini Palaniswamy
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: