Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3839 Umbrella jira for Pig on Tez Performance Improvements
  3. PIG-3850

Optimize join followed by order by using same key

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • tez
    • None

    Description

      Possible optimizations:
      1) If it is a skewed join, then we can combine ordering into it instead of doing a additional orderby as we skewed join already involves sampling.
      2) If it is a normal join, then we can do the order by and then join. i.e
      Current plan:
      Vertex 1 (load massive), Vertex 2 (load big) -> Vertex 3 (join) -> Vertex 4 (sampler), Vertex 5 (Partitioner using vertex 4 sample) -> Vertex 6 (order by)
      New plan:
      Vertex 1 (load massive) > Vertex 2 (sampler), Vertex 3 (Partitioner using vertex 2 sample) -> Vertex 4 (order by and join) < Vertex 5 (load big and construct WeightedRangePartitioner from Vertex 2 sample)
      3) If it is a replicated join, similar plan in 2) should work with Vertex 5 changing to broadcast input to Vertex 4 instead of using WeightedRangePartitioner.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rohini Rohini Palaniswamy
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: