Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3839 Umbrella jira for Pig on Tez Performance Improvements
  3. PIG-4495

Better multi-query planning in case of multiple edges

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.14.0
    • 0.15.0
    • tez
    • None
    • Reviewed

    Description

      Details in https://issues.apache.org/jira/browse/TEZ-1190?focusedCommentId=14393033&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14393033

      People split the data, perform some foreach transformations/filter, union them and then do some operation like group by or join with other data. In those cases it creates multiple edges from same Split, so we do not merge them, but
      write out the data to another dummy vertex to avoid multiple edges and this adds overhead and affects performance. Vertex groups accept multiple edges from same vertex. So if the multiple edges end up in a vertex group (and not a vertex which is the case in self join) we can avoid the dummy vertex.

      Attachments

        1. PIG-4495-1.patch
          160 kB
          Rohini Palaniswamy
        2. PIG-4495-2.patch
          187 kB
          Rohini Palaniswamy

        Issue Links

          Activity

            People

              rohini Rohini Palaniswamy
              rohini Rohini Palaniswamy
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: