Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3839 Umbrella jira for Pig on Tez Performance Improvements
  3. PIG-4495

Better multi-query planning in case of multiple edges

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.14.0
    • Fix Version/s: 0.15.0
    • Component/s: tez
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Details in https://issues.apache.org/jira/browse/TEZ-1190?focusedCommentId=14393033&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14393033

      People split the data, perform some foreach transformations/filter, union them and then do some operation like group by or join with other data. In those cases it creates multiple edges from same Split, so we do not merge them, but
      write out the data to another dummy vertex to avoid multiple edges and this adds overhead and affects performance. Vertex groups accept multiple edges from same vertex. So if the multiple edges end up in a vertex group (and not a vertex which is the case in self join) we can avoid the dummy vertex.

        Attachments

        1. PIG-4495-1.patch
          160 kB
          Rohini Palaniswamy
        2. PIG-4495-2.patch
          187 kB
          Rohini Palaniswamy

          Issue Links

            Activity

              People

              • Assignee:
                rohini Rohini Palaniswamy
                Reporter:
                rohini Rohini Palaniswamy
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: