Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5932

Improve the performance of transitive closure computation in value transfer graph

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: Impala 2.11.0
    • Fix Version/s: Impala 2.11.0
    • Component/s: Frontend
    • Labels:
      None
    • Epic Color:
      ghx-label-8

      Description

      Currently org.apache.impala.analysis.Analyzer.ValueTransferGraph computes transitive closure using a V^4 algorithm, where V is the number of vertexes. (https://github.com/apache/incubator-impala/blob/39e8cf313f41acc1a70be4c12b67173bd156f029/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L2691). It could be improved straightforwardly to V^3 with a standard Floyed-Warshall algorithm implementation. Also, given that our graph is a DAG, Topological sort and bitmap tricks might be practically faster.

        Activity

        Hide
        alex.behm Alexander Behm added a comment -

        Let's keep it simple and use Floyd Warshall for now. We should only consider further optimization of that part if we can show it's a significant bottleneck.

        I think we can do a simple optimization to the basic FW because our graph is typically sparse. We can reduce the number of times the inner-most loop is invoked (we can discuss details on the CR).

        Show
        alex.behm Alexander Behm added a comment - Let's keep it simple and use Floyd Warshall for now. We should only consider further optimization of that part if we can show it's a significant bottleneck. I think we can do a simple optimization to the basic FW because our graph is typically sparse. We can reduce the number of times the inner-most loop is invoked (we can discuss details on the CR).
        Hide
        tianyiwang Tianyi Wang added a comment -

        Performance investigation of https://gerrit.cloudera.org/#/c/8098/:
        A query is constructed with 800 nodes and about 3000 edges. The query is a series of left/right joins on random columns.
        Baseline commit is f0e79314fe9d9d3e920ad65c3ca6a4ef279e68fc.
        The "Equivalence classes computed" event in runtime profile is 21X faster:
        Baseline

            Planner Timeline: 6s430ms
               - Analysis finished: 354.476ms (354.476ms)
               - Equivalence classes computed: 4s745ms (4s391ms)
               - Single node plan created: 5s973ms (1s228ms)
               - Runtime filters computed: 6s002ms (29.059ms)
               - Distributed plan created: 6s079ms (76.807ms)
               - Planning finished: 6s430ms (351.400ms)
        

        https://gerrit.cloudera.org/#/c/8098/:

         Planner Timeline: 6s490ms
               - Metadata load started: 228.373ms (228.373ms)
               - Metadata load finished: 3s830ms (3s602ms)
               - Analysis finished: 4s285ms (454.771ms)
               - Equivalence classes computed: 4s490ms (205.083ms)
               - Single node plan created: 5s883ms (1s392ms)
               - Runtime filters computed: 5s947ms (63.763ms)
               - Distributed plan created: 6s031ms (84.481ms)
               - Planning finished: 6s490ms (458.692ms)
        

        By placing timers before after the exact code block, it can be seen that the algorithm itself is 43X faster:
        Baseline transitive closure time: 4355ms
        Change 8098 transitive closure time: 99ms

        The code for query generation:

        Unable to find source-code formatter for language: python. Available languages are: actionscript, html, java, javascript, none, sql, xhtml, xml
        import random
        
        f = open("test.sql", "w")
        
        f.write("use impala5932;\n")
        f.write("create table if not exist tb0 (c INT);")
        f.write("select * from (select * from tb0) v0 ")
        
        
        def on_clause(i):
          str = ""
          num_conj = random.randint(1, 10)
          for _ in range(num_conj):
            str += "v{}.c=v{}.c ".format(i, random.randint(0, i - 1))
            if _ != num_conj - 1:
              str += "AND "
          return str
        
        
        for i in range(1, 400):
          f.write(("left" if random.randint(0, 1) == 0 else "right")
                  + " join (select * from tb0 limit 3) v{} on {}".format(i, on_clause(i)))
        
        f.write(";\nprofile;")
        
        Show
        tianyiwang Tianyi Wang added a comment - Performance investigation of https://gerrit.cloudera.org/#/c/8098/: A query is constructed with 800 nodes and about 3000 edges. The query is a series of left/right joins on random columns. Baseline commit is f0e79314fe9d9d3e920ad65c3ca6a4ef279e68fc. The "Equivalence classes computed" event in runtime profile is 21X faster: Baseline Planner Timeline: 6s430ms - Analysis finished: 354.476ms (354.476ms) - Equivalence classes computed: 4s745ms (4s391ms) - Single node plan created: 5s973ms (1s228ms) - Runtime filters computed: 6s002ms (29.059ms) - Distributed plan created: 6s079ms (76.807ms) - Planning finished: 6s430ms (351.400ms) https://gerrit.cloudera.org/#/c/8098/: Planner Timeline: 6s490ms - Metadata load started: 228.373ms (228.373ms) - Metadata load finished: 3s830ms (3s602ms) - Analysis finished: 4s285ms (454.771ms) - Equivalence classes computed: 4s490ms (205.083ms) - Single node plan created: 5s883ms (1s392ms) - Runtime filters computed: 5s947ms (63.763ms) - Distributed plan created: 6s031ms (84.481ms) - Planning finished: 6s490ms (458.692ms) By placing timers before after the exact code block, it can be seen that the algorithm itself is 43X faster: Baseline transitive closure time: 4355ms Change 8098 transitive closure time: 99ms The code for query generation: Unable to find source-code formatter for language: python. Available languages are: actionscript, html, java, javascript, none, sql, xhtml, xml import random f = open( "test.sql" , "w" ) f.write( "use impala5932;\n" ) f.write( "create table if not exist tb0 (c INT);" ) f.write( "select * from (select * from tb0) v0 " ) def on_clause(i): str = "" num_conj = random.randint(1, 10) for _ in range(num_conj): str += "v{}.c=v{}.c " .format(i, random.randint(0, i - 1)) if _ != num_conj - 1: str += "AND " return str for i in range(1, 400): f.write(( "left" if random.randint(0, 1) == 0 else "right" ) + " join (select * from tb0 limit 3) v{} on {}" .format(i, on_clause(i))) f.write( ";\nprofile;" )
        Hide
        tianyiwang Tianyi Wang added a comment -

        IMPALA-5932: Improve transitive closure computation performance in FE

        This patch implements the Floyd-Warshall algorithm for the transitive
        closure computation for the value transfer graph, replacing the existing
        N^4 brute force algorithm.
        The performance improvement depends on the size and structure of the
        value transfer graph. On a random graph with 800 slots and 2800 edges it
        is 43X faster itself. And the "Equivalence classes computed" event in
        the runtime profile becomes 21X faster.
        This computation is covered by the existing tests, which verifies the
        equivalency of the new and the old value transfer graphs.

        Change-Id: Idb00e3c1f904e60ae25567a52b4bf0809a84c6b3
        Reviewed-on: http://gerrit.cloudera.org:8080/8098
        Reviewed-by: Alex Behm <alex.behm@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        tianyiwang Tianyi Wang added a comment - IMPALA-5932 : Improve transitive closure computation performance in FE This patch implements the Floyd-Warshall algorithm for the transitive closure computation for the value transfer graph, replacing the existing N^4 brute force algorithm. The performance improvement depends on the size and structure of the value transfer graph. On a random graph with 800 slots and 2800 edges it is 43X faster itself. And the "Equivalence classes computed" event in the runtime profile becomes 21X faster. This computation is covered by the existing tests, which verifies the equivalency of the new and the old value transfer graphs. Change-Id: Idb00e3c1f904e60ae25567a52b4bf0809a84c6b3 Reviewed-on: http://gerrit.cloudera.org:8080/8098 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            tianyiwang Tianyi Wang
            Reporter:
            tianyiwang Tianyi Wang
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development