• Sub-task
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.5.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
    • Impala 2.11.0
    • Frontend
    • None
    • ghx-label-5


      The simplest solution to speed up the equivalence-class computation is to remove the equivalence class altogether.

      A few points for justification:

      • The purpose of the equivalence classes is to provide a condensed version of the value-transfer graph to speed up certain equivalence checks/transformations. All necessary information is already captured in the value-transfer graph, so the high cost of computing the equivalence classes is not justified.
      • Incomplete/wrong definition of equivalence classes. In today's code, the equivalence class of a slot contains a list of slots where each pair of slots has a value transfer. This definition does not make sense, in particular, for unions which produce value transfers of the form s1 -> s2, s1 -> s3 where according to our definition s1 can only be in the same class as s2 or s3, but not both (because s2 and s3 do not have a value transfer). We could redefine the equivalence class of only containing slot pairs with mutual value transfers, but that information is already efficiently accessible from the value-transfer graph.

      The challenge in removing the equivalence classes is to rework all the existing code that relies on them to use the value-transfer graph directly.

      One of the tricky functions that needs to change is Analyzer.equivExprs(). An alternative implementation might look like this:

      // Returns true if 'other' is equivalent to this. Equivalence means that the two expr trees are equal except for SlotRefs which match if there
      // is another SlotRef in the other expr tree at the same position, and the underlying slots have a value transfer. If 'mutualOnly' is true,
      // SlotRefs are only considered equivalent if they there is a mutual value transfer.
      boolean Expr.isEquiv(Expr other, boolean mutualOnly);

      The existing equals() function can probably be refactored into a new function that can be called from both equals() and isEquiv() or something along those lines.


        1. queryB.sql
          155 kB
          Tianyi Wang
        2. queryA.sql
          155 kB
          Tianyi Wang

        Issue Links



              tianyiwang Tianyi Wang
              alex.behm Alexander Behm
              0 Vote for this issue
              2 Start watching this issue