Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4242

Slow planning due to equivalence class computation

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Critical
    • Resolution: Resolved
    • Affects Version/s: Impala 2.5.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
    • Fix Version/s: Impala 2.11.0
    • Component/s: Frontend

      Description

      Queries may exhibit very slow planning times if they have the following characteristics:

      • references many tables and/or views with many columns
      • many nested views, inline views and/or WITH-clause views
      • many unions and/or unions with many operands

      The slowness is due to the expensive equivalence class computation which grows exponentially in the number of "slots" in the query. Slots are roughly equivalent to column references (physical as well as virtual).

      The planning timeline will show a large time spent in "Equivalence classes computed":

          Planner Timeline: 13m12s
             - Metadata load started: 2s062ms (2s062ms)
             - Metadata load finished: 9s573ms (7s510ms)
             - Analysis finished: 12s043ms (2s470ms)
             - Equivalence classes computed: 13m1s (12m49s) <--- Very slow!
             - Single node plan created: 13m10s (9s136ms)
             - Runtime filters computed: 13m10s (20.928ms)
             - Distributed plan created: 13m11s (278.404ms)
             - Planning finished: 13m12s (1s016ms)
      

      For example, while experimenting with inserting a single row per partition as part of ETL performance testing I noticed that queries with a large number of UNION operators spend a lot of time in computeEquivClasses().

      The majority of the time is spent in HashSet.contains.

      Stack Trace Sample Count Percentage(%)
      java.util.HashSet.contains(Object) 35,273 80.402
      com.cloudera.impala.analysis.Analyzer$ValueTransferGraph.hasValueTransfer(SlotId, SlotId) 35,273 80.402
      com.cloudera.impala.analysis.Analyzer.hasValueTransfer(SlotId, SlotId) 35,273 80.402
      com.cloudera.impala.analysis.Analyzer.computeEquivClasses() 35,104 80.016
      com.cloudera.impala.planner.SingleNodePlanner.createSingleNodePlan() 35,104 80.016
      com.cloudera.impala.planner.Planner.createPlan() 35,104 80.016
      com.cloudera.impala.service.Frontend.createExecRequest(TQueryCtx, StringBuilder) 35,104 80.016
      com.cloudera.impala.service.JniFrontend.createExecRequest(byte[]) 35,104 80.016

      Not clear what the complexity of loops in computeEquivClasses() but it is obviously inefficient.
      https://github.com/apache/incubator-impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L1954

      For the attached query with 1800 UNION operators computing equivalence classes takes several minutes.

        Attachments

        1. largerUnionquery600.jfr
          2.44 MB
          Mostafa Mokhtar
        2. large union query profile.txt.zip
          992 kB
          Mostafa Mokhtar

          Issue Links

            Activity

              People

              • Assignee:
                tianyiwang Tianyi Wang
                Reporter:
                mmokhtar Mostafa Mokhtar
              • Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: