Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4242

Slow planning due to equivalence class computation

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Resolved
    • Impala 2.5.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
    • Impala 2.11.0
    • Frontend

    Description

      Queries may exhibit very slow planning times if they have the following characteristics:

      • references many tables and/or views with many columns
      • many nested views, inline views and/or WITH-clause views
      • many unions and/or unions with many operands

      The slowness is due to the expensive equivalence class computation which grows exponentially in the number of "slots" in the query. Slots are roughly equivalent to column references (physical as well as virtual).

      The planning timeline will show a large time spent in "Equivalence classes computed":

          Planner Timeline: 13m12s
             - Metadata load started: 2s062ms (2s062ms)
             - Metadata load finished: 9s573ms (7s510ms)
             - Analysis finished: 12s043ms (2s470ms)
             - Equivalence classes computed: 13m1s (12m49s) <--- Very slow!
             - Single node plan created: 13m10s (9s136ms)
             - Runtime filters computed: 13m10s (20.928ms)
             - Distributed plan created: 13m11s (278.404ms)
             - Planning finished: 13m12s (1s016ms)
      

      For example, while experimenting with inserting a single row per partition as part of ETL performance testing I noticed that queries with a large number of UNION operators spend a lot of time in computeEquivClasses().

      The majority of the time is spent in HashSet.contains.

      Stack Trace Sample Count Percentage(%)
      java.util.HashSet.contains(Object) 35,273 80.402
      com.cloudera.impala.analysis.Analyzer$ValueTransferGraph.hasValueTransfer(SlotId, SlotId) 35,273 80.402
      com.cloudera.impala.analysis.Analyzer.hasValueTransfer(SlotId, SlotId) 35,273 80.402
      com.cloudera.impala.analysis.Analyzer.computeEquivClasses() 35,104 80.016
      com.cloudera.impala.planner.SingleNodePlanner.createSingleNodePlan() 35,104 80.016
      com.cloudera.impala.planner.Planner.createPlan() 35,104 80.016
      com.cloudera.impala.service.Frontend.createExecRequest(TQueryCtx, StringBuilder) 35,104 80.016
      com.cloudera.impala.service.JniFrontend.createExecRequest(byte[]) 35,104 80.016

      Not clear what the complexity of loops in computeEquivClasses() but it is obviously inefficient.
      https://github.com/apache/incubator-impala/blob/master/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#L1954

      For the attached query with 1800 UNION operators computing equivalence classes takes several minutes.

      Attachments

        1. largerUnionquery600.jfr
          2.44 MB
          Mostafa Mokhtar
        2. large union query profile.txt.zip
          992 kB
          Mostafa Mokhtar

        Issue Links

          Activity

            People

              tianyiwang Tianyi Wang
              mmokhtar Mostafa Mokhtar
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: