Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8912

Avoid calling computeStats twice on HBaseScanNode

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0, Impala 3.1.0, Impala 3.2.0, Impala 3.3.0
    • Fix Version/s: Impala 3.4.0
    • Component/s: Frontend
    • Labels:
      None
    • Epic Color:
      ghx-label-11

      Description

      For simple queries on HBase tables that has HBaseScanNode as the root of the SingleNodePlan, HBaseScanNode#computeStats will be called twice.

      Stacktrace for the first call:

              at org.apache.impala.planner.HBaseScanNode.computeStats(HBaseScanNode.java:286)
              at org.apache.impala.planner.HBaseScanNode.init(HBaseScanNode.java:160)
              at org.apache.impala.planner.SingleNodePlanner.createScanNode(SingleNodePlanner.java:1405)
              at org.apache.impala.planner.SingleNodePlanner.createTableRefNode(SingleNodePlanner.java:1582)
              at org.apache.impala.planner.SingleNodePlanner.createTableRefsPlan(SingleNodePlanner.java:826)
              at org.apache.impala.planner.SingleNodePlanner.createSelectPlan(SingleNodePlanner.java:662)
              at org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:261)
              at org.apache.impala.planner.SingleNodePlanner.createSingleNodePlan(SingleNodePlanner.java:151)
              at org.apache.impala.planner.Planner.createPlan(Planner.java:117)
              at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1169)
              at org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:1495)
              at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1359)
              at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1250)
              at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1220)
      

      Stacktrace for the second call:

              at org.apache.impala.planner.HBaseScanNode.computeStats(HBaseScanNode.java:286)
              at org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:307)
              at org.apache.impala.planner.SingleNodePlanner.createSingleNodePlan(SingleNodePlanner.java:151)
              at org.apache.impala.planner.Planner.createPlan(Planner.java:117)
              at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1169)
              at org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:1495)
              at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1359)
              at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1250)
              at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1220)
              at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:154)
      

      Codes of the second call:

        private PlanNode createQueryPlan(QueryStmt stmt, Analyzer analyzer, boolean disableTopN)
            throws ImpalaException {
          ......
          if (stmt.evaluateOrderBy() && sortHasMaterializedSlots) {
            root = createSortNode(analyzer, root, stmt.getSortInfo(), stmt.getLimit(),
                stmt.getOffset(), stmt.hasLimit(), disableTopN);
          } else {
            root.setLimit(stmt.getLimit());
            root.computeStats(analyzer);   // <--- May call HBaseScanNode#computeStats here
          }
      
          return root;
        }
      

      Logs for a simple query on an old version of Impala:

      I0830 11:52:05.991547 41189 Analyzer.java:1578] new pred: stg.xxx_hbase.key >= 'key1' BinaryPredicate{op=>=, SlotRef{path=key, type=STRING, id=0} StringLiteral{value=key1}}
      I0830 11:52:05.991595 41189 Analyzer.java:1578] new pred: stg.xxx_hbase.key <= 'key2' BinaryPredicate{op=<=, SlotRef{path=key, type=STRING, id=0} StringLiteral{value=key2}}
      # <--------- 2 seconds here
      I0830 11:52:08.114225 41189 HBaseScanNode.java:217] computeStats HbaseScan: cardinality=1706076
      I0830 11:52:08.114341 41189 HBaseScanNode.java:223] computeStats HbaseScan: #nodes=100
      I0830 11:52:08.114452 41189 SingleNodePlanner.java:357] createCheapestJoinPlan
      # <--------- 2 seconds here
      I0830 11:52:10.260190 41189 HBaseScanNode.java:217] computeStats HbaseScan: cardinality=1706076
      I0830 11:52:10.260303 41189 HBaseScanNode.java:223] computeStats HbaseScan: #nodes=100
      I0830 11:52:10.260387 41189 SingleNodePlanner.java:357] createCheapestJoinPlan
      

      Such kind of queries are usually point queries and are always expected to return fast. HBaseScanNode#computeStats is heavy since it requires RPCs to HBase. We should avoid calling it twice.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                stigahuang Quanlong Huang
                Reporter:
                stigahuang Quanlong Huang
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: