Description
When calculating parallelism, we end up using HiveDefaultCostModel. getSplitCount which returns null instead of HiveOnTezCostModel.getSplitCount which results in wrong parallelism.
This happens for this join
org.apache.calcite.plan.RelOptUtil.toString(join) (java.lang.String) HiveJoin(condition=[=($1, $3)], joinType=[inner], algorithm=[none], cost=[not available]) HiveProject(cs_sold_date_sk=[$0], cs_bill_customer_sk=[$3], cs_sales_price=[$21]) HiveTableScan(table=[[tpcds_bin_orc_200.catalog_sales]]) HiveJoin(condition=[=($1, $2)], joinType=[inner], algorithm=[MapJoin], cost=[{2400000.0 rows, 6.400008E11 cpu, 1294.6098 io}]) HiveProject(c_customer_sk=[$0], c_current_addr_sk=[$4]) HiveTableScan(table=[[tpcds_bin_orc_200.customer]]) HiveProject(ca_address_sk=[$0], ca_state=[$8], ca_zip=[$9]) HiveTableScan(table=[[tpcds_bin_orc_200.customer_address]])
The issue appears to be happening very early when calling
if (pushDownTree != null) { costPushDown = RelMetadataQuery.getCumulativeCost(pushDownTree.getJoinTree()); }
As pushDownTree.getJoinTree().joinAlgorithm = HiveOnTezCostModel$TezMapJoinAlgorithm
Call stack.
HiveDefaultCostModel$DefaultJoinAlgorithm.getSplitCount(HiveJoin) line: 114 HiveJoin.getSplitCount() line: 136 HiveRelMdParallelism.splitCount(HiveJoin) line: 63 NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method] NativeMethodAccessorImpl.invoke(Object, Object[]) line: 57 DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43 Method.invoke(Object, Object...) line: 606 ReflectiveRelMetadataProvider$1$1.invoke(Object, Method, Object[]) line: 182 $Proxy46.splitCount() line: not available GeneratedMethodAccessor26.invoke(Object, Object[]) line: not available DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43 Method.invoke(Object, Object...) line: 606 ChainedRelMetadataProvider$ChainedInvocationHandler.invoke(Object, Method, Object[]) line: 109 $Proxy46.splitCount() line: not available GeneratedMethodAccessor26.invoke(Object, Object[]) line: not available DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43 Method.invoke(Object, Object...) line: 606 ChainedRelMetadataProvider$ChainedInvocationHandler.invoke(Object, Method, Object[]) line: 109 $Proxy46.splitCount() line: not available GeneratedMethodAccessor26.invoke(Object, Object[]) line: not available DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43 Method.invoke(Object, Object...) line: 606 CachingRelMetadataProvider$CachingInvocationHandler.invoke(Object, Method, Object[]) line: 132 $Proxy46.splitCount() line: not available RelMetadataQuery.splitCount(RelNode) line: 401 HiveOnTezCostModel$TezMapJoinAlgorithm.getCost(HiveJoin) line: 255 HiveOnTezCostModel(HiveCostModel).getJoinCost(HiveJoin) line: 64 HiveRelMdCost.getNonCumulativeCost(HiveJoin) line: 56 NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method] NativeMethodAccessorImpl.invoke(Object, Object[]) line: 57 DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43 Method.invoke(Object, Object...) line: 606 ReflectiveRelMetadataProvider$1$1.invoke(Object, Method, Object[]) line: 182 $Proxy41.getNonCumulativeCost() line: not available GeneratedMethodAccessor22.invoke(Object, Object[]) line: not available DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43 Method.invoke(Object, Object...) line: 606 ChainedRelMetadataProvider$ChainedInvocationHandler.invoke(Object, Method, Object[]) line: 109 $Proxy41.getNonCumulativeCost() line: not available GeneratedMethodAccessor22.invoke(Object, Object[]) line: not available DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43 Method.invoke(Object, Object...) line: 606 ChainedRelMetadataProvider$ChainedInvocationHandler.invoke(Object, Method, Object[]) line: 109 $Proxy41.getNonCumulativeCost() line: not available GeneratedMethodAccessor22.invoke(Object, Object[]) line: not available DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43 Method.invoke(Object, Object...) line: 606 ChainedRelMetadataProvider$ChainedInvocationHandler.invoke(Object, Method, Object[]) line: 109 $Proxy41.getNonCumulativeCost() line: not available GeneratedMethodAccessor22.invoke(Object, Object[]) line: not available DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43 Method.invoke(Object, Object...) line: 606 CachingRelMetadataProvider$CachingInvocationHandler.invoke(Object, Method, Object[]) line: 132 $Proxy41.getNonCumulativeCost() line: not available RelMetadataQuery.getNonCumulativeCost(RelNode) line: 115 HiveRelMdDistinctRowCount.getCumulativeCost(HiveJoin) line: 114 NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method] NativeMethodAccessorImpl.invoke(Object, Object[]) line: 57 DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43 Method.invoke(Object, Object...) line: 606 ReflectiveRelMetadataProvider$1$1.invoke(Object, Method, Object[]) line: 182 $Proxy40.getCumulativeCost() line: not available GeneratedMethodAccessor21.invoke(Object, Object[]) line: not available DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43 Method.invoke(Object, Object...) line: 606 ChainedRelMetadataProvider$ChainedInvocationHandler.invoke(Object, Method, Object[]) line: 109 $Proxy40.getCumulativeCost() line: not available GeneratedMethodAccessor21.invoke(Object, Object[]) line: not available DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43 Method.invoke(Object, Object...) line: 606 ChainedRelMetadataProvider$ChainedInvocationHandler.invoke(Object, Method, Object[]) line: 109 $Proxy40.getCumulativeCost() line: not available GeneratedMethodAccessor21.invoke(Object, Object[]) line: not available DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43 Method.invoke(Object, Object...) line: 606 CachingRelMetadataProvider$CachingInvocationHandler.invoke(Object, Method, Object[]) line: 132 $Proxy40.getCumulativeCost() line: not available RelMetadataQuery.getCumulativeCost(RelNode) line: 101 LoptOptimizeJoinRule.addFactorToTree(LoptMultiJoin, LoptSemiJoinOptimizer, LoptJoinTree, int, BitSet, List<RexNode>, boolean) line: 940 LoptOptimizeJoinRule.createOrdering(LoptMultiJoin, LoptSemiJoinOptimizer, int) line: 726 LoptOptimizeJoinRule.findBestOrderings(LoptMultiJoin, LoptSemiJoinOptimizer, RelOptRuleCall) line: 458 LoptOptimizeJoinRule.onMatch(RelOptRuleCall) line: 128 HepPlanner(AbstractRelOptPlanner).fireRule(RelOptRuleCall) line: 326 HepPlanner.applyRule(RelOptRule, HepRelVertex, boolean) line: 515 HepPlanner.applyRules(Collection<RelOptRule>, boolean) line: 392 HepPlanner.executeInstruction(HepInstruction$RuleInstance) line: 255 HepInstruction$RuleInstance.execute(HepPlanner) line: 125 HepPlanner.executeProgram(HepProgram) line: 207 HepPlanner.findBestExp() line: 194 CalcitePlanner$CalcitePlannerAction.apply(RelOptCluster, RelOptSchema, SchemaPlus) line: 849 CalcitePlanner$CalcitePlannerAction.apply(RelOptCluster, RelOptSchema, SchemaPlus) line: 761 Frameworks$1.apply(RelOptCluster, RelOptSchema, SchemaPlus, CalciteServerStatement) line: 109 CalcitePrepareImpl.perform(CalciteServerStatement, PrepareAction<R>) line: 730 Frameworks.withPrepare(PrepareAction<R>) line: 145 Frameworks.withPlanner(PlannerAction<R>, FrameworkConfig) line: 105 CalcitePlanner.getOptimizedAST() line: 602 CalcitePlanner.genOPTree(ASTNode, SemanticAnalyzer$PlannerContext) line: 240 CalcitePlanner(SemanticAnalyzer).analyzeInternal(ASTNode, SemanticAnalyzer$PlannerContext) line: 10003 CalcitePlanner.analyzeInternal(ASTNode) line: 203 CalcitePlanner(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 224 ExplainSemanticAnalyzer.analyzeInternal(ASTNode) line: 74 ExplainSemanticAnalyzer(BaseSemanticAnalyzer).analyze(ASTNode, Context) line: 224 Driver.compile(String, boolean) line: 424 Driver.compile(String) line: 308 Driver.compileInternal(String) line: 1122 Driver.runInternal(String, boolean) line: 1170 Driver.run(String, boolean) line: 1059 Driver.run(String) line: 1049 CliDriver.processLocalCmd(String, CommandProcessor, CliSessionState) line: 213 CliDriver.processCmd(String) line: 165 CliDriver.processLine(String, boolean) line: 376 CliDriver.executeDriver(CliSessionState, HiveConf, OptionsProcessor) line: 736 CliDriver.run(String[]) line: 681 CliDriver.main(String[]) line: 621 NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method] NativeMethodAccessorImpl.invoke(Object, Object[]) line: 57 DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43