Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23799

[CBO] FilterEstimation.evaluateInSet produces devision by zero in a case of empty table with analyzed statistics

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.1, 2.3.0
    • 2.4.0
    • Optimizer, SQL
    • None

    Description

      Spark 2.2.1 and 2.3.0 can produce NumberFormatException (see below) during the analysis of the queries, which are using previously analyzed hive tables. 

      The NumberFormatException occurs because in FilterEstimation.scala on lines 50 and 52 the method calculateFilterSelectivity returns NaN, which is caused by devision by zero. This leads to NumberFormatException during conversion from Double to BigDecimal. 

      NaN is caused by devision by zero in evaluateInSet method. 

      Exception:

      java.lang.NumberFormatException

      at java.math.BigDecimal.<init>(BigDecimal.java:494)

      at java.math.BigDecimal.<init>(BigDecimal.java:824)

      at scala.math.BigDecimal$.decimal(BigDecimal.scala:52)

      at scala.math.BigDecimal$.decimal(BigDecimal.scala:55)

      at scala.math.BigDecimal$.double2bigDecimal(BigDecimal.scala:343)

      at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.FilterEstimation.estimate(FilterEstimation.scala:52)

      at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visitFilter(BasicStatsPlanVisitor.scala:43)

      at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visitFilter(BasicStatsPlanVisitor.scala:25)

      at org.apache.spark.sql.catalyst.plans.logical.LogicalPlanVisitor$class.visit(LogicalPlanVisitor.scala:30)

      at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visit(BasicStatsPlanVisitor.scala:25)

      at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$$anonfun$stats$1.apply(LogicalPlanStats.scala:35)

      at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$$anonfun$stats$1.apply(LogicalPlanStats.scala:33)

      at scala.Option.getOrElse(Option.scala:121)

      at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$class.stats(LogicalPlanStats.scala:33)

      at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.stats(LogicalPlan.scala:30)

      at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.EstimationUtils$$anonfun$rowCountsExist$1.apply(EstimationUtils.scala:32)

      at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.EstimationUtils$$anonfun$rowCountsExist$1.apply(EstimationUtils.scala:32)

      at scala.collection.IndexedSeqOptimized$class.prefixLengthImpl(IndexedSeqOptimized.scala:38)

      at scala.collection.IndexedSeqOptimized$class.forall(IndexedSeqOptimized.scala:43)

      at scala.collection.mutable.WrappedArray.forall(WrappedArray.scala:35)

      at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.EstimationUtils$.rowCountsExist(EstimationUtils.scala:32)

      at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.ProjectEstimation$.estimate(ProjectEstimation.scala:27)

      at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visitProject(BasicStatsPlanVisitor.scala:63)

      at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visitProject(BasicStatsPlanVisitor.scala:25)

      at org.apache.spark.sql.catalyst.plans.logical.LogicalPlanVisitor$class.visit(LogicalPlanVisitor.scala:37)

      at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.BasicStatsPlanVisitor$.visit(BasicStatsPlanVisitor.scala:25)

      at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$$anonfun$stats$1.apply(LogicalPlanStats.scala:35)

      at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$$anonfun$stats$1.apply(LogicalPlanStats.scala:33)

      at scala.Option.getOrElse(Option.scala:121)

      at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats$class.stats(LogicalPlanStats.scala:33)

      at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.stats(LogicalPlan.scala:30)

      at org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder$$anonfun$2.apply(CostBasedJoinReorder.scala:64)

      at org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder$$anonfun$2.apply(CostBasedJoinReorder.scala:64)

      at scala.collection.LinearSeqOptimized$class.forall(LinearSeqOptimized.scala:83)

      at scala.collection.immutable.List.forall(List.scala:84)

      at org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder$.org$apache$spark$sql$catalyst$optimizer$CostBasedJoinReorder$$reorder(CostBasedJoinReorder.scala:64)

      at org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder$$anonfun$1.applyOrElse(CostBasedJoinReorder.scala:46)

      at org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder$$anonfun$1.applyOrElse(CostBasedJoinReorder.scala:43)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)

      at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)

      at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)

      at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)

      at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)

      at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)

      at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)

      at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)

      at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4$$anonfun$apply$11.apply(TreeNode.scala:335)

      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)

      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)

      at scala.collection.immutable.List.foreach(List.scala:392)

      at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)

      at scala.collection.immutable.List.map(List.scala:296)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)

      at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)

      at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4$$anonfun$apply$11.apply(TreeNode.scala:335)

      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)

      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)

      at scala.collection.immutable.List.foreach(List.scala:392)

      at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)

      at scala.collection.immutable.List.map(List.scala:296)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)

      at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)

      at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)

      at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)

      at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)

      at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)

      at org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder$.apply(CostBasedJoinReorder.scala:43)

      at org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder$.apply(CostBasedJoinReorder.scala:35)

      at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)

      at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)

      at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)

      at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)

      at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)

      at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)

      at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)

      at scala.collection.immutable.List.foreach(List.scala:392)

      at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)

      Attachments

        Activity

          People

            mshtelma Michael Shtelma
            mshtelma Michael Shtelma
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: