Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1914

Simplify CountFunction not to traverse to evaluate all child expressions.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.0.1, 1.1.0
    • SQL
    • None

    Description

      CountFunction should count up only if the child's evaluated value is not null.

      Because it traverses to evaluate all child expressions, even if the child is null, it counts up if one of the all children is not null.

      To reproduce this bug in sbt hive/console:

      scala> hql("SELECT COUNT(*) FROM src1").collect()
      res1: Array[org.apache.spark.sql.Row] = Array([25])
      
      scala> hql("SELECT COUNT(*) FROM src1 WHERE key IS NULL").collect()
      res2: Array[org.apache.spark.sql.Row] = Array([10])
      
      scala> hql("SELECT COUNT(key + 1) FROM src1").collect()
      res3: Array[org.apache.spark.sql.Row] = Array([25])
      

      res3 should be 15 since there are 10 null keys.

      Attachments

        Activity

          People

            ueshin Takuya Ueshin
            ueshin Takuya Ueshin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: