Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1914

Simplify CountFunction not to traverse to evaluate all child expressions.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0.1, 1.1.0
    • Component/s: SQL
    • Labels:
      None

      Description

      CountFunction should count up only if the child's evaluated value is not null.

      Because it traverses to evaluate all child expressions, even if the child is null, it counts up if one of the all children is not null.

      To reproduce this bug in sbt hive/console:

      scala> hql("SELECT COUNT(*) FROM src1").collect()
      res1: Array[org.apache.spark.sql.Row] = Array([25])
      
      scala> hql("SELECT COUNT(*) FROM src1 WHERE key IS NULL").collect()
      res2: Array[org.apache.spark.sql.Row] = Array([10])
      
      scala> hql("SELECT COUNT(key + 1) FROM src1").collect()
      res3: Array[org.apache.spark.sql.Row] = Array([25])
      

      res3 should be 15 since there are 10 null keys.

        Attachments

          Activity

            People

            • Assignee:
              ueshin Takuya Ueshin
              Reporter:
              ueshin Takuya Ueshin
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: