Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46743

Count bug introduced for scalar subquery when using TEMPORARY VIEW, as compared to using table

    XMLWordPrintableJSON

Details

    Description

      Using the temp view reproduces COUNT bug, returns nulls instead of 0.

      With a table:

      scala> spark.sql("""CREATE TABLE outer_table USING parquet AS SELECT * FROM VALUES
           |     (1, 1),
           |     (2, 1),
           |     (3, 3),
           |     (6, 6),
           |     (7, 7),
           |     (9, 9) AS inner_table(a, b)""")
      
      val res6: org.apache.spark.sql.DataFrame = []
      
      scala> spark.sql("CREATE TABLE null_table USING parquet AS SELECT CAST(null AS int) AS a, CAST(null as int) AS b ;")
      
      val res7: org.apache.spark.sql.DataFrame = []
      
      scala> spark.sql("""SELECT ( SELECT COUNT(null_table.a) AS aggAlias FROM null_table WHERE null_table.a = outer_table.a) FROM outer_table""").collect()
      
      val res8: Array[org.apache.spark.sql.Row] = Array([0], [0], [0], [0], [0], [0]) 

      With a view:

       

      spark.sql("CREATE TEMPORARY VIEW outer_view(a, b) AS VALUES (1, 1), (2, 1),(3, 3), (6, 6), (7, 7), (9, 9);")
      
      spark.sql("CREATE TEMPORARY VIEW null_view(a, b) AS SELECT CAST(null AS int), CAST(null as int);")
      
      spark.sql("""SELECT ( SELECT COUNT(null_view.a) AS aggAlias FROM null_view WHERE null_view.a = outer_view.a) FROM outer_view""").collect()
      
      val res2: Array[org.apache.spark.sql.Row] = Array([null], [null], [null], [null], [null], [null])

       

       

      Attachments

        Activity

          People

            andyylam Andy Lam
            andyylam Andy Lam
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: