Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37569

View Analysis incorrectly marks nested fields as nullable

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.3.0
    • SQL
    • None

    Description

      Consider a view as follows with all fields non-nullable (required)

      spark.sql("""
          CREATE OR REPLACE VIEW v AS 
          SELECT id, named_struct('a', id) AS nested
          FROM RANGE(10)
      """)
      

      we can see that the view schema has been correctly stored as non-nullable

      scala> System.out.println(spark.sessionState.catalog.externalCatalog.getTable("default", "v2"))
      CatalogTable(
      Database: default
      Table: v2
      Owner: smahadik
      Created Time: Tue Dec 07 09:00:42 PST 2021
      Last Access: UNKNOWN
      Created By: Spark 3.3.0-SNAPSHOT
      Type: VIEW
      View Text: SELECT id, named_struct('a', id) AS nested
          FROM RANGE(10)
      View Original Text: SELECT id, named_struct('a', id) AS nested
          FROM RANGE(10)
      View Catalog and Namespace: spark_catalog.default
      View Query Output Columns: [id, nested]
      Table Properties: [transient_lastDdlTime=1638896442]
      Serde Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat
      OutputFormat: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
      Storage Properties: [serialization.format=1]
      Schema: root
       |-- id: long (nullable = false)
       |-- nested: struct (nullable = false)
       |    |-- a: long (nullable = false)
      )
      

      However, when trying to read this view, it incorrectly marks nested column a as nullable

      scala> spark.table("v2").printSchema
      root
       |-- id: long (nullable = false)
       |-- nested: struct (nullable = false)
       |    |-- a: long (nullable = true)
      

      This is caused by this line in Analyzer.scala. Going through the history of changes for this block of code, it seems like asNullable is a remnant of a time before we added checks to ensure that the from and to types of the cast were compatible. As nullability is already checked, it should be safe to add a cast without converting the target datatype to nullable.

      Attachments

        Activity

          People

            shardulm Shardul Mahadik
            shardulm Shardul Mahadik
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: