Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37569

View Analysis incorrectly marks nested fields as nullable

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.3.0
    • SQL
    • None

    Description

      Consider a view as follows with all fields non-nullable (required)

      spark.sql("""
          CREATE OR REPLACE VIEW v AS 
          SELECT id, named_struct('a', id) AS nested
          FROM RANGE(10)
      """)
      

      we can see that the view schema has been correctly stored as non-nullable

      scala> System.out.println(spark.sessionState.catalog.externalCatalog.getTable("default", "v2"))
      CatalogTable(
      Database: default
      Table: v2
      Owner: smahadik
      Created Time: Tue Dec 07 09:00:42 PST 2021
      Last Access: UNKNOWN
      Created By: Spark 3.3.0-SNAPSHOT
      Type: VIEW
      View Text: SELECT id, named_struct('a', id) AS nested
          FROM RANGE(10)
      View Original Text: SELECT id, named_struct('a', id) AS nested
          FROM RANGE(10)
      View Catalog and Namespace: spark_catalog.default
      View Query Output Columns: [id, nested]
      Table Properties: [transient_lastDdlTime=1638896442]
      Serde Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat
      OutputFormat: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
      Storage Properties: [serialization.format=1]
      Schema: root
       |-- id: long (nullable = false)
       |-- nested: struct (nullable = false)
       |    |-- a: long (nullable = false)
      )
      

      However, when trying to read this view, it incorrectly marks nested column a as nullable

      scala> spark.table("v2").printSchema
      root
       |-- id: long (nullable = false)
       |-- nested: struct (nullable = false)
       |    |-- a: long (nullable = true)
      

      This is caused by this line in Analyzer.scala. Going through the history of changes for this block of code, it seems like asNullable is a remnant of a time before we added checks to ensure that the from and to types of the cast were compatible. As nullability is already checked, it should be safe to add a cast without converting the target datatype to nullable.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            shardulm Shardul Mahadik
            shardulm Shardul Mahadik
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment