Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35912

[SQL] JSON read behavior is different depending on the cache setting when nullable is false.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.1.1
    • 3.3.0
    • SQL
    • None

    Description

      Below is the reproduced code.

       

      import org.apache.spark.sql.Encoders
       
      case class TestSchema(x: Int, y: Int)
      case class BaseSchema(value: TestSchema)
       
      val schema = Encoders.product[BaseSchema].schema
      val testDS = Seq("""{"value":{"x":1}}""", """{"value":{"x":2}}""").toDS
      val jsonDS = spark.read.schema(schema).json(testDS)
      
      jsonDS.show
      +---------+
      |    value|
      +---------+
      |{1, null}|
      |{2, null}|
      +---------+
      
      jsonDS.cache.show
      +------+
      | value|
      +------+
      |{1, 0}|
      |{2, 0}|
      +------+
      
      

       

      The above result occurs when a schema is created with a nested StructType and nullable of StructField is false.

       

      Attachments

        Activity

          People

            fchen Fu Chen
            Heedo Heedo Lee
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: