Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35929

Schema inference of nested structs defaults to map

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.3.0
    • PySpark
    • None

    Description

      Inferring schema for struct columns causing schema issues as below.

      data = [{"inside_struct": {"payment": 100.5, "name": "Lee"}}]
      df = spark.createDataFrame(data)
      df.show()
      +--------------------+
      |       inside_struct|
      +--------------------+
      |{name -> null, pa...|
      +--------------------+
      

      The "inside_struct" is a map, and the "name" column inside of it becomes null.

      The schema inferring might decide on a map type with a value type of the first field of the struct, we should fix it.

      Attachments

        Activity

          People

            itholic Haejoon Lee
            itholic Haejoon Lee
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: