Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35929

Schema inference of nested structs defaults to map

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.3.0
    • PySpark
    • None

    Description

      Inferring schema for struct columns causing schema issues as below.

      data = [{"inside_struct": {"payment": 100.5, "name": "Lee"}}]
      df = spark.createDataFrame(data)
      df.show()
      +--------------------+
      |       inside_struct|
      +--------------------+
      |{name -> null, pa...|
      +--------------------+
      

      The "inside_struct" is a map, and the "name" column inside of it becomes null.

      The schema inferring might decide on a map type with a value type of the first field of the struct, we should fix it.

      Attachments

        Issue Links

          Activity

            People

              itholic Haejoon Lee
              itholic Haejoon Lee
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: