Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19716

Dataset should allow by-name resolution for struct type elements in array

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.2.0
    • SQL
    • None

    Description

      if we have a DataFrame with schema a: int, b: int, c: int, and convert it to Dataset with case class Data(a: Int, c: Int), it works and we will extract the `a` and `c` columns to build the Data.

      However, if the struct is inside array, e.g. schema is arr: array<struct<a: int, b: int, c: int>>, and we wanna convert it to Dataset with case class ComplexData(arr: Seq[Data]), we will fail. The reason is, to allow compatible types, e.g. convert a: int to case class A(a: Long), we will add cast for each field, except struct type field, because struct type is flexible, the number of columns can mismatch. We should probably also skip cast for array and map type.

      Attachments

        Activity

          People

            cloud_fan Wenchen Fan
            cloud_fan Wenchen Fan
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: