Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-45891 Support Variant data type
  3. SPARK-49443

Implement to_variant_object expression and make schema_of_variant expressions print OBJECT for for Variant Objects

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.0
    • 4.0.0
    • SQL

    Description

      Cast from structs to variant objects should not be legal since variant objects are unordered bags of key-value pairs while structs are ordered sets of elements of fixed types. Therefore, casts between structs and variant objects do not behave like casts between structs. Example (produced by Serge Rielau):

      scala> spark.sql("SELECT cast(named_struct('c', 1, 'b', '2') as struct<b int, c int>)").show()
      +------------------------+
      |named_struct(c, 1, b, 2)|
      
      +------------------------+
      |{1, 2}|
      
      +------------------------+
      
      Passing a struct into VARIANT loses the position
      scala> spark.sql("SELECT cast(named_struct('c', 1, 'b', '2')::variant as struct<b int, c int>)").show()
      +-----------------------------------------+
      |CAST(named_struct(c, 1, b, 2) AS VARIANT)|
      
      +-----------------------------------------+
      |{2, 1}|
      
      +-----------------------------------------+
      

      Casts from maps to variant objects should also not be legal since they represent completely orthogonal data types. Maps can represent a variable number of key value pairs based on just a key and value type in the schema but in objects, the schema (produced by schema_of_variant expressions) will have a type corresponding to each value in the object. Objects can have values of different types while maps cannot and objects can only have string keys while maps can also have complex keys.

      We should therefore prohibit the existing behavior of allowing explicit casts from structs and maps to variants as the variant spec currently only supports an object type which is remotely compatible with structs and maps. We should introduce a new expression that converts schemas containing structs and maps to variants. We will call it `to_variant_object`.

      Also, schema_of_variant and schema_of_variant_agg expressions currently print STRUCT when Variant Objects are observed. We should also correct that to OBJECT.

      Attachments

        Activity

          People

            harshmotw-db Harsh Motwani
            harshmotw-db Harsh Motwani
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: