Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
4.0.0
Description
Cast from structs to variant objects should not be legal since variant objects are unordered bags of key-value pairs while structs are ordered sets of elements of fixed types. Therefore, casts between structs and variant objects do not behave like casts between structs. Example (produced by Serge Rielau):
scala> spark.sql("SELECT cast(named_struct('c', 1, 'b', '2') as struct<b int, c int>)").show() +------------------------+ |named_struct(c, 1, b, 2)| +------------------------+ |{1, 2}| +------------------------+ Passing a struct into VARIANT loses the position scala> spark.sql("SELECT cast(named_struct('c', 1, 'b', '2')::variant as struct<b int, c int>)").show() +-----------------------------------------+ |CAST(named_struct(c, 1, b, 2) AS VARIANT)| +-----------------------------------------+ |{2, 1}| +-----------------------------------------+
Casts from maps to variant objects should also not be legal since they represent completely orthogonal data types. Maps can represent a variable number of key value pairs based on just a key and value type in the schema but in objects, the schema (produced by schema_of_variant expressions) will have a type corresponding to each value in the object. Objects can have values of different types while maps cannot and objects can only have string keys while maps can also have complex keys.
We should therefore prohibit the existing behavior of allowing explicit casts from structs and maps to variants as the variant spec currently only supports an object type which is remotely compatible with structs and maps. We should introduce a new expression that converts schemas containing structs and maps to variants. We will call it `to_variant_object`.
Also, schema_of_variant and schema_of_variant_agg expressions currently print STRUCT when Variant Objects are observed. We should also correct that to OBJECT.