Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.1.0
-
None
Description
spark's union/merging of compatible types seems kind of weak. it works on basic types in the top level record, but it fails for nested records, maps, arrays, etc.
i would like to improve this.
for example i get errors like this:
org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the compatible column types. StructType(StructField(_1,StringType,true), StructField(_2,IntegerType,false)) <> StructType(StructField(_1,StringType,true), StructField(_2,LongType,false)) at the first column of the second table
some examples that do work:
scala> Seq(1, 2, 3).toDF union Seq(1L, 2L, 3L).toDF res2: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: bigint] scala> Seq((1,"x"), (2,"x"), (3,"x")).toDF union Seq((1L,"x"), (2L,"x"), (3L,"x")).toDF res3: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: bigint, _2: string]
what i would also expect to work but currently doesn't:
scala> Seq((Seq(1),"x"), (Seq(2),"x"), (Seq(3),"x")).toDF union Seq((Seq(1L),"x"), (Seq(2L),"x"), (Seq(3L),"x")).toDF scala> Seq((1,("x",1)), (2,("x",2)), (3,("x",3))).toDF union Seq((1L,("x",1L)), (2L,("x",2L)), (3L,("x", 3L))).toDF
Attachments
Issue Links
- relates to
-
SPARK-19435 Type coercion between ArrayTypes
- Resolved
-
SPARK-24732 Type coercion between MapTypes.
- Resolved
-
SPARK-24737 Type coercion between StructTypes.
- Resolved