[SPARK-19536] Improve capability to merge SQL data types - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Incomplete
Affects Version/s: 2.1.0
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed

Description

spark's union/merging of compatible types seems kind of weak. it works on basic types in the top level record, but it fails for nested records, maps, arrays, etc.

i would like to improve this.

for example i get errors like this:

org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the compatible column types. StructType(StructField(_1,StringType,true), StructField(_2,IntegerType,false)) <> StructType(StructField(_1,StringType,true), StructField(_2,LongType,false)) at the first column of the second table

some examples that do work:

scala> Seq(1, 2, 3).toDF union Seq(1L, 2L, 3L).toDF
res2: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value: bigint]

scala> Seq((1,"x"), (2,"x"), (3,"x")).toDF union Seq((1L,"x"), (2L,"x"), (3L,"x")).toDF
res3: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: bigint, _2: string]

what i would also expect to work but currently doesn't:

scala> Seq((Seq(1),"x"), (Seq(2),"x"), (Seq(3),"x")).toDF union Seq((Seq(1L),"x"), (Seq(2L),"x"), (Seq(3L),"x")).toDF

scala> Seq((1,("x",1)), (2,("x",2)), (3,("x",3))).toDF union Seq((1L,("x",1L)), (2L,("x",2L)), (3L,("x", 3L))).toDF

Attachments

Issue Links

relates to

SPARK-19435 Type coercion between ArrayTypes

Resolved

SPARK-24732 Type coercion between MapTypes.

Resolved

SPARK-24737 Type coercion between StructTypes.

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: koert kuipers

Votes:: 1 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 09/Feb/17 22:07

Updated:: 12/Dec/22 18:10

Resolved:: 08/Oct/19 05:42