[SPARK-45657] Caching SQL UNION of different column data types does not work inside Dataset.union - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.3.2, 3.4.0, 3.4.1
Fix Version/s: 3.5.0
Component/s: SQL
Labels:
None

Description

Cache SQL UNION of 2 sides with different column data types

scala> spark.sql("select 1 id union select 's2' id").cache()

Dataset.union does not leverage the cache

scala> spark.sql("select 1 id union select 's2' id").union(spark.sql("select 's3'")).queryExecution.optimizedPlan
res15: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
Union false, false
:- Aggregate [id#109], [id#109]
:  +- Union false, false
:     :- Project [1 AS id#109]
:     :  +- OneRowRelation
:     +- Project [s2 AS id#108]
:        +- OneRowRelation
+- Project [s3 AS s3#111]
   +- OneRowRelation

SQL UNION of the cached SQL UNION does use the cache! Please note `InMemoryRelation` used.

scala> spark.sql("(select 1 id union select 's2' id) union select 's3'").queryExecution.optimizedPlan
res16: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
Aggregate [id#117], [id#117]
+- Union false, false
   :- InMemoryRelation [id#117], StorageLevel(disk, memory, deserialized, 1 replicas)
   :     +- *(4) HashAggregate(keys=[id#100], functions=[], output=[id#100])
   :        +- Exchange hashpartitioning(id#100, 500), ENSURE_REQUIREMENTS, [plan_id=241]
   :           +- *(3) HashAggregate(keys=[id#100], functions=[], output=[id#100])
   :              +- Union
   :                 :- *(1) Project [1 AS id#100]
   :                 :  +- *(1) Scan OneRowRelation[]
   :                 +- *(2) Project [s2 AS id#99]
   :                    +- *(2) Scan OneRowRelation[]
   +- Project [s3 AS s3#116]
      +- OneRowRelation

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: John Zhuge

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 25/Oct/23 00:20

Updated:: 25/Oct/23 05:34

Resolved:: 25/Oct/23 05:34