[SPARK-6231] Join on two tables (generated from same one) is broken - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.3.0
Fix Version/s: 1.4.0
Component/s: SQL
Labels:
- DataFrame

Target Version/s:

1.4.0
Sprint:
Spark 1.5 doc/QA sprint

Description

If the two column used in joinExpr come from the same table, they have the same id, then the joniExpr is explained in wrong way.

val df = sqlContext.load(path, "parquet")

val txns = df.groupBy("cust_id").agg($"cust_id", countDistinct($"day_num").as("txns"))

val spend = df.groupBy("cust_id").agg($"cust_id", sum($"extended_price").as("spend"))

val rmJoin = txns.join(spend, txns("cust_id") === spend("cust_id"), "inner")

scala> rmJoin.explain
== Physical Plan ==
CartesianProduct
 Filter (cust_id#0 = cust_id#0)
  Aggregate false, [cust_id#0], [cust_id#0,CombineAndCount(partialSets#25) AS txns#7L]
   Exchange (HashPartitioning [cust_id#0], 200)
    Aggregate true, [cust_id#0], [cust_id#0,AddToHashSet(day_num#2L) AS partialSets#25]
     PhysicalRDD [cust_id#0,day_num#2L], MapPartitionsRDD[1] at map at newParquet.scala:542
 Aggregate false, [cust_id#17], [cust_id#17,SUM(PartialSum#38) AS spend#8]
  Exchange (HashPartitioning [cust_id#17], 200)
   Aggregate true, [cust_id#17], [cust_id#17,SUM(extended_price#20) AS PartialSum#38]
    PhysicalRDD [cust_id#17,extended_price#20], MapPartitionsRDD[3] at map at newParquet.scala:542

Attachments

Issue Links

is related to

SPARK-6247 Certain self joins cannot be analyzed

Resolved

relates to

SPARK-7059 Create a DataFrame join API to facilitate equijoin and self join

Resolved

SPARK-20073 Unexpected Cartesian product when using eqNullSafe in join with a derived table

Resolved

links to

[Github] Pull Request #5634 (yhuai)

[Github] Pull Request #5919 (rxin)

Activity

People

Assignee:: Reynold Xin

Reporter:: Davies Liu

Votes:: 1 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 09/Mar/15 18:32

Updated:: 24/Mar/17 06:02

Resolved:: 06/May/15 02:01