Details
Description
I found strange behaviour using fullouter join in combination with inner join. It seems that inner join can't match values correctly after full outer join. Here is a reproducible example in spark 2.0.
____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.0.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45) Type in expressions to have them evaluated. Type :help for more information. scala> val a = Seq((1,2),(2,3)).toDF("a","b") a: org.apache.spark.sql.DataFrame = [a: int, b: int] scala> val b = Seq((2,5),(3,4)).toDF("a","c") b: org.apache.spark.sql.DataFrame = [a: int, c: int] scala> val c = Seq((3,1)).toDF("a","d") c: org.apache.spark.sql.DataFrame = [a: int, d: int] scala> val ab = a.join(b, Seq("a"), "fullouter") ab: org.apache.spark.sql.DataFrame = [a: int, b: int ... 1 more field] scala> ab.show +---+----+----+ | a| b| c| +---+----+----+ | 1| 2|null| | 3|null| 4| | 2| 3| 5| +---+----+----+ scala> ab.join(c, "a").show +---+---+---+---+ | a| b| c| d| +---+---+---+---+ +---+---+---+---+
Meanwhile, without the full outer, inner join works fine.
scala> b.join(c, "a").show
+---+---+---+
| a| c| d|
+---+---+---+
| 3| 4| 1|
+---+---+---+
Attachments
Issue Links
- is duplicated by
-
SPARK-17099 Incorrect result when HAVING clause is added to group by query
- Resolved
-
SPARK-17120 Analyzer incorrectly optimizes plan to empty LocalRelation
- Resolved
-
SPARK-17060 Call inner join after outer join will miss rows with null values
- Resolved
- links to