Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16991

Full outer join followed by inner join produces wrong results

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.0.0
    • 2.0.1, 2.1.0
    • SQL

    Description

      I found strange behaviour using fullouter join in combination with inner join. It seems that inner join can't match values correctly after full outer join. Here is a reproducible example in spark 2.0.

            ____              __
           / __/__  ___ _____/ /__
          _\ \/ _ \/ _ `/ __/  '_/
         /___/ .__/\_,_/_/ /_/\_\   version 2.0.0
            /_/
               
      Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45)
      Type in expressions to have them evaluated.
      Type :help for more information.
      
      scala> val a = Seq((1,2),(2,3)).toDF("a","b")
      a: org.apache.spark.sql.DataFrame = [a: int, b: int]
      
      scala> val b = Seq((2,5),(3,4)).toDF("a","c")
      b: org.apache.spark.sql.DataFrame = [a: int, c: int]
      
      scala> val c = Seq((3,1)).toDF("a","d")
      c: org.apache.spark.sql.DataFrame = [a: int, d: int]
      
      scala> val ab = a.join(b, Seq("a"), "fullouter")
      ab: org.apache.spark.sql.DataFrame = [a: int, b: int ... 1 more field]
      
      scala> ab.show
      +---+----+----+
      |  a|   b|   c|
      +---+----+----+
      |  1|   2|null|
      |  3|null|   4|
      |  2|   3|   5|
      +---+----+----+
      
      scala> ab.join(c, "a").show
      +---+---+---+---+
      |  a|  b|  c|  d|
      +---+---+---+---+
      +---+---+---+---+
      

      Meanwhile, without the full outer, inner join works fine.

      scala> b.join(c, "a").show
      +---+---+---+
      |  a|  c|  d|
      +---+---+---+
      |  3|  4|  1|
      +---+---+---+
      

      Attachments

        Issue Links

          Activity

            People

              smilegator Xiao Li
              jjarutis Jonas Jarutis
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: