Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17060

Call inner join after outer join will miss rows with null values

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.0.0
    • None
    • SQL
    • Spark 2.0.0, Mac, Local

    Description

      test.scala
      scala> val df1 = sc.parallelize(Seq((1, 2, 3), (3, 3, 3))).toDF("a", "b", "c")
      df1: org.apache.spark.sql.DataFrame = [a: int, b: int ... 1 more field]
      
      scala> val df2 = sc.parallelize(Seq((1, 2, 4), (4, 4, 4))).toDF("a", "b", "d")
      df2: org.apache.spark.sql.DataFrame = [a: int, b: int ... 1 more field]
      
      scala> val df3 = df1.join(df2, Seq("a", "b"), "outer")
      df3: org.apache.spark.sql.DataFrame = [a: int, b: int ... 2 more fields]
      
      scala> df3.show()
      +---+---+----+----+
      |  a|  b|   c|   d|
      +---+---+----+----+
      |  1|  2|   3|   4|
      |  3|  3|   3|null|
      |  4|  4|null|   4|
      +---+---+----+----+
      
      scala> val df4 = sc.parallelize(Seq((1, 2, 5), (3, 3, 5), (4, 4, 5))).toDF("a", "b", "e")
      df4: org.apache.spark.sql.DataFrame = [a: int, b: int ... 1 more field]
      
      scala> df4.show()
      +---+---+---+
      |  a|  b|  e|
      +---+---+---+
      |  1|  2|  5|
      |  3|  3|  5|
      |  4|  4|  5|
      +---+---+---+
      
      scala> df3.join(df4, Seq("a", "b"), "inner").show()
      +---+---+---+---+---+
      |  a|  b|  c|  d|  e|
      +---+---+---+---+---+
      |  1|  2|  3|  4|  5|
      +---+---+---+---+---+
      

      If call persist on df3, the output is correct

      test2.scala
      scala> df3.persist
      res32: df3.type = [a: int, b: int ... 2 more fields]
      
      scala> df3.join(df4, Seq("a", "b"), "inner").show()
      +---+---+----+----+---+
      |  a|  b|   c|   d|  e|
      +---+---+----+----+---+
      |  1|  2|   3|   4|  5|
      |  3|  3|   3|null|  5|
      |  4|  4|null|   4|  5|
      +---+---+----+----+---+
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              linbojin Linbo
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: