Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-14854

Left outer join produces incorrect output when the join condition does not have left table key

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 1.5.1
    • None
    • Spark Core
    • None

    Description

      import org.apache.spark.sql._
      import org.apache.spark.sql.types._

      val s = StructType(StructField("num", StringType, true)::Nil)
      val s1 = StructType(StructField("num1", StringType, true)::Nil)

      val m = sc.textFile("file:/tmp/master.txt").map(_.split(",")).map(p=>Row(p(0)))
      val d = sc.textFile("file:/tmp/detail.txt").map(_.split(",")).map(p=>Row(p(0)))
      val m1 = sqlContext.createDataFrame(m, s1)
      val d1 = sqlContext.createDataFrame(d, s)
      val j1 = d1.join(m1,$"num1".===(lit(null)),"left_outer");
      j1.take(1)

      Returns empty data set. Left table has data.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kdhuria kanika dhuria
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: