Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16506

Subsequent dataframe join dont work

    XMLWordPrintableJSON

Details

    Description

      Here is the example code:

      import sql.implicits._

      val objs = sc.parallelize(Seq(("1", "um"), ("2", "dois"), ("3", "tres"))).toDF.selectExpr("_1 as id", "_2 as name")

      val rawj = sc.parallelize(Seq(("1", "2"), ("1", "3"), ("2", "3"), ("2", "1"))).toDF.selectExpr("_1 as id1", "_2 as id2")

      val join1 = rawj.join(objs, objs("id") === rawj("id1"))
      .withColumnRenamed("id", "anything")

      println("works...")
      val join2a = join1.join(objs, 'id2 === 'id )
      join2a.show()

      println("works...")
      val join2b = objs.join(join1, objs("id") === join1("id2"))
      join2b.show()

      println("do not works...")
      val join2c = join1.join(objs, join1("id2") === objs("id") )
      join2c.show()

      Fisrt two joins work. But the last one gave me this error:

      Exception in thread "main" org.apache.spark.sql.AnalysisException: resolved attribute(s) id#2 missing from anything#8,name#14,name#3,id1#6,id2#7,id#13 in operator !Join Inner, Some((id2#7 = id#2));
      at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
      at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44)
      at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:183)
      at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
      at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:105)

      Without the first column rename, the error happens in silence since the join get empty:

      import sql.implicits._

      val objs = sc.parallelize(Seq(("1", "um"), ("2", "dois"), ("3", "tres"))).toDF.selectExpr("_1 as id", "_2 as name")

      val rawj = sc.parallelize(Seq(("1", "2"), ("1", "3"), ("2", "3"), ("2", "1"))).toDF.selectExpr("_1 as id1", "_2 as id2")

      val join1 = rawj.join(objs, objs("id") === rawj("id1"))

      println("do not works...")
      val join2c = join1.join(objs, join1("id2") === objs("id") )
      join2c.show()

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              timotta Tiago Albineli Motta
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: