Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6430

Cannot resolve column correctlly when using left semi join

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 1.3.0
    • None
    • SQL
    • None
    • Spark 1.3.0 on yarn mode

    Description

      My code:

      case class TestData(key: Int, value: String)
      case class TestData2(a: Int, b: Int)

      import org.apache.spark.sql.execution.joins._
      import sqlContext.implicits._

      val testData = sc.parallelize(
      (1 to 100).map(i => TestData(i, i.toString))).toDF()
      testData.registerTempTable("testData")

      val testData2 = sc.parallelize(
      TestData2(1, 1) ::
      TestData2(1, 2) ::
      TestData2(2, 1) ::
      TestData2(2, 2) ::
      TestData2(3, 1) ::
      TestData2(3, 2) :: Nil, 2).toDF()
      testData2.registerTempTable("testData2")

      //val tmp = sqlContext.sql("SELECT * FROM testData LEFT SEMI JOIN testData2 ON key = a ")
      val tmp = sqlContext.sql("SELECT testData2.b, count(testData2.b) FROM testData LEFT SEMI JOIN testData2 ON key = testData2.a group by testData2.b")
      tmp.explain()

      Error log:

      org.apache.spark.sql.AnalysisException: cannot resolve 'testData2.b' given input columns key, value; line 1 pos 108
      at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
      at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$apply$3$$anonfun$apply$1.applyOrElse(CheckAnalysis.scala:48)
      at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$apply$3$$anonfun$apply$1.applyOrElse(CheckAnalysis.scala:45)
      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:250)
      at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:250)
      at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:50)
      at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:249)
      at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionUp$1(QueryPlan.scala:103)
      at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2$$anonfun$apply$2.apply(QueryPlan.scala:117)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
      at scala.collection.immutable.List.foreach(List.scala:318)

      SELECT * FROM testData LEFT SEMI JOIN testData2 ON key = a

      is correct,

      SELECT a FROM testData LEFT SEMI JOIN testData2 ON key = a
      SELECT max(value) FROM testData LEFT SEMI JOIN testData2 ON key = a group by b
      SELECT max(value) FROM testData LEFT SEMI JOIN testData2 ON key = testData2.a group by testData2.b
      SELECT testData2.b, count(testData2.b) FROM testData LEFT SEMI JOIN testData2 ON key = testData2.a group by testData2.b

      are incorrect.

      Attachments

        Activity

          People

            marmbrus Michael Armbrust
            zzcclp Zhichao Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: