Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21216

Streaming DataFrames fail to join with Hive tables

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.1
    • Fix Version/s: 2.3.0
    • Component/s: Structured Streaming
    • Labels:
      None
    • Target Version/s:

      Description

      The following code will throw a cryptic exception:

      import org.apache.spark.sql.execution.streaming.MemoryStream
          import testImplicits._
      
          implicit val _sqlContext = spark.sqlContext
      
          Seq((1, "one"), (2, "two"), (4, "four")).toDF("number", "word").createOrReplaceTempView("t1")
          // Make a table and ensure it will be broadcast.
          sql("""CREATE TABLE smallTable(word string, number int)
                |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
                |STORED AS TEXTFILE
              """.stripMargin)
      
          sql(
            """INSERT INTO smallTable
              |SELECT word, number from t1
            """.stripMargin)
      
          val inputData = MemoryStream[Int]
          val joined = inputData.toDS().toDF()
            .join(spark.table("smallTable"), $"value" === $"number")
      
          val sq = joined.writeStream
            .format("memory")
            .queryName("t2")
            .start()
          try {
            inputData.addData(1, 2)
      
            sq.processAllAvailable()
          } finally {
            sq.stop()
          }
      

      If someone creates a HiveSession, the planner in `IncrementalExecution` doesn't take into account the Hive scan strategies

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                brkyvz Burak Yavuz
                Reporter:
                brkyvz Burak Yavuz
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: