[SPARK-19628] Duplicate Spark jobs in 2.1.0 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.1.0
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed

Description

After upgrading to Spark 2.1.0 we noticed that they are duplicate jobs executed. Going back to Spark 2.0.1 they are gone again

import org.apache.spark.sql._

object DoubleJobs {
  def main(args: Array[String]) {

    System.setProperty("hadoop.home.dir", "/tmp");

    val sparkSession: SparkSession = SparkSession.builder
      .master("local[4]")
      .appName("spark session example")
      .config("spark.driver.maxResultSize", "6G")
      .config("spark.sql.orc.filterPushdown", true)
      .config("spark.sql.hive.metastorePartitionPruning", true)
      .getOrCreate()

    sparkSession.sqlContext.setConf("spark.sql.orc.filterPushdown", "true")

    val paths = Seq(
      ""//some orc source
    )

    def dataFrame(path: String): DataFrame = {
      sparkSession.read.orc(path)
    }

    paths.foreach(path => {
      dataFrame(path).show(20)
    })
  }
}

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

spark2.0.1.png
16/Feb/17 12:37
352 kB
Jork Zijlstra
spark2.1.0.png
16/Feb/17 12:37
407 kB
Jork Zijlstra
spark2.1.0-examplecode.png
16/Feb/17 13:01
104 kB
Jork Zijlstra

Activity

People

Assignee:: Unassigned

Reporter:: Jork Zijlstra

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 16/Feb/17 12:36

Updated:: 21/May/19 04:16

Resolved:: 21/May/19 04:16