Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20590

Map default input data source formats to inlined classes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.2.0
    • SQL
    • None

    Description

      One of the common usability problems around reading data in spark (particularly CSV) is that there can often be a conflict between different readers in the classpath.

      As an example, if someone launches a 2.x spark shell with the spark-csv package in the classpath, Spark currently fails in an extremely unfriendly way

      ./bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0
      scala> val df = spark.read.csv("/foo/bar.csv")
      java.lang.RuntimeException: Multiple sources found for csv (org.apache.spark.sql.execution.datasources.csv.CSVFileFormat, com.databricks.spark.csv.DefaultSource15), please specify the fully qualified class name.
        at scala.sys.package$.error(package.scala:27)
        at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:574)
        at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:85)
        at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:85)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:295)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
        at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:533)
        at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:412)
        ... 48 elided
      

      This JIRA proposes a simple way of fixing this error by always mapping default input data source formats to inlined classes (that exist in Spark).

      ./bin/spark-shell --packages com.databricks:spark-csv_2.11:1.5.0
      scala> val df = spark.read.csv("/foo/bar.csv")
      df: org.apache.spark.sql.DataFrame = [_c0: string]
      

      Attachments

        Activity

          People

            gurwls223 Hyukjin Kwon
            sameerag Sameer Agarwal
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: