Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19919

Defer input path validation into DataSource in CSV datasource

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Trivial
    • Resolution: Fixed
    • 2.2.0
    • 2.2.0
    • SQL
    • None

    Description

      Currently, if other datasources fail to infer the schema, it returns None and then this is being validated in DataSource as below:

      scala> spark.read.json("emptydir")
      org.apache.spark.sql.AnalysisException: Unable to infer schema for JSON. It must be specified manually.;
      
      scala> spark.read.orc("emptydir")
      org.apache.spark.sql.AnalysisException: Unable to infer schema for ORC. It must be specified manually.;
      
      scala> spark.read.parquet("emptydir")
      org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.;
      

      However, CSV it checks it within the datasource implementation and throws another exception message as below:

      scala> spark.read.csv("emptydir")
      java.lang.IllegalArgumentException: requirement failed: Cannot infer schema from an empty set of files
      

      We could remove this duplicated check and validate this in one place in the same way with the same message.

      Attachments

        Activity

          People

            hyukjin.kwon Hyukjin Kwon
            hyukjin.kwon Hyukjin Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: