Description
Currently, if other datasources fail to infer the schema, it returns None and then this is being validated in DataSource as below:
scala> spark.read.json("emptydir") org.apache.spark.sql.AnalysisException: Unable to infer schema for JSON. It must be specified manually.;
scala> spark.read.orc("emptydir") org.apache.spark.sql.AnalysisException: Unable to infer schema for ORC. It must be specified manually.;
scala> spark.read.parquet("emptydir") org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.;
However, CSV it checks it within the datasource implementation and throws another exception message as below:
scala> spark.read.csv("emptydir")
java.lang.IllegalArgumentException: requirement failed: Cannot infer schema from an empty set of files
We could remove this duplicated check and validate this in one place in the same way with the same message.