Details
Description
In a kerberized cluster, when Spark reads a file path (e.g. `people.json`), it warns with a wrong error message during looking up `people.json/_spark_metadata`. The root cause of this istuation is the difference between `LocalFileSystem` and `DistributedFileSystem`. `LocalFileSystem.exists()` returns `false`, but `DistributedFileSystem.exists` raises Exception.
scala> spark.version res0: String = 2.4.0-SNAPSHOT scala> spark.read.json("file:///usr/hdp/current/spark-client/examples/src/main/resources/people.json").show +----+-------+ | age| name| +----+-------+ |null|Michael| | 30| Andy| | 19| Justin| +----+-------+ scala> spark.read.json("hdfs:///tmp/people.json") 18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for metadata directory. 18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for metadata directory. res6: org.apache.spark.sql.DataFrame = [age: bigint, name: string]
scala> spark.version res0: String = 2.2.1 scala> spark.read.json("hdfs:///tmp/people.json").show 18/02/15 05:28:02 WARN FileStreamSink: Error while looking for metadata directory. 18/02/15 05:28:02 WARN FileStreamSink: Error while looking for metadata directory.
scala> spark.version res0: String = 2.1.2 scala> spark.read.json("hdfs:///tmp/people.json").show 18/02/15 05:29:53 WARN DataSource: Error while looking for metadata directory. +----+-------+ | age| name| +----+-------+ |null|Michael| | 30| Andy| | 19| Justin| +----+-------+
scala> spark.version res0: String = 2.0.2 scala> spark.read.json("hdfs:///tmp/people.json").show 18/02/15 05:25:24 WARN DataSource: Error while looking for metadata directory.