Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
1.5.1
-
None
Description
If I specify a non-exist input path for json data source, the following exception will be thrown, it is not readable.
15/10/14 16:14:39 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 19.9 KB, free 251.4 KB) 15/10/14 16:14:39 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.3.3:54725 (size: 19.9 KB, free: 2.2 GB) 15/10/14 16:14:39 INFO SparkContext: Created broadcast 0 from json at <console>:19 java.io.IOException: No input paths specified in job at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:201) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1.apply(RDD.scala:1087) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) at org.apache.spark.rdd.RDD.treeAggregate(RDD.scala:1085) at org.apache.spark.sql.execution.datasources.json.InferSchema$.apply(InferSchema.scala:58) at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6.apply(JSONRelation.scala:105) at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6.apply(JSONRelation.scala:100) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema$lzycompute(JSONRelation.scala:100) at org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema(JSONRelation.scala:99) at org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:561) at org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:560) at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:106) at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:221) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:30) at $iwC$$iwC$$iwC.<init>(<console>:32) at $iwC$$iwC.<init>(<console>:34) at $iwC.<init>(<console>:36)
Attachments
Issue Links
- duplicates
-
SPARK-10709 When loading a json dataset as a data frame, if the input path is wrong, the error message is very confusing
- Resolved
- is duplicated by
-
SPARK-10709 When loading a json dataset as a data frame, if the input path is wrong, the error message is very confusing
- Resolved
- links to