[SPARK-11102] Uninformative exception when specifing non-exist input for JSON data source - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Incomplete
Affects Version/s: 1.5.1
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed

Description

If I specify a non-exist input path for json data source, the following exception will be thrown, it is not readable.

15/10/14 16:14:39 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 19.9 KB, free 251.4 KB)
15/10/14 16:14:39 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.3.3:54725 (size: 19.9 KB, free: 2.2 GB)
15/10/14 16:14:39 INFO SparkContext: Created broadcast 0 from json at <console>:19
java.io.IOException: No input paths specified in job
	at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:201)
	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
	at org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1.apply(RDD.scala:1087)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
	at org.apache.spark.rdd.RDD.treeAggregate(RDD.scala:1085)
	at org.apache.spark.sql.execution.datasources.json.InferSchema$.apply(InferSchema.scala:58)
	at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6.apply(JSONRelation.scala:105)
	at org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$6.apply(JSONRelation.scala:100)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema$lzycompute(JSONRelation.scala:100)
	at org.apache.spark.sql.execution.datasources.json.JSONRelation.dataSchema(JSONRelation.scala:99)
	at org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:561)
	at org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:560)
	at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:37)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:106)
	at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:221)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26)
	at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28)
	at $iwC$$iwC$$iwC$$iwC.<init>(<console>:30)
	at $iwC$$iwC$$iwC.<init>(<console>:32)
	at $iwC$$iwC.<init>(<console>:34)
	at $iwC.<init>(<console>:36)

Attachments

Issue Links

duplicates

SPARK-10709 When loading a json dataset as a data frame, if the input path is wrong, the error message is very confusing

Resolved

is duplicated by

SPARK-10709 When loading a json dataset as a data frame, if the input path is wrong, the error message is very confusing

Resolved

links to

[Github] Pull Request #9142 (zjffdu)

[Github] Pull Request #9223 (zjffdu)

[Github] Pull Request #9490 (zjffdu)

Activity

People

Assignee:: Unassigned

Reporter:: Jeff Zhang

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 14/Oct/15 08:35

Updated:: 21/May/19 04:34

Resolved:: 21/May/19 04:34