Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.6.2
-
None
-
None
Description
Error thrown from Spark Sql interpreter is truncated on Zeppelin NoteBook. For example: Below Spark program which tries to access people1.txt from hdfs /tmp which is not present will fail and error message will be thrown. Top of the error message is truncated.
%spark val people = sc.textFile("hdfs:///tmp/people1.txt") val schemaString = "name age" import org.apache.spark.sql.Row; import org.apache.spark.sql.types.{StructType,StructField,StringType}; val schema =StructType(schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true))) val rowRDD = people.map(_.split(",")).map(p => Row(p(0), p(1).trim)) val peopleDataFrame = sqlContext.createDataFrame(rowRDD, schema) peopleDataFrame.registerTempTable("people") val results = sqlContext.sql("SELECT name FROM people") results.map(t => "Name: " + t(0)).collect().foreach(println)
Truncated Error Message is attached. Top of the useful error message truncated is below.
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs:/tmp/people1.txt at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)