Description
Executing read().json() of SQLContext e.g. DataFrameReader raises a MatchError with a stacktrace as follows while trying to read JSON data:
15/07/14 11:25:26 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 15/07/14 11:25:26 INFO DAGScheduler: Job 0 finished: json at Example.java:23, took 6.981330 s Exception in thread "main" scala.MatchError: StringType (of class org.apache.spark.sql.types.StringType$) at org.apache.spark.sql.json.InferSchema$.apply(InferSchema.scala:58) at org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:139) at org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:138) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.json.JSONRelation.schema$lzycompute(JSONRelation.scala:137) at org.apache.spark.sql.json.JSONRelation.schema(JSONRelation.scala:137) at org.apache.spark.sql.sources.LogicalRelation.<init>(LogicalRelation.scala:30) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104) at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:213) at com.hp.sparkdemo.Example.main(Example.java:23) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 15/07/14 11:25:26 INFO SparkContext: Invoking stop() from shutdown hook 15/07/14 11:25:26 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4040 15/07/14 11:25:26 INFO DAGScheduler: Stopping DAGScheduler 15/07/14 11:25:26 INFO SparkDeploySchedulerBackend: Shutting down all executors 15/07/14 11:25:26 INFO SparkDeploySchedulerBackend: Asking each executor to shut down 15/07/14 11:25:26 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
Offending code snippet (around line 23):
JavaSparkContext sctx = new JavaSparkContext(sparkConf); SQLContext ctx = new SQLContext(sctx); DataFrame frame = ctx.read().json(facebookJSON); frame.printSchema();
The exception is reproducable using the following JSON:
{ "data": [ { "id": "X999_Y999", "from": { "name": "Tom Brady", "id": "X12" }, "message": "Looking forward to 2010!", "actions": [ { "name": "Comment", "link": "http://www.facebook.com/X999/posts/Y999" }, { "name": "Like", "link": "http://www.facebook.com/X999/posts/Y999" } ], "type": "status", "created_time": "2010-08-02T21:27:44+0000", "updated_time": "2010-08-02T21:27:44+0000" }, { "id": "X998_Y998", "from": { "name": "Peyton Manning", "id": "X18" }, "message": "Where's my contract?", "actions": [ { "name": "Comment", "link": "http://www.facebook.com/X998/posts/Y998" }, { "name": "Like", "link": "http://www.facebook.com/X998/posts/Y998" } ], "type": "status", "created_time": "2010-08-02T21:27:44+0000", "updated_time": "2010-08-02T21:27:44+0000" } ] }