Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3390

sqlContext.jsonRDD fails on a complex structure of JSON array and JSON object nesting

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.0.2
    • 1.2.0
    • SQL
    • None

    Description

      I found a valid JSON string, but which Spark SQL fails to correctly parse:

      Try running these lines in a spark-shell to reproduce:

      val sqlContext = new org.apache.spark.sql.SQLContext(sc)
      val badJson = "{\"foo\": [[{\"bar\": 0}]]}"
      val rdd = sc.parallelize(badJson :: Nil)
      sqlContext.jsonRDD(rdd).count()
      

      I've tried running these lines on the 1.0.2 release as well latest Spark1.1 release candidate, and I get this stack trace:

      org.apache.spark.SparkException: Job aborted due to stage failure: Task 2.0:3 failed 1 times, most recent failure: Exception failure in TID 7 on host localhost: scala.MatchError: StructType(List()) (of class org.apache.spark.sql.catalyst.types.StructType)
      org.apache.spark.sql.json.JsonRDD$.enforceCorrectType(JsonRDD.scala:333)
      org.apache.spark.sql.json.JsonRDD$$anonfun$enforceCorrectType$1.apply(JsonRDD.scala:335)
      scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
      scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
      scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
      scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
      scala.collection.AbstractTraversable.map(Traversable.scala:105)
      org.apache.spark.sql.json.JsonRDD$.enforceCorrectType(JsonRDD.scala:335)
      org.apache.spark.sql.json.JsonRDD$$anonfun$enforceCorrectType$1.apply(JsonRDD.scala:335)
      scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
      scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
      scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
      scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
      scala.collection.AbstractTraversable.map(Traversable.scala:105)
      org.apache.spark.sql.json.JsonRDD$.enforceCorrectType(JsonRDD.scala:335)
      org.apache.spark.sql.json.JsonRDD$$anonfun$org$apache$spark$sql$json$JsonRDD$$asRow$1$$anonfun$apply$12.apply(JsonRDD.scala:365)
      scala.Option.map(Option.scala:145)
      org.apache.spark.sql.json.JsonRDD$$anonfun$org$apache$spark$sql$json$JsonRDD$$asRow$1.apply(JsonRDD.scala:364)
      org.apache.spark.sql.json.JsonRDD$$anonfun$org$apache$spark$sql$json$JsonRDD$$asRow$1.apply(JsonRDD.scala:349)
      scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
      org.apache.spark.sql.json.JsonRDD$.org$apache$spark$sql$json$JsonRDD$$asRow(JsonRDD.scala:349)
      org.apache.spark.sql.json.JsonRDD$$anonfun$createLogicalPlan$1.apply(JsonRDD.scala:51)
      org.apache.spark.sql.json.JsonRDD$$anonfun$createLogicalPlan$1.apply(JsonRDD.scala:51)
      scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
      scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
      ....

      Attachments

        Activity

          People

            yhuai Yin Huai
            vidaha Vida Ha
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: