Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8093

Spark 1.4 branch's new JSON schema inference has changed the behavior of handling inner empty JSON object.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.4.0
    • 1.4.1, 1.5.0
    • SQL
    • None

    Description

      This is similar to SPARK-3365. Sample json is attached. Code to reproduce

      var jsonDF = read.json("/tmp/t1.json")
      jsonDF.write.parquet("/tmp/t1.parquet")
      

      The 'integration' object is empty in the json.
      StackTrace:

      ....
      Caused by: java.io.IOException: Could not read footer: java.lang.IllegalStateException: Cannot build an empty group
      	at parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:238)
      	at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:369)
      	at org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$metadataCache$lzycompute(newParquet.scala:154)
      	at org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$metadataCache(newParquet.scala:152)
      	at org.apache.spark.sql.parquet.ParquetRelation2.refresh(newParquet.scala:197)
      	at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.insert(commands.scala:134)
      	... 69 more
      Caused by: java.lang.IllegalStateException: Cannot build an empty group
      

      Attachments

        1. t1.json
          1 kB
          Harish Butani

        Activity

          People

            NathanHowell Nathan Howell
            rhbutani Harish Butani
            Yin Huai Yin Huai
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: