Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8093

Spark 1.4 branch's new JSON schema inference has changed the behavior of handling inner empty JSON object.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.4.0
    • Fix Version/s: 1.4.1, 1.5.0
    • Component/s: SQL
    • Labels:
      None

      Description

      This is similar to SPARK-3365. Sample json is attached. Code to reproduce

      var jsonDF = read.json("/tmp/t1.json")
      jsonDF.write.parquet("/tmp/t1.parquet")
      

      The 'integration' object is empty in the json.
      StackTrace:

      ....
      Caused by: java.io.IOException: Could not read footer: java.lang.IllegalStateException: Cannot build an empty group
      	at parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:238)
      	at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:369)
      	at org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$metadataCache$lzycompute(newParquet.scala:154)
      	at org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$metadataCache(newParquet.scala:152)
      	at org.apache.spark.sql.parquet.ParquetRelation2.refresh(newParquet.scala:197)
      	at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.insert(commands.scala:134)
      	... 69 more
      Caused by: java.lang.IllegalStateException: Cannot build an empty group
      

        Attachments

        1. t1.json
          1 kB
          Harish Butani

          Activity

            People

            • Assignee:
              NathanHowell Nathan Howell
              Reporter:
              rhbutani Harish Butani
              Shepherd:
              Yin Huai
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: