Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23372

Writing empty struct in parquet fails during execution. It should fail earlier during analysis.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.3.0
    • 2.4.0
    • SQL
    • None

    Description

      Running

      spark.emptyDataFrame.write.format("parquet").mode("overwrite").save(path)

      Results in

       org.apache.parquet.schema.InvalidSchemaException: Cannot write a schema with an empty group: message spark_schema {
       }
      
      at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:27)
       at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:37)
       at org.apache.parquet.schema.MessageType.accept(MessageType.java:58)
       at org.apache.parquet.schema.TypeUtil.checkValidWriteSchema(TypeUtil.java:23)
       at org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:225)
       at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:342)
       at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:302)
       at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:37)
       at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:151)
       at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:376)
       at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:387)
       at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:278)
       at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:276)
       at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411)
       at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:281)
       at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:206)
       at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:205)
       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
       at org.apache.spark.scheduler.Task.run(Task.scala:109)
       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
       at java.lang.Thread.run(Thread.
       

      We should detect this earlier in the processing and raise the error.

      Attachments

        Activity

          People

            dkbiswal Dilip Biswal
            dkbiswal Dilip Biswal
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: