Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20593

Writing Parquet: Cannot build an empty group

    XMLWordPrintableJSON

Details

    • Question
    • Status: Closed
    • Minor
    • Resolution: Not A Problem
    • 2.1.1
    • None
    • Spark Core, Spark Shell
    • None
    • I use Apache Spark 2.1.1 (used 2.1.0 and it was the same, switched today). Tested only on Mac

    Description

      Hi,

      This is my first ticket and I apologize for/if I'm doing certain things in an improper way.

      I have a dataset:

      root
      |-- muons: array (nullable = true)
      |    |-- element: struct (containsNull = true)
      |    |    |-- reco::Candidate: struct (nullable = true)
      |    |    |-- qx3_: integer (nullable = true)
      |    |    |-- pt_: float (nullable = true)
      |    |    |-- eta_: float (nullable = true)
      |    |    |-- phi_: float (nullable = true)
      |    |    |-- mass_: float (nullable = true)
      |    |    |-- vertex_: struct (nullable = true)
      |    |    |    |-- fCoordinates: struct (nullable = true)
      |    |    |    |    |-- fX: float (nullable = true)
      |    |    |    |    |-- fY: float (nullable = true)
      |    |    |    |    |-- fZ: float (nullable = true)
      |    |    |-- pdgId_: integer (nullable = true)
      |    |    |-- status_: integer (nullable = true)
      |    |    |-- cachePolarFixed_: struct (nullable = true)
      |    |    |-- cacheCartesianFixed_: struct (nullable = true)
      

      As you can see, there are 3 empty structs in this schema. I know 100% that I can read/manipulate/do whatever. However, when I try writing to disk in parquet, I get the following Exception:

      ds.write.format("parquet").save(outputPathName):

      java.lang.IllegalStateException: Cannot build an empty group
      at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
      at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
      at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
      at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
      at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
      at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
      at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
      at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
      So, basically I would like to understand if it's a bug or an intended behavior??? I also assume that it's related to the empty structs. Any help would be really appreciated!

      I've quickly created stripped version and that one works without any issues!
      For reference, I put a link to the original question on SO[1]

      VK

      [1] http://stackoverflow.com/questions/43767358/apache-spark-parquet-cannot-build-an-empty-group

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              vkhristenko Viktor Khristenko
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: