Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23734

InvalidSchemaException While Saving ALSModel

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.3.1
    • Component/s: ML
    • Environment:

      macOS 10.13.2

      Scala 2.11.8

      Spark 2.3.0  v2.3.0-rc5 (Feb 22 2018)

      Description

      After fitting an ALSModel, get following error while saving the model:

      Caused by: org.apache.parquet.schema.InvalidSchemaException: A group type can not be empty. Parquet does not support empty group without leaves. Empty group: spark_schema

      Exactly the same code ran ok on 2.2.1.

      Same issue also occurs on other ALSModels we have.

      To reproduce

      Get ALSExample: https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/ALSExample.scala and add the following line to save the model right before "spark.stop".

         model.write.overwrite().save("SparkExampleALSModel") 

      Stack Trace

      Exception in thread "main" java.lang.ExceptionInInitializerError
      at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$setSchema$2.apply(ParquetWriteSupport.scala:444)
      at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$setSchema$2.apply(ParquetWriteSupport.scala:444)
      at scala.collection.immutable.List.foreach(List.scala:392)
      at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$.setSchema(ParquetWriteSupport.scala:444)
      at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.prepareWrite(ParquetFileFormat.scala:112)
      at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:140)
      at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:154)
      at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
      at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
      at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
      at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
      at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
      at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
      at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
      at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654)
      at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
      at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
      at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
      at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
      at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225)
      at org.apache.spark.ml.recommendation.ALSModel$ALSModelWriter.saveImpl(ALS.scala:510)
      at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:103)
      at com.vitalmove.model.ALSExample$.main(ALSExample.scala:83)
      at com.vitalmove.model.ALSExample.main(ALSExample.scala)
      Caused by: org.apache.parquet.schema.InvalidSchemaException: A group type can not be empty. Parquet does not support empty group without leaves. Empty group: spark_schema
      at org.apache.parquet.schema.GroupType.<init>(GroupType.java:92)
      at org.apache.parquet.schema.GroupType.<init>(GroupType.java:48)
      at org.apache.parquet.schema.MessageType.<init>(MessageType.java:50)
      at org.apache.parquet.schema.Types$MessageTypeBuilder.named(Types.java:1256)
      at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.<init>(ParquetSchemaConverter.scala:567)
      at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.<clinit>(ParquetSchemaConverter.scala)
       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              spoon Stanley Poon
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: