Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8079

NPE when HadoopFsRelation.prepareForWriteJob throws exception

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.0
    • Fix Version/s: 1.4.1, 1.5.0
    • Component/s: SQL
    • Labels:
      None

      Description

      Take ParquetRelation2 as an example, the following Spark shell code may cause an unexpected NPE:

      import sqlContext._
      import sqlContext.implicits._
      
      range(1, 3).select($"id" as "a b").write.format("parquet").save("file:///tmp/foo")
      

      Exceptions thrown:

      import sqlContext._
      import sqlContext.implicits._
      
      range(1, 3).select($"id" as "a b").write.format("parquet").save("file:///tmp/foo")
      
      java.lang.RuntimeException: Attribute name "a b" contains invalid character(s) among " ,;{}()   =". Please use alias to rename it.
              at scala.sys.package$.error(package.scala:27)
              at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$checkSpecialCharacters$2.apply(ParquetTypes.scala:414)
              at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$checkSpecialCharacters$2.apply(ParquetTypes.scala:412)
              at scala.collection.immutable.List.foreach(List.scala:318)
              at org.apache.spark.sql.parquet.ParquetTypesConverter$.checkSpecialCharacters(ParquetTypes.scala:412)
              at org.apache.spark.sql.parquet.ParquetTypesConverter$.convertToString(ParquetTypes.scala:423)
              at org.apache.spark.sql.parquet.RowWriteSupport$.setSchema(ParquetTableSupport.scala:383)
              at org.apache.spark.sql.parquet.ParquetRelation2.prepareJobForWrite(newParquet.scala:230)
              ...
      java.lang.NullPointerException
              at org.apache.spark.sql.sources.BaseWriterContainer.abortJob(commands.scala:372)
              at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.insert(commands.scala:137)
              at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.run(commands.scala:114)
              at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
              at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
              ...
      

      Note that the first RuntimeException is expected, while the following NPE is not.

      The reason of the NPE is that, BaseWriterContainer.driverSideSetup() calls relation.prepareForWriteJob() AND initializes the OutputCommitter used for the subsequent write job. However, if the former throws an exception, the latter is not properly initialized, thus an NPE is thrown when aborting the job because the OutputCommitter is still null.

        Attachments

          Activity

            People

            • Assignee:
              lian cheng Cheng Lian
              Reporter:
              lian cheng Cheng Lian
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: