Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12420 Have a built-in CSV data source implementation
  3. SPARK-13137

NullPoingException in schema inference for CSV when the first line is empty

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.0.0
    • SQL
    • None

    Description

      When the first line is empty and it tries to infer schema, this emits the exception below:

      java.lang.NullPointerException was thrown.
      java.lang.NullPointerException
      	at scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:114)
      	at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:114)
      	at scala.collection.IndexedSeqOptimized$class.zipWithIndex(IndexedSeqOptimized.scala:93)
      	at scala.collection.mutable.ArrayOps$ofRef.zipWithIndex(ArrayOps.scala:108)
      	at org.apache.spark.sql.execution.datasources.csv.CSVRelation.inferSchema(CSVRelation.scala:137)
      	at org.apache.spark.sql.execution.datasources.csv.CSVRelation.dataSchema$lzycompute(CSVRelation.scala:50)
      	at org.apache.spark.sql.execution.datasources.csv.CSVRelation.dataSchema(CSVRelation.scala:48)
      	at org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:666)
      	at org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:665)
      	at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:39)
      	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:115)
      	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
      

      This looks because it fails to skip the empty lines in findFirstLine() at CSVRelation.

      Attachments

        Activity

          People

            gurwls223 Hyukjin Kwon
            gurwls223 Hyukjin Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: