[SPARK-13137] NullPoingException in schema inference for CSV when the first line is empty - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.0
Fix Version/s: 2.0.0
Component/s: SQL
Labels:
None

Description

When the first line is empty and it tries to infer schema, this emits the exception below:

java.lang.NullPointerException was thrown.
java.lang.NullPointerException
	at scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:114)
	at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:114)
	at scala.collection.IndexedSeqOptimized$class.zipWithIndex(IndexedSeqOptimized.scala:93)
	at scala.collection.mutable.ArrayOps$ofRef.zipWithIndex(ArrayOps.scala:108)
	at org.apache.spark.sql.execution.datasources.csv.CSVRelation.inferSchema(CSVRelation.scala:137)
	at org.apache.spark.sql.execution.datasources.csv.CSVRelation.dataSchema$lzycompute(CSVRelation.scala:50)
	at org.apache.spark.sql.execution.datasources.csv.CSVRelation.dataSchema(CSVRelation.scala:48)
	at org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:666)
	at org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:665)
	at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:39)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:115)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)

This looks because it fails to skip the empty lines in findFirstLine() at CSVRelation.

Attachments

Issue Links

links to

[Github] Pull Request #11023 (HyukjinKwon)

Activity

People

Assignee:: Hyukjin Kwon

Reporter:: Hyukjin Kwon

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 02/Feb/16 07:58

Updated:: 12/Dec/22 18:10

Resolved:: 21/Feb/16 21:21