[SPARK-25890] Null rows are ignored with Ctrl-A as a delimiter when reading a CSV file. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Cannot Reproduce
Affects Version/s: 2.3.2
Fix Version/s: None
Component/s: Spark Shell, SQL
Labels:
None

Description

Reading a Ctrl-A delimited CSV file ignores rows with all null values. However a comma delimited CSV file doesn't.

Reproduction in spark-shell:

import org.apache.spark.sql._
import org.apache.spark.sql.types._

val l = List(List(1, 2), List(null,null), List(2,3))
val datasetSchema = StructType(List(StructField("colA", IntegerType, true), StructField("colB", IntegerType, true)))
val rdd = sc.parallelize(l).map(item ⇒ Row.fromSeq(item.toSeq))
val df = spark.createDataFrame(rdd, datasetSchema)

df.show()

colA	colB
1	2
null	null
2	3

df.write.option("delimiter", "\u0001").option("header", "true").csv("/ctrl-a-separated.csv")
df.write.option("delimiter", ",").option("header", "true").csv("/comma-separated.csv")

val commaDf = spark.read.option("header", "true").option("delimiter", ",").csv("/comma-separated.csv")
commaDf.show

colA	colB
1	2
2	3
null	null

val ctrlaDf = spark.read.option("header", "true").option("delimiter", "\u0001").csv("/ctrl-a-separated.csv")
ctrlaDf.show

colA	colB
1	2
2	3

As seen above, for Ctrl-A delimited CSV, rows containing only null values are ignored.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Lakshminarayan Kamath

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 30/Oct/18 22:18

Updated:: 12/Dec/22 18:11

Resolved:: 05/Nov/18 12:54