[SPARK-25199] InferSchema "all Strings" if one of many CSVs is empty - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Cannot Reproduce
Affects Version/s: 2.2.1
Fix Version/s: None
Component/s: Input/Output
Labels:
- newbie
Environment:

I discovered this on AWS Glue, which uses Spark 2.2.1

Description

Spark can load multiple CSV files in one read:

df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("/*.csv")

However, if one of these files is empty (though it has a header), Spark will set all column types to "String"

Spark should skip a file for inference if it contains no (non-header) rows

Attachments

Issue Links

links to

[Github] Pull Request #22177 (yunjzhang)

Activity

People

Assignee:: Unassigned

Reporter:: Neil McGuigan

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 22/Aug/18 18:07

Updated:: 25/Aug/18 21:02

Resolved:: 25/Aug/18 21:02