[SPARK-28092] Spark cannot load files with COLON(:) char if not specified full path - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Invalid
Affects Version/s: 2.4.3
Fix Version/s: None
Component/s: Spark Core
Labels:
None
Environment:

Cloudera 6.2

Spark latest parcel (I think 2.4.3)

Description

Scenario:

I have CSV files in S3 bucket like this:

s3a://bucket/prefix/myfile_2019:04:05.csv

s3a://bucket/prefix/myfile_2019:04:06.csv

Now when I try to load files with something like:
df = spark.read.load("s3://bucket/prefix/*", format="csv", sep=":", inferSchema="true", header="true")

It fails on error about URI (sorry don't have here exact exception), but when I list all files from S3 and provide path like array:
df = spark.read.load(path=["s3://bucket/prefix/myfile_2019:04:05.csv","s3://bucket/prefix/myfile_2019:04:05.csv"], format="csv", sep=":", inferSchema="true", header="true")

It works, the reason is COLON character in the name of files as per my observations.

Attachments

Issue Links

relates to

HADOOP-14829 Path should support colon

Resolved

HADOOP-14217 Object Storage: support colon in object path

Open

HADOOP-14235 S3A Path does not understand colon (:) when globbing

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Ladislav Jech

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 18/Jun/19 07:52

Updated:: 26/Oct/19 23:14

Resolved:: 26/Oct/19 23:14