Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
2.8.1
-
None
-
None
-
EC2, AWS
Description
S3 paths, colons ":" are valid character in S3 paths. However, the Java URI class, which is used in the Path class, does not allow it.
This becomes a problem particularly when we are globbing S3 paths. The globber thinks paths with colons are invalid paths and throws URISyntaxException.
The reason is we are sharing Globber.java with all other Fs. Some of the rules for regular Fs are not applicable to S3 just like this colon as an example.
Same issue is reported here https://issues.apache.org/jira/browse/SPARK-20061
The good news is I have a one line fix that I am about to send a pull request.
However, for a right fix, we should separate the S3 globber from the Globber.java as proposed at https://issues.apache.org/jira/browse/HADOOP-13371
Attachments
Issue Links
- is duplicated by
-
SPARK-20061 Reading a file with colon (:) from S3 fails with URISyntaxException
- Resolved
- is part of
-
HADOOP-3257 Path should handle all characters
- Open
- is related to
-
HADOOP-14217 Object Storage: support colon in object path
- Open
-
SPARK-28092 Spark cannot load files with COLON(:) char if not specified full path
- Resolved
- relates to
-
HADOOP-13371 S3A globber to use bulk listObject call over recursive directory scan
- Resolved
- links to