Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
2.2.0
-
None
-
None
-
java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)
Description
While writing to S3 using Spark 1.2.0's ReceiverInputDStream#saveAsTextFiles with a S3 URL ("s3://fake-test/1234"), I noticed that files are written with double forward slashes (e.g. "s3://fake-test//1234/-1419334280000/").
After debugging, it seems this is caused by Jets3tFileSystemStore#pathToKey(path), which returns "/fake-test/1234/..." for the input "s3://fake-test/1234/...". when it should hack off the first forward slash.
When I used a s3n URL and hence Jets3tNativeFileSystemStore, the double slashes went away. Here are the comparison between their pathToKey implementation:
Jets3tNativeFileSystemStore's implementation of pathToKey is:
private static String pathToKey(Path path) { if (path.toUri().getScheme() != null && path.toUri().getPath().isEmpty()) { // allow uris without trailing slash after bucket to refer to root, // like s3n://mybucket return ""; } if (!path.isAbsolute()) { throw new IllegalArgumentException("Path must be absolute: " + path); } String ret = path.toUri().getPath().substring(1); // remove initial slash if (ret.endsWith("/") && (ret.indexOf("/") != ret.length() - 1)) { ret = ret.substring(0, ret.length() -1); } return ret; }
whereas Jets3tFileSystemStore uses:
private String pathToKey(Path path) { if (!path.isAbsolute()) { throw new IllegalArgumentException("Path must be absolute: " + path); } return path.toUri().getPath(); }
Attachments
Issue Links
- is related to
-
HADOOP-10373 create tools/hadoop-amazon for aws/EMR support
- Closed