Description
MemPipeline.getInstance().read(From.avroFile(new Path("s3:///something")));
Fails with:
java.lang.IllegalArgumentException: Wrong FS: s3:/something, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:643) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80) at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:519) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:397) at org.apache.hadoop.fs.FileSystem.isFile(FileSystem.java:1424) at org.apache.crunch.io.From.getSchemaFromPath(From.java:351) at org.apache.crunch.io.From.avroFile(From.java:306) at org.apache.crunch.io.From.avroFile(From.java:280)
I noticed this in the From class, method getSchemaFromPath:
FileSystem fs = FileSystem.get(conf);
Shouldn't that be changed to this?
FileSystem fs = path.getFileSystem(conf);
We ran into this in a usecase where the file was on a valid path on S3 but the Configuration was pointing to HDFS, which I believe should just work.
After some googling, I also found CRUNCH-47 which seems related, but the patch there couldn't fix the From/At/To helpers as they were introduced later...