All of the uses of org.apache.hadoop.fs.FileSystem.get use the single argument version that takes a job configuration. This will always return the default file system type (which is usually HDFS), rather than using the file system type used in the URIs for the input or output paths. This is particularly a problem on Amazon's Elastic MapReduce where the input and output data typically reside in a org.apache.hadoop.fs.s3native.NativeS3FileSystem.
Description
All of the uses of org.apache.hadoop.fs.FileSystem.get use the single argument version that takes a job configuration. This will always return the default file system type (which is usually HDFS), rather than using the file system type used in the URIs for the input or output paths. This is particularly a problem on Amazon's Elastic MapReduce where the input and output data typically reside in a org.apache.hadoop.fs.s3native.NativeS3FileSystem.
This is a patch for mahout-trunk that fixes all of the occurrences (except one) of FileSystem.get to use the two argument version with an appropriate Path as a first argument.
The only remaining instance is in org.apache.mahout.ga.watchmaker.MahoutEvaluator at line 64, where there's no obvious Path available to provide the first argument. Getting this one working will probably require refactoring that's beyond the scope of my Mahout understanding.
Stephen Green added a comment - 16/Apr/09 08:51 PM This is a patch for mahout-trunk that fixes all of the occurrences (except one) of FileSystem.get to use the two argument version with an appropriate Path as a first argument.
The only remaining instance is in org.apache.mahout.ga.watchmaker.MahoutEvaluator at line 64, where there's no obvious Path available to provide the first argument. Getting this one working will probably require refactoring that's beyond the scope of my Mahout understanding.
MAHOUT-118 fixes all of the occurrences (except one) of FileSystem.get to use the two argument version with an appropriate Path as a first argument.
The only remaining instance is in org.apache.mahout.ga.watchmaker.MahoutEvaluator at line 64, where there's no obvious Path available to provide the first argument. Getting this one working will probably require refactoring.
The only remaining instance is in org.apache.mahout.ga.watchmaker.MahoutEvaluator at line 64, where there's no obvious Path available to provide the first argument. Getting this one working will probably require refactoring that's beyond the scope of my Mahout understanding.
In this particular case MahoutEvaluator creates the input and output paths automatically, I think the default file system should do
The only remaining instance is in org.apache.mahout.ga.watchmaker.MahoutEvaluator at line 64, where there's no obvious Path available to provide the first argument. Getting this one working will probably require refactoring that's beyond the scope of my Mahout understanding.
In this particular case MahoutEvaluator creates the input and output paths automatically, I think the default file system should do
The only remaining instance is in org.apache.mahout.ga.watchmaker.MahoutEvaluator at line 64, where there's no obvious Path available to provide the first argument. Getting this one working will probably require refactoring that's beyond the scope of my Mahout understanding.