Issue Details (XML | Word | Printable)

Key: MAHOUT-118
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Jeff Eastman
Reporter: Stephen Green
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Mahout

Mahout needs to respect the file system type when getting a FileSystem for an input or output path

Created: 16/Apr/09 08:22 PM   Updated: 18/Nov/09 02:05 PM
Return to search
Component/s: Classification, Clustering, Collaborative Filtering, Genetic Algorithms, Matrix
Affects Version/s: 0.1, 0.2
Fix Version/s: 0.1, 0.2

Time Tracking:
Original Estimate: 24h
Original Estimate - 24h
Remaining Estimate: 24h
Remaining Estimate - 24h
Time Spent: Not Specified
Remaining Estimate - 24h

File Attachments:
  Size
Text File Licensed for inclusion in ASF works getfs.patch 2009-04-16 08:51 PM Stephen Green 33 kB
Environment: Mac OS X 10.5 and Amazon's Elastic MapReduce

Resolution Date: 19/Apr/09 11:18 PM


 Description  « Hide
All of the uses of org.apache.hadoop.fs.FileSystem.get use the single argument version that takes a job configuration. This will always return the default file system type (which is usually HDFS), rather than using the file system type used in the URIs for the input or output paths. This is particularly a problem on Amazon's Elastic MapReduce where the input and output data typically reside in a org.apache.hadoop.fs.s3native.NativeS3FileSystem.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Stephen Green added a comment - 16/Apr/09 08:51 PM
This is a patch for mahout-trunk that fixes all of the occurrences (except one) of FileSystem.get to use the two argument version with an appropriate Path as a first argument.

The only remaining instance is in org.apache.mahout.ga.watchmaker.MahoutEvaluator at line 64, where there's no obvious Path available to provide the first argument. Getting this one working will probably require refactoring that's beyond the scope of my Mahout understanding.


Jeff Eastman added a comment - 16/Apr/09 09:38 PM
committed in r765769

Deneche A. Hakim added a comment - 18/Apr/09 04:46 PM

The only remaining instance is in org.apache.mahout.ga.watchmaker.MahoutEvaluator at line 64, where there's no obvious Path available to provide the first argument. Getting this one working will probably require refactoring that's beyond the scope of my Mahout understanding.

In this particular case MahoutEvaluator creates the input and output paths automatically, I think the default file system should do


Jeff Eastman added a comment - 19/Apr/09 11:18 PM
Given Deneche's comment above I'm going to mark this issue fixed. Thanks Steve.