Details
-
Task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.5
-
None
-
None
Description
For Taming Text, I've commissioned some benchmarking work on Mahout's clustering algorithms. I've asked the two doing the project to do all the work in the open here. The goal is to use a publicly reusable dataset (for now, the ASF mail archives, assuming it is big enough) and run on EC2 and make all resources available so others can reproduce/improve.
I'd like to add the setup code to utils (although it could possibly be done as a Vectorizer) and the publication of the results will be put up on the Wiki as well as in the book. This issue is to track the patches, etc.
Attachments
Attachments
Issue Links
- is blocked by
-
MAHOUT-598 Downstream steps in the seq2sparse job flow looking in wrong location for output from previous steps when running in Elastic MapReduce (EMR) cluster
- Closed
- is related to
-
MAHOUT-500 Make it easy to run Mahout on Amazon's Elastic Map Reduce
- Closed
-
MAHOUT-670 Provide a performance measurement framework for Mahout
- Closed