Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Cannot Reproduce
-
0.9
-
AWS EMR with AMI 3.2.3
Description
When running 'mahout cvb' command on AWS EMR having option --input with value like s3://mybucket/input/ or s3://mybucket/input/* (7 input files in my case) the content of doc-topic output is really non-sense. It seems like the docIds in doc-topic output are shuffled. But the topic model output (p(term|topic) for each topic) looks still fine.
The workaround is to first copy input files from s3 to cluster's hdfs with command:
hadoop fs -cp s3://mybucket/input /input
and then running mahout cvb with option --input /input .