When running 'mahout cvb' command on AWS EMR having option --input with value like s3://mybucket/input/ or s3://mybucket/input/* (7 input files in my case) the content of doc-topic output is really non-sense. It seems like the docIds in doc-topic output are shuffled. But the topic model output (p(term|topic) for each topic) looks still fine.
The workaround is to first copy input files from s3 to cluster's hdfs with command:
and then running mahout cvb with option --input /input .