When running LDA via build-reuters.sh on a 3-node Hadoop cluster, I've noticed that there is only a single vector file produced by the utility preprocessing steps. This means LDA (and other clustering too) can only use a single mapper no matter how large the cluster is. Investigating, it seems that the program argument (-nr) for setting the number of reducers - and hence the number of output files - is not propagated to the final stages where the output vectors are created.
|Status||Open [ 1 ]||Patch Available [ 10002 ]|
|Status||Patch Available [ 10002 ]||Resolved [ 5 ]|
|Resolution||Fixed [ 1 ]|
|Status||Resolved [ 5 ]||Closed [ 6 ]|