[MAHOUT-397] SparseVectorsFromSequenceFiles only outputs a single vector file - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.3
Fix Version/s: 0.4
Component/s: classic
Labels:
None

Description

When running LDA via build-reuters.sh on a 3-node Hadoop cluster, I've noticed that there is only a single vector file produced by the utility preprocessing steps. This means LDA (and other clustering too) can only use a single mapper no matter how large the cluster is. Investigating, it seems that the program argument (-nr) for setting the number of reducers - and hence the number of output files - is not propagated to the final stages where the output vectors are created.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAHOUT-397.patch
20/May/10 01:41
13 kB
Jeff Eastman

Activity

People

Assignee:: Jeff Eastman

Reporter:: Jeff Eastman

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 20/May/10 01:29

Updated:: 31/Jan/24 22:17

Resolved:: 22/Sep/10 07:44