Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-397

SparseVectorsFromSequenceFiles only outputs a single vector file

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.3
    • 0.4
    • classic
    • None

    Description

      When running LDA via build-reuters.sh on a 3-node Hadoop cluster, I've noticed that there is only a single vector file produced by the utility preprocessing steps. This means LDA (and other clustering too) can only use a single mapper no matter how large the cluster is. Investigating, it seems that the program argument (-nr) for setting the number of reducers - and hence the number of output files - is not propagated to the final stages where the output vectors are created.

      Attachments

        1. MAHOUT-397.patch
          13 kB
          Jeff Eastman

        Activity

          People

            jeastman Jeff Eastman
            jeastman Jeff Eastman
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: