Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-398

Seq2sparse outputs final vectors to different directories depending upon the TF/TFIDF weight switch. This is confusing to users.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.3
    • 0.4
    • classic
    • None

    Description

      In TF mode, seq2sparse puts the output vectors into <output>/vectors. In TFIDF mode; however, it puts the output vectors into <output>/tfidf/vectors. This happens because the IDF calculation - if it is selected - happens after TF and uses the TF vectors for its input.

      Seems like both modes ought to output to a consistent directory structure so changing the switch does not change the final output location: perhaps as simple as changing TF to output to <output>/tf/vectors so that the contents of both directories when present are more obvious from their nomenclature.

      Attachments

        1. MAHOUT-398.patch
          2 kB
          Drew Farris

        Activity

          People

            drew.farris Drew Farris
            jeastman Jeff Eastman
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: