Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-688

High Document Frequency pruning for seq2sparse

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 0.6
    • None

    Description

      This improvement allows to prune the words with high document frequencies from the tf and tf-idf vectors produced by seq2sparse, based on the standard deviation of the words' document frequencies and specifying which rods to be pruned in a means of times this standard deviation. One good option is 3 times the standard deviation

      Attachments

        1. MAHOUT-688.patch
          48 kB
          Grant Ingersoll
        2. MAHOUT-688.patch
          56 kB
          Vasil Vasilev
        3. MAHOUT-688.patch
          54 kB
          Grant Ingersoll
        4. MAHOUT-688.patch
          39 kB
          Vasil Vasilev

        Activity

          People

            gsingers Grant Ingersoll
            vavasilev Vasil Vasilev
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: