Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-521

Add option to DictionaryVectorizer to create (tf and tfidf) vectors on-the-fly using a given dictionary

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.4
    • None
    • None

    Description

      Current dictionary vectorizer takes a set of text-files, creates the dictionary and convert them to text vectors. In a classification scenario, the vectorizer needs to take a Already existing dictionary and use the ids to convert text to vectors and optionally do the following

      1. Choose between tf|tfidf weights (need to take the document frequency as an input for this)
      2. Add new words to the dictionary and provide options to write it to the disk and read it back
      3. Add option to normalize/lognormalize

      Attachments

        1. MAHOUT-vectorizer-move.patch
          8 kB
          Robin Anil
        2. MAHOUT-vectorizer-move.patch
          216 kB
          Robin Anil
        3. MAHOUT-move-encoder.patch
          85 kB
          Robin Anil
        4. MAHOUT-521.patch
          12 kB
          Robin Anil

        Activity

          People

            robinanil Robin Anil
            robinanil Robin Anil
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: