[MAHOUT-521] Add option to DictionaryVectorizer to create (tf and tfidf) vectors on-the-fly using a given dictionary - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.4
Component/s: None
Labels:
None

Description

Current dictionary vectorizer takes a set of text-files, creates the dictionary and convert them to text vectors. In a classification scenario, the vectorizer needs to take a Already existing dictionary and use the ids to convert text to vectors and optionally do the following

1. Choose between tf|tfidf weights (need to take the document frequency as an input for this)
2. Add new words to the dictionary and provide options to write it to the disk and read it back
3. Add option to normalize/lognormalize

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

MAHOUT-vectorizer-move.patch
02/Oct/10 22:07
8 kB
Robin Anil
MAHOUT-vectorizer-move.patch
02/Oct/10 22:20
216 kB
Robin Anil
MAHOUT-move-encoder.patch
04/Oct/10 21:28
85 kB
Robin Anil
MAHOUT-521.patch
05/Oct/10 17:02
12 kB
Robin Anil

Activity

People

Assignee:: Robin Anil

Reporter:: Robin Anil

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 02/Oct/10 20:18

Updated:: 31/Oct/10 15:50

Resolved:: 09/Oct/10 10:58