Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Current dictionary vectorizer takes a set of text-files, creates the dictionary and convert them to text vectors. In a classification scenario, the vectorizer needs to take a Already existing dictionary and use the ids to convert text to vectors and optionally do the following
1. Choose between tf|tfidf weights (need to take the document frequency as an input for this)
2. Add new words to the dictionary and provide options to write it to the disk and read it back
3. Add option to normalize/lognormalize