Type: New Feature
Affects Version/s: 0.3
Fix Version/s: 0.3
Identifies interesting Collocations in text using ngrams scored via the LogLikelihoodRatio calculation.
As discussed in:
Current form is a tar of a maven project that depends on mahout. Build as usual with 'mvn clean install', can be executed using:
Output will be placed in target/output and can be viewed nicely using:
Includes rudimentary unit tests. Please review and comment. Needs more work to get this into patch state and integrate with Robin's document vectorizer work in
Some basic TODO/FIXME's include:
- use mahout math's ObjectInt map implementation when available
- make the analyzer configurable
- better input validation + negative unit tests.
- more flexible ways to generate units of analysis (n-1)grams.
|Assignee||Jake Mannix [ jake.mannix ]|
|Fix Version/s||0.3 [ 12314281 ]|
|Assignee||Jake Mannix [ jake.mannix ]||Drew Farris [ drew.farris ]|
|Status||Open [ 1 ]||Resolved [ 5 ]|
|Resolution||Fixed [ 1 ]|
|Status||Resolved [ 5 ]||Closed [ 6 ]|