Details
-
New Feature
-
Status: Closed
-
Minor
-
Resolution: Abandoned
-
1.6.0
-
None
Description
Text classification will naturally produce a lot of features. A lot of them are independent of the category, and provide no real information gain in the classification.
The Chi-Squared feature selection method will allow features that do not pass a threshold for dependency to be removed from the feature list, keeping the feature list a reasonable size without significantly affecting the classification accuracy.