Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.3.0
-
None
Description
It would be handy to have a Param in FPGrowth for filtering out very common items. This is from a use case where the dataset had items appearing in 99.9%+ of the rows. These common items were useless, but they caused the algorithm to generate many unnecessary itemsets. Filtering useless common items beforehand can make the algorithm much faster.
Attachments
Issue Links
- relates to
-
SPARK-7211 Improvements for FPGrowth
- Resolved