Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23740

Add FPGrowth Param for filtering out very common items

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.3.0
    • None
    • ML

    Description

      It would be handy to have a Param in FPGrowth for filtering out very common items. This is from a use case where the dataset had items appearing in 99.9%+ of the rows. These common items were useless, but they caused the algorithm to generate many unnecessary itemsets. Filtering useless common items beforehand can make the algorithm much faster.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              josephkb Joseph K. Bradley
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: