Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15938

Adding "support" property to MLlib Association Rule

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: MLlib
    • Labels:
      None

      Description

      Support is an indication of how frequently the item-set appears in the database. Besides confidence, "Support" is another critical property for Association rule.
      References:
      https://en.wikipedia.org/wiki/Association_rule_learning
      http://www.philippe-fournier-viger.com/spmf/index.php?link=documentation.php#allassociationrules
      https://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf

      Support can be either the count of appearances or the fraction within the dataset. I choose to use the count as:
      1. API compatibility: Currently both FPGrowthModel and Association Rule does not have the information about size of the dataset. I'd try to avoid breaking a list of public APIs.
      2. This also refers to the API of SPMF. http://www.philippe-fournier-viger.com/spmf/index.php?link=documentation.php#allassociationrules.

      In the next steps, we could add constraint like minSupport as in other libraries. FPGrowthModel should also contains the size of the dataset.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                yuhaoyan yuhao yang
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: