Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13385

Enable AssociationRules to generate consequents with user-defined lengths

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Not A Problem
    • None
    • None
    • ML
    • None

    Description

      AssociationRules should generates all association rules with user-defined iterations, no just rules which have a single item as the consequent.

      Such as:
      39 804 ==> 413 743 819 #SUP: 1023 #CONF: 0.70117
      39 743 ==> 413 804 819 #SUP: 1023 #CONF: 0.93939
      39 413 ==> 743 804 819 #SUP: 1023 #CONF: 0.6007
      819 ==> 39 413 743 804 #SUP: 1023 #CONF: 0.15418
      804 ==> 39 413 743 819 #SUP: 1023 #CONF: 0.12997
      743 ==> 39 413 804 819 #SUP: 1023 #CONF: 0.7276
      39 ==> 413 743 804 819 #SUP: 1023 #CONF: 0.12874
      ...

      I have implemented it based on Apriori's Rule-Generation Algorithm:
      https://github.com/zhengruifeng/spark-rules

      It's compatible with fpm's APIs.

      import org.apache.spark.mllib.fpm._

      val data = sc.textFile("hdfs://ns1/whale/T40I10D100K.dat")
      val transactions = data.map(s => s.trim.split(' ')).persist()

      val fpg = new FPGrowth().setMinSupport(0.01)
      val model = fpg.run(transactions)

      val ar = new AprioriRules().setMinConfidence(0.1).setMaxConsequent(15)
      val results = ar.run(model.freqItemsets)

      and it output rule-generation infomation like this:
      15/11/04 11:28:46 INFO AprioriRules: Candidates for 1-consequent rules : 312917
      15/11/04 11:28:58 INFO AprioriRules: Generated 1-consequent rules : 306703
      15/11/04 11:29:10 INFO AprioriRules: Candidates for 2-consequent rules : 707747
      15/11/04 11:29:35 INFO AprioriRules: Generated 2-consequent rules : 704000
      15/11/04 11:29:55 INFO AprioriRules: Candidates for 3-consequent rules : 1020253
      15/11/04 11:30:38 INFO AprioriRules: Generated 3-consequent rules : 1014002
      15/11/04 11:31:14 INFO AprioriRules: Candidates for 4-consequent rules : 972225
      15/11/04 11:32:00 INFO AprioriRules: Generated 4-consequent rules : 956483
      15/11/04 11:32:44 INFO AprioriRules: Candidates for 5-consequent rules : 653749
      15/11/04 11:33:32 INFO AprioriRules: Generated 5-consequent rules : 626993
      15/11/04 11:34:07 INFO AprioriRules: Candidates for 6-consequent rules : 331038
      15/11/04 11:34:50 INFO AprioriRules: Generated 6-consequent rules : 314455
      15/11/04 11:35:10 INFO AprioriRules: Candidates for 7-consequent rules : 138490
      15/11/04 11:35:43 INFO AprioriRules: Generated 7-consequent rules : 136260
      15/11/04 11:35:57 INFO AprioriRules: Candidates for 8-consequent rules : 48567
      15/11/04 11:36:14 INFO AprioriRules: Generated 8-consequent rules : 47331
      15/11/04 11:36:24 INFO AprioriRules: Candidates for 9-consequent rules : 12430
      15/11/04 11:36:33 INFO AprioriRules: Generated 9-consequent rules : 11925
      15/11/04 11:36:37 INFO AprioriRules: Candidates for 10-consequent rules : 2211
      15/11/04 11:36:47 INFO AprioriRules: Generated 10-consequent rules : 2064
      15/11/04 11:36:55 INFO AprioriRules: Candidates for 11-consequent rules : 246
      15/11/04 11:36:58 INFO AprioriRules: Generated 11-consequent rules : 219
      15/11/04 11:37:00 INFO AprioriRules: Candidates for 12-consequent rules : 13
      15/11/04 11:37:03 INFO AprioriRules: Generated 12-consequent rules : 11
      15/11/04 11:37:03 INFO AprioriRules: Candidates for 13-consequent rules : 0

      Attachments

        1. rule-generation.pdf
          183 kB
          Ruifeng Zheng

        Issue Links

          Activity

            People

              podongfeng Ruifeng Zheng
              podongfeng Ruifeng Zheng
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: