Mahout
  1. Mahout
  2. MAHOUT-108

Implementation of Assoication Rules learning by Apriori algorithm

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: 0.2
    • Component/s: None
    • Labels:
      None
    • Environment:

      Linux, Hadoop-0.17.1

      Description

      Target: Association Rules learning is a popular method for discovering interesting relations between variables in large databases. Here, we would implement the Apriori algorithm using Hadoop&Mapreduce parallel techniques.

      Applications: Typically, association rules learning is used to discover regularities between products in large scale transaction data in supermarkets. For example, the rule "

      {onions, patatoes}

      ->beef" found in the sales data would indicate that if a customer buys onions and potatoes together, he or she is likely to also buy beef. Such information can be used as the basis for decisions about marketing activities. In addition to the market basket analysis, association rules are employed today in many application areas including Web usage mining, intrusion detection and bioinformatics.

      Apriori algorithm: Apriori is the best-known algorithm to mine association rules. It uses a breadth-first search strategy to counting the support of itemsets and uses a candidate generation function which exploits the downward closure property of support

        Activity

        Hide
        chao deng added a comment -

        We are adapting our existing codes to mahout

        Show
        chao deng added a comment - We are adapting our existing codes to mahout
        Hide
        chao deng added a comment -

        hi, everyone. The source code for apriori algorithm has finished, but how
        can i submit it via svn? can anyone help me?

        2009/3/5 chao deng (JIRA) <jira@apache.org>


        -------------
        Name: chao deng
        Career: Ph.D candidate on Computer Science
        School: Harbin Institue of Technology (HIT)
        Department: Computer Science & Technolgy School
        Office: Machine Learning Group in Nature Compuation Lab
        Mobile+86)13836134116
        Phone+86)045186402407
        Post zip: 150001
        Post Address: 319# Harbin Institute of Technology(HIT), P.R. China

        Show
        chao deng added a comment - hi, everyone. The source code for apriori algorithm has finished, but how can i submit it via svn? can anyone help me? 2009/3/5 chao deng (JIRA) <jira@apache.org> – ------------- Name: chao deng Career: Ph.D candidate on Computer Science School: Harbin Institue of Technology (HIT) Department: Computer Science & Technolgy School Office: Machine Learning Group in Nature Compuation Lab Mobile +86)13836134116 Phone +86)045186402407 Post zip: 150001 Post Address: 319# Harbin Institute of Technology(HIT), P.R. China
        Hide
        chao deng added a comment -

        hi, everyone. We have fininshed the source code for apriori algorithm. But,
        how can i submit the source to mahout via svn? thanks!

        2009/3/5 chao deng (JIRA) <jira@apache.org>


        -------------
        Name: chao deng
        Career: Ph.D candidate on Computer Science
        School: Harbin Institue of Technology (HIT)
        Department: Computer Science & Technolgy School
        Office: Machine Learning Group in Nature Compuation Lab
        Mobile+86)13836134116
        Phone+86)045186402407
        Post zip: 150001
        Post Address: 319# Harbin Institute of Technology(HIT), P.R. China

        Show
        chao deng added a comment - hi, everyone. We have fininshed the source code for apriori algorithm. But, how can i submit the source to mahout via svn? thanks! 2009/3/5 chao deng (JIRA) <jira@apache.org> – ------------- Name: chao deng Career: Ph.D candidate on Computer Science School: Harbin Institue of Technology (HIT) Department: Computer Science & Technolgy School Office: Machine Learning Group in Nature Compuation Lab Mobile +86)13836134116 Phone +86)045186402407 Post zip: 150001 Post Address: 319# Harbin Institute of Technology(HIT), P.R. China
        Hide
        Ted Dunning added a comment -

        Probably the best thing to do is to attach a patch to this JIRA so that it can be reviewed.

        One question, I have right away is whether this implementation is sequential or is parallelized. Also, is this new code or is it based on other code?

        Show
        Ted Dunning added a comment - Probably the best thing to do is to attach a patch to this JIRA so that it can be reviewed. One question, I have right away is whether this implementation is sequential or is parallelized. Also, is this new code or is it based on other code?
        Hide
        chao deng added a comment -

        Thanks Dunning, this implementation is MapReduce-based parallelized, and it
        is new code.
        Should i attached it as a zip package?

        thanks

        2009/6/8 Ted Dunning (JIRA) <jira@apache.org>


        -------------
        Name: chao deng
        Career: Ph.D candidate on Computer Science
        School: Harbin Institue of Technology (HIT)
        Department: Computer Science & Technolgy School
        Office: Machine Learning Group in Nature Compuation Lab
        Mobile+86)13836134116
        Phone+86)045186402407
        Post zip: 150001
        Post Address: 319# Harbin Institute of Technology(HIT), P.R. China

        Show
        chao deng added a comment - Thanks Dunning, this implementation is MapReduce-based parallelized, and it is new code. Should i attached it as a zip package? thanks 2009/6/8 Ted Dunning (JIRA) <jira@apache.org> – ------------- Name: chao deng Career: Ph.D candidate on Computer Science School: Harbin Institue of Technology (HIT) Department: Computer Science & Technolgy School Office: Machine Learning Group in Nature Compuation Lab Mobile +86)13836134116 Phone +86)045186402407 Post zip: 150001 Post Address: 319# Harbin Institute of Technology(HIT), P.R. China
        Hide
        Ted Dunning added a comment -

        A patch file would be much better.

        SVN diff should be able to produce one.

        Show
        Ted Dunning added a comment - A patch file would be much better. SVN diff should be able to produce one.
        Show
        Grant Ingersoll added a comment - http://cwiki.apache.org/MAHOUT/howtocontribute.html
        Hide
        Isabel Drost-Fromm added a comment -

        Hello Chao Deng,

        how is the status of your apriori patch?

        Isabel

        Show
        Isabel Drost-Fromm added a comment - Hello Chao Deng, how is the status of your apriori patch? Isabel
        Hide
        Mishkin Faustini added a comment -

        I'm also interested in knowing the status of the patch! =]

        Show
        Mishkin Faustini added a comment - I'm also interested in knowing the status of the patch! =]
        Hide
        Isabel Drost-Fromm added a comment -

        Contacted (at least tried to) Chao Deng asking for the status and if I could help him submit the patch. Should we close this issue as won't fix or defer it to a later version if he does not respond? Or is anyone else up to implementing a patch for this task until 0.2?

        Show
        Isabel Drost-Fromm added a comment - Contacted (at least tried to) Chao Deng asking for the status and if I could help him submit the patch. Should we close this issue as won't fix or defer it to a later version if he does not respond? Or is anyone else up to implementing a patch for this task until 0.2?
        Hide
        Isabel Drost-Fromm added a comment -

        Superseded by FPGrowth patch (MAHOUT-157).

        Show
        Isabel Drost-Fromm added a comment - Superseded by FPGrowth patch ( MAHOUT-157 ).
        Hide
        Pooja Sharma added a comment -

        Hello Chao Deng,

        Can you please share/upload the code so that we can reference the code to see how it does item set generation in a paralle fashion

        Thanks
        Pooja

        Show
        Pooja Sharma added a comment - Hello Chao Deng, Can you please share/upload the code so that we can reference the code to see how it does item set generation in a paralle fashion Thanks Pooja
        Hide
        Ted Dunning added a comment -

        Pooja,

        This JIRA issue is long since closed as Won't Fix. The reason is that this
        effort has been superseded by the other frequent itemset software that Robin
        implemented.

        Show
        Ted Dunning added a comment - Pooja, This JIRA issue is long since closed as Won't Fix. The reason is that this effort has been superseded by the other frequent itemset software that Robin implemented.
        Hide
        smita upadhyay added a comment -

        How to get the source, so we can also go through and get a feeling haow it has been implemented.

        Show
        smita upadhyay added a comment - How to get the source, so we can also go through and get a feeling haow it has been implemented.

          People

          • Assignee:
            Unassigned
            Reporter:
            chao deng
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 504h
              504h
              Remaining:
              Remaining Estimate - 504h
              504h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development