[SPARK-6381] add Apriori algorithm to MLLib - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 1.3.0
Fix Version/s: None
Component/s: MLlib
Labels:
None

Description

mengxr
There are many algorithms about association rule mining,for example FPGrowth, Apriori and so on.these algorithms are classic

algorithms in machine learning, and there are very much usefully in big data mining. Even the FPGrowth algorithm in spark

1.3 version have implementation to solution big big data set, but it need create FPTree before mining frequent item. so

while transition data is smaller and the data is sparse and minSupport is bigger，wen can select Apriori algorithms.
how Apriori algorithm parallelism？
1.Generates frequent items by filtering the input data using minimal support level.
private def genFreqItems[Item: ClassTag]( data: RDD[Array[Item]],minCount: Long,partitioner: Partitioner): Array[Item]
2.Generate frequent itemSets by building apriori, the extraction is done on each partition.
2.1 create candidateSet by kFreqItems and k
private def createCandidateSet[Item: ClassTag]( kFreqItems: Array[(Array[Item], Long)], k: Int)
2.2 create kFreqItems from candidateSet is generated by candidateSet
private def scanDataSet[Item: ClassTag](dataSet: RDD[Array[Item]],candidateSet: Array[Array[Item]], minCount: Double):
RDD[(Array[Item], Long)]
2.3 filter dataSet by candidateSet.

Attachments

Issue Links

duplicates

SPARK-4001 Add FP-growth algorithm to Spark MLlib

Resolved

is duplicated by

SPARK-6386 add association rule mining algorithm to MLLib

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: zhangyouhua

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 17/Mar/15 09:16

Updated:: 17/Mar/15 11:17

Resolved:: 17/Mar/15 09:41