Description
mengxr zhangyouhua
Sequential pattern mining is an important branch in the pattern mining. In the past the actual work, we use the sequence mining (mainly PrefixSpan algorithm) to find the telecommunication signaling sequence pattern, achieved good results. But once the data is too large, the operation time is too long, even can not meet the the service requirements. We are ready to implement the PrefixSpan algorithm in spark, and applied to our subsequent work.
The related Paper:
PrefixSpan:
Pei, Jian, et al. "Mining sequential patterns by pattern-growth: The prefixspan approach." Knowledge and Data Engineering, IEEE Transactions on 16.11 (2004): 1424-1440.
Parallel Algorithm:
Cong, Shengnan, Jiawei Han, and David Padua. "Parallel mining of closed sequential patterns." Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, 2005.
Distributed Algorithm:
Wei, Yong-qing, Dong Liu, and Lin-shan Duan. "Distributed PrefixSpan algorithm based on MapReduce." Information Technology in Medicine and Education (ITME), 2012 International Symposium on. Vol. 2. IEEE, 2012.
Pattern mining and sequential mining Knowledge:
Han, Jiawei, et al. "Frequent pattern mining: current status and future directions." Data Mining and Knowledge Discovery 15.1 (2007): 55-86.
Attachments
Issue Links
- is depended upon by
-
SPARK-8999 Support non-temporal sequence in PrefixSpan
- Resolved
-
SPARK-8997 Improve LocalPrefixSpan performance
- Resolved
-
SPARK-8998 Collect enough frequent prefixes before projection in PrefixSpan
- Resolved
- is related to
-
SPARK-9540 Optimize PrefixSpan implementation
- Resolved
- relates to
-
SPARK-9898 User guide for PrefixSpan
- Resolved
- supercedes
-
SPARK-7212 Frequent pattern mining for sequential item sets
- Resolved
- links to