Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6487

Add sequential pattern mining algorithm PrefixSpan to Spark MLlib

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 1.5.0
    • MLlib
    • None

    Description

      mengxr zhangyouhua
      Sequential pattern mining is an important branch in the pattern mining. In the past the actual work, we use the sequence mining (mainly PrefixSpan algorithm) to find the telecommunication signaling sequence pattern, achieved good results. But once the data is too large, the operation time is too long, even can not meet the the service requirements. We are ready to implement the PrefixSpan algorithm in spark, and applied to our subsequent work.

      The related Paper:

      PrefixSpan:
      Pei, Jian, et al. "Mining sequential patterns by pattern-growth: The prefixspan approach." Knowledge and Data Engineering, IEEE Transactions on 16.11 (2004): 1424-1440.

      Parallel Algorithm:
      Cong, Shengnan, Jiawei Han, and David Padua. "Parallel mining of closed sequential patterns." Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, 2005.

      Distributed Algorithm:
      Wei, Yong-qing, Dong Liu, and Lin-shan Duan. "Distributed PrefixSpan algorithm based on MapReduce." Information Technology in Medicine and Education (ITME), 2012 International Symposium on. Vol. 2. IEEE, 2012.

      Pattern mining and sequential mining Knowledge:
      Han, Jiawei, et al. "Frequent pattern mining: current status and future directions." Data Mining and Knowledge Discovery 15.1 (2007): 55-86.

      Attachments

        Issue Links

          Activity

            People

              Zhang JiaJin Zhang JiaJin
              Zhang JiaJin Zhang JiaJin
              Xiangrui Meng Xiangrui Meng
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: