Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8997

Improve LocalPrefixSpan performance

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.5.0
    • 1.5.0
    • MLlib
    • None

    Description

      We can improve the performance by:

      1. run should output Iterator instead of Array
      2. Local count shouldn't use groupBy, which creates too many arrays. We can use PrimitiveKeyOpenHashMap
      3. We can use list to avoid materialize frequent sequences

      Attachments

        Issue Links

          Activity

            People

              fliang Feynman Liang
              mengxr Xiangrui Meng
              Xiangrui Meng Xiangrui Meng
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 24h
                  24h
                  Remaining:
                  Remaining Estimate - 24h
                  24h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified