[SPARK-8997] Improve LocalPrefixSpan performance - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.5.0
Fix Version/s: 1.5.0
Component/s: MLlib
Labels:
None

Target Version/s:

1.5.0

Description

We can improve the performance by:

1. run should output Iterator instead of Array
2. Local count shouldn't use groupBy, which creates too many arrays. We can use PrimitiveKeyOpenHashMap
3. We can use list to avoid materialize frequent sequences

Attachments

Issue Links

blocks

SPARK-8999 Support non-temporal sequence in PrefixSpan

Resolved

depends upon

SPARK-6487 Add sequential pattern mining algorithm PrefixSpan to Spark MLlib

Resolved

links to

[Github] Pull Request #7360 (feynmanliang)

Activity

People

Assignee:: Feynman Liang

Reporter:: Xiangrui Meng

Shepherd:: Xiangrui Meng

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 11/Jul/15 04:13

Updated:: 15/Jul/15 06:51

Resolved:: 15/Jul/15 06:51

Time Tracking

Estimated:

24h

Remaining:

24h

Logged:

Not Specified