Description
We can improve the performance by:
1. run should output Iterator instead of Array
2. Local count shouldn't use groupBy, which creates too many arrays. We can use PrimitiveKeyOpenHashMap
3. We can use list to avoid materialize frequent sequences
Attachments
Issue Links
- blocks
-
SPARK-8999 Support non-temporal sequence in PrefixSpan
- Resolved
- depends upon
-
SPARK-6487 Add sequential pattern mining algorithm PrefixSpan to Spark MLlib
- Resolved
- links to