Description
We made several changes to ALS in 2.0. It is necessary to run some tests to avoid performance regression. We should test (synthetic) datasets from 1 million ratings to 1 billion ratings.
cc mlnick holdenk Do you have time to run some large-scale performance tests?
Links:
Results spreadsheet
Raw results for SPARK-14891
Raw results for SPARK-6716
Attachments
Issue Links
- relates to
-
SPARK-14891 ALS in ML never validates input schema
- Resolved
-
SPARK-6717 Clear shuffle files after checkpointing in ALS
- Resolved