Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
Skewed data is not rare. For example, a book recommendation site may have several books which are liked by most of the users. Running ALS on such skewed data will raise a OutOfMemory error, if some book has too many users which cannot be fit into memory. To solve it, we propose a skewed join implementation.
Attachments
Attachments
Issue Links
- is related to
-
SPARK-11387 minimize shuffles during joins by using existing partitions and bundling messages
- Resolved
- links to