Details
Description
For huge reduce tasks, user will got out of memory exception when all the data can not fit in memory.
It should put some of the data into disks and then merge them together, just like what we do in Scala.
Attachments
Issue Links
- is duplicated by
-
SPARK-2021 External hashing in PySpark
- Resolved
- is related to
-
SPARK-2876 RDD.partitionBy loads entire partition into memory
- Resolved
- links to