Description
With large partition, pyspark may exceeds executor memory limit and trigger out of memory for python 2.7.
This is because map() is used. Unlike in python3.x, python 2.7 map() will generate a list and need to read all data into memory.
The proposed fix will use imap in python 2.7 and it has been verified.
Attachments
Issue Links
- links to