Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.0.2
-
None
Description
Filed on dev@ on 22 August by pnepywoda:
On line 777
https://github.com/apache/spark/commit/42571d30d0d518e69eecf468075e4c5a823a2ae8#diff-1d55e54678eff2076263f2fe36150c17R771
the logic for take() reads ALL partitions if the first one (or first k) are
empty. This has actually lead to OOMs when we had many partitions
(thousands) and unfortunately the first one was empty.Wouldn't a better implementation strategy be
numPartsToTry = partsScanned * 2
instead of
numPartsToTry = totalParts - 1
(this doubling is similar to most memory allocation strategies)
Thanks!
- Paul
Attachments
Issue Links
- links to