Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3211

.take() is OOM-prone when there are empty partitions

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.2
    • Fix Version/s: 1.1.1, 1.2.0
    • Component/s: Spark Core
    • Labels:
      None

      Description

      Filed on dev@ on 22 August by Paul Nepywoda:

      On line 777
      https://github.com/apache/spark/commit/42571d30d0d518e69eecf468075e4c5a823a2ae8#diff-1d55e54678eff2076263f2fe36150c17R771
      the logic for take() reads ALL partitions if the first one (or first k) are
      empty. This has actually lead to OOMs when we had many partitions
      (thousands) and unfortunately the first one was empty.

      Wouldn't a better implementation strategy be

      numPartsToTry = partsScanned * 2

      instead of

      numPartsToTry = totalParts - 1

      (this doubling is similar to most memory allocation strategies)

      Thanks!

      • Paul

        Attachments

          Activity

            People

            • Assignee:
              aash Andrew Ash
              Reporter:
              aash Andrew Ash
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: