Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25035

Replicating disk-stored blocks should avoid memory mapping

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.1
    • Fix Version/s: 3.0.0
    • Component/s: Spark Core
    • Labels:

      Description

      This is a follow-up to SPARK-24296.

      When replicating a disk-cached block, even if we fetch-to-disk, we still memory-map the file, just to copy it to another location.

      Ideally we'd just move the tmp file to the right location. But even without that, we could read the file as an input stream, instead of memory-mapping the whole thing. Memory-mapping is particularly a problem when running under yarn, as the OS may believe there is plenty of memory available, meanwhile yarn decides to kill the process for exceeding memory limits.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                attilapiros Attila Zsolt Piros
                Reporter:
                irashid Imran Rashid
              • Votes:
                2 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: