Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21475

Change to use NIO's Files API for external shuffle service

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.3.0
    • Component/s: Shuffle, Spark Core
    • Labels:
      None

      Description

      Java's FileInputStream and FileOutputStream overrides finalize(), even this file input/output stream is closed correctly and promptly, it will still leave some memory footprints which will get cleaned in Full GC. This will introduce two side effects:

      1. Lots of memory footprints regarding to Finalizer will be kept in memory and this will increase the memory overhead. In our use case of external shuffle service, a busy shuffle service will have bunch of this object and potentially lead to OOM.
      2. The Finalizer will only be called in Full GC, and this will increase the overhead of Full GC and lead to long GC pause.

      So to fix this potential issue, here propose to use NIO's Files#newInput/OutputStream instead in some critical paths like shuffle.

      https://www.cloudbees.com/blog/fileinputstream-fileoutputstream-considered-harmful

        Attachments

          Activity

            People

            • Assignee:
              jerryshao Saisai Shao
              Reporter:
              jerryshao Saisai Shao
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: