Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21475

Change to use NIO's Files API for external shuffle service

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.3.0
    • 2.3.0
    • Shuffle, Spark Core
    • None

    Description

      Java's FileInputStream and FileOutputStream overrides finalize(), even this file input/output stream is closed correctly and promptly, it will still leave some memory footprints which will get cleaned in Full GC. This will introduce two side effects:

      1. Lots of memory footprints regarding to Finalizer will be kept in memory and this will increase the memory overhead. In our use case of external shuffle service, a busy shuffle service will have bunch of this object and potentially lead to OOM.
      2. The Finalizer will only be called in Full GC, and this will increase the overhead of Full GC and lead to long GC pause.

      So to fix this potential issue, here propose to use NIO's Files#newInput/OutputStream instead in some critical paths like shuffle.

      https://www.cloudbees.com/blog/fileinputstream-fileoutputstream-considered-harmful

      Attachments

        Activity

          People

            jerryshao Saisai Shao
            jerryshao Saisai Shao
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: