Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9499

Possible file handle leak in spilling/sort code

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Duplicate
    • None
    • 1.5.0
    • SQL
    • None
    • Spark 1.5 doc/QA sprint

    Description

      As reported by hvanhovell. See SPARK-8850.

      Hi,

      I am getting a Too many open files error since the unsafe mode is on. The same thing popped up when playing with unsafe before. The error is below:

      15/07/30 23:37:29 WARN TaskSetManager: Lost task 2.0 in stage 33.0 (TID 2423, localhost): java.io.FileNotFoundException: /tmp/blockmgr-b3d3e14a-f313-4075-8082-7d97f012e35a/14/temp_shuffle_1cab42fa-dcb1-4114-ae53-1674446f9dac (Too many open files)
      	at java.io.FileOutputStream.open0(Native Method)
      	at java.io.FileOutputStream.open(FileOutputStream.java:270)
      	at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
      	at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88)
      	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:111)
      	at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:71)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
      	at org.apache.spark.scheduler.Task.run(Task.scala:86)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      

      I am currently working on local mode (which is probably the cause of the problem) using the following command line:

      $SPARK_HOME/bin/spark-shell --master local[*] --driver-memory 14G --driver-library-path $HADOOP_NATIVE_LIB
      

      The maximum number of files I can open are 1024 (ulimit -n). I have tried to run the same code with an increased limit, but this didn't work out.

      Dump of all open files after a Too Many Files Open error.
      The command used to make the dump:

      lsof -c java > open
      

      The job starts crashing after as soon as I start sorting 10000000 rows for the 9th time (doing benchmarking). I guess files are left open after every benchmark? Is there a way to trigger the closing of files?

      Attachments

        1. open.files.II.txt
          380 kB
          Herman van Hövell
        2. perf_test4.scala
          5 kB
          Herman van Hövell

        Issue Links

          Activity

            People

              davies Davies Liu
              rxin Reynold Xin
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: