Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34788

Spark throws FileNotFoundException instead of IOException when disk is full

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.2.0
    • Fix Version/s: None
    • Component/s: Shuffle, Spark Core
    • Labels:
      None

      Description

      When the disk is full, Spark throws FileNotFoundException instead of IOException with the hint. It's quite a confusing error to users:

      9/03/26 09:03:45 ERROR ShuffleBlockFetcherIterator: Failed to create input stream from local block
      java.io.IOException: Error in reading FileSegmentManagedBuffer{file=/local_disk0/spark-c2f26f02-2572-4764-815a-cbba65ddb315/executor-b4b76a4c-788c-4cb6-b904-664a883be1aa/blockmgr-36804371-24fe-4131-a3dc-00b7f98f3a3e/11/shuffle_113_1029_0.data, offset=110254956, length=1875458}
      	at org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:111)
      	at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:442)
      	at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:64)
      	at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
      	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
      	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
      	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
      	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
      	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
      	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.sort_addToSorter_0$(Unknown Source)
      	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
      	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
      	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:622)
      	at org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:98)
      	at org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:95)
      	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:839)
      	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:839)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:340)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:304)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:340)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:304)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:340)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:304)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
      	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:139)
      	at org.apache.spark.scheduler.Task.run(Task.scala:112)
      	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:497)
      	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1432)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:503)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      	at java.lang.Thread.run(Thread.java:748)
      Caused by: java.io.FileNotFoundException: /local_disk0/spark-c2f26f02-2572-4764-815a-cbba65ddb315/executor-b4b76a4c-788c-4cb6-b904-664a883be1aa/blockmgr-36804371-24fe-4131-a3dc-00b7f98f3a3e/11/shuffle_113_1029_0.data (No such file or directory)
      	at java.io.FileInputStream.open0(Native Method)
      	at java.io.FileInputStream.open(FileInputStream.java:195)
      	at java.io.FileInputStream.<init>(FileInputStream.java:138)
      	at org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:100)
      	... 35 more

      (The cause only says the file is not found, but we believe it's highly possible due to the disk full issue after investigation.)

      And there's probably a way to detect the disk full: when we get `FileNotFoundException`, we try http://weblog.janek.org/Archive/2004/12/20/ExceptionWhenWritingToAFu.html to see if SyncFailedException throws. If SyncFailedException throws, then we throw IOException with the disk full hint.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Ngone51 wuyi

              Dates

              • Created:
                Updated:

                Issue deployment