Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17300

ClosedChannelException caused by missing block manager when speculative tasks are killed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • None
    • None
    • None

    Description

      We recently backported SPARK-10530 to our Spark build, which kills unnecessary duplicate/speculative tasks when one completes (either a speculative task or the original). In large jobs with 500+ executors, this caused some executors to die and resulted in the same error that was fixed by SPARK-15262: ClosedChannelException when trying to connect to the block manager on affected hosts.

      java.nio.channels.ClosedChannelException
      	at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
      	at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
      	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
      	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
      	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
      	at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
      	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
      	at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
      	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
      	at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
      	at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
      	at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
      	at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
      	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
      	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
      	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
      	at java.lang.Thread.run(Thread.java:745)
      Caused by: java.nio.channels.ClosedChannelException
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rdblue Ryan Blue
              Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: