Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-44126

Migration shuffle to decommissioned executor should not count as block failure

    XMLWordPrintableJSON

Details

    Description

      When shuffle migration to decommissioned executor, the below exception is thrown:

      org.apache.spark.SparkException: Exception thrown in awaitResult:     at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)    at org.apache.spark.network.BlockTransferService.uploadBlockSync(BlockTransferService.scala:122)    at org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$5(BlockManagerDecommissioner.scala:127)    at org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$5$adapted(BlockManagerDecommissioner.scala:118)    at scala.collection.immutable.List.foreach(List.scala:431)    at org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.run(BlockManagerDecommissioner.scala:118)    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)    at java.base/java.lang.Thread.run(Thread.java:829)Caused by: java.lang.RuntimeException: org.apache.spark.storage.BlockSavedOnDecommissionedBlockManagerException: Block shuffle_2_6429_0.data cannot be saved on decommissioned executor    at org.apache.spark.errors.SparkCoreErrors$.cannotSaveBlockOnDecommissionedExecutorError(SparkCoreErrors.scala:238)    at org.apache.spark.storage.BlockManager.checkShouldStore(BlockManager.scala:277)    at org.apache.spark.storage.BlockManager.putBlockDataAsStream(BlockManager.scala:741)    at org.apache.spark.network.netty.NettyBlockRpcServer.receiveStream(NettyBlockRpcServer.scala:174)    at org.apache.spark.network.server.AbstractAuthRpcHandler.receiveStream(AbstractAuthRpcHandler.java:78)    at org.apache.spark.network.server.TransportRequestHandler.processStreamUpload(TransportRequestHandler.java:202)    at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:115)    at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)    at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)    at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)    at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)    at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)    at org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:190)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)    at java.base/java.lang.Thread.run(Thread.java:829)
      

      Then this count as block migration failure, which should not.

       

      Attachments

        Issue Links

          Activity

            People

              warrenzhu25 Zhongwei Zhu
              warrenzhu25 Zhongwei Zhu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: