Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23994

Add Host To Blacklist If Shuffle Cannot Complete

    XMLWordPrintableJSON

    Details

      Description

      If a node cannot be reached for shuffling data, add the node to the blacklist and retry the current stage.

      2018-04-10 20:25:55,065 ERROR [Block Fetch Retry-3] shuffle.RetryingBlockFetcher (RetryingBlockFetcher.java:fetchAllOutstanding(142)) - Exception while beginning fetch of 711 outstanding blocks (after 3 retries)
      java.io.IOException: Failed to connect to host.local/10.11.12.13:7337
      	at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
      	at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
      	at org.apache.spark.network.shuffle.ExternalShuffleClient$1.createAndStart(ExternalShuffleClient.java:105)
      	at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
      	at org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
      	at org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:748)
      Caused by: java.net.ConnectException: Connection refused: host.local/10.11.12.13:7337
      	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
      	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
      	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
      	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
      	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
      	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
      	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
      	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
      	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
      	... 1 more
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                belugabehr David Mollitor
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: