Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4195

retry to fetch blocks's result when fetchfailed's reason is connection timeout

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • Spark Core
    • None

    Description

      when there are many executors in a application(example:1000),Connection timeout often occure.Exception is:
      WARN nio.SendingConnection: Error finishing connection
      java.net.ConnectException: Connection timed out
      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
      at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
      at org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:342)
      at org.apache.spark.network.nio.ConnectionManager$$anon$11.run(ConnectionManager.scala:273)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:744)
      that will make driver as these executors are lost,but in fact these executors are alive.so add retry mechanism to reduce the probability of the occurrence of this problem.

      Attachments

        Issue Links

          Activity

            People

              adav Aaron Davidson
              lianhuiwang Lianhui Wang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: