Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4195

retry to fetch blocks's result when fetchfailed's reason is connection timeout

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Spark Core
    • Labels:
      None

      Description

      when there are many executors in a application(example:1000),Connection timeout often occure.Exception is:
      WARN nio.SendingConnection: Error finishing connection
      java.net.ConnectException: Connection timed out
      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
      at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
      at org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:342)
      at org.apache.spark.network.nio.ConnectionManager$$anon$11.run(ConnectionManager.scala:273)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:744)
      that will make driver as these executors are lost,but in fact these executors are alive.so add retry mechanism to reduce the probability of the occurrence of this problem.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                adav Aaron Davidson
                Reporter:
                lianhuiwang Lianhui Wang
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: