Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4195

retry to fetch blocks's result when fetchfailed's reason is connection timeout

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Spark Core
    • Labels:
      None

      Description

      when there are many executors in a application(example:1000),Connection timeout often occure.Exception is:
      WARN nio.SendingConnection: Error finishing connection
      java.net.ConnectException: Connection timed out
      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
      at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
      at org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:342)
      at org.apache.spark.network.nio.ConnectionManager$$anon$11.run(ConnectionManager.scala:273)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:744)
      that will make driver as these executors are lost,but in fact these executors are alive.so add retry mechanism to reduce the probability of the occurrence of this problem.

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              adav Aaron Davidson
              Reporter:
              lianhuiwang Lianhui Wang

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment