Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5539

TimelineClient failed to retry on "java.net.SocketTimeoutException: Read timed out"

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha2
    • Component/s: yarn
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      AM fails with the following exception

      FATAL distributedshell.ApplicationMaster: Error running ApplicationMaster
      com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
      	at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
      	at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter$1.run(TimelineClientImpl.java:236)
      	at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:185)
      	at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:247)
      	at com.sun.jersey.api.client.Client.handle(Client.java:648)
      	at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
      	at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
      	at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
      	at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPostingObject(TimelineWriter.java:154)
      	at org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:115)
      	at org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:112)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:422)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
      	at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPosting(TimelineWriter.java:112)
      	at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:92)
      	at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:345)
      	at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1166)
      	at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:567)
      	at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:298)
      Caused by: java.net.SocketTimeoutException: Read timed out
      	at java.net.SocketInputStream.socketRead0(Native Method)
      	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
      	at java.net.SocketInputStream.read(SocketInputStream.java:170)
      	at java.net.SocketInputStream.read(SocketInputStream.java:141)
      	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
      	at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
      	at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
      	at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
      	at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
      	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536)
      	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
      	at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
      	at org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:253)
      	at org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:77)
      	at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:132)
      	at org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:216)
      	at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.openConnection(DelegationTokenAuthenticatedURL.java:322)
      	at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineURLConnectionFactory.getHttpURLConnection(TimelineClientImpl.java:472)
      	at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:159)
      	at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147)
      	... 19 more
      
      1. YARN-5539.patch
        1.0 kB
        Junping Du

        Activity

        Hide
        ssathish@hortonworks.com Sumana Sathish added a comment -

        Not able to reproduce the issue.

        Show
        ssathish@hortonworks.com Sumana Sathish added a comment - Not able to reproduce the issue.
        Hide
        djp Junping Du added a comment -

        I think this exception hints our TimelineClient retry logic leak the exception in SokectTimeout case other than ConnectException.

                public boolean shouldRetryOn(Exception e) {
                  // Only retry on connection exceptions
                  return (e instanceof ClientHandlerException)
                      && (e.getCause() instanceof ConnectException);
                }
        

        This is a valid issue but only can be found in very occasional cases.
        Reopen this issue to address the corner case. Will put up a patch soon!

        Show
        djp Junping Du added a comment - I think this exception hints our TimelineClient retry logic leak the exception in SokectTimeout case other than ConnectException. public boolean shouldRetryOn(Exception e) { // Only retry on connection exceptions return (e instanceof ClientHandlerException) && (e.getCause() instanceof ConnectException); } This is a valid issue but only can be found in very occasional cases. Reopen this issue to address the corner case. Will put up a patch soon!
        Hide
        djp Junping Du added a comment -

        Attach a quick patch to fix for corner case here. The fix It is very straight-forward, however, to add a unit test is not straightforward as exception handling is embedded. Should be fine to simply fix here without UT.

        Show
        djp Junping Du added a comment - Attach a quick patch to fix for corner case here. The fix It is very straight-forward, however, to add a unit test is not straightforward as exception handling is embedded. Should be fine to simply fix here without UT.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 13s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 mvninstall 7m 39s trunk passed
        +1 compile 0m 38s trunk passed
        +1 checkstyle 0m 24s trunk passed
        +1 mvnsite 0m 36s trunk passed
        +1 mvneclipse 0m 18s trunk passed
        +1 findbugs 1m 30s trunk passed
        +1 javadoc 0m 31s trunk passed
        +1 mvninstall 0m 28s the patch passed
        +1 compile 0m 25s the patch passed
        +1 javac 0m 25s the patch passed
        +1 checkstyle 0m 18s the patch passed
        +1 mvnsite 0m 31s the patch passed
        +1 mvneclipse 0m 11s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 14s the patch passed
        +1 javadoc 0m 27s the patch passed
        +1 unit 2m 26s hadoop-yarn-common in the patch passed.
        +1 asflicense 0m 16s The patch does not generate ASF License warnings.
        18m 50s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12829828/YARN-5539.patch
        JIRA Issue YARN-5539
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 0fe56146ad40 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 537095d
        Default Java 1.8.0_101
        findbugs v3.0.0
        Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13187/testReport/
        modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
        Console output https://builds.apache.org/job/PreCommit-YARN-Build/13187/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 7m 39s trunk passed +1 compile 0m 38s trunk passed +1 checkstyle 0m 24s trunk passed +1 mvnsite 0m 36s trunk passed +1 mvneclipse 0m 18s trunk passed +1 findbugs 1m 30s trunk passed +1 javadoc 0m 31s trunk passed +1 mvninstall 0m 28s the patch passed +1 compile 0m 25s the patch passed +1 javac 0m 25s the patch passed +1 checkstyle 0m 18s the patch passed +1 mvnsite 0m 31s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 14s the patch passed +1 javadoc 0m 27s the patch passed +1 unit 2m 26s hadoop-yarn-common in the patch passed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 18m 50s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12829828/YARN-5539.patch JIRA Issue YARN-5539 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 0fe56146ad40 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 537095d Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13187/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common Console output https://builds.apache.org/job/PreCommit-YARN-Build/13187/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        varun_saxena Varun Saxena added a comment -

        +1
        Will commit it shortly.

        Show
        varun_saxena Varun Saxena added a comment - +1 Will commit it shortly.
        Hide
        varun_saxena Varun Saxena added a comment -

        Committed to trunk, branch-2 and branch-2.8
        Thanks Junping Du for your contribution.

        Show
        varun_saxena Varun Saxena added a comment - Committed to trunk, branch-2 and branch-2.8 Thanks Junping Du for your contribution.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10478 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10478/)
        YARN-5539. TimelineClient failed to retry on (varunsaxena: rev b8a2d7b8fc96302ba1ef99d24392f463734f1b82)

        • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10478 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10478/ ) YARN-5539 . TimelineClient failed to retry on (varunsaxena: rev b8a2d7b8fc96302ba1ef99d24392f463734f1b82) (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java

          People

          • Assignee:
            djp Junping Du
            Reporter:
            ssathish@hortonworks.com Sumana Sathish
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development