Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-12464

Interrupted client may try to fail-over and retry

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 2.7.2, 3.0.0-alpha1
    • Component/s: ipc
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      When an IPC client is interrupted, it sometimes try to fail-over to a different namenode and retry. We've seen this causing hang during shutdown.

        Issue Links

          Activity

          Hide
          kihwal Kihwal Lee added a comment -

          If an interrupt is noticed while sending the request, Client#call() catches InterruptedException, wrap it in IOException and throw. Since RetryPolicies#FailoverOnNetworkExceptionRetry regards all IOException}}s as local exception, it prescribes {{RetryAction.FAILOVER_AND_RETRY. We can make the client bail out early in RetryInvocationHandler before it even consults RetryPolicies.

          Another potential place of hang is after making the call. In Client#call(), when waiting for a response, it catches InterruptedException to sets a local variable and continues to wait for a response. The connection thread has to either get a response or get shutdown and notify the client for it to exit the loop. It allows graceful termination, but may cause extra delay or hang during shutdown. I am not sure what is the harm in throwing right away when an InterruptedException is received.

          Show
          kihwal Kihwal Lee added a comment - If an interrupt is noticed while sending the request, Client#call() catches InterruptedException , wrap it in IOException and throw. Since RetryPolicies#FailoverOnNetworkExceptionRetry regards all IOException}}s as local exception, it prescribes {{RetryAction.FAILOVER_AND_RETRY . We can make the client bail out early in RetryInvocationHandler before it even consults RetryPolicies . Another potential place of hang is after making the call. In Client#call() , when waiting for a response, it catches InterruptedException to sets a local variable and continues to wait for a response. The connection thread has to either get a response or get shutdown and notify the client for it to exit the loop. It allows graceful termination, but may cause extra delay or hang during shutdown. I am not sure what is the harm in throwing right away when an InterruptedException is received.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 17m 49s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 javac 8m 1s There were no new javac warning messages.
          +1 javadoc 10m 25s There were no new javadoc warning messages.
          -1 release audit 0m 21s The applied patch generated 1 release audit warnings.
          +1 checkstyle 1m 7s There were no new checkstyle issues.
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 29s mvn install still works.
          +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
          +1 findbugs 1m 55s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          -1 common tests 7m 37s Tests failed in hadoop-common.
              49m 22s  



          Reason Tests
          Failed unit tests hadoop.metrics2.impl.TestGangliaMetrics



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12765394/HADOOP-12464.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 61b3547
          Release Audit https://builds.apache.org/job/PreCommit-HADOOP-Build/7776/artifact/patchprocess/patchReleaseAuditProblems.txt
          hadoop-common test log https://builds.apache.org/job/PreCommit-HADOOP-Build/7776/artifact/patchprocess/testrun_hadoop-common.txt
          Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/7776/testReport/
          Java 1.7.0_55
          uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/7776/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 17m 49s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 8m 1s There were no new javac warning messages. +1 javadoc 10m 25s There were no new javadoc warning messages. -1 release audit 0m 21s The applied patch generated 1 release audit warnings. +1 checkstyle 1m 7s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 29s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 1m 55s The patch does not introduce any new Findbugs (version 3.0.0) warnings. -1 common tests 7m 37s Tests failed in hadoop-common.     49m 22s   Reason Tests Failed unit tests hadoop.metrics2.impl.TestGangliaMetrics Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12765394/HADOOP-12464.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 61b3547 Release Audit https://builds.apache.org/job/PreCommit-HADOOP-Build/7776/artifact/patchprocess/patchReleaseAuditProblems.txt hadoop-common test log https://builds.apache.org/job/PreCommit-HADOOP-Build/7776/artifact/patchprocess/testrun_hadoop-common.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/7776/testReport/ Java 1.7.0_55 uname Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/7776/console This message was automatically generated.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          +1

          I've seen the situation where a blocked/hanging HDFS client won't shut down...this may be related.

          Show
          stevel@apache.org Steve Loughran added a comment - +1 I've seen the situation where a blocked/hanging HDFS client won't shut down...this may be related.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          Linking to the issues I've seen with hanging on shutdown. Do these stack traces match what you've seen?

          We could have a test for this BTW: find a free port, create an hdfs client of hdfs://localhost/$port, try to use it and then try to shut down the FS

          Show
          stevel@apache.org Steve Loughran added a comment - Linking to the issues I've seen with hanging on shutdown. Do these stack traces match what you've seen? We could have a test for this BTW: find a free port, create an hdfs client of hdfs://localhost/$port, try to use it and then try to shut down the FS
          Hide
          stevel@apache.org Steve Loughran added a comment -

          HADOOP-12418 may be a race condition where this intermittently surfaces (especially on Java8)

          Show
          stevel@apache.org Steve Loughran added a comment - HADOOP-12418 may be a race condition where this intermittently surfaces (especially on Java8)
          Hide
          kihwal Kihwal Lee added a comment -

          The client hang during making connection (HADOOP-10219) is a different problem. The thread that is making connection is never interrupted. I posted a patch there.

          Show
          kihwal Kihwal Lee added a comment - The client hang during making connection ( HADOOP-10219 ) is a different problem. The thread that is making connection is never interrupted. I posted a patch there.
          Hide
          stevel@apache.org Steve Loughran added a comment -

          +1 —I'll let you do the commit

          Show
          stevel@apache.org Steve Loughran added a comment - +1 —I'll let you do the commit
          Hide
          kihwal Kihwal Lee added a comment -

          Thanks for the review, Steve Loughran. I've committed this to trunk, branch-2 and branch-2.7.

          Show
          kihwal Kihwal Lee added a comment - Thanks for the review, Steve Loughran . I've committed this to trunk, branch-2 and branch-2.7.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8664 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8664/)
          HADOOP-12464. Interrupted client may try to fail-over and retry. (kihwal: rev 6144e0137bb51bd04b46ea5ce42c59c2d4f7657d)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8664 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8664/ ) HADOOP-12464 . Interrupted client may try to fail-over and retry. (kihwal: rev 6144e0137bb51bd04b46ea5ce42c59c2d4f7657d) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #551 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/551/)
          HADOOP-12464. Interrupted client may try to fail-over and retry. (kihwal: rev 6144e0137bb51bd04b46ea5ce42c59c2d4f7657d)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #551 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/551/ ) HADOOP-12464 . Interrupted client may try to fail-over and retry. (kihwal: rev 6144e0137bb51bd04b46ea5ce42c59c2d4f7657d) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #567 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/567/)
          HADOOP-12464. Interrupted client may try to fail-over and retry. (kihwal: rev 6144e0137bb51bd04b46ea5ce42c59c2d4f7657d)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #567 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/567/ ) HADOOP-12464 . Interrupted client may try to fail-over and retry. (kihwal: rev 6144e0137bb51bd04b46ea5ce42c59c2d4f7657d) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #1288 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1288/)
          HADOOP-12464. Interrupted client may try to fail-over and retry. (kihwal: rev 6144e0137bb51bd04b46ea5ce42c59c2d4f7657d)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #1288 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1288/ ) HADOOP-12464 . Interrupted client may try to fail-over and retry. (kihwal: rev 6144e0137bb51bd04b46ea5ce42c59c2d4f7657d) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2500 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2500/)
          HADOOP-12464. Interrupted client may try to fail-over and retry. (kihwal: rev 6144e0137bb51bd04b46ea5ce42c59c2d4f7657d)

          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2500 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2500/ ) HADOOP-12464 . Interrupted client may try to fail-over and retry. (kihwal: rev 6144e0137bb51bd04b46ea5ce42c59c2d4f7657d) hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #514 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/514/)
          HADOOP-12464. Interrupted client may try to fail-over and retry. (kihwal: rev 6144e0137bb51bd04b46ea5ce42c59c2d4f7657d)

          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
          • hadoop-common-project/hadoop-common/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #514 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/514/ ) HADOOP-12464 . Interrupted client may try to fail-over and retry. (kihwal: rev 6144e0137bb51bd04b46ea5ce42c59c2d4f7657d) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java hadoop-common-project/hadoop-common/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2451 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2451/)
          HADOOP-12464. Interrupted client may try to fail-over and retry. (kihwal: rev 6144e0137bb51bd04b46ea5ce42c59c2d4f7657d)

          • hadoop-common-project/hadoop-common/CHANGES.txt
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
          • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2451 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2451/ ) HADOOP-12464 . Interrupted client may try to fail-over and retry. (kihwal: rev 6144e0137bb51bd04b46ea5ce42c59c2d4f7657d) hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryInvocationHandler.java

            People

            • Assignee:
              kihwal Kihwal Lee
              Reporter:
              kihwal Kihwal Lee
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development