Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-3376

DFSClient fails to make connection to DN if there are many unusable cached sockets

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: 2.0.0-alpha
    • Component/s: hdfs-client
    • Labels:
      None

      Description

      After fixing the datanode side of keepalive to properly disconnect stale clients, (HDFS-3357), the client side has the following issue: when it connects to a DN, it first tries to use cached sockets, and will try a configurable number of sockets from the cache. If there are more cached sockets than the configured number of retries, and all of them have been closed by the datanode side, then the client will throw an exception and mark the replica node as dead.

      1. hdfs-3376.txt
        4 kB
        Todd Lipcon

        Activity

        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12525751/hdfs-3376.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        -1 javadoc. The javadoc tool appears to have generated 2 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2382//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2382//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525751/hdfs-3376.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. -1 javadoc. The javadoc tool appears to have generated 2 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2382//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2382//console This message is automatically generated.
        Hide
        Eli Collins added a comment -

        +1 looks good

        Show
        Eli Collins added a comment - +1 looks good
        Hide
        Todd Lipcon added a comment -

        Committed to 2.0 and trunk. Thanks for the review.

        Show
        Todd Lipcon added a comment - Committed to 2.0 and trunk. Thanks for the review.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #2272 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2272/)
        HDFS-3376. DFSClient fails to make connection to DN if there are many unusable cached sockets. Contributed by Todd Lipcon. (Revision 1335115)

        Result = SUCCESS
        todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1335115
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferKeepalive.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2272 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2272/ ) HDFS-3376 . DFSClient fails to make connection to DN if there are many unusable cached sockets. Contributed by Todd Lipcon. (Revision 1335115) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1335115 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferKeepalive.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #2197 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2197/)
        HDFS-3376. DFSClient fails to make connection to DN if there are many unusable cached sockets. Contributed by Todd Lipcon. (Revision 1335115)

        Result = SUCCESS
        todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1335115
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferKeepalive.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2197 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2197/ ) HDFS-3376 . DFSClient fails to make connection to DN if there are many unusable cached sockets. Contributed by Todd Lipcon. (Revision 1335115) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1335115 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferKeepalive.java
        Hide
        Tsz Wo Nicholas Sze added a comment -
        +      // Don't use the cache on the last attempt - it's possible that there
        +      // are arbitrarily many unusable sockets in the cache, but we don't
        +      // want to fail the read.
        

        Just a question: Will the unusable sockets be closed and removed from the cache?

        Show
        Tsz Wo Nicholas Sze added a comment - + // Don't use the cache on the last attempt - it's possible that there + // are arbitrarily many unusable sockets in the cache, but we don't + // want to fail the read. Just a question: Will the unusable sockets be closed and removed from the cache?
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #2214 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2214/)
        HDFS-3376. DFSClient fails to make connection to DN if there are many unusable cached sockets. Contributed by Todd Lipcon. (Revision 1335115)

        Result = ABORTED
        todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1335115
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferKeepalive.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2214 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2214/ ) HDFS-3376 . DFSClient fails to make connection to DN if there are many unusable cached sockets. Contributed by Todd Lipcon. (Revision 1335115) Result = ABORTED todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1335115 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferKeepalive.java
        Hide
        Todd Lipcon added a comment -

        Just a question: Will the unusable sockets be closed and removed from the cache?

        Yes, if it pulls a socket which is unusable, then attempts to use it, it will get an EOF exception, swallow it, and then not re-insert it into the cache.

        Show
        Todd Lipcon added a comment - Just a question: Will the unusable sockets be closed and removed from the cache? Yes, if it pulls a socket which is unusable, then attempts to use it, it will get an EOF exception, swallow it, and then not re-insert it into the cache.
        Hide
        Robert Joseph Evans added a comment -

        Hey Todd,

        I have been trying to follow some of the fixes you have been putting into the HDFS socket caching. I was wondering if you would be willing to pull HDFS-3357 and this one, HDFS-3376, into branch-0.23. They both seem to apply cleanly, but I am not an HDFS committer to do this myself.

        Show
        Robert Joseph Evans added a comment - Hey Todd, I have been trying to follow some of the fixes you have been putting into the HDFS socket caching. I was wondering if you would be willing to pull HDFS-3357 and this one, HDFS-3376 , into branch-0.23. They both seem to apply cleanly, but I am not an HDFS committer to do this myself.
        Hide
        Todd Lipcon added a comment -

        Hi Bobby. I think we need to do the following series: HADOOP-8280, HADOOP-8350, HDFS-3357, then this one. Does that look good to you? The reason for HADOOP-8280 is that the test for HADOOP-8350 depends on GenericTestUtils being in common.

        Show
        Todd Lipcon added a comment - Hi Bobby. I think we need to do the following series: HADOOP-8280 , HADOOP-8350 , HDFS-3357 , then this one. Does that look good to you? The reason for HADOOP-8280 is that the test for HADOOP-8350 depends on GenericTestUtils being in common.
        Hide
        Aaron T. Myers added a comment -

        Hi Bobby,

        They both seem to apply cleanly, but I am not an HDFS committer to do this myself.

        I'm under the impression that it's acceptable for release managers to do back-ports to the branches they're managing regardless of what sub-project they're a committer for.

        Show
        Aaron T. Myers added a comment - Hi Bobby, They both seem to apply cleanly, but I am not an HDFS committer to do this myself. I'm under the impression that it's acceptable for release managers to do back-ports to the branches they're managing regardless of what sub-project they're a committer for.
        Hide
        Robert Joseph Evans added a comment -

        Todd,

        You are much more of an expert on this then I am. I think HADOOP-8280 and HADOOP-8350 look fine to pull in too. Thanks for the help with this.

        Aaron,

        I spoke with Suresh off-line about it when I took over release manager for branch-0.23, as I was curious about it. He thought that I could not. I don't really see it being too much of a problem just yet, because there have not been very many HDFS issues that are applicable to branch-0.23. Although I am in the process of going through the full HDFS list to see if I have missed anything.

        Show
        Robert Joseph Evans added a comment - Todd, You are much more of an expert on this then I am. I think HADOOP-8280 and HADOOP-8350 look fine to pull in too. Thanks for the help with this. Aaron, I spoke with Suresh off-line about it when I took over release manager for branch-0.23, as I was curious about it. He thought that I could not. I don't really see it being too much of a problem just yet, because there have not been very many HDFS issues that are applicable to branch-0.23. Although I am in the process of going through the full HDFS list to see if I have missed anything.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1038 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1038/)
        HDFS-3376. DFSClient fails to make connection to DN if there are many unusable cached sockets. Contributed by Todd Lipcon. (Revision 1335115)

        Result = FAILURE
        todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1335115
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferKeepalive.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1038 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1038/ ) HDFS-3376 . DFSClient fails to make connection to DN if there are many unusable cached sockets. Contributed by Todd Lipcon. (Revision 1335115) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1335115 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferKeepalive.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1073 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1073/)
        HDFS-3376. DFSClient fails to make connection to DN if there are many unusable cached sockets. Contributed by Todd Lipcon. (Revision 1335115)

        Result = SUCCESS
        todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1335115
        Files :

        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
        • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferKeepalive.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1073 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1073/ ) HDFS-3376 . DFSClient fails to make connection to DN if there are many unusable cached sockets. Contributed by Todd Lipcon. (Revision 1335115) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1335115 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferKeepalive.java
        Hide
        stack added a comment -

        +1 on pulling the hadoop-8280, etc., series into 0.23 branch.

        Show
        stack added a comment - +1 on pulling the hadoop-8280, etc., series into 0.23 branch.
        Hide
        Daryn Sharp added a comment -

        Perhaps a naive question, but why can't socket.isClosed() be used to determine if the socket is unusable? The closed sockets could be skipped and removed from the cache.

        Show
        Daryn Sharp added a comment - Perhaps a naive question, but why can't socket.isClosed() be used to determine if the socket is unusable? The closed sockets could be skipped and removed from the cache.
        Hide
        Todd Lipcon added a comment -

        Perhaps a naive question, but why can't socket.isClosed() be used to determine if the socket is unusable? The closed sockets could be skipped and removed from the cache.

        Unfortunately the .isClosed() method just checks a local flag which is set by close(). Here's the JDK source:

            public boolean isClosed() {
                synchronized(closeLock) {
                    return closed;
                }
            }
        

        It may be possible to determine closed-ness by setting up a selector and selecting only for errors, but that seems somewhat complicated and for not much gain.

        Show
        Todd Lipcon added a comment - Perhaps a naive question, but why can't socket.isClosed() be used to determine if the socket is unusable? The closed sockets could be skipped and removed from the cache. Unfortunately the .isClosed() method just checks a local flag which is set by close(). Here's the JDK source: public boolean isClosed() { synchronized (closeLock) { return closed; } } It may be possible to determine closed-ness by setting up a selector and selecting only for errors, but that seems somewhat complicated and for not much gain.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-0.23-Build #309 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/309/)
        HDFS-3376. DFSClient fails to make connection to DN if there are many unusable cached sockets. Contributed by Todd Lipcon. (Revision 1359221)

        Result = SUCCESS
        daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359221
        Files :

        • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
        • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferKeepalive.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #309 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/309/ ) HDFS-3376 . DFSClient fails to make connection to DN if there are many unusable cached sockets. Contributed by Todd Lipcon. (Revision 1359221) Result = SUCCESS daryn : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359221 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferKeepalive.java

          People

          • Assignee:
            Todd Lipcon
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development