Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.5.0, 3.0.0-alpha1
    • Fix Version/s: 2.6.0
    • Component/s: hdfs-client
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Input streams lost their timeout. The problem appears to be DFSClient#newConnectedPeer does not set the read timeout. During a temporary network interruption the server will close the socket, unbeknownst to the client host, which blocks on a read forever.

      The results are dire. Services such as the RM, JHS, NMs, oozie servers, etc all need to be restarted to recover - unless you want to wait many hours for the tcp stack keepalive to detect the broken socket.

        Issue Links

          Activity

          Hide
          ndimiduk Nick Dimiduk added a comment -

          Hi Karthik Kambatla. Thanks again for the offer. HBase community has decided to lax it's position [0] on dependency project version changes for the time being and will release 1.1 against Hadoop 2.6.0. In general, we would appreciate a release patch cadence that includes bug fixes that do not impact user-facing APIs and wire compatibility in addition to security fixes. However this is probably a discussion for a wider audience.

          [0]: http://markmail.org/message/4sv4hapzxkeqbk3d

          Show
          ndimiduk Nick Dimiduk added a comment - Hi Karthik Kambatla . Thanks again for the offer. HBase community has decided to lax it's position [0] on dependency project version changes for the time being and will release 1.1 against Hadoop 2.6.0. In general, we would appreciate a release patch cadence that includes bug fixes that do not impact user-facing APIs and wire compatibility in addition to security fixes. However this is probably a discussion for a wider audience. [0] : http://markmail.org/message/4sv4hapzxkeqbk3d
          Hide
          kasha Karthik Kambatla added a comment -

          Thanks for the ping, Chris Nauroth.

          Nick Dimiduk - there are no active plans for 2.5.3. If HDFS committers think this issue is serious enough to warrant a point release, I don't mind creating the RC and putting it through a vote.

          Show
          kasha Karthik Kambatla added a comment - Thanks for the ping, Chris Nauroth . Nick Dimiduk - there are no active plans for 2.5.3. If HDFS committers think this issue is serious enough to warrant a point release, I don't mind creating the RC and putting it through a vote.
          Hide
          cnauroth Chris Nauroth added a comment -

          Hi Nick Dimiduk. I'm not aware of any plans for a 2.5.3 patch release. To do so, we'd need someone to volunteer as release manager and conduct a vote on a release candidate. Karthik Kambatla, I'm notifying you just FYI, since you had been release manager previously on the 2.5.x release line.

          Show
          cnauroth Chris Nauroth added a comment - Hi Nick Dimiduk . I'm not aware of any plans for a 2.5.3 patch release. To do so, we'd need someone to volunteer as release manager and conduct a vote on a release candidate. Karthik Kambatla , I'm notifying you just FYI, since you had been release manager previously on the 2.5.x release line.
          Hide
          ndimiduk Nick Dimiduk added a comment -

          Any chance of bringing this to a 2.5.x patch release? Over on HBASE-13339 we're trying to work out how best to support users with minimal impact on dependencies for our next minor release (1.1). Bumping Hadoop minor versions (I think) will break our semantic versioning compatibility guidelines.

          FYI Elliott Clark, Sean Busbey, Chris Nauroth

          Show
          ndimiduk Nick Dimiduk added a comment - Any chance of bringing this to a 2.5.x patch release? Over on HBASE-13339 we're trying to work out how best to support users with minimal impact on dependencies for our next minor release (1.1). Bumping Hadoop minor versions (I think) will break our semantic versioning compatibility guidelines. FYI Elliott Clark , Sean Busbey , Chris Nauroth
          Hide
          cmccabe Colin P. McCabe added a comment -

          zhangshilong, it appears that the DataNode is setting both a write and a read timeout on its sockets, but the DFSClient is only setting a read timeout. If you want to file another JIRA to add a write timeout to DFSClient sockets, that might be a good idea.

          Show
          cmccabe Colin P. McCabe added a comment - zhangshilong , it appears that the DataNode is setting both a write and a read timeout on its sockets, but the DFSClient is only setting a read timeout. If you want to file another JIRA to add a write timeout to DFSClient sockets, that might be a good idea.
          Hide
          zsl2007 zhangshilong added a comment -

          there is no write timeout. why not add writeTimeout?

          Show
          zsl2007 zhangshilong added a comment - there is no write timeout. why not add writeTimeout?
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #1866 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1866/)
          HDFS-7005. DFS input streams do not timeout. Contributed by Daryn Sharp. (kihwal: rev 6a84f88c1190a8fecadd81deb6e7b8a69675fa91)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1866 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1866/ ) HDFS-7005 . DFS input streams do not timeout. Contributed by Daryn Sharp. (kihwal: rev 6a84f88c1190a8fecadd81deb6e7b8a69675fa91) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1891 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1891/)
          HDFS-7005. DFS input streams do not timeout. Contributed by Daryn Sharp. (kihwal: rev 6a84f88c1190a8fecadd81deb6e7b8a69675fa91)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1891 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1891/ ) HDFS-7005 . DFS input streams do not timeout. Contributed by Daryn Sharp. (kihwal: rev 6a84f88c1190a8fecadd81deb6e7b8a69675fa91) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #675 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/675/)
          HDFS-7005. DFS input streams do not timeout. Contributed by Daryn Sharp. (kihwal: rev 6a84f88c1190a8fecadd81deb6e7b8a69675fa91)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #675 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/675/ ) HDFS-7005 . DFS input streams do not timeout. Contributed by Daryn Sharp. (kihwal: rev 6a84f88c1190a8fecadd81deb6e7b8a69675fa91) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          Hide
          kihwal Kihwal Lee added a comment -

          Committed to trunk and cherry-picked to branch-2.

          Show
          kihwal Kihwal Lee added a comment - Committed to trunk and cherry-picked to branch-2.
          Hide
          cmccabe Colin P. McCabe added a comment -

          +1. Thanks, Daryn

          Show
          cmccabe Colin P. McCabe added a comment - +1. Thanks, Daryn
          Hide
          daryn Daryn Sharp added a comment -

          Test failure is unrelated, it's been failing for many other builds.

          Show
          daryn Daryn Sharp added a comment - Test failure is unrelated, it's been failing for many other builds.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12667172/HDFS-7005.patch
          against trunk revision 0974f43.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7948//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7948//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667172/HDFS-7005.patch against trunk revision 0974f43. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7948//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7948//console This message is automatically generated.
          Hide
          stack stack added a comment -

          +1

          Show
          stack stack added a comment - +1
          Hide
          daryn Daryn Sharp added a comment -

          Simply set read timeout on the peer

          Show
          daryn Daryn Sharp added a comment - Simply set read timeout on the peer

            People

            • Assignee:
              daryn Daryn Sharp
              Reporter:
              daryn Daryn Sharp
            • Votes:
              0 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development