Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7608

hdfs dfsclient newConnectedPeer has no write timeout

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0, 2.6.0
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: fuse-dfs, hdfs-client
    • Labels:
      None
    • Environment:

      hdfs 2.3.0 hbase 0.98.6

    • Target Version/s:

      Description

      problem:
      hbase compactSplitThread may lock forever on read datanode blocks.

      debug found: epollwait timeout set to 0,so epollwait can not run out.

      cause: in hdfs 2.3.0
      hbase using DFSClient to read and write blocks.
      DFSClient creates one socket using newConnectedPeer(addr), but has no read or write timeout.

      in v 2.6.0, newConnectedPeer has added readTimeout to deal with the problem,but did not add writeTimeout. why did not add write Timeout?

      I think NioInetPeer need a default socket timeout,so appalications will no need to force adding timeout by themselives.

      1. HDFS-7608.0.patch
        4 kB
        Xiaoyu Yao
      2. HDFS-7608.1.patch
        6 kB
        Xiaoyu Yao
      3. HDFS-7608.2.patch
        4 kB
        Esteban Gutierrez

        Issue Links

          Activity

          Hide
          busbey Sean Busbey added a comment -

          can we get this backported to 2.6 and 2.7 please?

          Show
          busbey Sean Busbey added a comment - can we get this backported to 2.6 and 2.7 please?
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2203 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2203/)
          HDFS-7608: hdfs dfsclient newConnectedPeer has no write timeout (Xiaoyu Yao via Colin P. McCabe) (cmccabe: rev 1d74ccececaefffaa90c0c18b40a3645dbc819d9)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
            HDFS-7608: add CHANGES.txt (cmccabe: rev b7fb6ec4513de7d342c541eb3d9e14642286e2cf)
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2203 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2203/ ) HDFS-7608 : hdfs dfsclient newConnectedPeer has no write timeout (Xiaoyu Yao via Colin P. McCabe) (cmccabe: rev 1d74ccececaefffaa90c0c18b40a3645dbc819d9) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java HDFS-7608 : add CHANGES.txt (cmccabe: rev b7fb6ec4513de7d342c541eb3d9e14642286e2cf) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #255 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/255/)
          HDFS-7608: hdfs dfsclient newConnectedPeer has no write timeout (Xiaoyu Yao via Colin P. McCabe) (cmccabe: rev 1d74ccececaefffaa90c0c18b40a3645dbc819d9)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
            HDFS-7608: add CHANGES.txt (cmccabe: rev b7fb6ec4513de7d342c541eb3d9e14642286e2cf)
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #255 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/255/ ) HDFS-7608 : hdfs dfsclient newConnectedPeer has no write timeout (Xiaoyu Yao via Colin P. McCabe) (cmccabe: rev 1d74ccececaefffaa90c0c18b40a3645dbc819d9) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java HDFS-7608 : add CHANGES.txt (cmccabe: rev b7fb6ec4513de7d342c541eb3d9e14642286e2cf) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #245 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/245/)
          HDFS-7608: hdfs dfsclient newConnectedPeer has no write timeout (Xiaoyu Yao via Colin P. McCabe) (cmccabe: rev 1d74ccececaefffaa90c0c18b40a3645dbc819d9)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
            HDFS-7608: add CHANGES.txt (cmccabe: rev b7fb6ec4513de7d342c541eb3d9e14642286e2cf)
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #245 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/245/ ) HDFS-7608 : hdfs dfsclient newConnectedPeer has no write timeout (Xiaoyu Yao via Colin P. McCabe) (cmccabe: rev 1d74ccececaefffaa90c0c18b40a3645dbc819d9) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java HDFS-7608 : add CHANGES.txt (cmccabe: rev b7fb6ec4513de7d342c541eb3d9e14642286e2cf) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2184 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2184/)
          HDFS-7608: hdfs dfsclient newConnectedPeer has no write timeout (Xiaoyu Yao via Colin P. McCabe) (cmccabe: rev 1d74ccececaefffaa90c0c18b40a3645dbc819d9)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
            HDFS-7608: add CHANGES.txt (cmccabe: rev b7fb6ec4513de7d342c541eb3d9e14642286e2cf)
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2184 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2184/ ) HDFS-7608 : hdfs dfsclient newConnectedPeer has no write timeout (Xiaoyu Yao via Colin P. McCabe) (cmccabe: rev 1d74ccececaefffaa90c0c18b40a3645dbc819d9) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java HDFS-7608 : add CHANGES.txt (cmccabe: rev b7fb6ec4513de7d342c541eb3d9e14642286e2cf) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #987 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/987/)
          HDFS-7608: hdfs dfsclient newConnectedPeer has no write timeout (Xiaoyu Yao via Colin P. McCabe) (cmccabe: rev 1d74ccececaefffaa90c0c18b40a3645dbc819d9)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
            HDFS-7608: add CHANGES.txt (cmccabe: rev b7fb6ec4513de7d342c541eb3d9e14642286e2cf)
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #987 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/987/ ) HDFS-7608 : hdfs dfsclient newConnectedPeer has no write timeout (Xiaoyu Yao via Colin P. McCabe) (cmccabe: rev 1d74ccececaefffaa90c0c18b40a3645dbc819d9) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java HDFS-7608 : add CHANGES.txt (cmccabe: rev b7fb6ec4513de7d342c541eb3d9e14642286e2cf) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #257 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/257/)
          HDFS-7608: hdfs dfsclient newConnectedPeer has no write timeout (Xiaoyu Yao via Colin P. McCabe) (cmccabe: rev 1d74ccececaefffaa90c0c18b40a3645dbc819d9)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
            HDFS-7608: add CHANGES.txt (cmccabe: rev b7fb6ec4513de7d342c541eb3d9e14642286e2cf)
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #257 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/257/ ) HDFS-7608 : hdfs dfsclient newConnectedPeer has no write timeout (Xiaoyu Yao via Colin P. McCabe) (cmccabe: rev 1d74ccececaefffaa90c0c18b40a3645dbc819d9) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java HDFS-7608 : add CHANGES.txt (cmccabe: rev b7fb6ec4513de7d342c541eb3d9e14642286e2cf) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8162 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8162/)
          HDFS-7608: hdfs dfsclient newConnectedPeer has no write timeout (Xiaoyu Yao via Colin P. McCabe) (cmccabe: rev 1d74ccececaefffaa90c0c18b40a3645dbc819d9)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
            HDFS-7608: add CHANGES.txt (cmccabe: rev b7fb6ec4513de7d342c541eb3d9e14642286e2cf)
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8162 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8162/ ) HDFS-7608 : hdfs dfsclient newConnectedPeer has no write timeout (Xiaoyu Yao via Colin P. McCabe) (cmccabe: rev 1d74ccececaefffaa90c0c18b40a3645dbc819d9) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java HDFS-7608 : add CHANGES.txt (cmccabe: rev b7fb6ec4513de7d342c541eb3d9e14642286e2cf) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          cmccabe Colin P. McCabe added a comment -

          committed to 2.8, thanks all

          Show
          cmccabe Colin P. McCabe added a comment - committed to 2.8, thanks all
          Hide
          cnauroth Chris Nauroth added a comment -

          Ah, of course! This is not "block write timeout". This is "socket write timeout during block read operations". Thanks for the detailed follow-up.

          +1 for the patch. Thank you, Xiaoyu, Esteban and Colin.

          Show
          cnauroth Chris Nauroth added a comment - Ah, of course! This is not "block write timeout". This is "socket write timeout during block read operations". Thanks for the detailed follow-up. +1 for the patch. Thank you, Xiaoyu, Esteban and Colin.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Esteban Gutierrez, Chris Nauroth: I took another look at the patch and I think that it's correct as-is.

          The confusing part about all of this is that the DFSInputStream uses the Peer classes to encapsulate sockets, whereas the DFSOutputStream simply uses "raw" socket objects. The word "Peer" does not occur in DFSOutputStream.java or DataStreamer.java. It only is used in the input stream code. Therefore, since this patch only changes newConnectedPeer, it only affects DFSInputStream.

          In contrast, the issue Chris brought up is actually only a DFSOutputStream issue. Chris commented:

          Right now, write timeout is enforced not as a socket option, but instead enforced per operation by passing the timeout to SocketOutputStream, which uses it in the underlying NIO selector calls. The exact write timeout value is not purely based on configuration. It's also a function of the number of nodes in the write pipeline. The details are implemented in DFSClient#getDatanodeWriteTimeout

          But DFSClient#getDatanodeWriteTimeout is less generic than it may seem. It actually is only used by the DFSOutputStream (write path) code to calculate how long to set the timeout based on the number of nodes in the pipeline. It is never called by the input stream code at all. So I believe that this concern is not valid.

          Chris asked:

          I see the original problem reported in HDFS-7005 was related to lack of read timeout. I'm wondering if there is actually no further change required for write timeout, given the above explanation. Is anyone seeing an actual problem related to lack of write timeout?

          Good question. Here is the scenario where we can run into problems in the DFSInputStream code:

          DFSInputStream#getBlockReader uses BlockReaderFactory#build to create a new RemoteBlockReader2. Ultimately BlockReaderFactory#nextTcpPeer calls into DFSClient#newConnectedPeer. As you can see from this patch, this function sets a read timeout, but no write timeout.

          I should also make an additional comment here: the Peer class we will be creating is NioInetPeer. You can see that NioInetPeer.java contains a org.apache.hadoop.net.SocketOutputStream inside it. This is the same class which DFSOutputStream is using to "simulate" socket write timeouts by using non-blocking I/O and selectors. But whereas DFSOutputStream uses org.apache.hadoop.net.SocketOutputStream directly, DFSInputStream uses it "under the covers" by using the Peer class. So the timeout mechanism is actually the same (Chris, I think this answers another question you raised.)

          I always found it very confusing that org.apache.hadoop.net.SocketOutputStream is named the same as java.net.SocketOutputStream. Just for anyone new to Hadoop reading this-- they are very different classes!

          So RemoteBlockReader2#newBlockReader has this code:

           
             // in and out will be closed when sock is closed (by the caller)
              final DataOutputStream out = new DataOutputStream(new BufferedOutputStream(
                    peer.getOutputStream()));
              new Sender(out).readBlock(block, blockToken, clientName, startOffset, len,
                  verifyChecksum, cachingStrategy);
          
              //
              // Get bytes in block
              //
              DataInputStream in = new DataInputStream(peer.getInputStream());
          
              BlockOpResponseProto status = BlockOpResponseProto.parseFrom(
                  PBHelper.vintPrefixed(in));
              checkSuccess(status, peer, block, file);
              ReadOpChecksumInfoProto checksumInfo =
                status.getReadOpChecksumInfo();
          

          You can see that there is a potential for things to get stuck here since we have not set a write timeout. That's the case this patch fixes. We're seeing this in production now so it's definitely more than a theoretical problem.

          I am +1 on the patch. Chris Nauroth, can you take another look and verify that this answers your questions? We will hold off on committing until that point.

          Show
          cmccabe Colin P. McCabe added a comment - Esteban Gutierrez , Chris Nauroth : I took another look at the patch and I think that it's correct as-is. The confusing part about all of this is that the DFSInputStream uses the Peer classes to encapsulate sockets, whereas the DFSOutputStream simply uses "raw" socket objects. The word "Peer" does not occur in DFSOutputStream.java or DataStreamer.java . It only is used in the input stream code. Therefore, since this patch only changes newConnectedPeer , it only affects DFSInputStream . In contrast, the issue Chris brought up is actually only a DFSOutputStream issue. Chris commented: Right now, write timeout is enforced not as a socket option, but instead enforced per operation by passing the timeout to SocketOutputStream, which uses it in the underlying NIO selector calls. The exact write timeout value is not purely based on configuration. It's also a function of the number of nodes in the write pipeline. The details are implemented in DFSClient#getDatanodeWriteTimeout But DFSClient#getDatanodeWriteTimeout is less generic than it may seem. It actually is only used by the DFSOutputStream (write path) code to calculate how long to set the timeout based on the number of nodes in the pipeline. It is never called by the input stream code at all. So I believe that this concern is not valid. Chris asked: I see the original problem reported in HDFS-7005 was related to lack of read timeout. I'm wondering if there is actually no further change required for write timeout, given the above explanation. Is anyone seeing an actual problem related to lack of write timeout? Good question. Here is the scenario where we can run into problems in the DFSInputStream code: DFSInputStream#getBlockReader uses BlockReaderFactory#build to create a new RemoteBlockReader2 . Ultimately BlockReaderFactory#nextTcpPeer calls into DFSClient#newConnectedPeer . As you can see from this patch, this function sets a read timeout, but no write timeout. I should also make an additional comment here: the Peer class we will be creating is NioInetPeer . You can see that NioInetPeer.java contains a org.apache.hadoop.net.SocketOutputStream inside it. This is the same class which DFSOutputStream is using to "simulate" socket write timeouts by using non-blocking I/O and selectors. But whereas DFSOutputStream uses org.apache.hadoop.net.SocketOutputStream directly, DFSInputStream uses it "under the covers" by using the Peer class. So the timeout mechanism is actually the same (Chris, I think this answers another question you raised.) I always found it very confusing that org.apache.hadoop.net.SocketOutputStream is named the same as java.net.SocketOutputStream . Just for anyone new to Hadoop reading this-- they are very different classes! So RemoteBlockReader2#newBlockReader has this code: // in and out will be closed when sock is closed (by the caller) final DataOutputStream out = new DataOutputStream( new BufferedOutputStream( peer.getOutputStream())); new Sender(out).readBlock(block, blockToken, clientName, startOffset, len, verifyChecksum, cachingStrategy); // // Get bytes in block // DataInputStream in = new DataInputStream(peer.getInputStream()); BlockOpResponseProto status = BlockOpResponseProto.parseFrom( PBHelper.vintPrefixed(in)); checkSuccess(status, peer, block, file); ReadOpChecksumInfoProto checksumInfo = status.getReadOpChecksumInfo(); You can see that there is a potential for things to get stuck here since we have not set a write timeout. That's the case this patch fixes. We're seeing this in production now so it's definitely more than a theoretical problem. I am +1 on the patch. Chris Nauroth , can you take another look and verify that this answers your questions? We will hold off on committing until that point.
          Hide
          esteban Esteban Gutierrez added a comment -

          Chris Nauroth I tried the approach using DFSClient.getDatanodeWriteTimeout() and doesn't seem there is a clear way for users to use the timeout, e.g. if you pass a timeout of 1000ms then your final timeout will be 11000ms and that won't be straightforward. I think if you pass the socketTimeout is much more straightforward. More suggestions to address this are welcome Chris Nauroth.

          Show
          esteban Esteban Gutierrez added a comment - Chris Nauroth I tried the approach using DFSClient.getDatanodeWriteTimeout() and doesn't seem there is a clear way for users to use the timeout, e.g. if you pass a timeout of 1000ms then your final timeout will be 11000ms and that won't be straightforward. I think if you pass the socketTimeout is much more straightforward. More suggestions to address this are welcome Chris Nauroth .
          Hide
          esteban Esteban Gutierrez added a comment -

          Thanks Chris Nauroth. I think just changing newConnectedPeer() to use the one provided by DFSClient.getDatanodeWriteTimeout() is good enough. In the DataStreamer we already use it and but we have set the number of nodes to 2. That should be fine and that gives at least the flexibility to tune it down if required. Also, I see that if we don't set a write timeout we can run into the issue that was mentioned in this JIRA and after adding the timeout in the peer I no longer experience this issue. I've noticed other issues like in Client() where we set up the connection and then the timeout but that can be addressed in another JIRA.

          Show
          esteban Esteban Gutierrez added a comment - Thanks Chris Nauroth . I think just changing newConnectedPeer() to use the one provided by DFSClient.getDatanodeWriteTimeout() is good enough. In the DataStreamer we already use it and but we have set the number of nodes to 2. That should be fine and that gives at least the flexibility to tune it down if required. Also, I see that if we don't set a write timeout we can run into the issue that was mentioned in this JIRA and after adding the timeout in the peer I no longer experience this issue. I've noticed other issues like in Client() where we set up the connection and then the timeout but that can be addressed in another JIRA.
          Hide
          xyao Xiaoyu Yao added a comment -

          I agree with Chris on this one. We need to address the comments Colin P. McCabe and Chris Nauroth above before commit it. I also remove the Bugbash label to avoid confusion.

          Show
          xyao Xiaoyu Yao added a comment - I agree with Chris on this one. We need to address the comments Colin P. McCabe and Chris Nauroth above before commit it. I also remove the Bugbash label to avoid confusion.
          Hide
          cnauroth Chris Nauroth added a comment -

          Esteban Gutierrez, thank you for rebasing, but this patch cannot be committed. As I described here, it would potentially change the existing behavior much more than intended.

          https://issues.apache.org/jira/browse/HDFS-7608?focusedCommentId=14314868&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14314868

          I'm clicking Cancel Patch to make it clearer that this patch isn't ready.

          Show
          cnauroth Chris Nauroth added a comment - Esteban Gutierrez , thank you for rebasing, but this patch cannot be committed. As I described here, it would potentially change the existing behavior much more than intended. https://issues.apache.org/jira/browse/HDFS-7608?focusedCommentId=14314868&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14314868 I'm clicking Cancel Patch to make it clearer that this patch isn't ready.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 14m 33s Pre-patch trunk compilation is healthy.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 30s There were no new javac warning messages.
          +1 javadoc 9m 38s There were no new javadoc warning messages.
          +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
          -1 checkstyle 2m 16s The applied patch generated 1 new checkstyle issues (total was 118, now 118).
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 38s mvn install still works.
          +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse.
          +1 findbugs 3m 4s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
          +1 native 3m 14s Pre-build of native portion
          -1 hdfs tests 165m 5s Tests failed in hadoop-hdfs.
              207m 56s  



          Reason Tests
          Failed unit tests hadoop.hdfs.tools.TestHdfsConfigFields
            hadoop.hdfs.TestDistributedFileSystem
            hadoop.tracing.TestTraceAdmin
            hadoop.hdfs.server.namenode.TestFileTruncate



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12732086/HDFS-7608.2.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / d6f6741
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/10923/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/10923/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/10923/testReport/
          Java 1.7.0_55
          uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10923/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 33s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 30s There were no new javac warning messages. +1 javadoc 9m 38s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. -1 checkstyle 2m 16s The applied patch generated 1 new checkstyle issues (total was 118, now 118). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 38s mvn install still works. +1 eclipse:eclipse 0m 33s The patch built with eclipse:eclipse. +1 findbugs 3m 4s The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 native 3m 14s Pre-build of native portion -1 hdfs tests 165m 5s Tests failed in hadoop-hdfs.     207m 56s   Reason Tests Failed unit tests hadoop.hdfs.tools.TestHdfsConfigFields   hadoop.hdfs.TestDistributedFileSystem   hadoop.tracing.TestTraceAdmin   hadoop.hdfs.server.namenode.TestFileTruncate Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12732086/HDFS-7608.2.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / d6f6741 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/10923/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/10923/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/10923/testReport/ Java 1.7.0_55 uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10923/console This message was automatically generated.
          Hide
          esteban Esteban Gutierrez added a comment -

          attached new patch rebased to trunk. cc:Xiaoyu Yao

          Show
          esteban Esteban Gutierrez added a comment - attached new patch rebased to trunk. cc: Xiaoyu Yao
          Hide
          esteban Esteban Gutierrez added a comment -

          +1 to revamp the timeout naming on DFSland, in the mean time I think this can be committed to trunk first and have another jira to deprecate the timeout properties.

          Show
          esteban Esteban Gutierrez added a comment - +1 to revamp the timeout naming on DFSland, in the mean time I think this can be committed to trunk first and have another jira to deprecate the timeout properties.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          -1 patch 0m 0s The patch command could not apply the patch during dryrun.



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12697602/HDFS-7608.1.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 08f0ae4
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10888/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment -1 patch 0m 0s The patch command could not apply the patch during dryrun. Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12697602/HDFS-7608.1.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 08f0ae4 Console output https://builds.apache.org/job/PreCommit-HDFS-Build/10888/console This message was automatically generated.
          Hide
          cnauroth Chris Nauroth added a comment -

          Colin P. McCabe, thanks for double-checking me on this. I'm also now starting to wonder if HDFS-7005 had unintended side effects. By setting read timeout as a socket option in DFSClient#newConnectedPeer, the setting also would have applied for DFSOutputStream, and thus circumvented the extension time that DFSClient#getDatanodeReadTimeout wants to apply.

          For example, in RemoteBlockReader2#newBlockReader, we are writing stuff to the socket, all before ever calling DFSClient#getDataNodeWriteTimeout.

          Yes, you're right. I suppose a complete implementation, with retention of the "timeout extension" behavior, is going to require pushing the NetUtils socket wrapping calls up to those earlier layers, or setting the socket option at a point where we know the number of nodes in the pipeline, and therefore can calculate the extension. I expect either of those are going to be much more invasive changes than the posted patch.

          I'm not even sure most HDFS developers could answer which one(s) this key does, if quizzed.

          I sure couldn't without reading the code fresh again today.

          +1 for your proposal for new, clearer configuration properties.

          Show
          cnauroth Chris Nauroth added a comment - Colin P. McCabe , thanks for double-checking me on this. I'm also now starting to wonder if HDFS-7005 had unintended side effects. By setting read timeout as a socket option in DFSClient#newConnectedPeer , the setting also would have applied for DFSOutputStream , and thus circumvented the extension time that DFSClient#getDatanodeReadTimeout wants to apply. For example, in RemoteBlockReader2#newBlockReader, we are writing stuff to the socket, all before ever calling DFSClient#getDataNodeWriteTimeout. Yes, you're right. I suppose a complete implementation, with retention of the "timeout extension" behavior, is going to require pushing the NetUtils socket wrapping calls up to those earlier layers, or setting the socket option at a point where we know the number of nodes in the pipeline, and therefore can calculate the extension. I expect either of those are going to be much more invasive changes than the posted patch. I'm not even sure most HDFS developers could answer which one(s) this key does, if quizzed. I sure couldn't without reading the code fresh again today. +1 for your proposal for new, clearer configuration properties.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Chris, I think you're absolutely right. I vaguely remembered that there was an alternate method of setting write timeouts we used in places, but I was unable to find it in a few minutes of digging. The fact that it's passed as a parameter to NetUtils#getOutputStream explains why looking for setWriteTimeout and similar didn't turn up anything.

          However. I still think this is broken, because we will do some writes to the socket prior to calling DFSClient#getDataNodeWriteTimeout. For example, in RemoteBlockReader2#newBlockReader, we are writing stuff to the socket, all before ever calling DFSClient#getDataNodeWriteTimeout.

          On a semi-related note, I think that the current configuration situation is highly confusing and unsatisfactory. We have a configuration key called simply dfs.client.socket-timeout, which doesn't specify whether it applies to reads or writes. I'm not even sure most HDFS developers could answer which one(s) this key does, if quizzed. Meanwhile, the units are unspecified (is it seconds? ms?) and the default value doesn't appear in DFSConfigKeys.java, unlike almost every other configuration key.

          How about having dfs.client.datanode.socket.read.timeout.ms as an alias for dfs.client.socket-timeout,
          dfs.client.datanode.socket.write.timeout.ms for a base write timeout, and dfs.client.datanode.socket.write.timeout.extra.per.pipeline.node.ms to be an extra amount that we add for each DN in the pipeline?

          Show
          cmccabe Colin P. McCabe added a comment - Chris, I think you're absolutely right. I vaguely remembered that there was an alternate method of setting write timeouts we used in places, but I was unable to find it in a few minutes of digging. The fact that it's passed as a parameter to NetUtils#getOutputStream explains why looking for setWriteTimeout and similar didn't turn up anything. However. I still think this is broken, because we will do some writes to the socket prior to calling DFSClient#getDataNodeWriteTimeout . For example, in RemoteBlockReader2#newBlockReader , we are writing stuff to the socket, all before ever calling DFSClient#getDataNodeWriteTimeout . On a semi-related note, I think that the current configuration situation is highly confusing and unsatisfactory. We have a configuration key called simply dfs.client.socket-timeout , which doesn't specify whether it applies to reads or writes. I'm not even sure most HDFS developers could answer which one(s) this key does, if quizzed. Meanwhile, the units are unspecified (is it seconds? ms?) and the default value doesn't appear in DFSConfigKeys.java , unlike almost every other configuration key. How about having dfs.client.datanode.socket.read.timeout.ms as an alias for dfs.client.socket-timeout , dfs.client.datanode.socket.write.timeout.ms for a base write timeout, and dfs.client.datanode.socket.write.timeout.extra.per.pipeline.node.ms to be an extra amount that we add for each DN in the pipeline?
          Hide
          cnauroth Chris Nauroth added a comment -

          Please let me know if I'm missing something, but it appears this patch would significantly alter the pre-existing write timeout behavior of the HDFS client.

          Right now, write timeout is enforced not as a socket option, but instead enforced per operation by passing the timeout to SocketOutputStream, which uses it in the underlying NIO selector calls. The exact write timeout value is not purely based on configuration. It's also a function of the number of nodes in the write pipeline. The details are implemented in DFSClient#getDatanodeWriteTimeout. Under default configuration, this method would extend the configured timeout of 60 seconds to 75 seconds (additional 5 seconds per replica in the pipeline). Extending the timeout proportional to the pipeline size is meant to make the client robust against the cumulative latency effects of every write in the pipeline.

          This patch would set a 60 second write timeout (under default configuration) directly as a socket option. I believe that effectively negates the extension time of up to 75 seconds that DFSClient#getDatanodeWriteTimeout was trying to allow.

          I see the original problem reported in HDFS-7005 was related to lack of read timeout. I'm wondering if there is actually no further change required for write timeout, given the above explanation. Is anyone seeing an actual problem related to lack of write timeout?

          Show
          cnauroth Chris Nauroth added a comment - Please let me know if I'm missing something, but it appears this patch would significantly alter the pre-existing write timeout behavior of the HDFS client. Right now, write timeout is enforced not as a socket option, but instead enforced per operation by passing the timeout to SocketOutputStream , which uses it in the underlying NIO selector calls. The exact write timeout value is not purely based on configuration. It's also a function of the number of nodes in the write pipeline. The details are implemented in DFSClient#getDatanodeWriteTimeout . Under default configuration, this method would extend the configured timeout of 60 seconds to 75 seconds (additional 5 seconds per replica in the pipeline). Extending the timeout proportional to the pipeline size is meant to make the client robust against the cumulative latency effects of every write in the pipeline. This patch would set a 60 second write timeout (under default configuration) directly as a socket option. I believe that effectively negates the extension time of up to 75 seconds that DFSClient#getDatanodeWriteTimeout was trying to allow. I see the original problem reported in HDFS-7005 was related to lack of read timeout. I'm wondering if there is actually no further change required for write timeout, given the above explanation. Is anyone seeing an actual problem related to lack of write timeout?
          Hide
          xyao Xiaoyu Yao added a comment -

          Thanks Colin P. McCabe for the review! I've updated patch based on the feedback.

          Show
          xyao Xiaoyu Yao added a comment - Thanks Colin P. McCabe for the review! I've updated patch based on the feedback.
          Hide
          cmccabe Colin P. McCabe added a comment -
                socketTimeout = conf.getInt(DFS_CLIENT_SOCKET_TIMEOUT_KEY,
                    HdfsServerConstants.READ_TIMEOUT);
          

          Can we change this to use DFSConfigKeys#DFS_CLIENT_SOCKET_TIMEOUT_DEFAULT like the other configuration keys do, rather than HdfsServerConstants#READ_TIMEOUT?

          Looks good other than that.

          Show
          cmccabe Colin P. McCabe added a comment - socketTimeout = conf.getInt(DFS_CLIENT_SOCKET_TIMEOUT_KEY, HdfsServerConstants.READ_TIMEOUT); Can we change this to use DFSConfigKeys#DFS_CLIENT_SOCKET_TIMEOUT_DEFAULT like the other configuration keys do, rather than HdfsServerConstants#READ_TIMEOUT ? Looks good other than that.
          Hide
          xyao Xiaoyu Yao added a comment -

          Post a patch for DFSClient newConnectedPeer write timeout.

          Show
          xyao Xiaoyu Yao added a comment - Post a patch for DFSClient newConnectedPeer write timeout.
          Hide
          cnauroth Chris Nauroth added a comment -

          I updated the title to make it clear that write timeout is still missing. HDFS-7005 already added read timeout.

          Show
          cnauroth Chris Nauroth added a comment - I updated the title to make it clear that write timeout is still missing. HDFS-7005 already added read timeout.
          Hide
          zsl2007 zhangshilong added a comment -

          there is no write timeout. why not add writeTimeout?

          Show
          zsl2007 zhangshilong added a comment - there is no write timeout. why not add writeTimeout?
          Hide
          kihwal Kihwal Lee added a comment -

          Is it a dupe of HDFS-7005?

          Show
          kihwal Kihwal Lee added a comment - Is it a dupe of HDFS-7005 ?

            People

            • Assignee:
              xyao Xiaoyu Yao
              Reporter:
              zsl2007 zhangshilong
            • Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Due:
                Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 24h
                24h
                Remaining:
                Remaining Estimate - 24h
                24h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development