Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9178

Slow datanode I/O can cause a wrong node to be marked bad

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 2.7.2, 2.6.4, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      When non-leaf datanode in a pipeline is slow on or stuck at disk I/O, the downstream node can timeout on reading packet since even the heartbeat packets will not be relayed down.

      The packet read timeout is set in DataXceiver#run():

        peer.setReadTimeout(dnConf.socketTimeout);
      

      When the downstream node times out and closes the connection to the upstream, the upstream node's PacketResponder gets EOFException and it sends an ack upstream with the downstream node status set to ERROR. This caused the client to exclude the downstream node, even thought the upstream node was the one got stuck.

      The connection to downstream has longer timeout, so the downstream will always timeout first. The downstream timeout is set in writeBlock()

                int timeoutValue = dnConf.socketTimeout +
                    (HdfsConstants.READ_TIMEOUT_EXTENSION * targets.length);
                int writeTimeout = dnConf.socketWriteTimeout +
                    (HdfsConstants.WRITE_TIMEOUT_EXTENSION * targets.length);
                NetUtils.connect(mirrorSock, mirrorTarget, timeoutValue);
                OutputStream unbufMirrorOut = NetUtils.getOutputStream(mirrorSock,
                    writeTimeout);
      
      1. 002-HDFS-9178.branch-2.6.patch
        9 kB
        Junping Du
      2. HDFS-9178.branch-2.6.patch
        9 kB
        Kihwal Lee
      3. HDFS-9178.patch
        9 kB
        Kihwal Lee

        Issue Links

          Activity

          Hide
          kihwal Kihwal Lee added a comment -

          A simple solution is to let datanode check when it last sent a packet whenever downstream closes connection. If it has not sent a packet for a long time (e.g. 0.9*timeout. it is supposed to send a packet at least every 0.5*timeout), it or its upstream might be at fault. In this case, it will simply close connection to its upstream, so that the same check is triggered upstream. If an upstream node thinks it has sent packets in time, the downstream node will be reported as bad. When it goes all the way to client, the client will remove the first node and rebuild the pipeline. Since DataStreamer does not get stuck on disk I/O (except on rare occasion when it logs and the disk is having an issue), it would be either slow first node or communication problem between client and the first node. So removing first node seems reasonable.

          Show
          kihwal Kihwal Lee added a comment - A simple solution is to let datanode check when it last sent a packet whenever downstream closes connection. If it has not sent a packet for a long time (e.g. 0.9*timeout. it is supposed to send a packet at least every 0.5*timeout), it or its upstream might be at fault. In this case, it will simply close connection to its upstream, so that the same check is triggered upstream. If an upstream node thinks it has sent packets in time, the downstream node will be reported as bad. When it goes all the way to client, the client will remove the first node and rebuild the pipeline. Since DataStreamer does not get stuck on disk I/O (except on rare occasion when it logs and the disk is having an issue), it would be either slow first node or communication problem between client and the first node. So removing first node seems reasonable.
          Hide
          kihwal Kihwal Lee added a comment -

          The patch implements the proposed fix.

          Show
          kihwal Kihwal Lee added a comment - The patch implements the proposed fix.
          Hide
          hadoopqa Hadoop QA added a comment -



          -1 overall



          Vote Subsystem Runtime Comment
          0 pre-patch 17m 37s Pre-patch trunk compilation is healthy.
          +1 @author 0m 1s The patch does not contain any @author tags.
          +1 tests included 0m 0s The patch appears to include 1 new or modified test files.
          +1 javac 7m 59s There were no new javac warning messages.
          +1 javadoc 10m 10s There were no new javadoc warning messages.
          -1 release audit 0m 15s The applied patch generated 1 release audit warnings.
          -1 checkstyle 1m 24s The applied patch generated 1 new checkstyle issues (total was 61, now 61).
          +1 whitespace 0m 0s The patch has no lines that end in whitespace.
          +1 install 1m 33s mvn install still works.
          +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse.
          +1 findbugs 2m 31s The patch does not introduce any new Findbugs (version 3.0.0) warnings.
          +1 native 3m 11s Pre-build of native portion
          -1 hdfs tests 64m 32s Tests failed in hadoop-hdfs.
              109m 51s  



          Reason Tests
          Failed unit tests hadoop.hdfs.TestWriteReadStripedFile
          Timed out tests org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyWriter
            org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure010



          Subsystem Report/Notes
          Patch URL http://issues.apache.org/jira/secure/attachment/12764472/HDFS-9178.patch
          Optional Tests javadoc javac unit findbugs checkstyle
          git revision trunk / 6c17d31
          Release Audit https://builds.apache.org/job/PreCommit-HDFS-Build/12753/artifact/patchprocess/patchReleaseAuditProblems.txt
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/12753/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
          hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12753/artifact/patchprocess/testrun_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12753/testReport/
          Java 1.7.0_55
          uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12753/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 17m 37s Pre-patch trunk compilation is healthy. +1 @author 0m 1s The patch does not contain any @author tags. +1 tests included 0m 0s The patch appears to include 1 new or modified test files. +1 javac 7m 59s There were no new javac warning messages. +1 javadoc 10m 10s There were no new javadoc warning messages. -1 release audit 0m 15s The applied patch generated 1 release audit warnings. -1 checkstyle 1m 24s The applied patch generated 1 new checkstyle issues (total was 61, now 61). +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 33s mvn install still works. +1 eclipse:eclipse 0m 34s The patch built with eclipse:eclipse. +1 findbugs 2m 31s The patch does not introduce any new Findbugs (version 3.0.0) warnings. +1 native 3m 11s Pre-build of native portion -1 hdfs tests 64m 32s Tests failed in hadoop-hdfs.     109m 51s   Reason Tests Failed unit tests hadoop.hdfs.TestWriteReadStripedFile Timed out tests org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyWriter   org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure010 Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12764472/HDFS-9178.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / 6c17d31 Release Audit https://builds.apache.org/job/PreCommit-HDFS-Build/12753/artifact/patchprocess/patchReleaseAuditProblems.txt checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/12753/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt hadoop-hdfs test log https://builds.apache.org/job/PreCommit-HDFS-Build/12753/artifact/patchprocess/testrun_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/12753/testReport/ Java 1.7.0_55 uname Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HDFS-Build/12753/console This message was automatically generated.
          Hide
          kihwal Kihwal Lee added a comment -
          • release audit: caused by the EC branch merge
          • checkstyle: file length, which was already over the "limit".
          • test failures: mostly new EC related tests. They seem to pass when run locally, including TestLazyWriter.
          Show
          kihwal Kihwal Lee added a comment - release audit: caused by the EC branch merge checkstyle: file length, which was already over the "limit". test failures: mostly new EC related tests. They seem to pass when run locally, including TestLazyWriter .
          Hide
          daryn Daryn Sharp added a comment -

          +1 Seems to have helped fix our problems.

          Show
          daryn Daryn Sharp added a comment - +1 Seems to have helped fix our problems.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8587 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8587/)
          HDFS-9178. Slow datanode I/O can cause a wrong node to be marked bad. (kihwal: rev 99e5204ff5326430558b6f6fd9da7c44654c15d7)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8587 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8587/ ) HDFS-9178 . Slow datanode I/O can cause a wrong node to be marked bad. (kihwal: rev 99e5204ff5326430558b6f6fd9da7c44654c15d7) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          Hide
          kihwal Kihwal Lee added a comment -

          Thanks for the review, Daryn. I've committed this to trunk, branch-2 and branch-2.7.

          Show
          kihwal Kihwal Lee added a comment - Thanks for the review, Daryn. I've committed this to trunk, branch-2 and branch-2.7.
          Hide
          kihwal Kihwal Lee added a comment -

          Attaching a patch for branch-2.6, in case someone wants it.

          Show
          kihwal Kihwal Lee added a comment - Attaching a patch for branch-2.6, in case someone wants it.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #1231 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1231/)
          HDFS-9178. Slow datanode I/O can cause a wrong node to be marked bad. (kihwal: rev 99e5204ff5326430558b6f6fd9da7c44654c15d7)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #1231 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/1231/ ) HDFS-9178 . Slow datanode I/O can cause a wrong node to be marked bad. (kihwal: rev 99e5204ff5326430558b6f6fd9da7c44654c15d7) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #502 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/502/)
          HDFS-9178. Slow datanode I/O can cause a wrong node to be marked bad. (kihwal: rev 99e5204ff5326430558b6f6fd9da7c44654c15d7)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #502 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/502/ ) HDFS-9178 . Slow datanode I/O can cause a wrong node to be marked bad. (kihwal: rev 99e5204ff5326430558b6f6fd9da7c44654c15d7) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2437 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2437/)
          HDFS-9178. Slow datanode I/O can cause a wrong node to be marked bad. (kihwal: rev 99e5204ff5326430558b6f6fd9da7c44654c15d7)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2437 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2437/ ) HDFS-9178 . Slow datanode I/O can cause a wrong node to be marked bad. (kihwal: rev 99e5204ff5326430558b6f6fd9da7c44654c15d7) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #494 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/494/)
          HDFS-9178. Slow datanode I/O can cause a wrong node to be marked bad. (kihwal: rev 99e5204ff5326430558b6f6fd9da7c44654c15d7)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #494 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/494/ ) HDFS-9178 . Slow datanode I/O can cause a wrong node to be marked bad. (kihwal: rev 99e5204ff5326430558b6f6fd9da7c44654c15d7) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #467 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/467/)
          HDFS-9178. Slow datanode I/O can cause a wrong node to be marked bad. (kihwal: rev 99e5204ff5326430558b6f6fd9da7c44654c15d7)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #467 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/467/ ) HDFS-9178 . Slow datanode I/O can cause a wrong node to be marked bad. (kihwal: rev 99e5204ff5326430558b6f6fd9da7c44654c15d7) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2405 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2405/)
          HDFS-9178. Slow datanode I/O can cause a wrong node to be marked bad. (kihwal: rev 99e5204ff5326430558b6f6fd9da7c44654c15d7)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2405 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2405/ ) HDFS-9178 . Slow datanode I/O can cause a wrong node to be marked bad. (kihwal: rev 99e5204ff5326430558b6f6fd9da7c44654c15d7) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          Hide
          sjlee0 Sangjin Lee added a comment -

          Does this issue exist in 2.6.x? Should this be backported to branch-2.6?

          Show
          sjlee0 Sangjin Lee added a comment - Does this issue exist in 2.6.x? Should this be backported to branch-2.6?
          Hide
          djp Junping Du added a comment -

          Hi Kihwal Lee, I saw you already attached the patch for 2.6 branch. Shall we commit this patch in branch-2.6? Thanks!

          Show
          djp Junping Du added a comment - Hi Kihwal Lee , I saw you already attached the patch for 2.6 branch. Shall we commit this patch in branch-2.6? Thanks!
          Hide
          djp Junping Du added a comment -

          The patch for branch-2.6 is stale. Update 002 patch that sync with latest 2.6 branch. Kihwal Lee, would you help to review it? Thanks!

          Show
          djp Junping Du added a comment - The patch for branch-2.6 is stale. Update 002 patch that sync with latest 2.6 branch. Kihwal Lee , would you help to review it? Thanks!
          Hide
          kihwal Kihwal Lee added a comment -

          +1 It looks like the only difference is in DataNodeFaultInjector.

          Show
          kihwal Kihwal Lee added a comment - +1 It looks like the only difference is in DataNodeFaultInjector .
          Hide
          djp Junping Du added a comment -

          Thanks Kihwal Lee for review the patch! I have commit the patch to branch-2.6.

          Show
          djp Junping Du added a comment - Thanks Kihwal Lee for review the patch! I have commit the patch to branch-2.6.

            People

            • Assignee:
              kihwal Kihwal Lee
              Reporter:
              kihwal Kihwal Lee
            • Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development