Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-5806

balancer should set SoTimeout to avoid indefinite hangs

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0, 2.2.0
    • Fix Version/s: 0.23.11, 2.3.0
    • Component/s: balancer & mover
    • Labels:
      None
    • Target Version/s:

      Description

      Simple patch to avoid the balancer hanging when datanode stops responding to requests.

      1. HDFS-5806.patch
        0.8 kB
        Nathan Roberts
      2. HDFS-5806-0.23.patch
        0.8 kB
        Nathan Roberts

        Issue Links

          Activity

          Hide
          Nathan Roberts added a comment -

          use setSoTimeout() to avoid read hangs.

          Show
          Nathan Roberts added a comment - use setSoTimeout() to avoid read hangs.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12624203/HDFS-5806.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5928//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5928//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12624203/HDFS-5806.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5928//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5928//console This message is automatically generated.
          Hide
          Andrew Wang added a comment -

          LGTM, +1. Nathan, I assume you tested this manually?

          Show
          Andrew Wang added a comment - LGTM, +1. Nathan, I assume you tested this manually?
          Hide
          Nathan Roberts added a comment -

          Andrew, thanks for taking a look. Sorry about not mentioning the testing.

          Didn't have great ideas on how to test. Basically did the following

          • Changed balancer so that sotimeout was 1 second
          • Changed balancer so that sleeptime between iterations was 2 seconds
          • Changed dispatch() within balancer to randomly not send the request - this causes the response read to timeout due to sotimeout
          • Made sure TestBalancer still worked
          Show
          Nathan Roberts added a comment - Andrew, thanks for taking a look. Sorry about not mentioning the testing. Didn't have great ideas on how to test. Basically did the following Changed balancer so that sotimeout was 1 second Changed balancer so that sleeptime between iterations was 2 seconds Changed dispatch() within balancer to randomly not send the request - this causes the response read to timeout due to sotimeout Made sure TestBalancer still worked
          Hide
          Andrew Wang added a comment -

          Sounds good to me, thanks Nathan. I'll commit shortly.

          Show
          Andrew Wang added a comment - Sounds good to me, thanks Nathan. I'll commit shortly.
          Hide
          Andrew Wang added a comment -

          Committed to trunk and branch-2, thanks again Nathan.

          Show
          Andrew Wang added a comment - Committed to trunk and branch-2, thanks again Nathan.
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #5035 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5035/)
          HDFS-5806. Balancer should set soTimeout to avoid indefinite hangs. Contributed by Nathan Roberts. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1560548)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #5035 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5035/ ) HDFS-5806 . Balancer should set soTimeout to avoid indefinite hangs. Contributed by Nathan Roberts. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1560548 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #461 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/461/)
          HDFS-5806. Balancer should set soTimeout to avoid indefinite hangs. Contributed by Nathan Roberts. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1560548)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #461 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/461/ ) HDFS-5806 . Balancer should set soTimeout to avoid indefinite hangs. Contributed by Nathan Roberts. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1560548 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1678 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1678/)
          HDFS-5806. Balancer should set soTimeout to avoid indefinite hangs. Contributed by Nathan Roberts. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1560548)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1678 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1678/ ) HDFS-5806 . Balancer should set soTimeout to avoid indefinite hangs. Contributed by Nathan Roberts. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1560548 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #1653 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1653/)
          HDFS-5806. Balancer should set soTimeout to avoid indefinite hangs. Contributed by Nathan Roberts. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1560548)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1653 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1653/ ) HDFS-5806 . Balancer should set soTimeout to avoid indefinite hangs. Contributed by Nathan Roberts. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1560548 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Hide
          Nathan Roberts added a comment -

          0.23 version of patch

          Show
          Nathan Roberts added a comment - 0.23 version of patch
          Hide
          Jason Lowe added a comment -

          +1 for branch-0.23 patch, committing this.

          Show
          Jason Lowe added a comment - +1 for branch-0.23 patch, committing this.
          Hide
          Jason Lowe added a comment -

          Thanks, Nathan! I committed this to branch-0.23.

          Show
          Jason Lowe added a comment - Thanks, Nathan! I committed this to branch-0.23.

            People

            • Assignee:
              Nathan Roberts
              Reporter:
              Nathan Roberts
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development