Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11252

TestFileTruncate#testTruncateWithDataNodesRestartImmediately can fail with BindException

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha2
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      testTruncateWithDataNodesRestartImmediately can fail with a BindException. The setup for TestFileTruncate has been fixed in the past to solve a bind exception, but this is occurring after the minicluster comes up and the datanodes are being restarted. Maybe there's a race condition there?

      1. HDFS-11252.001.patch
        2 kB
        Yiqun Lin
      2. HDFS-11252.002.patch
        2 kB
        Yiqun Lin

        Activity

        Hide
        jlowe Jason Lowe added a comment -

        Stacktrace:

        java.net.BindException: Problem binding to [localhost:33571] java.net.BindException: Address already in use; For more details see:  http://wiki.apache.org/hadoop/BindException
        	at sun.nio.ch.Net.bind0(Native Method)
        	at sun.nio.ch.Net.bind(Net.java:433)
        	at sun.nio.ch.Net.bind(Net.java:425)
        	at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
        	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
        	at org.apache.hadoop.ipc.Server.bind(Server.java:543)
        	at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:1033)
        	at org.apache.hadoop.ipc.Server.<init>(Server.java:2785)
        	at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:960)
        	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:420)
        	at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:341)
        	at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:802)
        	at org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:953)
        	at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1364)
        	at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:492)
        	at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2661)
        	at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2564)
        	at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2611)
        	at org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2305)
        	at org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2355)
        	at org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.testTruncateWithDataNodesRestartImmediately(TestFileTruncate.java:804)
        
        Show
        jlowe Jason Lowe added a comment - Stacktrace: java.net.BindException: Problem binding to [localhost:33571] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.apache.hadoop.ipc.Server.bind(Server.java:543) at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:1033) at org.apache.hadoop.ipc.Server.<init>(Server.java:2785) at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:960) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:420) at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:341) at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:802) at org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:953) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1364) at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:492) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2661) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2564) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2611) at org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2305) at org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2355) at org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.testTruncateWithDataNodesRestartImmediately(TestFileTruncate.java:804)
        Hide
        linyiqun Yiqun Lin added a comment -

        Thanks for reporting this, Jason Lowe. Some comments from mie.

        Maybe there's a race condition there?

        We would better not to hard-code or reuse the same port number in unit tests since the Jenkins slave can run multiple jobs at the same time. And this will cause bind conflicts. So if we restart datanode with keeping port, there is a chance that will lead a Bind
        Exception. You can see a similar issue HDFS-10730.
        BTW, I think here we don't need to keep the port. Will attach a simple patch soon to fix this. Thanks.

        Show
        linyiqun Yiqun Lin added a comment - Thanks for reporting this, Jason Lowe . Some comments from mie. Maybe there's a race condition there? We would better not to hard-code or reuse the same port number in unit tests since the Jenkins slave can run multiple jobs at the same time. And this will cause bind conflicts. So if we restart datanode with keeping port, there is a chance that will lead a Bind Exception. You can see a similar issue HDFS-10730 . BTW, I think here we don't need to keep the port. Will attach a simple patch soon to fix this. Thanks.
        Hide
        linyiqun Yiqun Lin added a comment -

        Attach a simple patch to have a fix. I have tested the patch in my local, it don't need to keep the port but should still keep expireOnNN as true.

        Show
        linyiqun Yiqun Lin added a comment - Attach a simple patch to have a fix. I have tested the patch in my local, it don't need to keep the port but should still keep expireOnNN as true.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 18s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 14m 29s trunk passed
        +1 compile 0m 49s trunk passed
        +1 checkstyle 0m 28s trunk passed
        +1 mvnsite 0m 59s trunk passed
        +1 mvneclipse 0m 14s trunk passed
        +1 findbugs 1m 54s trunk passed
        +1 javadoc 0m 41s trunk passed
        +1 mvninstall 0m 52s the patch passed
        +1 compile 0m 51s the patch passed
        +1 javac 0m 51s the patch passed
        +1 checkstyle 0m 26s the patch passed
        +1 mvnsite 0m 58s the patch passed
        +1 mvneclipse 0m 11s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 2m 0s the patch passed
        +1 javadoc 0m 39s the patch passed
        -1 unit 115m 54s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 27s The patch does not generate ASF License warnings.
        143m 31s



        Reason Tests
        Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:a9ad5d6
        JIRA Issue HDFS-11252
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12843523/HDFS-11252.001.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 3f938b5bb00d 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / cee0c46
        Default Java 1.8.0_111
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/17873/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17873/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17873/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 18s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 14m 29s trunk passed +1 compile 0m 49s trunk passed +1 checkstyle 0m 28s trunk passed +1 mvnsite 0m 59s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 54s trunk passed +1 javadoc 0m 41s trunk passed +1 mvninstall 0m 52s the patch passed +1 compile 0m 51s the patch passed +1 javac 0m 51s the patch passed +1 checkstyle 0m 26s the patch passed +1 mvnsite 0m 58s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 0s the patch passed +1 javadoc 0m 39s the patch passed -1 unit 115m 54s hadoop-hdfs in the patch failed. +1 asflicense 0m 27s The patch does not generate ASF License warnings. 143m 31s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-11252 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12843523/HDFS-11252.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 3f938b5bb00d 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / cee0c46 Default Java 1.8.0_111 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/17873/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17873/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17873/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        linyiqun Yiqun Lin added a comment -

        Hi Brahma Reddy Battula, would you have have a quick look for this? The test TestFileTruncate still fails sometimes due to BindException after HDFS-9224 which you have fixed BindException there before. Thanks.

        Show
        linyiqun Yiqun Lin added a comment - Hi Brahma Reddy Battula , would you have have a quick look for this? The test TestFileTruncate still fails sometimes due to BindException after HDFS-9224 which you have fixed BindException there before. Thanks.
        Hide
        brahmareddy Brahma Reddy Battula added a comment -

        Yiqun Lin thanks for working on this..As we discussed in HDFS-11134, we no need to keep same port while restarting the DN..

        Patch LGTM ,Apart from the following,if you agree. can extract the duplicate code to one method like following..?

        +  private void truncateAndRestartDN(Path p, int dn, int newLength)
        +      throws IOException {
        +    try {
        +      boolean isReady = fs.truncate(p, newLength);
        +      assertFalse(isReady);
        +    } finally {
        +      cluster.restartDataNode(dn, false, true);
        +      cluster.waitActive();
        +    }
        +  }
        
        Show
        brahmareddy Brahma Reddy Battula added a comment - Yiqun Lin thanks for working on this..As we discussed in HDFS-11134 , we no need to keep same port while restarting the DN.. Patch LGTM ,Apart from the following,if you agree. can extract the duplicate code to one method like following..? + private void truncateAndRestartDN(Path p, int dn, int newLength) + throws IOException { + try { + boolean isReady = fs.truncate(p, newLength); + assertFalse(isReady); + } finally { + cluster.restartDataNode(dn, false , true ); + cluster.waitActive(); + } + }
        Hide
        linyiqun Yiqun Lin added a comment -

        Thanks Brahma Reddy Battula for the review and comments. The comment makes sense for me. New patch attached.

        Show
        linyiqun Yiqun Lin added a comment - Thanks Brahma Reddy Battula for the review and comments. The comment makes sense for me. New patch attached.
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 12s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 13m 12s trunk passed
        +1 compile 0m 46s trunk passed
        +1 checkstyle 0m 26s trunk passed
        +1 mvnsite 0m 52s trunk passed
        +1 mvneclipse 0m 13s trunk passed
        +1 findbugs 1m 44s trunk passed
        +1 javadoc 0m 40s trunk passed
        +1 mvninstall 0m 47s the patch passed
        +1 compile 0m 43s the patch passed
        +1 javac 0m 43s the patch passed
        +1 checkstyle 0m 24s the patch passed
        +1 mvnsite 0m 48s the patch passed
        +1 mvneclipse 0m 10s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 49s the patch passed
        +1 javadoc 0m 37s the patch passed
        +1 unit 76m 8s hadoop-hdfs in the patch passed.
        +1 asflicense 0m 20s The patch does not generate ASF License warnings.
        101m 5s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:a9ad5d6
        JIRA Issue HDFS-11252
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12844544/HDFS-11252.002.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux 516164ae2c79 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 4e90296
        Default Java 1.8.0_111
        findbugs v3.0.0
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17947/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17947/console
        Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 13m 12s trunk passed +1 compile 0m 46s trunk passed +1 checkstyle 0m 26s trunk passed +1 mvnsite 0m 52s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 1m 44s trunk passed +1 javadoc 0m 40s trunk passed +1 mvninstall 0m 47s the patch passed +1 compile 0m 43s the patch passed +1 javac 0m 43s the patch passed +1 checkstyle 0m 24s the patch passed +1 mvnsite 0m 48s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 49s the patch passed +1 javadoc 0m 37s the patch passed +1 unit 76m 8s hadoop-hdfs in the patch passed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 101m 5s Subsystem Report/Notes Docker Image:yetus/hadoop:a9ad5d6 JIRA Issue HDFS-11252 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12844544/HDFS-11252.002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 516164ae2c79 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 4e90296 Default Java 1.8.0_111 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17947/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17947/console Powered by Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        brahmareddy Brahma Reddy Battula added a comment -

        Yiqun Lin thanks for updating the patch.. LGTM.. will commit this weekend unless there are further comments.

        Show
        brahmareddy Brahma Reddy Battula added a comment - Yiqun Lin thanks for updating the patch.. LGTM.. will commit this weekend unless there are further comments.
        Hide
        brahmareddy Brahma Reddy Battula added a comment -

        Committed to trunk,branch-2 and branch-2.8..Yiqun Hu thannks for your contribution.

        Show
        brahmareddy Brahma Reddy Battula added a comment - Committed to trunk,branch-2 and branch-2.8.. Yiqun Hu thannks for your contribution.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11046 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11046/)
        HDFS-11252. TestFileTruncate#testTruncateWithDataNodesRestartImmediately (brahma: rev 0ddb8defad6a7fd5eb69847d1789ba51952c0cf0)

        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11046 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11046/ ) HDFS-11252 . TestFileTruncate#testTruncateWithDataNodesRestartImmediately (brahma: rev 0ddb8defad6a7fd5eb69847d1789ba51952c0cf0) (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileTruncate.java

          People

          • Assignee:
            linyiqun Yiqun Lin
            Reporter:
            jlowe Jason Lowe
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development