Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-6054

MiniQJMHACluster should not use static port to avoid binding failure in unit test

    Details

    • Type: Improvement Improvement
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: test
    • Labels:
    • Target Version/s:

      Description

      One example of the test failues: TestFailureToReadEdits

      Error Message
      
      Port in use: localhost:10003
      
      Stacktrace
      
      java.net.BindException: Port in use: localhost:10003
      	at sun.nio.ch.Net.bind(Native Method)
      	at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
      	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
      	at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
      	at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:845)
      	at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:786)
      	at org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:132)
      	at org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:593)
      	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:492)
      	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:650)
      	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:635)
      	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1283)
      	at org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:966)
      	at org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:851)
      	at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:697)
      	at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:374)
      	at org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:355)
      	at org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits.setUpCluster(TestFailureToReadEdits.java:108)
      
      
      1. HDFS-6054.001.patch
        3 kB
        Yongjun Zhang
      2. HDFS-6054.002.patch
        4 kB
        Yongjun Zhang
      3. HDFS-6054.002.patch
        4 kB
        Yongjun Zhang

        Activity

        Hide
        Yongjun Zhang added a comment -

        Hi Brandon Li,

        Thanks for reporting the issue, and thanks Kihwal Lee for pointing me to this jira, after we saw the failure HDFS-7707 test. Hope you don't mind I assign it to myself.

        I took a look, found the following:

        • There is a retrying mechanism in MiniQJMHACluster#MiniQJMHACluster(Builder builder) to find available ports
        • There is a bug in there when incrementing retryCount, if there is exception thrown due to BindException, the retryCount won't be incremented
        • In TestFailureToReadEdits#setUpCluster, there are two branches, one create cluster for SHARED_DIR_HA mode, and the other create cluster for QJM_HA mode. The QJM_HA branch uses the existing retrying mechanism; the SHARED_DIR_HA branch is where the failure reported in this jira happens, because it doesn't retry.

        I'm attaching patch rev 001 to fix the retryCount bug, and also a retry machanism in the SHARED_DIR_HA branch.

        Hi Kihwal Lee and Brandon Li, wonder if you could help doing a review?

        Thanks a lot.

        Show
        Yongjun Zhang added a comment - Hi Brandon Li , Thanks for reporting the issue, and thanks Kihwal Lee for pointing me to this jira, after we saw the failure HDFS-7707 test. Hope you don't mind I assign it to myself. I took a look, found the following: There is a retrying mechanism in MiniQJMHACluster#MiniQJMHACluster(Builder builder) to find available ports There is a bug in there when incrementing retryCount, if there is exception thrown due to BindException, the retryCount won't be incremented In TestFailureToReadEdits#setUpCluster , there are two branches, one create cluster for SHARED_DIR_HA mode, and the other create cluster for QJM_HA mode. The QJM_HA branch uses the existing retrying mechanism; the SHARED_DIR_HA branch is where the failure reported in this jira happens, because it doesn't retry. I'm attaching patch rev 001 to fix the retryCount bug, and also a retry machanism in the SHARED_DIR_HA branch. Hi Kihwal Lee and Brandon Li , wonder if you could help doing a review? Thanks a lot.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12696380/HDFS-6054.001.patch
        against trunk revision 26dee14.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9426//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9426//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696380/HDFS-6054.001.patch against trunk revision 26dee14. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9426//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9426//console This message is automatically generated.
        Hide
        Kihwal Lee added a comment -

        Before looking at the patch too hard, I tried running the test case before & after with nc -l 10001 running another terminal window to make the port unavailable. The test didn't pass after the patch. Then I realized that the patch simply retries and waits for the port to be freed.

        2015-02-04 15:42:56,100 INFO  ha.TestFailureToReadEdits (TestFailureToReadEdits.java:setUpCluster(124))
         - SHARED_DIR_HA: MiniQJMHACluster port conflicts, retried 716 times
        

        Is there any harm in incrementing the port number in each retry? That seems to make the test pass.

        Show
        Kihwal Lee added a comment - Before looking at the patch too hard, I tried running the test case before & after with nc -l 10001 running another terminal window to make the port unavailable. The test didn't pass after the patch. Then I realized that the patch simply retries and waits for the port to be freed. 2015-02-04 15:42:56,100 INFO ha.TestFailureToReadEdits (TestFailureToReadEdits.java:setUpCluster(124)) - SHARED_DIR_HA: MiniQJMHACluster port conflicts, retried 716 times Is there any harm in incrementing the port number in each retry? That seems to make the test pass.
        Hide
        Yongjun Zhang added a comment -

        Hi Kihwal Lee,

        Thanks a lot for looking into! I thought each trial we are using a random number, but as you pointed out, it's not! My bad to not have examined it well enough.

        I'm uploading a revised patch shortly.

        Show
        Yongjun Zhang added a comment - Hi Kihwal Lee , Thanks a lot for looking into! I thought each trial we are using a random number, but as you pointed out, it's not! My bad to not have examined it well enough. I'm uploading a revised patch shortly.
        Hide
        Yongjun Zhang added a comment -

        Hi Kihwal Lee,

        I just uploaded rev 002 to make the port random. I also make both MiniQJMHACluster and SHARED_DIR_HA report the basePort. In addition, I think it's better to cleanup the cluster before retrying in MiniQJMHACluster, so added the code to do so.

        I did similar test like you did this time, started with using 10000 as the base, while making the port 10001 unavailable with nc -l 10001; then later changed to use random even in the first try as in the patch.

        Thanks.

        Show
        Yongjun Zhang added a comment - Hi Kihwal Lee , I just uploaded rev 002 to make the port random. I also make both MiniQJMHACluster and SHARED_DIR_HA report the basePort. In addition, I think it's better to cleanup the cluster before retrying in MiniQJMHACluster, so added the code to do so. I did similar test like you did this time, started with using 10000 as the base, while making the port 10001 unavailable with nc -l 10001 ; then later changed to use random even in the first try as in the patch. Thanks.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12696646/HDFS-6054.002.patch
        against trunk revision 9112f09.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The test build failed in hadoop-hdfs-project/hadoop-hdfs

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9435//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9435//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696646/HDFS-6054.002.patch against trunk revision 9112f09. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The test build failed in hadoop-hdfs-project/hadoop-hdfs Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9435//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9435//console This message is automatically generated.
        Hide
        Kihwal Lee added a comment -

        The precommit successfully ran 3223 unit tests, but the stdout/stderr redirect file was deleted by another process before this build job had a chance to cat it.

        Show
        Kihwal Lee added a comment - The precommit successfully ran 3223 unit tests, but the stdout/stderr redirect file was deleted by another process before this build job had a chance to cat it.
        Hide
        Yongjun Zhang added a comment -

        Thanks Kihwal! I uploaded the same patch again to trigger another run.

        Show
        Yongjun Zhang added a comment - Thanks Kihwal! I uploaded the same patch again to trigger another run.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12696920/HDFS-6054.002.patch
        against trunk revision 4641196.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9449//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9449//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696920/HDFS-6054.002.patch against trunk revision 4641196. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9449//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9449//console This message is automatically generated.
        Hide
        Yongjun Zhang added a comment -

        Hi Kihwal Lee,

        Thanks for your help earlier. The test is clean, would you please help doing a review? very much appreciated!

        Show
        Yongjun Zhang added a comment - Hi Kihwal Lee , Thanks for your help earlier. The test is clean, would you please help doing a review? very much appreciated!
        Hide
        Yongjun Zhang added a comment -

        Hi Kihwal Lee,

        Wonder if you could help taking a look at the latest patch? thanks a lot.

        Show
        Yongjun Zhang added a comment - Hi Kihwal Lee , Wonder if you could help taking a look at the latest patch? thanks a lot.
        Hide
        Haohui Mai added a comment -

        Why don't you just bind to port 0?

        Show
        Haohui Mai added a comment - Why don't you just bind to port 0?
        Hide
        Yongjun Zhang added a comment -

        Hi Haohui Mai, thanks for the suggestion. However, in this case, we are trying to use a random base port and configure several ports (basePort, basePort+1, basePort+2, basePort+3) together for the cluster, so to use port 0 doesn't seem practical here. Agree? thanks.

        Show
        Yongjun Zhang added a comment - Hi Haohui Mai , thanks for the suggestion. However, in this case, we are trying to use a random base port and configure several ports (basePort, basePort+1, basePort+2, basePort+3) together for the cluster, so to use port 0 doesn't seem practical here. Agree? thanks.
        Hide
        Yongjun Zhang added a comment -

        Hi Kihwal Lee, would you please help taking a look at the latest patch? Many thanks.

        Show
        Yongjun Zhang added a comment - Hi Kihwal Lee , would you please help taking a look at the latest patch? Many thanks.

          People

          • Assignee:
            Yongjun Zhang
            Reporter:
            Brandon Li
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development