Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-10755

TestDecommissioningStatus BindException Failure

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Tests in TestDecomissioningStatus call MiniDFSCluster.dataNodeRestart(). They are required to come back up on the same (initially ephemeral) port that they were on before being shutdown. Because of this, there is an inherent race condition where another process could bind to the port while the datanode is down. If this happens then we get a BindException failure. However, all of the tests in TestDecommissioningStatus depend on the cluster being up and running for them to run correctly. So if a test blows up the cluster, the subsequent tests will also fail. Below I show the BindException failure as well as the subsequent test failure that occurred.

      java.net.BindException: Problem binding to [localhost:35370] java.net.BindException: Address already in use; For more details see:  http://wiki.apache.org/hadoop/BindException
      	at sun.nio.ch.Net.bind0(Native Method)
      	at sun.nio.ch.Net.bind(Net.java:436)
      	at sun.nio.ch.Net.bind(Net.java:428)
      	at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
      	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
      	at org.apache.hadoop.ipc.Server.bind(Server.java:430)
      	at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:768)
      	at org.apache.hadoop.ipc.Server.<init>(Server.java:2391)
      	at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:951)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:523)
      	at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:498)
      	at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:796)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:802)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1134)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:429)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2387)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2274)
      	at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2321)
      	at org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2037)
      	at org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionDeadDN(TestDecommissioningStatus.java:426)
      
      java.lang.AssertionError: Number of Datanodes  expected:<2> but was:<1>
      	at org.junit.Assert.fail(Assert.java:88)
      	at org.junit.Assert.failNotEquals(Assert.java:743)
      	at org.junit.Assert.assertEquals(Assert.java:118)
      	at org.junit.Assert.assertEquals(Assert.java:555)
      	at org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:275)
      

      I don't think there's any way to avoid the inherent race condition with getting the same ephemeral port, but we can definitely fix the tests so that it doesn't cause subsequent tests to fail.

        Attachments

        1. HDFS-10755.001.patch
          3 kB
          Eric Badger
        2. HDFS-10755.002.patch
          3 kB
          Eric Badger

          Activity

            People

            • Assignee:
              ebadger Eric Badger
              Reporter:
              ebadger Eric Badger
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: