Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-12913

TestDNFencingWithReplication.testFencingStress fix mini cluster not yet active issue

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.1.0, 3.0.1
    • None

    Description

      Once in every 5000 test run the following issue happens:

      2017-12-11 10:33:09 [INFO] 
      2017-12-11 10:33:09 [INFO] -------------------------------------------------------
      2017-12-11 10:33:09 [INFO]  T E S T S
      2017-12-11 10:33:09 [INFO] -------------------------------------------------------
      2017-12-11 10:33:09 [INFO] Running org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
      2017-12-11 10:37:32 [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 262.641 s <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
      2017-12-11 10:37:32 [ERROR] testFencingStress(org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication)  Time elapsed: 262.477 s  <<< ERROR!
      2017-12-11 10:37:32 java.lang.RuntimeException: Deferred
      2017-12-11 10:37:32 	at org.apache.hadoop.test.MultithreadedTestUtil$TestContext.checkException(MultithreadedTestUtil.java:130)
      2017-12-11 10:37:32 	at org.apache.hadoop.test.MultithreadedTestUtil$TestContext.stop(MultithreadedTestUtil.java:166)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress(TestDNFencingWithReplication.java:137)
      2017-12-11 10:37:32 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      2017-12-11 10:37:32 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      2017-12-11 10:37:32 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      2017-12-11 10:37:32 	at java.lang.reflect.Method.invoke(Method.java:498)
      2017-12-11 10:37:32 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
      2017-12-11 10:37:32 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      2017-12-11 10:37:32 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
      2017-12-11 10:37:32 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      2017-12-11 10:37:32 	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
      2017-12-11 10:37:32 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
      2017-12-11 10:37:32 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
      2017-12-11 10:37:32 	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
      2017-12-11 10:37:32 	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
      2017-12-11 10:37:32 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
      2017-12-11 10:37:32 	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
      2017-12-11 10:37:32 	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
      2017-12-11 10:37:32 	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
      2017-12-11 10:37:32 	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369)
      2017-12-11 10:37:32 	at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275)
      2017-12-11 10:37:32 	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239)
      2017-12-11 10:37:32 	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160)
      2017-12-11 10:37:32 	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373)
      2017-12-11 10:37:32 	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334)
      2017-12-11 10:37:32 	at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119)
      2017-12-11 10:37:32 	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407)
      2017-12-11 10:37:32 Caused by: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1962)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1421)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1862)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:728)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:417)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      2017-12-11 10:37:32 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
      2017-12-11 10:37:32 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
      2017-12-11 10:37:32 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
      2017-12-11 10:37:32 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
      2017-12-11 10:37:32 	at java.security.AccessController.doPrivileged(Native Method)
      2017-12-11 10:37:32 	at javax.security.auth.Subject.doAs(Subject.java:422)
      2017-12-11 10:37:32 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
      2017-12-11 10:37:32 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
      2017-12-11 10:37:32 
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication$ReplicationToggler$1.get(TestDNFencingWithReplication.java:88)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication$ReplicationToggler$1.get(TestDNFencingWithReplication.java:80)
      2017-12-11 10:37:32 	at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:380)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication$ReplicationToggler.waitForReplicas(TestDNFencingWithReplication.java:80)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication$ReplicationToggler.doAnAction(TestDNFencingWithReplication.java:75)
      2017-12-11 10:37:32 	at org.apache.hadoop.test.MultithreadedTestUtil$RepeatingTestThread.doWork(MultithreadedTestUtil.java:222)
      2017-12-11 10:37:32 	at org.apache.hadoop.test.MultithreadedTestUtil$TestingThread.run(MultithreadedTestUtil.java:189)
      2017-12-11 10:37:32 Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1962)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1421)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1862)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:728)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:417)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      2017-12-11 10:37:32 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
      2017-12-11 10:37:32 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
      2017-12-11 10:37:32 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
      2017-12-11 10:37:32 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
      2017-12-11 10:37:32 	at java.security.AccessController.doPrivileged(Native Method)
      2017-12-11 10:37:32 	at javax.security.auth.Subject.doAs(Subject.java:422)
      2017-12-11 10:37:32 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
      2017-12-11 10:37:32 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
      2017-12-11 10:37:32 
      2017-12-11 10:37:32 	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491)
      2017-12-11 10:37:32 	at org.apache.hadoop.ipc.Client.call(Client.java:1437)
      2017-12-11 10:37:32 	at org.apache.hadoop.ipc.Client.call(Client.java:1347)
      2017-12-11 10:37:32 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
      2017-12-11 10:37:32 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
      2017-12-11 10:37:32 	at com.sun.proxy.$Proxy23.getBlockLocations(Unknown Source)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:306)
      2017-12-11 10:37:32 	at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
      2017-12-11 10:37:32 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      2017-12-11 10:37:32 	at java.lang.reflect.Method.invoke(Method.java:498)
      2017-12-11 10:37:32 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
      2017-12-11 10:37:32 	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
      2017-12-11 10:37:32 	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
      2017-12-11 10:37:32 	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
      2017-12-11 10:37:32 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
      2017-12-11 10:37:32 	at com.sun.proxy.$Proxy27.getBlockLocations(Unknown Source)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:852)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:841)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:898)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.DistributedFileSystem$2.doCall(DistributedFileSystem.java:271)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.DistributedFileSystem$2.doCall(DistributedFileSystem.java:268)
      2017-12-11 10:37:32 	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:278)
      2017-12-11 10:37:32 	at org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication$ReplicationToggler$1.get(TestDNFencingWithReplication.java:84)
      2017-12-11 10:37:32 	... 6 more
      2017-12-11 10:37:32 
      2017-12-11 10:37:32 [INFO] 
      2017-12-11 10:37:32 [INFO] Results:
      2017-12-11 10:37:32 [INFO] 
      2017-12-11 10:37:32 [ERROR] Errors: 
      2017-12-11 10:37:32 [ERROR]   TestDNFencingWithReplication.testFencingStress:137 ? Runtime Deferred
      2017-12-11 10:37:32 [INFO] 
      2017-12-11 10:37:32 [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
      

      Attachments

        1. HDFS-12913.02.patch
          2 kB
          Zsolt Venczel
        2. HDFS-12913.01.patch
          0.9 kB
          Zsolt Venczel

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            zvenczel Zsolt Venczel
            zvenczel Zsolt Venczel
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment