Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-3379

Clients unable to failover after the OzoneManager leader is restart in MiniOzoneChaosCluster

    XMLWordPrintableJSON

Details

    Description

      Clients unable to failover after the OzoneManager leader is restart in MiniOzoneChaosCluster.

      This happens after the following restart events.

      ➜  chaos-2020-04-11-21-51-52-IST egrep "iniOzoneHAClusterImp|Failures" complete.log
      2020-04-11 21:52:08,296 [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC server at localhost/127.0.0.1:10804
      2020-04-11 21:52:08,387 [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC server at localhost/127.0.0.1:10810
      2020-04-11 21:52:08,485 [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC server at localhost/127.0.0.1:10816
      2020-04-11 21:52:22,845 [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  failure.Failures (FailureManager.java:start(66)) - starting failure manager 60 60 SECONDS
      2020-04-11 21:53:22,850 [pool-59-thread-1] INFO  failure.Failures (FailureManager.java:fail(56)) - time failure with OzoneManagerRestartFailure
      2020-04-11 21:53:22,853 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:shutdownOzoneManager(211)) - Shutting down OzoneManager omNode-3
      2020-04-11 21:53:22,988 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:restartOzoneManager(228)) - Restarting OzoneManager omNode-3
      	at org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:229)
      	at org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:223)
      	at org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.lambda$fail$0(Failures.java:101)
      	at org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.fail(Failures.java:98)
      2020-04-11 21:54:22,849 [pool-59-thread-1] INFO  failure.Failures (FailureManager.java:fail(56)) - time failure with OzoneManagerRestartFailure
      2020-04-11 21:54:22,850 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:shutdownOzoneManager(211)) - Shutting down OzoneManager omNode-1
      2020-04-11 21:54:22,895 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:restartOzoneManager(228)) - Restarting OzoneManager omNode-1
      	at org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:229)
      	at org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:223)
      	at org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.lambda$fail$0(Failures.java:101)
      	at org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.fail(Failures.java:98)
      ➜  chaos-2020-04-11-21-51-52-IST
      

      This results in the following exception.

      2020-04-11 21:54:24,201 [pool-360-thread-4] ERROR loadgenerators.LoadExecutors (LoadExecutors.java:load(67)) - FilesystemLoadGenerator LOADGEN: Exiting due to exception
      java.io.IOException: java.io.IOException: Could not determine or connect to OM Leader.
              at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:229)
              at org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:199)
              at org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46)
              at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
              at java.io.DataOutputStream.write(DataOutputStream.java:107)
              at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
              at org.apache.hadoop.ozone.utils.LoadBucket$WriteOp.doPostOp(LoadBucket.java:176)
              at org.apache.hadoop.ozone.utils.LoadBucket$Op.execute(LoadBucket.java:132)
              at org.apache.hadoop.ozone.utils.LoadBucket$WriteOp.execute(LoadBucket.java:153)
              at org.apache.hadoop.ozone.utils.LoadBucket.writeKey(LoadBucket.java:76)
              at org.apache.hadoop.ozone.loadgenerators.FilesystemLoadGenerator.generateLoad(FilesystemLoadGenerator.java:47)
              at org.apache.hadoop.ozone.loadgenerators.LoadExecutors.load(LoadExecutors.java:65)
              at org.apache.hadoop.ozone.loadgenerators.LoadExecutors.lambda$startLoad$0(LoadExecutors.java:89)
              at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: java.io.IOException: Could not determine or connect to OM Leader.
              at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.submitRequest(OzoneManagerProtocolClientSideTranslatorPB.java:429)
              at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:843)
              at sun.reflect.GeneratedMethodAccessor80.invoke(Unknown Source)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:498)
              at org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:71)
              at com.sun.proxy.$Proxy65.allocateBlock(Unknown Source)
              at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateNewBlock(BlockOutputStreamEntryPool.java:281)
              at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateBlockIfNeeded(BlockOutputStreamEntryPool.java:327)
              at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:208)
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            msingh Mukul Kumar Singh
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: