Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-3379

Clients unable to failover after the OzoneManager leader is restart in MiniOzoneChaosCluster

    XMLWordPrintableJSON

    Details

    • Target Version/s:

      Description

      Clients unable to failover after the OzoneManager leader is restart in MiniOzoneChaosCluster.

      This happens after the following restart events.

      ➜  chaos-2020-04-11-21-51-52-IST egrep "iniOzoneHAClusterImp|Failures" complete.log
      2020-04-11 21:52:08,296 [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC server at localhost/127.0.0.1:10804
      2020-04-11 21:52:08,387 [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC server at localhost/127.0.0.1:10810
      2020-04-11 21:52:08,485 [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:createOMService(373)) - Started OzoneManager RPC server at localhost/127.0.0.1:10816
      2020-04-11 21:52:22,845 [org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.main()] INFO  failure.Failures (FailureManager.java:start(66)) - starting failure manager 60 60 SECONDS
      2020-04-11 21:53:22,850 [pool-59-thread-1] INFO  failure.Failures (FailureManager.java:fail(56)) - time failure with OzoneManagerRestartFailure
      2020-04-11 21:53:22,853 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:shutdownOzoneManager(211)) - Shutting down OzoneManager omNode-3
      2020-04-11 21:53:22,988 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:restartOzoneManager(228)) - Restarting OzoneManager omNode-3
      	at org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:229)
      	at org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:223)
      	at org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.lambda$fail$0(Failures.java:101)
      	at org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.fail(Failures.java:98)
      2020-04-11 21:54:22,849 [pool-59-thread-1] INFO  failure.Failures (FailureManager.java:fail(56)) - time failure with OzoneManagerRestartFailure
      2020-04-11 21:54:22,850 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:shutdownOzoneManager(211)) - Shutting down OzoneManager omNode-1
      2020-04-11 21:54:22,895 [pool-59-thread-1] INFO  ozone.MiniOzoneHAClusterImpl (MiniOzoneHAClusterImpl.java:restartOzoneManager(228)) - Restarting OzoneManager omNode-1
      	at org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:229)
      	at org.apache.hadoop.ozone.MiniOzoneHAClusterImpl.restartOzoneManager(MiniOzoneHAClusterImpl.java:223)
      	at org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.lambda$fail$0(Failures.java:101)
      	at org.apache.hadoop.ozone.failure.Failures$OzoneManagerRestartFailure.fail(Failures.java:98)
      ➜  chaos-2020-04-11-21-51-52-IST
      

      This results in the following exception.

      2020-04-11 21:54:24,201 [pool-360-thread-4] ERROR loadgenerators.LoadExecutors (LoadExecutors.java:load(67)) - FilesystemLoadGenerator LOADGEN: Exiting due to exception
      java.io.IOException: java.io.IOException: Could not determine or connect to OM Leader.
              at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:229)
              at org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:199)
              at org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46)
              at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
              at java.io.DataOutputStream.write(DataOutputStream.java:107)
              at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
              at org.apache.hadoop.ozone.utils.LoadBucket$WriteOp.doPostOp(LoadBucket.java:176)
              at org.apache.hadoop.ozone.utils.LoadBucket$Op.execute(LoadBucket.java:132)
              at org.apache.hadoop.ozone.utils.LoadBucket$WriteOp.execute(LoadBucket.java:153)
              at org.apache.hadoop.ozone.utils.LoadBucket.writeKey(LoadBucket.java:76)
              at org.apache.hadoop.ozone.loadgenerators.FilesystemLoadGenerator.generateLoad(FilesystemLoadGenerator.java:47)
              at org.apache.hadoop.ozone.loadgenerators.LoadExecutors.load(LoadExecutors.java:65)
              at org.apache.hadoop.ozone.loadgenerators.LoadExecutors.lambda$startLoad$0(LoadExecutors.java:89)
              at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: java.io.IOException: Could not determine or connect to OM Leader.
              at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.submitRequest(OzoneManagerProtocolClientSideTranslatorPB.java:429)
              at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:843)
              at sun.reflect.GeneratedMethodAccessor80.invoke(Unknown Source)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:498)
              at org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:71)
              at com.sun.proxy.$Proxy65.allocateBlock(Unknown Source)
              at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateNewBlock(BlockOutputStreamEntryPool.java:281)
              at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateBlockIfNeeded(BlockOutputStreamEntryPool.java:327)
              at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:208)
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              msingh Mukul Kumar Singh
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: