Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-8539

Container DB open, but not found in DatanodeStoreCache

    XMLWordPrintableJSON

Details

    Description

      Surefire fork Intermittently timeouts in TestDecommissionAndMaintenance.

      Container DB added to cache:

      2023-05-03 08:18:26,909 [EndpointStateMachine task thread for /0.0.0.0:43723 - 0 ] INFO  utils.DatanodeStoreCache (DatanodeStoreCache.java:addDB(58)) - Added db /home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-ff176d5b-bea5-4cbe-a997-8236a6853a89/datanode-0/data-0/containers/hdds/ff176d5b-bea5-4cbe-a997-8236a6853a89/DS-4328e108-8c1a-4a6f-8bff-6f686dd50b24/container.db to cache
      

      but then not found and tried to open again:

      2023-05-03 08:18:57,086 [Command processor thread] ERROR utils.DatanodeStoreCache (DatanodeStoreCache.java:getDB(74)) - Failed to get DB store /home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-ff176d5b-bea5-4cbe-a997-8236a6853a89/datanode-0/data-0/containers/hdds/ff176d5b-bea5-4cbe-a997-8236a6853a89/DS-4328e108-8c1a-4a6f-8bff-6f686dd50b24/container.db
      java.io.IOException: Failed init RocksDB, db path : /home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-ff176d5b-bea5-4cbe-a997-8236a6853a89/datanode-0/data-0/containers/hdds/ff176d5b-bea5-4cbe-a997-8236a6853a89/DS-4328e108-8c1a-4a6f-8bff-6f686dd50b24/container.db, exception :org.rocksdb.RocksDBException lock hold by current process, acquire time 1683101936 acquiring thread 139985634854656: /home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-ff176d5b-bea5-4cbe-a997-8236a6853a89/datanode-0/data-0/containers/hdds/ff176d5b-bea5-4cbe-a997-8236a6853a89/DS-4328e108-8c1a-4a6f-8bff-6f686dd50b24/container.db/LOCK: No locks available
      	at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:182)
      	at org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:212)
      	at org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.start(AbstractDatanodeStore.java:147)
      	at org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.<init>(AbstractDatanodeStore.java:99)
      	at org.apache.hadoop.ozone.container.metadata.DatanodeStoreSchemaThreeImpl.<init>(DatanodeStoreSchemaThreeImpl.java:66)
      	at org.apache.hadoop.ozone.container.common.utils.DatanodeStoreCache.getDB(DatanodeStoreCache.java:69)
      	at org.apache.hadoop.ozone.container.keyvalue.helpers.BlockUtils.getDB(BlockUtils.java:132)
      	at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.flushAndSyncDB(KeyValueContainer.java:444)
      	at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.closeAndFlushIfNeeded(KeyValueContainer.java:385)
      	at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.quasiClose(KeyValueContainer.java:355)
      	at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.quasiCloseContainer(KeyValueHandler.java:1121)
      	at org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.quasiCloseContainer(ContainerController.java:142)
      	at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.notifyGroupRemove(ContainerStateMachine.java:1052)
      	at org.apache.ratis.server.impl.RaftServerImpl.groupRemove(RaftServerImpl.java:423)
      	at org.apache.ratis.server.impl.RaftServerProxy.lambda$groupRemoveAsync$12(RaftServerProxy.java:530)
      	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
      	at java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:628)
      	at java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:1996)
      	at org.apache.ratis.server.impl.RaftServerProxy.groupRemoveAsync(RaftServerProxy.java:529)
      	at org.apache.ratis.server.impl.RaftServerProxy.groupManagementAsync(RaftServerProxy.java:479)
      	at org.apache.ratis.server.impl.RaftServerProxy.groupManagement(RaftServerProxy.java:459)
      	at org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.removeGroup(XceiverServerRatis.java:822)
      	at org.apache.hadoop.ozone.container.common.statemachine.commandhandler.ClosePipelineCommandHandler.handle(ClosePipelineCommandHandler.java:77)
      	at org.apache.hadoop.ozone.container.common.statemachine.commandhandler.CommandDispatcher.handle(CommandDispatcher.java:99)
      	at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$initCommandHandlerThread$3(DatanodeStateMachine.java:644)
      	at java.lang.Thread.run(Thread.java:750)
      

      This continues until the fork is killed:

      2023-05-03 08:33:24,505 [Command processor thread] ERROR utils.DatanodeStoreCache (DatanodeStoreCache.java:getDB(74)) - Failed to get DB store /home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-ff176d5b-bea5-4cbe-a997-8236a6853a89/datanode-0/data-0/containers/hdds/ff176d5b-bea5-4cbe-a997-8236a6853a89/DS-4328e108-8c1a-4a6f-8bff-6f686dd50b24/container.db
      java.io.IOException: Failed init RocksDB, db path : /home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-ff176d5b-bea5-4cbe-a997-8236a6853a89/datanode-0/data-0/containers/hdds/ff176d5b-bea5-4cbe-a997-8236a6853a89/DS-4328e108-8c1a-4a6f-8bff-6f686dd50b24/container.db, exception :org.rocksdb.RocksDBException lock hold by current process, acquire time 1683101936 acquiring thread 139985634854656: /home/runner/work/ozone/ozone/hadoop-ozone/integration-test/target/test-dir/MiniOzoneClusterImpl-ff176d5b-bea5-4cbe-a997-8236a6853a89/datanode-0/data-0/containers/hdds/ff176d5b-bea5-4cbe-a997-8236a6853a89/DS-4328e108-8c1a-4a6f-8bff-6f686dd50b24/container.db/LOCK: No locks available
      	at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:182)
      	at org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:212)
      	at org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.start(AbstractDatanodeStore.java:147)
      	at org.apache.hadoop.ozone.container.metadata.AbstractDatanodeStore.<init>(AbstractDatanodeStore.java:99)
      	at org.apache.hadoop.ozone.container.metadata.DatanodeStoreSchemaThreeImpl.<init>(DatanodeStoreSchemaThreeImpl.java:66)
      	at org.apache.hadoop.ozone.container.common.utils.DatanodeStoreCache.getDB(DatanodeStoreCache.java:69)
      	at org.apache.hadoop.ozone.container.keyvalue.helpers.BlockUtils.getDB(BlockUtils.java:132)
      	at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.flushAndSyncDB(KeyValueContainer.java:444)
      	at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.closeAndFlushIfNeeded(KeyValueContainer.java:385)
      	at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.quasiClose(KeyValueContainer.java:355)
      

      Attachments

        Issue Links

          Activity

            People

              adoroszlai Attila Doroszlai
              adoroszlai Attila Doroszlai
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: