Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • None

    Description

      While accessing snapshot, We use SnapshotCache to load and retrieve a snapshot's RocksDB instance. When Multiple background process, accesses the SnapshotCache's get, We also cleanup the pending eviction list. 

      There is a scenario, where Thread 1(KeyDeletingService) is executing get->cleanup method and Thread 2(SSTFilteringService) is executing get, The reference count of the snapshot is incremented by get we still close the rocksDB instance because the cleanup method assumes everything in the pending eviction list has a reference count of 0. Which is not the case, We need to recheck this when closing the RocksDB instance. Other wise we end up in this scenario,

       

      2023-12-18 19:19:28,739 INFO [SstFilteringService#0]-org.apache.hadoop.ozone.om.snapshot.SnapshotCache: Loading snapshot. Table key: /vol-t2gj8/buck-07uux/snap-5griw
      2023-12-18 19:19:28,741 ERROR [SstFilteringService#0]-org.apache.hadoop.ozone.om.SstFilteringService: Exception encountered while filtering a snapshot
      java.io.IOException: Rocks Database is closed
      2023-12-18 19:20:28,739 INFO [SstFilteringService#0]-org.apache.hadoop.ozone.om.snapshot.SnapshotCache: Loading snapshot. Table key: /vol-t2gj8/buck-07uux/snap-5griw
      2023-12-18 19:20:28,768 WARN [KeyDeletingService#0]-org.apache.hadoop.hdds.utils.BackgroundService: Background task execution failed
      java.lang.IllegalStateException: Cache map entry removal failure. The cache is in an inconsistent state. Expected OmSnapshot instance: org.apache.hadoop.ozone.om.snapshot.ReferenceCounted@4f63f85e, actual: org.apache.hadoop.ozone.om.snapshot.ReferenceCounted@7656056
      2023-12-18 19:20:28,768 WARN [SstFilteringService#0]-org.apache.hadoop.hdds.utils.BackgroundService: Background task execution failed
      java.lang.IllegalStateException: Cache map entry removal failure. The cache is in an inconsistent state. Expected OmSnapshot instance: org.apache.hadoop.ozone.om.snapshot.ReferenceCounted@4f63f85e, actual: null
      2023-12-18 19:21:06,486 WARN [Finalizer]-org.apache.hadoop.ozone.om.OmSnapshot: org.apache.hadoop.hdds.utils.db.RDBStore@4e5ac786 is not closed properly. snapshotName: snap-5griw
      2023-12-18 19:21:28,742 ERROR [SstFilteringService#0]-org.apache.hadoop.ozone.om.OmSnapshotManager: Failed to retrieve snapshot: /vol-t2gj8/buck-07uux/snap-5griw
      java.io.IOException: Failed init RocksDB, db path : /var/lib/hadoop-ozone/om/data913140/db.snapshots/checkpointState/om.db-4e72e3fd-58e4-4814-b8fa-869fb3e8741b, exception :org.rocksdb.RocksDBException lock hold by current process, acquire time 1702927228 acquiring thread 139777345017600: /var/lib/hadoop-ozone/om/data913140/db.snapshots/checkpointState/om.db-4e72e3fd-58e4-4814-b8fa-869fb3e8741b/LOCK: No locks available
              at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:180)
              at org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:220)
              at org.apache.hadoop.ozone.om.OmMetadataManagerImpl.loadDB(OmMetadataManagerImpl.java:598)
              at org.apache.hadoop.ozone.om.OmMetadataManagerImpl.<init>(OmMetadataManagerImpl.java:406)
              at org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:357)
              at org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:1)
              at org.apache.hadoop.ozone.om.snapshot.SnapshotCache.lambda$0(SnapshotCache.java:171)
              at java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1908)
              at org.apache.hadoop.ozone.om.snapshot.SnapshotCache.get(SnapshotCache.java:167)
              at org.apache.hadoop.ozone.om.snapshot.SnapshotCache.get(SnapshotCache.java:153)
      

       

      Attachments

        Issue Links

          Activity

            People

              aswinshakil Aswin Shakil
              aswinshakil Aswin Shakil
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: