Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
While accessing snapshot, We use SnapshotCache to load and retrieve a snapshot's RocksDB instance. When Multiple background process, accesses the SnapshotCache's get, We also cleanup the pending eviction list.
There is a scenario, where Thread 1(KeyDeletingService) is executing get->cleanup method and Thread 2(SSTFilteringService) is executing get, The reference count of the snapshot is incremented by get we still close the rocksDB instance because the cleanup method assumes everything in the pending eviction list has a reference count of 0. Which is not the case, We need to recheck this when closing the RocksDB instance. Other wise we end up in this scenario,
2023-12-18 19:19:28,739 INFO [SstFilteringService#0]-org.apache.hadoop.ozone.om.snapshot.SnapshotCache: Loading snapshot. Table key: /vol-t2gj8/buck-07uux/snap-5griw 2023-12-18 19:19:28,741 ERROR [SstFilteringService#0]-org.apache.hadoop.ozone.om.SstFilteringService: Exception encountered while filtering a snapshot java.io.IOException: Rocks Database is closed 2023-12-18 19:20:28,739 INFO [SstFilteringService#0]-org.apache.hadoop.ozone.om.snapshot.SnapshotCache: Loading snapshot. Table key: /vol-t2gj8/buck-07uux/snap-5griw 2023-12-18 19:20:28,768 WARN [KeyDeletingService#0]-org.apache.hadoop.hdds.utils.BackgroundService: Background task execution failed java.lang.IllegalStateException: Cache map entry removal failure. The cache is in an inconsistent state. Expected OmSnapshot instance: org.apache.hadoop.ozone.om.snapshot.ReferenceCounted@4f63f85e, actual: org.apache.hadoop.ozone.om.snapshot.ReferenceCounted@7656056 2023-12-18 19:20:28,768 WARN [SstFilteringService#0]-org.apache.hadoop.hdds.utils.BackgroundService: Background task execution failed java.lang.IllegalStateException: Cache map entry removal failure. The cache is in an inconsistent state. Expected OmSnapshot instance: org.apache.hadoop.ozone.om.snapshot.ReferenceCounted@4f63f85e, actual: null 2023-12-18 19:21:06,486 WARN [Finalizer]-org.apache.hadoop.ozone.om.OmSnapshot: org.apache.hadoop.hdds.utils.db.RDBStore@4e5ac786 is not closed properly. snapshotName: snap-5griw 2023-12-18 19:21:28,742 ERROR [SstFilteringService#0]-org.apache.hadoop.ozone.om.OmSnapshotManager: Failed to retrieve snapshot: /vol-t2gj8/buck-07uux/snap-5griw java.io.IOException: Failed init RocksDB, db path : /var/lib/hadoop-ozone/om/data913140/db.snapshots/checkpointState/om.db-4e72e3fd-58e4-4814-b8fa-869fb3e8741b, exception :org.rocksdb.RocksDBException lock hold by current process, acquire time 1702927228 acquiring thread 139777345017600: /var/lib/hadoop-ozone/om/data913140/db.snapshots/checkpointState/om.db-4e72e3fd-58e4-4814-b8fa-869fb3e8741b/LOCK: No locks available at org.apache.hadoop.hdds.utils.db.RDBStore.<init>(RDBStore.java:180) at org.apache.hadoop.hdds.utils.db.DBStoreBuilder.build(DBStoreBuilder.java:220) at org.apache.hadoop.ozone.om.OmMetadataManagerImpl.loadDB(OmMetadataManagerImpl.java:598) at org.apache.hadoop.ozone.om.OmMetadataManagerImpl.<init>(OmMetadataManagerImpl.java:406) at org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:357) at org.apache.hadoop.ozone.om.OmSnapshotManager$1.load(OmSnapshotManager.java:1) at org.apache.hadoop.ozone.om.snapshot.SnapshotCache.lambda$0(SnapshotCache.java:171) at java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1908) at org.apache.hadoop.ozone.om.snapshot.SnapshotCache.get(SnapshotCache.java:167) at org.apache.hadoop.ozone.om.snapshot.SnapshotCache.get(SnapshotCache.java:153)
Attachments
Issue Links
- duplicates
-
HDDS-10103 Snapshot read calls are failing due to SnapshotCache's inconsistency
-
- Resolved
-
- is duplicated by
-
HDDS-9965 [Snapshot] SnapshotCache is in inconsistent state
-
- Resolved
-
- links to