Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
Description
Test scenario :
The test test_unordered_deletion is trying to delete snapshots in random order. And while doing so, we are hitting below exception with OM more often than not.
Once the error is seen, the OM goes into an unhealthy state, and all the tests after this couldn't run.
Snapshot is deleted :
2023-08-06 06:33:27,113 INFO [OM StateMachine ApplyTransaction Thread - 0]-org.apache.hadoop.ozone.om.request.snapshot.OMSnapshotDeleteRequest: Deleted snapshot 'snap-ae5or' under path 'vol-w19gk/buck-f9sqw'
And soon after during copy
2023-08-06 06:39:06,314|INFO|MainThread|machine.py:188 - run()||GUID=5210f279-e5c7-4ee9-b652-b49a6b0eb07a|RUNNING: /opt/cloudera/parcels/CDH/bin/ozone fs -cp ofs://ozone1/vol-w19gk/buck-f9sqw/.snapshot/snap-5qmtv/key_1691303390 ofs://ozone1/vol-w19gk/buck-f9sqw/
OM log stacktrace:
2023-08-06 06:33:38,126 INFO [SstFilteringService#0]-org.apache.hadoop.hdds.utils.db.RocksDatabase: Deleting sst file /000396.sst corresponding to column family keyTable from db: /var/lib/hadoop-ozone/om/data293349/db.snapshots/checkpointState/om.db-0ccb08e9-c5ab-45bb-a71e-8444a2142511 2023-08-06 06:33:38,127 INFO [SstFilteringService#0]-org.apache.hadoop.hdds.utils.db.managed.ManagedRocksObjectUtils: Waited for 1 milliseconds for file /var/lib/hadoop-ozone/om/data293349/db.snapshots/checkpointState/om.db-0ccb08e9-c5ab-45bb-a71e-8444a2142511/000396.sst deletion. 2023-08-06 06:34:37,938 INFO [SstFilteringService#0]-org.apache.hadoop.ozone.om.snapshot.SnapshotCache: Loading snapshot. Table key: /vol-w19gk/buck-f9sqw/snap-ae5or 2023-08-06 06:34:37,938 INFO [SstFilteringService#0]-org.apache.hadoop.ozone.om.helpers.OmKeyInfo: OmKeyInfo.getCodec ignorePipeline = true 2023-08-06 06:34:37,989 ERROR [SstFilteringService#0]-org.apache.hadoop.ozone.om.SstFilteringService: Error during Snapshot sst filtering FILE_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Unable to load snapshot. Snapshot with table key '/vol-w19gk/buck-f9sqw/snap-ae5or' is no longer active at org.apache.hadoop.ozone.om.snapshot.SnapshotCache.get(SnapshotCache.java:205) at org.apache.hadoop.ozone.om.snapshot.SnapshotCache.get(SnapshotCache.java:151) at org.apache.hadoop.ozone.om.SstFilteringService$SstFilteringTask.call(SstFilteringService.java:178) at org.apache.hadoop.hdds.utils.BackgroundService$PeriodicalTask.lambda$run$0(BackgroundService.java:121) at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2023-08-06 06:35:30,232 INFO [pool-8-thread-1]-org.apache.ozone.rocksdiff.RocksDBCheckpointDiffer: Removing SST files: [000410, 000453, 000496, 000253, 000374, 000535, 000611, 000456, 000417, 000658, 000338, 000459, 000380, 000185, 000124, 000245, 000443, 000200, 000563, 000364, 000562, 000128, 000447, 000248, 000688, 000324, 000522, 000367, 000209, 000407, 000129, 000602, 000290, 000296, 000692, 000130, 000372, 000690, 000172, 000293, 000157, 000355, 000399, 000674, 000233, 000277, 000310, 000398, 000552, 000596, 000474, 000352, 000550, 000315, 000359, 000634, 000236, 000599, 000554, 000638, 000637, 000559, 000514, 000518, 000160, 000681, 000163, 000284, 000162, 000344, 000663, 000264, 000462, 000425, 000667, 000225, 000302, 000467, 000588, 000301, 000506, 000307, 000504, 000668, 000628, 000193, 000391, 000197] as part of SST file pruning. 2023-08-06 06:35:37,937 INFO [SstFilteringService#0]-org.apache.hadoop.ozone.om.snapshot.SnapshotCache: Loading snapshot. Table key: /vol-w19gk/buck-f9sqw/snap-ae5or 2023-08-06 06:35:37,937 ERROR [SstFilteringService#0]-org.apache.hadoop.ozone.om.SstFilteringService: Error during Snapshot sst filtering FILE_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Unable to load snapshot. Snapshot with table key '/vol-w19gk/buck-f9sqw/snap-ae5or' is no longer active
Attachments
Attachments
Issue Links
- links to