Description
Test TableSnapshotReadsMapReduceIT.testMapReduceSnapshotsMultiRegion sometimes fails to delete snapshot due to snapshot corruption.
ERROR [MASTER_SNAPSHOT_OPERATIONS-master/e09b7a102100:0-0] org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier(187): Regions moved during the snapshot '{ ss=FOO table=N000006 type=FLUSH }'. expected=2 snapshotted=3.ERROR [MASTER_SNAPSHOT_OPERATIONS-master/e09b7a102100:0-0] org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier(187): Regions moved during the snapshot '{ ss=FOO table=N000006 type=FLUSH }'. expected=2 snapshotted=3.2020-12-18 01:18:45,532 ERROR [MASTER_SNAPSHOT_OPERATIONS-master/e09b7a102100:0-0] org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler(222): Failed taking snapshot { ss=FOO table=N000006 type=FLUSH } due to exception:Regions moved during the snapshot '{ ss=FOO table=N000006 type=FLUSH }'. expected=2 snapshotted=3.org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Regions moved during the snapshot '{ ss=FOO table=N000006 type=FLUSH }'. expected=2 snapshotted=3. at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifyRegions(MasterSnapshotVerifier.java:205) at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshot(MasterSnapshotVerifier.java:119) at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.process(TakeSnapshotHandler.java:209) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Due to this, deletion of snapshot fails:
org.apache.hadoop.hbase.snapshot.SnapshotDoesNotExistException: Snapshot 'FOO' doesn't exist on the filesystem
at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:312)
at org.apache.hadoop.hbase.master.MasterRpcServices.deleteSnapshot(MasterRpcServices.java:694)
at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
Moreover, since the test is all about dealing with multi regions, which are generated by splitting, as per test logs, apparently table is not getting split correctly:
org.apache.hadoop.hbase.regionserver.StoreUtils(123): cannot split hdfs://localhost:37149/user/jenkins/test-data/b2a954c8-abb9-5528-bca3-27fcaaab735c/data/default/N000006/903675aa1a12e58bb247916dd7d492dd/0/b4bdabddce254ea5a9fa3ca432e2bca3 because midkey is the same as first or last row