Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
0.8.0
-
None
Description
Running a workload that deletes and creates a new table with hundreds of tablets every few hours, I encounter this issue where if I haven't restarted the cluster in a while it'll take tens of minutes to process all the deleted tablets. The logs for a single tablet looks like:
I0511 19:03:52.380903 79512 ts_tablet_manager.cc:609] Loading metadata for tablet 28f7ae54ac1d413b8a0e694e1dcef0fc I0511 19:03:52.391885 79512 ts_tablet_manager.cc:937] T 28f7ae54ac1d413b8a0e694e1dcef0fc P d87c4ff7b7124cf8839940b71ed1704d: Tablet Manager startup: Rolling forward tablet deletion of type TABLET_DATA_DELETED I0511 19:03:52.391894 79512 ts_tablet_manager.cc:964] T 28f7ae54ac1d413b8a0e694e1dcef0fc P d87c4ff7b7124cf8839940b71ed1704d: Deleting tablet data with delete state TABLET_DATA_DELETED I0511 19:03:52.497952 79512 ts_tablet_manager.cc:974] T 28f7ae54ac1d413b8a0e694e1dcef0fc P d87c4ff7b7124cf8839940b71ed1704d: Tablet deleted. Last logged OpId: 406.65248 I0511 19:03:52.498010 79512 ts_tablet_manager.cc:946] T 28f7ae54ac1d413b8a0e694e1dcef0fc P d87c4ff7b7124cf8839940b71ed1704d: Deleting tablet superblock
In my latest instance of this problem, running on 43c9c87604f3b6f3dd286c63344bf18a2db08c21, it took almost 20 minutes to process 18k deleted tablets... then the TS can start bootstrapping.