Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
hbase-filesystem-1.0.0-alpha1
-
None
-
Reviewed
Description
We ran into a fun situation where the partition hosting ZK data was repeatedly filling up while heavy ExportSnapshot+clone_snapshot operations were running (10's of TB). The cluster was previously working just fine.
Upon investigation of the ZK tree, we found a large number of znodes beneath /hboss, specifically many in the corresponding ZK HBOSS path for $hbase.rootdir/.tmp.
Tracing back from the code, we saw that the CloneSnapshotProcedure (like CreateTableProcedure) will create the table filesystem layout in $hbase.rootdir/.tmp and then rename it into $hbase.rootdir/data/<namespace>. However, it appears that, upon rename, HBOSS was not cleaning up the src path's znode. This is a bug as it allows ZK to grow unbounded (which explains why this problem slowly arose and not suddenly).
As a workaround, HBase can be stopped and the corresponding ZK path for $hbase.rootdir/.tmp can be cleaned up to reclaim 1/2 the space taken up by znodes for imported hbase tables (we would still have znodes for $hbase.rootdir/data/...)
Attachments
Issue Links
- relates to
-
HBASE-26461 [hboss] Delete self lock without orphaning znode
- Open
- Testing discovered
-
HBASE-26453 [hboss] removeInMemoryLocks can remove still in-use locks
- Resolved
- links to