Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
2.4.0, 2.5.0, 2.6.0, 3.0.0-alpha-4, 4.0.0-alpha-1
-
None
-
Added a new property "hbase.normalizer.merge.merge_request_max_number_of_regions" to limit the max number of region to be processed for merge request in a single merge normalisation. Defaults to 100
Description
In our production environment, while investigating an issue, we observed that the Noramlizer had scheduled one single merge procedure to an RS providing 27K+ empty regions of a table (this was a result of a failed copy table job that left 27K+ empty regions of the table) to merge.
This action led the procedure to go to stuck state and eventually the procedure framework bailed out after ~40mins. This was happening with each normalizer run until we deleted the table manually.
Logs
Normalizer triggers a merge procedure
normalizer.RegionNormalizerWorker - NormalizationTarget[regionInfo=\{ENCODED => 6e8606335a62f6bafceb017dc7edfdf5, NAME => 'TEST.TEST_TABLE,XXXX.', STARTKEY => 'XXXX', ENDKEY => 'YYYY'},{*}regionSizeMb=0{*}], NormalizationTarget[regionInfo=\{ENCODED => 79607df308d7618e632abe8a12c1bf6b, NAME => 'TEST.TEST_TABLE,XXXX', STARTKEY => 'XXYY', ENDKEY => 'YYZZ'},{*}regionSizeMb=0]]] resulting in *pid 21968356
procedure immediately gets stuck
procedure2.ProcedureExecutor - Worker stuck PEWorker-56(pid=21968356), run time 12.4850 sec
Finally fails after ~40 mins
procedure2.ProcedureExecutor - Worker stuck PEWorker-56(pid=21968356), run time 40 mins, 58.055 sec
Bails out with RuntimeException
procedure2.ProcedureExecutor - force=false
java.lang.UnsupportedOperationException: pid=21968356, state=FAILED:MERGE_TABLE_REGIONS_UPDATE_META, locked=true, exception=java.lang.RuntimeException via CODE-BUG: Uncaught runtime exception: pid=21968356, state=RUNNABLE:MERGE_TABLE_REGIONS_UPDATE_META, locked=true; MergeTableRegionsProcedure table=TEST.TEST_TABLEXXXX, regions=[269a1b168af497cce9ba6d3d581568f2
.
.
.
.
27K+ regions printed here]
Attachments
Issue Links
- causes
-
HBASE-28126 TestSimpleRegionNormalizer fails 100% of times on flaky dashboard
- Resolved
- links to