Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28068

Add hbase.normalizer.merge.merge_request_max_number_of_regions property to limit max number of regions in a merge request for merge normalization

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.4.0, 2.5.0, 2.6.0, 3.0.0-alpha-4, 4.0.0-alpha-1
    • 2.6.0, 2.4.18, 2.5.6, 3.0.0-beta-1
    • Normalizer
    • None
    • Added a new property "hbase.normalizer.merge.merge_request_max_number_of_regions" to limit the max number of region to be processed for merge request in a single merge normalisation. Defaults to 100

    Description

      In our production environment, while investigating an issue, we observed that the Noramlizer had scheduled one single merge procedure to an RS providing 27K+ empty regions of a table (this was a result of a failed copy table job that left 27K+ empty regions of the table) to merge.

      This action led the procedure to go to stuck state and eventually the procedure framework bailed out after ~40mins. This was happening with each normalizer run until we deleted the table manually.

      Logs

      Normalizer triggers a merge procedure

      normalizer.RegionNormalizerWorker - NormalizationTarget[regionInfo=\{ENCODED => 6e8606335a62f6bafceb017dc7edfdf5, NAME => 'TEST.TEST_TABLE,XXXX.', STARTKEY => 'XXXX', ENDKEY => 'YYYY'},{*}regionSizeMb=0{*}], NormalizationTarget[regionInfo=\{ENCODED => 79607df308d7618e632abe8a12c1bf6b, NAME => 'TEST.TEST_TABLE,XXXX', STARTKEY => 'XXYY', ENDKEY => 'YYZZ'},{*}regionSizeMb=0]]] resulting in *pid 21968356

      procedure immediately gets stuck

      procedure2.ProcedureExecutor - Worker stuck PEWorker-56(pid=21968356), run time 12.4850 sec

      Finally fails after ~40 mins

      procedure2.ProcedureExecutor - Worker stuck PEWorker-56(pid=21968356), run time 40 mins, 58.055 sec

      Bails out with RuntimeException

      procedure2.ProcedureExecutor - force=false
      java.lang.UnsupportedOperationException: pid=21968356, state=FAILED:MERGE_TABLE_REGIONS_UPDATE_META, locked=true, exception=java.lang.RuntimeException via CODE-BUG: Uncaught runtime exception: pid=21968356, state=RUNNABLE:MERGE_TABLE_REGIONS_UPDATE_META, locked=true; MergeTableRegionsProcedure table=TEST.TEST_TABLEXXXX, regions=[269a1b168af497cce9ba6d3d581568f2
      .
      .
      .
      .
      27K+ regions printed here]

      Attachments

        Activity

          People

            rkrahul324 Rahul Kumar
            rvaleti Ravi Kishore Valeti
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: