Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-24527

Improve region housekeeping status observability

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Implemented
    • None
    • None
    • Admin, Compaction, Operability, shell, UI
    • None

    Description

      We provide a coarse grained admin API and associated shell command for determining the compaction status of a table:

      hbase(main):001:0> help "compaction_state"
      Here is some help for this command:
           Gets compaction status (MAJOR, MAJOR_AND_MINOR, MINOR, NONE) for a table:
           hbase> compaction_state 'ns1:t1'
           hbase> compaction_state 't1'
      

      We also log compaction activity, including a compaction journal at completion, via log4j to whatever log aggregation solution is available in production.

      This is not sufficient for online and interactive observation, debugging, or performance analysis of current compaction activity. In this kind of activity an operator is attempting to observe and analyze compaction activity in real time. Log aggregation and presentation solutions have typical latencies (end to end visibility of log lines on the order of ~minutes) which make that not possible today.

      We don't offer any API or tools for directly interrogating split and merge activity in real time. Some indirect knowledge of split or merge activity can be inferred from RIT information via ClusterStatus. It can also be scraped, with some difficulty, from the debug servlet.

      We should have new APIs and shell commands, and perhaps also new admin UI views, for

      at regionserver scope:

      • listing the current state of a regionserver's compaction, split, and merge tasks and threads
      • counting (simple view) and listing (detailed view) a regionserver's compaction queues
      • listing a region's currently compacting, splitting, or merging status

      at master scope, aggregations of the above detailed information into:

      • listing the active compaction tasks and threads for a given table, the extension of compaction_state with a new detailed view
      • listing the active split or merge tasks and threads for a given table's regions

      Compaction detail should include the names of the effective engine and policy classes, and the results and timestamp of the last compaction selection evaluation. Split and merge detail should include the names of the effective policy classes and the result of the last split or merge criteria evaluation.

      Attachments

        Activity

          busbey , FYI companion to HBASE-24528 but for another aspect of operations needing online tools (IMHO)

          apurtell Andrew Kyle Purtell added a comment - busbey , FYI companion to HBASE-24528 but for another aspect of operations needing online tools (IMHO)
          busbey Sean Busbey added a comment -

          subscribed; thanks for the heads-up

          busbey Sean Busbey added a comment - subscribed; thanks for the heads-up
          vjasani Viraj Jasani added a comment -

          at regionserver scope:

          • listing the current state of a regionserver's compaction, split, and merge tasks and threads
          • counting (simple view) and listing (detailed view) a regionserver's compaction queues
          • listing a region's currently compacting, splitting, or merging status

          at master scope, aggregations of the above detailed information into:

          • listing the active compaction tasks and threads for a given table, the extension of compaction_state with a new detailed view
          • listing the active split or merge tasks and threads for a given table's regions

          Among the scopes listed here, from operator's viewpoint, master scope seems more relevant because usually we would want to know what is going on with regions of the table we are interested in. 

          For regionserver scope, if we store all region tasks and thread info at regionserver, perhaps we should not allow client to query all RS and aggregate results because each RS might have accommodated many region tasks related info, only one RS should be queried for detailed view of a region at a time.

          Master scope can provide table -> regions (with RS and current state) mapping, and operator can query specific RS for detailed view of a region. On the other hand, querying all RS with filtered table/regions might require too many RPC calls from client (which, operator is more likely to keep repeating until all regions come to intended states). Hence, basically both of above scopes, when used together, might provide better results (with likely optimal performance).

          Thought?

          vjasani Viraj Jasani added a comment - at regionserver scope: listing the current state of a regionserver's compaction, split, and merge tasks and threads counting (simple view) and listing (detailed view) a regionserver's compaction queues listing a region's currently compacting, splitting, or merging status at master scope, aggregations of the above detailed information into: listing the active compaction tasks and threads for a given table, the extension of  compaction_state  with a new detailed view listing the active split or merge tasks and threads for a given table's regions Among the scopes listed here, from operator's viewpoint, master scope seems more relevant because usually we would want to know what is going on with regions of the table we are interested in.  For regionserver scope, if we store all region tasks and thread info at regionserver, perhaps we should not allow client to query all RS and aggregate results because each RS might have accommodated many region tasks related info, only one RS should be queried for detailed view of a region at a time. Master scope can provide table -> regions (with RS and current state) mapping, and operator can query specific RS for detailed view of a region. On the other hand, querying all RS with filtered table/regions might require too many RPC calls from client (which, operator is more likely to keep repeating until all regions come to intended states). Hence, basically both of above scopes, when used together, might provide better results (with likely optimal performance). Thought?

          I ended up addressing this simply with the new shell support for status "tasks" and its backing support in ClusterStatus, exposing a recent snapshot of per server MonitoredTasks to the admin user. It's not a polished summary presentation but that seems not required. Reopen or file a follow up if desired.

          apurtell Andrew Kyle Purtell added a comment - I ended up addressing this simply with the new shell support for status "tasks" and its backing support in ClusterStatus, exposing a recent snapshot of per server MonitoredTasks to the admin user. It's not a polished summary presentation but that seems not required. Reopen or file a follow up if desired.

          People

            Unassigned Unassigned
            apurtell Andrew Kyle Purtell
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: