Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: master, metrics, regionserver
    • Labels:
      None

      Description

      For debugging and analysis purposes it will be useful to understand regions' lifecycle, how it is created ( from which parent region, for example), how it is splitted, assigned, etc. Some of these info are in the logs, hbase .META. table, zookeeper, metrics. Certain history data is lost; for example, the states will be removed from zookeeper /hbase/unassigned once the region is assigned; also .META. table has max version of 10 thus only tracks the last 10 RS assignments of a given region. It will be nice to put it a central place. It can provide:

      1. How applications use hbase. For example, it might create large number of regions in a short period of time and drop the table later.
      2. How HBase internally manage regions such as how regions are splitted, assigned, turned offline, etc.

      Things to track
      1. How it is created, parent region in the case of split.
      2. Region tranisition process such as region state change, region server change.

      One idea is to put such transition history data to zookeeper. One issue is it could blow up zookeeper memory if we have large number of regions and the cluster runs for a long time. I would like to get your feedback on different approaches to address the issue. One assumption is region assignment doesn't happen with high frequency and thus the overhead introduced won't have much impact on the system performance.

      Approach 1:

      Zookeeper knows the history of how /hbase/unassigned is modified, if we can get zookeeper's logs (Bookkeeper ? ) somehow, we know the history of region transition.

      Approach 2:

      1. HBase logs extra region transition data to zookeeper. It could be one zookeeper node per transaction.
      2. Have a separate thread on the Master to move data from zookeeper and append to HDFS. That will keep the zookeeper size in check.
      3. Have some tool or web UI to show the history of a given region by looking at zookeeper and HDFS.

        Activity

        Ming Ma created issue -
        Hide
        stack added a comment -

        A long time ago we had a history column in .META. It tried to note each transition a region went through. The history of a region was kept in its row up in .META. It was a kinda nice feature. It was also a super pain at the same time. We had lots of issues around regionservers trying to update history in .META. though .META. was gone and we couldn't do full history; e.g. the close of a region on a cluster shutdown. There may have been deadlocks too around updating history while trying to do edits in .META. but my memory may not be serving me right here. In the end we stripped the feature out because it was more trouble that it was worth.

        That said, I think this would be good to have. The natural place to do this stuff would be in a table inside hbase I'd think. But then what to do if this table is not online or if we are shutting down the cluster and you want to log region close?

        Show
        stack added a comment - A long time ago we had a history column in .META. It tried to note each transition a region went through. The history of a region was kept in its row up in .META. It was a kinda nice feature. It was also a super pain at the same time. We had lots of issues around regionservers trying to update history in .META. though .META. was gone and we couldn't do full history; e.g. the close of a region on a cluster shutdown. There may have been deadlocks too around updating history while trying to do edits in .META. but my memory may not be serving me right here. In the end we stripped the feature out because it was more trouble that it was worth. That said, I think this would be good to have. The natural place to do this stuff would be in a table inside hbase I'd think. But then what to do if this table is not online or if we are shutting down the cluster and you want to log region close?
        Hide
        Andrew Purtell added a comment -

        There may have been deadlocks too around updating history while trying to do edits in .META. but my memory may not be serving me right here

        Yes.

        The natural place to do this stuff would be in a table inside hbase I'd think.

        The mistake we made last time IMHO was making region historian updating synchronous with the transitions. If we instead log the transitions to a table in a background thread (executor?) with best effort, the result could be viable.

        Show
        Andrew Purtell added a comment - There may have been deadlocks too around updating history while trying to do edits in .META. but my memory may not be serving me right here Yes. The natural place to do this stuff would be in a table inside hbase I'd think. The mistake we made last time IMHO was making region historian updating synchronous with the transitions. If we instead log the transitions to a table in a background thread (executor?) with best effort, the result could be viable.
        Hide
        Ming Ma added a comment -

        Thanks, Stack, Andy. Writing the data to "RegionHistory" table in HBASE sounds a good idea. The key point is to make it async as Andy said, or to handle situation when "RegionHistory" isn't available.

        1. Track the regions of "RegionHistory". When the regions of "RegionHistory" are moved around, the write to "RegionHistory" won't work.
        2. Track the regions of "ROOT" and ".META.". Ideally we would like to track all regions including those for "ROOT", ".META.". In the case of cluster startup, "RegionHistory" will be available after "ROOT", ".META.".

        So to make it work:

        1. Make the logging async.
        2. If we want to keep every entry even in the case of error like master failover, make the logging reliable. For example, persist the data to zookeeper or HDFS as buffer when "RegionHistory" isn't available.

        We could also log it to another hbase cluster. But that will create operational overheads, unless it can be combined with other metrics, logging scenarios ( like OpenTSDB ).

        Show
        Ming Ma added a comment - Thanks, Stack, Andy. Writing the data to "RegionHistory" table in HBASE sounds a good idea. The key point is to make it async as Andy said, or to handle situation when "RegionHistory" isn't available. 1. Track the regions of "RegionHistory". When the regions of "RegionHistory" are moved around, the write to "RegionHistory" won't work. 2. Track the regions of " ROOT " and ".META.". Ideally we would like to track all regions including those for " ROOT ", ".META.". In the case of cluster startup, "RegionHistory" will be available after " ROOT ", ".META.". So to make it work: 1. Make the logging async. 2. If we want to keep every entry even in the case of error like master failover, make the logging reliable. For example, persist the data to zookeeper or HDFS as buffer when "RegionHistory" isn't available. We could also log it to another hbase cluster. But that will create operational overheads, unless it can be combined with other metrics, logging scenarios ( like OpenTSDB ).
        Hide
        Todd Lipcon added a comment -

        This sounds useful, but seems best to put it behind an interface. I can imagine some people might want to just log to text files on HDFS for later analysis, or even to use an existing log4j-based infrastructure. Maybe the initial implementation could just use a separate log4j category?

        Show
        Todd Lipcon added a comment - This sounds useful, but seems best to put it behind an interface. I can imagine some people might want to just log to text files on HDFS for later analysis, or even to use an existing log4j-based infrastructure. Maybe the initial implementation could just use a separate log4j category?
        Hide
        Ming Ma added a comment -

        Thanks, Todd. Yes, interface is good to abstract various implementations.

        I was about to open a separate jira "dynamic metrics logging" for a more general strutured data logging infracture, something useful to collect hbase/mapreduce/hdfs dynamic metrics which aren't predefined and could change over time. It seems like "region transaction history" could an application for that system.

        Show
        Ming Ma added a comment - Thanks, Todd. Yes, interface is good to abstract various implementations. I was about to open a separate jira "dynamic metrics logging" for a more general strutured data logging infracture, something useful to collect hbase/mapreduce/hdfs dynamic metrics which aren't predefined and could change over time. It seems like "region transaction history" could an application for that system.

          People

          • Assignee:
            Ming Ma
            Reporter:
            Ming Ma
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development