Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Summarized block-cache report for a RegionServer would be helpful. For example ...

      table1
      cf1 100 blocks, totalBytes=yyyyy, averageTimeInCache=XXXX hours
      cf2 200 blocks, totalBytes=zzzzz, averageTimeInCache=XXXX hours

      table2
      cf1 75 blocks, totalBytes=yyyyy, averageTimeInCache=XXXX hours
      cf2 150 blocks, totalBytes=zzzzz, averageTimeInCache=XXXX hours

      ... Etc.

      The current metrics list blockCacheSize and blockCacheFree, but there is no way to know what's in there. Any single block isn't really important, but the patterns of what CF/Table they came from, how big are they, and how long (on average) they've been in the cache, are important.

      No such interface exists in HRegionInterface. But I think it would be helpful from an operational perspective.

      Updated (7-29): Removing suggestion for UI. I would be happy just to get this report on a configured interval dumped to a log file.

      1. blockCache summary - web UI Sub-task Reopened Unassigned
       

        Activity

        Hide
        Doug Meil added a comment -

        Unless somebody already did the UI showing the details of the block cache report (by table, etc.) then it's not done.

        Show
        Doug Meil added a comment - Unless somebody already did the UI showing the details of the block cache report (by table, etc.) then it's not done.
        Hide
        Lars Hofhansl added a comment -

        Maybe I closed the sub task in error. Looking at the Web UI, I can see block cache statistics. But this is different. Should I reopen?

        Show
        Lars Hofhansl added a comment - Maybe I closed the sub task in error. Looking at the Web UI, I can see block cache statistics. But this is different. Should I reopen?
        Hide
        Doug Meil added a comment -

        Realistically, I don't think I can do the front-end work.

        Show
        Doug Meil added a comment - Realistically, I don't think I can do the front-end work.
        Hide
        Doug Meil added a comment -

        The sub-task HBASE-4200 was closed as a dup, what was the ticket that implemented that UI?

        Show
        Doug Meil added a comment - The sub-task HBASE-4200 was closed as a dup, what was the ticket that implemented that UI?
        Hide
        Lars Hofhansl added a comment -

        Doug, are you planning on finishing this?

        Show
        Lars Hofhansl added a comment - Doug, are you planning on finishing this?
        Hide
        Doug Meil added a comment -

        Sure thing. I didn't want it in the loop obviously, but it didn't seem like it was important enough to be permanently in memory in the block cache (although it's pretty small).

        Show
        Doug Meil added a comment - Sure thing. I didn't want it in the loop obviously, but it didn't seem like it was important enough to be permanently in memory in the block cache (although it's pretty small).
        Hide
        Jean-Daniel Cryans added a comment -

        Move this out of the method:

        final String pattern = "\\" + HFile.CACHE_KEY_SEPARATOR;
        
        Show
        Jean-Daniel Cryans added a comment - Move this out of the method: final String pattern = "\\" + HFile.CACHE_KEY_SEPARATOR;
        Hide
        Doug Meil added a comment -

        Thanks! I'll make the changes.

        re: "default constructor"
        Yep, you said that already. Doh! Although in my doh-fense I hadn't gotten to the RS-API yet.

        re: "Shouldn't the pattern in getBlockCacheSummary be a member of the class instead? (and still final)"
        I'm not sure I know what you mean...

        Show
        Doug Meil added a comment - Thanks! I'll make the changes. re: "default constructor" Yep, you said that already. Doh! Although in my doh-fense I hadn't gotten to the RS-API yet. re: "Shouldn't the pattern in getBlockCacheSummary be a member of the class instead? (and still final)" I'm not sure I know what you mean...
        Hide
        Jean-Daniel Cryans added a comment -

        Some comments on the patch:

        • Writables need a default constructor! See my 01/Aug/11 18:03 comment
        • About the style, I'd prefer you don't put extra white spaces like in this snippet:
          arg0.writeUTF( table );
          
        • Also pay attention to those lines > 80 chars.
        • Shouldn't the pattern in getBlockCacheSummary be a member of the class instead? (and still final)
        • Try to set the size of containers when you already know it, like at the end of getBlockCacheSummary
        Show
        Jean-Daniel Cryans added a comment - Some comments on the patch: Writables need a default constructor! See my 01/Aug/11 18:03 comment About the style, I'd prefer you don't put extra white spaces like in this snippet: arg0.writeUTF( table ); Also pay attention to those lines > 80 chars. Shouldn't the pattern in getBlockCacheSummary be a member of the class instead? (and still final) Try to set the size of containers when you already know it, like at the end of getBlockCacheSummary
        Hide
        Doug Meil added a comment -

        Yep, I logged HBASE-4147 and did that writeup too. Summarizing and periodically outputting data is a theme. I think that writing these summaries back into an hbase table is a good idea. I personally don't think this kind of data fits well into what are currently called 'metrics' which is much higher level.

        Show
        Doug Meil added a comment - Yep, I logged HBASE-4147 and did that writeup too. Summarizing and periodically outputting data is a theme. I think that writing these summaries back into an hbase table is a good idea. I personally don't think this kind of data fits well into what are currently called 'metrics' which is much higher level.
        Hide
        Ming Ma added a comment -

        useful doc, Doug.

        It seems like this one, https://issues.apache.org/jira/browse/HBASE-4147, https://issues.apache.org/jira/browse/HBASE-4145 need some common infrastructure to log and analyze structured data.

        1. RS Web UI is useful. But that only provides the most recent value.

        2. As you mentioned in the doc, we can create a static metric for each combination of table and CF. That could end up with lots of metrics. Might not be ideal.

        3. How we plan to analyze the data is an important factor for the design.
        a. Is there a latency requirement? In a production system, it is better to get these reports sooner than later.
        b. Is it easy to do query and analysis on the data?, e.g., aggregate, max, etc.

        4. Some ideas along the line of custom output
        a. Can the log data be asynchronously uploaded to a special table in hbase? It might be a bit strange to upload data back to hbase. However, for performance, we can partition the special table into regions so that each region is colocated on the same RS where the log is generated; no automatic compaction, split, load balancing on the table.
        b. Upload the log to HDFS periodically. Run map reduce jobs to mine the data with a customized inputformat. This might be ok if there is no strong latency requirement.

        Show
        Ming Ma added a comment - useful doc, Doug. It seems like this one, https://issues.apache.org/jira/browse/HBASE-4147 , https://issues.apache.org/jira/browse/HBASE-4145 need some common infrastructure to log and analyze structured data. 1. RS Web UI is useful. But that only provides the most recent value. 2. As you mentioned in the doc, we can create a static metric for each combination of table and CF. That could end up with lots of metrics. Might not be ideal. 3. How we plan to analyze the data is an important factor for the design. a. Is there a latency requirement? In a production system, it is better to get these reports sooner than later. b. Is it easy to do query and analysis on the data?, e.g., aggregate, max, etc. 4. Some ideas along the line of custom output a. Can the log data be asynchronously uploaded to a special table in hbase? It might be a bit strange to upload data back to hbase. However, for performance, we can partition the special table into regions so that each region is colocated on the same RS where the log is generated; no automatic compaction, split, load balancing on the table. b. Upload the log to HDFS periodically. Run map reduce jobs to mine the data with a customized inputformat. This might be ok if there is no strong latency requirement.
        Hide
        Doug Meil added a comment -

        Hey JD/Stack, can you guys sniff this patch? This is only a checkpoint - not the final product - but it actually works (I have a hacked up unit test that isn't in this patch). What I hope to be the hard-part hopefully (i.e., the summarization) is done, now it needs to be added to the RS API.

        Show
        Doug Meil added a comment - Hey JD/Stack, can you guys sniff this patch? This is only a checkpoint - not the final product - but it actually works (I have a hacked up unit test that isn't in this patch). What I hope to be the hard-part hopefully (i.e., the summarization) is done, now it needs to be added to the RS API.
        Doug Meil made changes -
        Hide
        Doug Meil added a comment -

        Ahhh.. ok, I get it. This... "8351478435190657655_0" ... makes sense now.

        Since the first part is a StoreFile, I can use directory paths to figure out which CF, which region, and which table this StoreFile belongs to because the directory structure is /table/region/cf/storefile. I'll basically construct a Map where the key is the hfileName (aka StoreFile) and the value is an object that contains table/cf (since that is the level that the report needs to roll up to).

        I'm currently not aware of any utility that has this kind of lookup, but I think I now understand how to build it.

        Show
        Doug Meil added a comment - Ahhh.. ok, I get it. This... "8351478435190657655_0" ... makes sense now. Since the first part is a StoreFile, I can use directory paths to figure out which CF, which region, and which table this StoreFile belongs to because the directory structure is /table/region/cf/storefile. I'll basically construct a Map where the key is the hfileName (aka StoreFile) and the value is an object that contains table/cf (since that is the level that the report needs to roll up to). I'm currently not aware of any utility that has this kind of lookup, but I think I now understand how to build it.
        Hide
        stack added a comment -

        I'd say the name just has to be guaranteed unique... short would be nice too.

        Currently name is made in here for both v1 and v2 hfiles:

          public static String getBlockCacheKey(String hfileName, long offset) {
            return hfileName + CACHE_KEY_SEPARATOR + offset;
          }
        

        The hfilename seems to depend on the fact storefile names are unique across hbase. They are made here using the 'name' part of the full storefile Path (from the AbstractHFileReader constructor):

        this.name = path.getName();

        Show
        stack added a comment - I'd say the name just has to be guaranteed unique... short would be nice too. Currently name is made in here for both v1 and v2 hfiles: public static String getBlockCacheKey( String hfileName, long offset) { return hfileName + CACHE_KEY_SEPARATOR + offset; } The hfilename seems to depend on the fact storefile names are unique across hbase. They are made here using the 'name' part of the full storefile Path (from the AbstractHFileReader constructor): this.name = path.getName();
        Hide
        Doug Meil added a comment -

        Let me back up a bit on this... what is the contract on the 'name' attribute in CachedBlock (the value class of the internal block cache map)? What is that supposed to be?

        Show
        Doug Meil added a comment - Let me back up a bit on this... what is the contract on the 'name' attribute in CachedBlock (the value class of the internal block cache map)? What is that supposed to be?
        Hide
        stack added a comment -

        Tell us more about this reverse lookup.... whats that going to look like? Are you sure what you are seeing is not a filename plus an offset?

        Show
        stack added a comment - Tell us more about this reverse lookup.... whats that going to look like? Are you sure what you are seeing is not a filename plus an offset?
        Hide
        Doug Meil added a comment -

        Hmmm.. started implementation and have been running other unit tests and getting more information on blockNames in the cache, and here's what they look like now: "8351478435190657655_0". That looks a lot more like a "block" than what I had documented in the writeup, but unfortunately I have nothing else to go on. Based on what I see, I need to do a reverse lookup in the catalog for the containing StoreFile (to hopefully get full-path, where I can get table/CF).

        Anybody know any easy way to do that?

        Show
        Doug Meil added a comment - Hmmm.. started implementation and have been running other unit tests and getting more information on blockNames in the cache, and here's what they look like now: "8351478435190657655_0". That looks a lot more like a "block" than what I had documented in the writeup, but unfortunately I have nothing else to go on. Based on what I see, I need to do a reverse lookup in the catalog for the containing StoreFile (to hopefully get full-path, where I can get table/CF). Anybody know any easy way to do that?
        Doug Meil made changes -
        Assignee Doug Meil [ dmeil ]
        Hide
        Jean-Daniel Cryans added a comment -

        +1

        Show
        Jean-Daniel Cryans added a comment - +1
        Hide
        Doug Meil added a comment -

        Ok, how about this for the next course of action:

        Subtask #1: implement basic summary in BlockCache/LruBlockCache, RS-API change, web-UI.

        Subtask #2: this is the "output to X" part of the ticket.

        I'll start with #1, hold off on #2 for now.

        Does that sound reasonable?

        Show
        Doug Meil added a comment - Ok, how about this for the next course of action: Subtask #1: implement basic summary in BlockCache/LruBlockCache, RS-API change, web-UI. Subtask #2: this is the "output to X" part of the ticket. I'll start with #1, hold off on #2 for now. Does that sound reasonable?
        Hide
        Andrew Purtell added a comment -

        Does that mean that BlockCacheSummary returned from BlockCache should implement writable, or is there another class that represents the BlockCacheSummary that implements Writable that has the same information

        Objects sent over RPC implement Writable directly, so by that convention BlockCacheSummary should implement Writable.

        Show
        Andrew Purtell added a comment - Does that mean that BlockCacheSummary returned from BlockCache should implement writable, or is there another class that represents the BlockCacheSummary that implements Writable that has the same information Objects sent over RPC implement Writable directly, so by that convention BlockCacheSummary should implement Writable.
        Hide
        Doug Meil added a comment -

        Thanks JD. I included the JMX/Ganglia/et al. because it came up as suggestions in the dist-list, but I really didn't see how it would fit with this type of usage reporting. I'm glad you came to the same conclusion!

        Show
        Doug Meil added a comment - Thanks JD. I included the JMX/Ganglia/et al. because it came up as suggestions in the dist-list, but I really didn't see how it would fit with this type of usage reporting. I'm glad you came to the same conclusion!
        Hide
        Jean-Daniel Cryans added a comment -

        Nice document Doug, it puts everyone else to shame

        I don't think we can expose those metrics through JMX/Ganglia/OpenTSDB as they will be changing a lot. It would be "doable" only if the regions and families never changed IMO. I'd prefer we concentrate on presenting this information from inside HBase.

        In the nice to haves I'd like to see:

        • Number of accesses/misses per block or family (could see what's hot, well cached, etc)
        • Total size of the family on disk (then you can tell what portion of the dataset you cached)

        Regarding the Writable question, you have to do that because it's required by Hadoop RPC. Since you are adding new infos, you'll have to implement it. Don't forget the default constructor!

        For the web UI, what about making the region name clickable?

        Show
        Jean-Daniel Cryans added a comment - Nice document Doug, it puts everyone else to shame I don't think we can expose those metrics through JMX/Ganglia/OpenTSDB as they will be changing a lot. It would be "doable" only if the regions and families never changed IMO. I'd prefer we concentrate on presenting this information from inside HBase. In the nice to haves I'd like to see: Number of accesses/misses per block or family (could see what's hot, well cached, etc) Total size of the family on disk (then you can tell what portion of the dataset you cached) Regarding the Writable question, you have to do that because it's required by Hadoop RPC. Since you are adding new infos, you'll have to implement it. Don't forget the default constructor! For the web UI, what about making the region name clickable?
        Doug Meil made changes -
        Attachment hbase_4089_blockcachereport.pdf [ 12488408 ]
        Hide
        Doug Meil added a comment -

        Adding writeup of use-cases and 1st-pass general design.

        Show
        Doug Meil added a comment - Adding writeup of use-cases and 1st-pass general design.
        Hide
        Doug Meil added a comment -

        If this approach is acceptable, probably should add this to the BlockCache interface. This is how the block cache is accessed.

        Show
        Doug Meil added a comment - If this approach is acceptable, probably should add this to the BlockCache interface. This is how the block cache is accessed.
        Hide
        Doug Meil added a comment -

        Regarding dumping the summary report to the log, I think exposing a public 'printSummary' (logSummary?) method on LruBlockCache would do it. Another thread can take care of the scheduling on how often the block cache summary should be run.

        Show
        Doug Meil added a comment - Regarding dumping the summary report to the log, I think exposing a public 'printSummary' (logSummary?) method on LruBlockCache would do it. Another thread can take care of the scheduling on how often the block cache summary should be run.
        Doug Meil made changes -
        Description A UI that would display a block-cache report for a RegionServer would be helpful. For example ...

        table1
          cf1 100 blocks, totalBytes=yyyyy, averageTimeInCache=XXXX hours
          cf2 200 blocks, totalBytes=zzzzz, averageTimeInCache=XXXX hours

        table2
          cf1 75 blocks, totalBytes=yyyyy, averageTimeInCache=XXXX hours
          cf2 150 blocks, totalBytes=zzzzz, averageTimeInCache=XXXX hours

        ... Etc.

        The current metrics list blockCacheSize and blockCacheFree, but there is no way to know what's in there. Any single block isn't really important, but the patterns of what CF/Table they came from, how big are they, and how long (on average) they've been in the cache, are important.

        No such interface exists in HRegionInterface, so this is not just a UI request also an API change. But I think it would be helpful from an operational perspective.
        Summarized block-cache report for a RegionServer would be helpful. For example ...

        table1
          cf1 100 blocks, totalBytes=yyyyy, averageTimeInCache=XXXX hours
          cf2 200 blocks, totalBytes=zzzzz, averageTimeInCache=XXXX hours

        table2
          cf1 75 blocks, totalBytes=yyyyy, averageTimeInCache=XXXX hours
          cf2 150 blocks, totalBytes=zzzzz, averageTimeInCache=XXXX hours

        ... Etc.

        The current metrics list blockCacheSize and blockCacheFree, but there is no way to know what's in there. Any single block isn't really important, but the patterns of what CF/Table they came from, how big are they, and how long (on average) they've been in the cache, are important.

        No such interface exists in HRegionInterface. But I think it would be helpful from an operational perspective.

        Updated (7-29): Removing suggestion for UI. I would be happy just to get this report on a configured interval dumped to a log file.

        Doug Meil made changes -
        Field Original Value New Value
        Summary UI for blockCache contents report blockCache contents report
        Description
        A UI that would display a block-cache report for a RegionServer would be helpful. For example ...

        table1
          cf1 100 blocks, totalBytes=yyyyy, averageTimeInCache=XXXX hours
          cf2 200 blocks, totalBytes=zzzzz, averageTimeInCache=XXXX hours

        table2
          cf1 75 blocks, totalBytes=yyyyy, averageTimeInCache=XXXX hours
          cf2 150 blocks, totalBytes=zzzzz, averageTimeInCache=XXXX hours

        ... Etc.

        The current metrics list blockCacheSize and blockCacheFree, but there is no way to know what's in there. Any single block isn't really important, but the patterns of what CF/Table they came from, how big are they, and how long (on average) they've been in the cache, are important.

        No such interface exists in HRegionInterface, so this is not just a UI request also an API change. But I think it would be helpful from an operational perspective.
        A UI that would display a block-cache report for a RegionServer would be helpful. For example ...

        table1
          cf1 100 blocks, totalBytes=yyyyy, averageTimeInCache=XXXX hours
          cf2 200 blocks, totalBytes=zzzzz, averageTimeInCache=XXXX hours

        table2
          cf1 75 blocks, totalBytes=yyyyy, averageTimeInCache=XXXX hours
          cf2 150 blocks, totalBytes=zzzzz, averageTimeInCache=XXXX hours

        ... Etc.

        The current metrics list blockCacheSize and blockCacheFree, but there is no way to know what's in there. Any single block isn't really important, but the patterns of what CF/Table they came from, how big are they, and how long (on average) they've been in the cache, are important.

        No such interface exists in HRegionInterface, so this is not just a UI request also an API change. But I think it would be helpful from an operational perspective.
        Doug Meil created issue -

          People

          • Assignee:
            Doug Meil
            Reporter:
            Doug Meil
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development