HBase
  1. HBase
  2. HBASE-4818

HBase Shell - Add support for formatting row keys before output

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Trivial Trivial
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: shell
    • Labels:
      None

      Description

      As many HBase users use binary row keys rather than strings to optimize memory consumption displaying an escaped string in the HBase shell isn't useful (and takes a lot of screen space)
      Allowing user to provide a row key formatter as part of the scan\get commands would allow developers to display the row key in a way thats makes sense for them.

      Example:
      scan 'stats',

      { ROWFORMATTER => MyRowFormatter.new }

      The row formatter simply gets the bytes array key and formats it to a string.
      Its an easy change tomake with simple monkey-patching of the shell commands but I would be happy to see it as part of the shell itself.

      1. hbase-4818.patch
        3 kB
        Ben West
      2. format3.patch
        14 kB
        Ben West

        Activity

        Hide
        Lars George added a comment -

        I would also like to see this persisted then, i.e. a simply text property file, or in the .irbrc where you can define this per table, so that these classes are loaded implicitly.

        Show
        Lars George added a comment - I would also like to see this persisted then, i.e. a simply text property file, or in the .irbrc where you can define this per table, so that these classes are loaded implicitly.
        Hide
        Eran Kampf added a comment -

        Thats a good idea!
        A simple global hash that maps a table name to a row key formatter and then all operations on the table use that formatter unless explicitly given one

        Show
        Eran Kampf added a comment - Thats a good idea! A simple global hash that maps a table name to a row key formatter and then all operations on the table use that formatter unless explicitly given one
        Hide
        Ben West added a comment -

        Attaching a patch which includes:

        1. The ability to specify a custom formatter on the command line
        2. A sample custom formatter which reverses the keys before printing them in a scan

        You can use the new formatter by doing

        hbase shell --format=Shell::Formatter::ReverseID.new
        

        We have an existing shell variable (JRUBY_OPTS) which you can set in your config script to persist your options, as Lars suggested. I'm not sure how to implement Eran's suggestion of per-table formatters using the command line; maybe we should deprecate the command line option since it doesn't do anything anyway and store this in .irbrc.

        Also, the reverse ID formatter works by a kind of hack.

        I'd like to hear from people more familiar with the shell on how to make this better.

        Show
        Ben West added a comment - Attaching a patch which includes: 1. The ability to specify a custom formatter on the command line 2. A sample custom formatter which reverses the keys before printing them in a scan You can use the new formatter by doing hbase shell --format=Shell::Formatter::ReverseID. new We have an existing shell variable (JRUBY_OPTS) which you can set in your config script to persist your options, as Lars suggested. I'm not sure how to implement Eran's suggestion of per-table formatters using the command line; maybe we should deprecate the command line option since it doesn't do anything anyway and store this in .irbrc. Also, the reverse ID formatter works by a kind of hack. I'd like to hear from people more familiar with the shell on how to make this better.
        Hide
        Todd Lipcon added a comment -

        I think this should be a table property, and refer to a Java class name, rather than doing it in ruby. Doing it in ruby only helps with shell, but doing it in Java means we can also use it in the UIs, etc. ACCUMULO-303 is helpful reference material.

        Show
        Todd Lipcon added a comment - I think this should be a table property, and refer to a Java class name, rather than doing it in ruby. Doing it in ruby only helps with shell, but doing it in Java means we can also use it in the UIs, etc. ACCUMULO-303 is helpful reference material.
        Hide
        Ben West added a comment -

        Todd: I think since we're using JRuby the formatter can be a java class, right? You'd just have --format=org.apache....

        But I guess we could store it as a table property.

        (Btw: if the formatters are to be useful outside of shell, we'll need a revamp of how they work. Right now, it just formats text without much knowledge of what the text is - we'd probably want to have FormatKey() FormatColumn() etc. methods. Which is a good idea anyway.)

        Show
        Ben West added a comment - Todd: I think since we're using JRuby the formatter can be a java class, right? You'd just have --format=org.apache.... But I guess we could store it as a table property. (Btw: if the formatters are to be useful outside of shell, we'll need a revamp of how they work. Right now, it just formats text without much knowledge of what the text is - we'd probably want to have FormatKey() FormatColumn() etc. methods. Which is a good idea anyway.)
        Hide
        Ben West added a comment -

        New patch is a lot cleaner. It moves some formatting from table.rb to HTableFormatter.java like Todd suggested, so it can be used elsewhere.

        There is also scope creep: it parses input as well as formats output (so if you do a get it will translate the rowkey into an internal format first). This is just because it made my head hurt to have the output of scans be one format but the input another.

        Right now there is only one formatter which is set via a shell param, but could be set at a table level - just wasn't sure if putting it in .irbrc was best or if there was a way we could do it in Java so non-shell would work too. Todd said to make it a "table property", but I don't know what this means.

        Show
        Ben West added a comment - New patch is a lot cleaner. It moves some formatting from table.rb to HTableFormatter.java like Todd suggested, so it can be used elsewhere. There is also scope creep: it parses input as well as formats output (so if you do a get it will translate the rowkey into an internal format first). This is just because it made my head hurt to have the output of scans be one format but the input another. Right now there is only one formatter which is set via a shell param, but could be set at a table level - just wasn't sure if putting it in .irbrc was best or if there was a way we could do it in Java so non-shell would work too. Todd said to make it a "table property", but I don't know what this means.
        Hide
        stack added a comment -

        Patch looks nice Ben. You have illustrations of it in action? It looks like it keeps default behavior. The default htformatter just does the bytes thing we currently have and doing formatting of HRegionInfo cells such as happens up in .META.

        What would it take to have this htformatter work in the ui too as per Todd suggestion?

        Show
        stack added a comment - Patch looks nice Ben. You have illustrations of it in action? It looks like it keeps default behavior. The default htformatter just does the bytes thing we currently have and doing formatting of HRegionInfo cells such as happens up in .META. What would it take to have this htformatter work in the ui too as per Todd suggestion?
        Hide
        Ben West added a comment -

        The ReverseIDFormatter in that patch overrides the default formatter to display row keys in reverse order.

        Something which we will have to think about is how we can maintain usability with these new formatters. Scans, for example, might not go in the order the user predicts because the stored format is different from the displayed one. Similarly with where regions split and so forth. Maybe we should require sort order to be constant across formatted and unformatted row keys (which would make the ReverseIDFormatter and probably most formatters impossible).

        I'm not super familiar with the web UI, but it looks like the only spots we display row keys is when we specify the start and end rows of each region, and when we issue splits/compactions. So that shouldn't be too bad to change.

        Show
        Ben West added a comment - The ReverseIDFormatter in that patch overrides the default formatter to display row keys in reverse order. Something which we will have to think about is how we can maintain usability with these new formatters. Scans, for example, might not go in the order the user predicts because the stored format is different from the displayed one. Similarly with where regions split and so forth. Maybe we should require sort order to be constant across formatted and unformatted row keys (which would make the ReverseIDFormatter and probably most formatters impossible). I'm not super familiar with the web UI, but it looks like the only spots we display row keys is when we specify the start and end rows of each region, and when we issue splits/compactions. So that shouldn't be too bad to change.
        Hide
        Ben West added a comment -

        I can work on adding this to the web UI if someone can suggest a place to store the formatter preference.

        Should it just be in hbase-site.xml?

        Show
        Ben West added a comment - I can work on adding this to the web UI if someone can suggest a place to store the formatter preference. Should it just be in hbase-site.xml?
        Hide
        Andrew Purtell added a comment -

        Stale issue. Reopen if still relevant.

        Show
        Andrew Purtell added a comment - Stale issue. Reopen if still relevant.

          People

          • Assignee:
            Unassigned
            Reporter:
            Eran Kampf
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 24h
              24h
              Remaining:
              Remaining Estimate - 24h
              24h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development