HBase
  1. HBase
  2. HBASE-487

Replace hql w/ a hbase-friendly jirb or jython shell

    Details

    • Type: Wish Wish
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.2.0
    • Component/s: None
    • Labels:
      None

      Description

      The hbase shell is a useful admin and debugging tool but it has a couple of downsides. To extend, a fragile parser definition needs tinkering-with and new java classes must be added. The current test suite for hql is lacking coverage and the current code could do with a rewrite having evolved piecemeal. Another downside is that the presence of an HQL interpreter gives the mis-impression that hbase is like a SQL database.

      This 'wish' issue suggests that we jettison HQL and instead offer users a jirb or jython command line. We'd ship with some scripts and jruby/jython classes that we'd source on startup to do things like import base client classes – so folks wouldn't have to remember all the packages stuff sat in – and added a pretty-print for scanners and getters outputting text, xhtml or binary. They would also make it easy to do HQL-things in jruby/python script.

      Advantages: Already-written parser with no need of extension probing deeper into hbase: i.e. better for debugging than HQL could ever be. Easy extension adding scripts/modules rather than java code. Less likely hbase could be confused for a SQL db.

      Downsides: Probably more verbose. Requires ruby or python knowledge ("Everyone knows some sql"). Big? (jruby lib is 24M).

      I was going to write security as downside but HQL suffers this at the moment too – though it has been possible to sort the updates from the selects in the UI to prevent modification of the db from the UI, something that would be hard to do in a jruby/jython parser.

      What do others think?

      1. 487-added-formatter.patch
        4 kB
        stack
      2. better-hirb.patch
        3 kB
        Bryan Duxbury
      3. module_in_bin.patch
        1 kB
        stack
      4. rb.patch
        2 kB
        stack
      5. jruby.patch
        1 kB
        stack
      6. groovy-2.patch
        34 kB
        stack
      7. groovy.patch
        3 kB
        stack

        Issue Links

          Activity

          Hide
          Edward J. Yoon added a comment -

          -1. Is above downsides can't fix?

          Show
          Edward J. Yoon added a comment - -1. Is above downsides can't fix?
          Hide
          Bryan Duxbury added a comment -

          @Edward - I don't think that a custom-written parser will ever give us the flexibility to do crazy debugging and tinkering like a JRuby shell would.

          As far as the point that it requires some ruby/python knowledge to use, I don't think that's true. At least in ruby, we could make the syntax very simple, while still allowing access to the deeper pure-ruby stuff for those who know it's there.

          For instance, we could make a get method like so:

          > get "row name", "fam:col", "fam2:col2"
          

          It doesn't look like SQL, but it has the same functionality, and you don't really have to know ruby to use it.

          Show
          Bryan Duxbury added a comment - @Edward - I don't think that a custom-written parser will ever give us the flexibility to do crazy debugging and tinkering like a JRuby shell would. As far as the point that it requires some ruby/python knowledge to use, I don't think that's true. At least in ruby, we could make the syntax very simple, while still allowing access to the deeper pure-ruby stuff for those who know it's there. For instance, we could make a get method like so: > get "row name" , "fam:col" , "fam2:col2" It doesn't look like SQL, but it has the same functionality, and you don't really have to know ruby to use it.
          Hide
          Edward J. Yoon added a comment -

          Hmm. I See. and agree about flexibility.
          Then, i'd like to move current HQL features to HRdfStore project. (http://wiki.apache.org/incubator/HRdfStoreProposal)
          It makes more sense for us, and HQL features are good fit with HRdfStore project.

          Would you sponsor me and HRdfStore project?

          Show
          Edward J. Yoon added a comment - Hmm. I See. and agree about flexibility. Then, i'd like to move current HQL features to HRdfStore project. ( http://wiki.apache.org/incubator/HRdfStoreProposal ) It makes more sense for us, and HQL features are good fit with HRdfStore project. Would you sponsor me and HRdfStore project?
          Hide
          stack added a comment -

          jython is easy enough to integrate. What about fact that you have to tab in shell when you want to write a little loop or that long lines can't be continued on next line with a '\'. Does this make it unsuitable?

          Looking at integrating jirb now. Its more of a PITA integrating but no need of parens and more free-form would be more 'shell-like'.

          What else should we be looking at? Beanshell would be a sort of middle-ground; everyone would have to learn its syntax and you can do anything you could in jython/jirb and its easy to integrate

          Was thinking that ./bin/hbase shell would put up the new shell, not HQL. Would be a banner which said run './bin/hbase hql' to get old, deprecated shell

          Show
          stack added a comment - jython is easy enough to integrate. What about fact that you have to tab in shell when you want to write a little loop or that long lines can't be continued on next line with a '\'. Does this make it unsuitable? Looking at integrating jirb now. Its more of a PITA integrating but no need of parens and more free-form would be more 'shell-like'. What else should we be looking at? Beanshell would be a sort of middle-ground; everyone would have to learn its syntax and you can do anything you could in jython/jirb and its easy to integrate Was thinking that ./bin/hbase shell would put up the new shell, not HQL. Would be a banner which said run './bin/hbase hql' to get old, deprecated shell
          Hide
          stack added a comment -

          Patch to get groovy into hbase. Currently type "./bin/hbase groovy" to make it work.

          Jars to include are about 2.7M

          Can use the groovy config. to preload the shell w/ hbase helper scripts and methods

          Should do our own shell Main so we use later commons-cli, the one we include... that'd cut down on having to import one more jar and so we don't show 'inspect' in as a help option (I'm running groovy headless – but maybe we want AWT and being able to inspect objects to see what their API, etc.)?

          Here is sample:

          durruti:~/Documents/checkouts/hbase/trunk stack$ ./bin/hbase groovy
          Groovy Shell (1.5.4, JVM: 1.5.0_13-121)
          Type 'help' or '\h' for help.
          ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          groovy:000> help
          
          For information about Groovy, visit:
              http://groovy.codehaus.org 
          
          Available commands:
            help     (\h) Display this help message
            ?        (\?) Alias to: help
            exit     (\x) Exit the shell
            quit     (\q) Alias to: exit
            import   (\i) Import a class into the namespace
            display  (\d) Display the current buffer
            clear    (\c) Clear the buffer
            show     (\S) Show variables, classes or imports
            inspect  (\n) Inspect a variable or the last result with the GUI object browser
            purge    (\p) Purge variables, classes, imports or preferences
            edit     (\e) Edit the current buffer
            load     (\l) Load a file or URL into the buffer
            .        (\.) Alias to: load
            save     (\s) Save the current buffer to a file
            record   (\r) Record the current session to a file
            history  (\H) Display, manage and recall edit-line history
            alias    (\a) Create an alias
            set      (\=) Set (or list) preferences
          
          For help on a specific command type:
              help command 
          
          groovy:000> import org.apache.hadoop.hbase.client.HTable
          groovy:000> import org.apache.hadoop.hbase.HBaseConfiguration
          groovy:000> c = new HBaseConfiguration() 
          ...
          
          Show
          stack added a comment - Patch to get groovy into hbase. Currently type "./bin/hbase groovy" to make it work. Jars to include are about 2.7M Can use the groovy config. to preload the shell w/ hbase helper scripts and methods Should do our own shell Main so we use later commons-cli, the one we include... that'd cut down on having to import one more jar and so we don't show 'inspect' in as a help option (I'm running groovy headless – but maybe we want AWT and being able to inspect objects to see what their API, etc.)? Here is sample: durruti:~/Documents/checkouts/hbase/trunk stack$ ./bin/hbase groovy Groovy Shell (1.5.4, JVM: 1.5.0_13-121) Type 'help' or '\h' for help. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- groovy:000> help For information about Groovy, visit: http: //groovy.codehaus.org Available commands: help (\h) Display this help message ? (\?) Alias to: help exit (\x) Exit the shell quit (\q) Alias to: exit import (\i) Import a class into the namespace display (\d) Display the current buffer clear (\c) Clear the buffer show (\S) Show variables, classes or imports inspect (\n) Inspect a variable or the last result with the GUI object browser purge (\p) Purge variables, classes, imports or preferences edit (\e) Edit the current buffer load (\l) Load a file or URL into the buffer . (\.) Alias to: load save (\s) Save the current buffer to a file record (\r) Record the current session to a file history (\H) Display, manage and recall edit-line history alias (\a) Create an alias set (\=) Set (or list) preferences For help on a specific command type: help command groovy:000> import org.apache.hadoop.hbase.client.HTable groovy:000> import org.apache.hadoop.hbase.HBaseConfiguration groovy:000> c = new HBaseConfiguration() ...
          Hide
          stack added a comment -

          jruby and beanshell have forbidding licenses; they're out of the running. jython license may be ok so its not ruled out yet. Groovy is apache licensed.

          Show
          stack added a comment - jruby and beanshell have forbidding licenses; they're out of the running. jython license may be ok so its not ruled out yet. Groovy is apache licensed.
          Hide
          Bryan Duxbury added a comment -

          The license compatibility issue only prevents us from bundling, right? Certainly it'd be nice to bundle the needed libraries in, but it would add at most one or two more steps to the install process. Also, if we decide to make a web admin interface in JRuby on Rails, JRuby would already be around.

          I think we should avoid making a decision solely for license purposes.

          Show
          Bryan Duxbury added a comment - The license compatibility issue only prevents us from bundling, right? Certainly it'd be nice to bundle the needed libraries in, but it would add at most one or two more steps to the install process. Also, if we decide to make a web admin interface in JRuby on Rails, JRuby would already be around. I think we should avoid making a decision solely for license purposes.
          Hide
          stack added a comment -

          Yeah – license prevents bundling.

          If shell were optional, requiring user take optional installation step would be fine but IMO shell is core.

          Other jython/jruby downsides:

          • Size on disk. JRuby is 25M just counting ruby classes and jar. Jython is smaller. Would need to trim either (Beanshell is small).
          • JRuby is more awkward to integrate than jython/groovy/beanshell
          • Jython is a python that is 6 years old – 2.2.1 (From michael B). In the past, its been awkward having to remember the old syntax when you've been used to modern cpythons

          (I wonder what groovy's grails is like as a RonR clone)

          Show
          stack added a comment - Yeah – license prevents bundling. If shell were optional, requiring user take optional installation step would be fine but IMO shell is core. Other jython/jruby downsides: Size on disk. JRuby is 25M just counting ruby classes and jar. Jython is smaller. Would need to trim either (Beanshell is small). JRuby is more awkward to integrate than jython/groovy/beanshell Jython is a python that is 6 years old – 2.2.1 (From michael B). In the past, its been awkward having to remember the old syntax when you've been used to modern cpythons (I wonder what groovy's grails is like as a RonR clone)
          Hide
          Tom White added a comment -

          How about JavaScript? It comes with the JDK from version 6, and is easy enough to add for older versions. Just a thought.

          Show
          Tom White added a comment - How about JavaScript? It comes with the JDK from version 6, and is easy enough to add for older versions. Just a thought.
          Hide
          stack added a comment -

          > How about JavaScript?

          When I've suggested that we can have js effectively for free, reaction seems to be generally uninspired. I've heard "javascript does not work well as a shell" and "yuck". I'll take a look...

          Show
          stack added a comment - > How about JavaScript? When I've suggested that we can have js effectively for free, reaction seems to be generally uninspired. I've heard "javascript does not work well as a shell" and "yuck". I'll take a look...
          Hide
          Chris K Wensel added a comment -

          Quick thoughts...

          Groovy seems polished as a shell and as a base scripting language. Re scripting, the 'builder' functionality strikes me as a big plus.

          Re javascript. I've used it (Rhino/E4X) extensively for scripting. Managing script 'includes' from shared scripts was very clumsy. E4X was a major plus, but it is analogous to Groovy 'builder' functionality (but only for XML). Further, javascript has a rocky roadmap (E4X not avail in jdk, and might be dropped in JS2, not that it's needed here).

          Show
          Chris K Wensel added a comment - Quick thoughts... Groovy seems polished as a shell and as a base scripting language. Re scripting, the 'builder' functionality strikes me as a big plus. Re javascript. I've used it (Rhino/E4X) extensively for scripting. Managing script 'includes' from shared scripts was very clumsy. E4X was a major plus, but it is analogous to Groovy 'builder' functionality (but only for XML). Further, javascript has a rocky roadmap (E4X not avail in jdk, and might be dropped in JS2, not that it's needed here).
          Hide
          stack added a comment -

          Adds a src/groovy dir. In it are my beginnings playing w/ altering shell so it includes hbase context. Also included are mods to build.xml to add a groovyc target which will compile groovy to class files.

          Show
          stack added a comment - Adds a src/groovy dir. In it are my beginnings playing w/ altering shell so it includes hbase context. Also included are mods to build.xml to add a groovyc target which will compile groovy to class files.
          Hide
          stack added a comment -

          Couple of notes on my groovy playing:

          + Groovysh is hard to modify. Not subclassable – has a main – and so I have to copy at least two groovy shell files local – Main and Groovysh – and change a few lines like package and class names.
          + Groovysh is hard to modify in that the list of commands displayed in shell are listed in a bundled commands.xml file. The commands.xml is loaded as a resource from the current .groovy files classloader. Means I have to bring the commands.xml local too.
          + Looks like I cannot do without parens invoking groovy methods. Works if you do this 'get tableName:"SOME_TABLE", rowName: "SOME_ROW"' but not if I do this 'rowresult = get tableName:"SOME_TABLE", rowName: "SOME_ROW"': i.e. the parse gets messed up by the 'rowresult =' prefix.

          Idea for how hbase groovy would work is roughly:

          + On startup you'd have a shell that had an hbase object in it.
          + If you did a help, you'd see list of groovy options with addition of hbase object.
          + You'd do 'help hbase' and it would list something like this:

          hbase.admin
          hbase.conf
          hbase.get
          hbase.iterator
          hbase.put
          

          + Doing 'help hbase.admin' would show you something like:

          hbase.admin.createTable ...
          hbase.admin.deleteTable ...
          hbase.admin.tables
          ...
          

          + Invoking something like hbase.admin.createTable, you could pass a populated HTableDescriptor or you could do short-circuit that was something like: 'hbase.admin.createTable("table_name", "colfamily1_name", ....)

          Replacing shell with groovy will take a while – few days to get something primitive into place and then ongoing work improving.

          Show
          stack added a comment - Couple of notes on my groovy playing: + Groovysh is hard to modify. Not subclassable – has a main – and so I have to copy at least two groovy shell files local – Main and Groovysh – and change a few lines like package and class names. + Groovysh is hard to modify in that the list of commands displayed in shell are listed in a bundled commands.xml file. The commands.xml is loaded as a resource from the current .groovy files classloader. Means I have to bring the commands.xml local too. + Looks like I cannot do without parens invoking groovy methods. Works if you do this 'get tableName:"SOME_TABLE", rowName: "SOME_ROW"' but not if I do this 'rowresult = get tableName:"SOME_TABLE", rowName: "SOME_ROW"': i.e. the parse gets messed up by the 'rowresult =' prefix. Idea for how hbase groovy would work is roughly: + On startup you'd have a shell that had an hbase object in it. + If you did a help, you'd see list of groovy options with addition of hbase object. + You'd do 'help hbase' and it would list something like this: hbase.admin hbase.conf hbase.get hbase.iterator hbase.put + Doing 'help hbase.admin' would show you something like: hbase.admin.createTable ... hbase.admin.deleteTable ... hbase.admin.tables ... + Invoking something like hbase.admin.createTable, you could pass a populated HTableDescriptor or you could do short-circuit that was something like: 'hbase.admin.createTable("table_name", "colfamily1_name", ....) Replacing shell with groovy will take a while – few days to get something primitive into place and then ongoing work improving.
          Hide
          stack added a comment -

          It looks like jruby is in the running again. If its included under the CPL license, that should work. Hadoop currently bundles junit which is CPL. Nigel Daley also pointed me to this useful resource: http://people.apache.org/~rubys/3party.html

          Show
          stack added a comment - It looks like jruby is in the running again. If its included under the CPL license, that should work. Hadoop currently bundles junit which is CPL. Nigel Daley also pointed me to this useful resource: http://people.apache.org/~rubys/3party.html
          Hide
          stack added a comment -

          Having nice little classloader issues making groovy work. Putting aside for the moment to play with rhino/js.

          + Need to figure out a readline for it. It doesn't have it natively.
          + I like the way it reports syntax errors in-line rather than throwing pages of exceptions (groovy)
          + Does not autoimport java.lang because class name clashes got basic types.
          + Having trouble creating instances of hbase classes though I've done importPackage – says 'TypeError: [JavaPackage org.apache.hadoop.hbase.HBaseConfiguration] is not a function, it is object.' Looks like I need to use full package names though the doc claims otherwise (Not friendly). Its odd because I can do java classes fine but not hbase ones (must be doing something wrong):

          js> var sb = new java.lang.StringBuffer();
          js> var sb = new org.apache.hadoop.hbase.HBaseConfiguration();
          js: "<stdin>", line 21: uncaught JavaScript runtime exception: TypeError: [JavaPackage org.apache.hadoop.hbase.HBaseConfiguration] is not a function, it is object.
                  at <stdin>:21
          

          + Looks like I'd make an hbase object that subclassed the rhino ScriptableObject. In here I could hide alot of the hbase'isms.

          Show
          stack added a comment - Having nice little classloader issues making groovy work. Putting aside for the moment to play with rhino/js. + Need to figure out a readline for it. It doesn't have it natively. + I like the way it reports syntax errors in-line rather than throwing pages of exceptions (groovy) + Does not autoimport java.lang because class name clashes got basic types. + Having trouble creating instances of hbase classes though I've done importPackage – says 'TypeError: [JavaPackage org.apache.hadoop.hbase.HBaseConfiguration] is not a function, it is object.' Looks like I need to use full package names though the doc claims otherwise (Not friendly). Its odd because I can do java classes fine but not hbase ones (must be doing something wrong): js> var sb = new java.lang. StringBuffer (); js> var sb = new org.apache.hadoop.hbase.HBaseConfiguration(); js: "<stdin>" , line 21: uncaught JavaScript runtime exception: TypeError: [JavaPackage org.apache.hadoop.hbase.HBaseConfiguration] is not a function, it is object. at <stdin>:21 + Looks like I'd make an hbase object that subclassed the rhino ScriptableObject. In here I could hide alot of the hbase'isms.
          Hide
          stack added a comment -

          Here's a patch to make jirb run. In our lib dir I have the following:

          $ ls lib/ruby 
          1.8             jirb            jruby.jar       site_ruby
          

          jirb is from jruby/bin. The 1.8 is a cut-down version of whats in jruby (I removed cgi support, etc.). I have not included gems. Was 15M.

          To make it run you do:

          $ ./bin/hbase shell
          irb(main):001:0> import org.apache.hadoop.hbase.HBaseConfiguration
          
          Show
          stack added a comment - Here's a patch to make jirb run. In our lib dir I have the following: $ ls lib/ruby 1.8 jirb jruby.jar site_ruby jirb is from jruby/bin. The 1.8 is a cut-down version of whats in jruby (I removed cgi support, etc.). I have not included gems. Was 15M. To make it run you do: $ ./bin/hbase shell irb(main):001:0> import org.apache.hadoop.hbase.HBaseConfiguration
          Hide
          stack added a comment -

          Marking as 0.2.0

          Show
          stack added a comment - Marking as 0.2.0
          Hide
          Chris K Wensel added a comment -

          I've added a patch to groovysh to allow for scripts to register new commands.
          http://jira.codehaus.org/browse/GROOVY-2778

          But the argument parser leaves something to be desired, it tokenizes all the values after the command by whitespace and does not honor quotes.

          for example: cmd "foo bar"
          would result in arg[0] == '"foo' and arg[1] == 'bar"'

          Further I do not know if you can access shell variables from the command. Likely so, but isn't straightforward. the argument parser for sure doesn't resolve them. thus escaping shell variables into command arguments might be troublesome.

          Show
          Chris K Wensel added a comment - I've added a patch to groovysh to allow for scripts to register new commands. http://jira.codehaus.org/browse/GROOVY-2778 But the argument parser leaves something to be desired, it tokenizes all the values after the command by whitespace and does not honor quotes. for example: cmd "foo bar" would result in arg [0] == '"foo' and arg [1] == 'bar"' Further I do not know if you can access shell variables from the command. Likely so, but isn't straightforward. the argument parser for sure doesn't resolve them. thus escaping shell variables into command arguments might be troublesome.
          Hide
          stack added a comment -

          An interesting shell framework (via Chris) http://cwiki.apache.org/GSHELL/

          Show
          stack added a comment - An interesting shell framework (via Chris) http://cwiki.apache.org/GSHELL/
          Hide
          stack added a comment -
          Show
          stack added a comment - Some requirements a discussion: http://wiki.apache.org/hadoop/Hbase/Shell/Replacement
          Hide
          stack added a comment -

          I'm thinking we should just check in the jruby jar and make the shell jirb now in advance of any DSL design. Having it in place will allow us experiment. What do others think?

          Show
          stack added a comment - I'm thinking we should just check in the jruby jar and make the shell jirb now in advance of any DSL design. Having it in place will allow us experiment. What do others think?
          Hide
          Jean-Daniel Cryans added a comment -

          +1 on that!

          Show
          Jean-Daniel Cryans added a comment - +1 on that!
          Hide
          Jim Kellerman added a comment -

          > stack - 22/May/08 05:32 PM
          > I'm thinking we should just check in the jruby jar and make the shell jirb now in advance of any DSL
          > design. Having it in place will allow us experiment. What do others think?

          +1

          Show
          Jim Kellerman added a comment - > stack - 22/May/08 05:32 PM > I'm thinking we should just check in the jruby jar and make the shell jirb now in advance of any DSL > design. Having it in place will allow us experiment. What do others think? +1
          Hide
          Bryan Duxbury added a comment -

          +1 on that.

          Show
          Bryan Duxbury added a comment - +1 on that.
          Hide
          Tim Dysinger added a comment -

          +1 from me.

          Add in some more small ruby code and you'll have yourself a DSL.

          Just take the code from this project's lib dir and stick it into a jar next to jruby.

          Check out the examples for mail.rb and others.

          http://dslkit.rubyforge.org/

          Show
          Tim Dysinger added a comment - +1 from me. Add in some more small ruby code and you'll have yourself a DSL. Just take the code from this project's lib dir and stick it into a jar next to jruby. Check out the examples for mail.rb and others. http://dslkit.rubyforge.org/
          Hide
          stack added a comment -

          I'll add in the jar then.

          Tim, that dslkit is GPL so I can't check it in (Looks good though... )

          Show
          stack added a comment - I'll add in the jar then. Tim, that dslkit is GPL so I can't check it in (Looks good though... )
          Hide
          stack added a comment -

          I added the jar and made it so that when you do './bin/hbase shell', you now get raw jirb. While I was at it, I purged hql. First order of business I'd say is changing prompt from 'jirb' to 'hbase'.

          Show
          stack added a comment - I added the jar and made it so that when you do './bin/hbase shell', you now get raw jirb. While I was at it, I purged hql. First order of business I'd say is changing prompt from 'jirb' to 'hbase'.
          Hide
          Bryan Duxbury added a comment -

          Ok, I finally found 15 minutes in which to write the dsl spec I've been promising. Here it is!

          --General--
          help
            Get a list of all possible commands
          
          help [cmd]
            Get deeper help on a specific command
          
          --Admin--
          
          list
            List all the currently existing tables (and their status?)
            
          create name, colfam_spec [, colfam_spec ...], [table_opts]
            Create a table.
            
          drop name
            Delete a table
            
          alter name [, colfam_spec ...] [, table_opts]
            Change an existing table's options or column families
            
          enable name
            Enable a table
            
          disable name
            Disable a table
            
          truncate name
            Drop and recreate a table, effectively emptying it.
            
          --DML--
          get table_name, row_key [, "column_name" ...]
            Return data from a row
            
          put table_name, row_key [, timestamp], "column_name" => "new value", ...
            Put some data to a row at the specific cells, optionally with a timestamp
            
          scan table_name, start_key, end_key [, timestamp] [, "column_name" ...]
            Perform a scan over the table from start_key to end_key optionally with a timestamp or specified columns.
            
          delete table_name, row_key, [, timestamp], "column_name", ...
            Delete the data at the cells provided for a given row
          
          
          Show
          Bryan Duxbury added a comment - Ok, I finally found 15 minutes in which to write the dsl spec I've been promising. Here it is! --General-- help Get a list of all possible commands help [cmd] Get deeper help on a specific command --Admin-- list List all the currently existing tables (and their status?) create name, colfam_spec [, colfam_spec ...], [table_opts] Create a table. drop name Delete a table alter name [, colfam_spec ...] [, table_opts] Change an existing table's options or column families enable name Enable a table disable name Disable a table truncate name Drop and recreate a table, effectively emptying it. --DML-- get table_name, row_key [, "column_name" ...] Return data from a row put table_name, row_key [, timestamp], "column_name" => " new value" , ... Put some data to a row at the specific cells, optionally with a timestamp scan table_name, start_key, end_key [, timestamp] [, "column_name" ...] Perform a scan over the table from start_key to end_key optionally with a timestamp or specified columns. delete table_name, row_key, [, timestamp], "column_name" , ... Delete the data at the cells provided for a given row
          Hide
          Tim Dysinger added a comment -

          I didn't notice the license on DSLKit. Most Ruby code is BSD that I see. Bummer.

          Show
          Tim Dysinger added a comment - I didn't notice the license on DSLKit. Most Ruby code is BSD that I see. Bummer.
          Hide
          stack added a comment -

          > I didn't notice the license on DSLKit.

          np Tim.

          Bryan, how you think colfam_spec should be done? As a dictionary so '

          {name: "info", max_versions: 3...}

          '? Anything not explicitly listed gets the default. Same for table opts?

          Alter needs to take add, delete, edit qualifiers. What happens if you just want to edit a colum family only or table opts? How's the parser figure your intent?

          Drop truncate I'd say. Maybe instead add describe table. Let describe table return a result. Make it so create table can take the result of a describe table. How about also adding 'describe table columnfamily' to get a column family listing?

          Regards 'get', can users specify regex for column name (not critical). Its missing optional timestamp

          In the put, is '=>' the separator between value and cell address? Should the timestampt be after column since it is optional.

          Whats output look like? Hows it going to work? How is it specified for a session?

          Otherwise, looks good. Intent is to use verbs from current hbase API rather than sql-ese?

          Show
          stack added a comment - > I didn't notice the license on DSLKit. np Tim. Bryan, how you think colfam_spec should be done? As a dictionary so ' {name: "info", max_versions: 3...} '? Anything not explicitly listed gets the default. Same for table opts? Alter needs to take add, delete, edit qualifiers. What happens if you just want to edit a colum family only or table opts? How's the parser figure your intent? Drop truncate I'd say. Maybe instead add describe table. Let describe table return a result. Make it so create table can take the result of a describe table. How about also adding 'describe table columnfamily' to get a column family listing? Regards 'get', can users specify regex for column name (not critical). Its missing optional timestamp In the put, is '=>' the separator between value and cell address? Should the timestampt be after column since it is optional. Whats output look like? Hows it going to work? How is it specified for a session? Otherwise, looks good. Intent is to use verbs from current hbase API rather than sql-ese?
          Hide
          Bryan Duxbury added a comment -

          For sure the simplest way to do the colfam spec would be to use JRuby hashes, which have the form of

          {"key" => "value"}

          . That's what the => operator is used for. If that doesn't put people off, then I think it's what we should use for all *_spec objects in this dsl.

          Truncate can be a nice to have. I agree there should be a describe in general, so let's add that as the first step. "describe table columnfamily" seems redundant to me - let's just go with "describe table", and you get the full table spec including columns.

          I think regex in get is a nice to have for the next version.

          Some sample output might look like (excuse the hacky ascii boxing):

          irb>get "my_table", "colfam:name"
          +------------------+---------+
           | column           | value  |
          +------------------+---------+
          | colfam:name  | bryan  |
          +------------------+---------+
          Completed in 0.01 sec.
          irb>
          

          Mind you, the cool looking table would be the toString of whatever object was actually returned, which would probably be some wrapper around RowResult. What do you mean, how is it specified for a session?

          > Otherwise, looks good. Intent is to use verbs from current hbase API rather than sql-ese?
          Yes, and this is very, very important distinction. The goal is to map as much of this shell as possible to the API, rather than trying to make it look like SQL. I want people to have to stop and think at least a little about the differences, rather than just assuming off the bat there's some magical SELECT verb that takes care of everything.

          Show
          Bryan Duxbury added a comment - For sure the simplest way to do the colfam spec would be to use JRuby hashes, which have the form of {"key" => "value"} . That's what the => operator is used for. If that doesn't put people off, then I think it's what we should use for all *_spec objects in this dsl. Truncate can be a nice to have. I agree there should be a describe in general, so let's add that as the first step. "describe table columnfamily" seems redundant to me - let's just go with "describe table", and you get the full table spec including columns. I think regex in get is a nice to have for the next version. Some sample output might look like (excuse the hacky ascii boxing): irb>get "my_table" , "colfam:name" +------------------+---------+ | column | value | +------------------+---------+ | colfam:name | bryan | +------------------+---------+ Completed in 0.01 sec. irb> Mind you, the cool looking table would be the toString of whatever object was actually returned, which would probably be some wrapper around RowResult. What do you mean, how is it specified for a session? > Otherwise, looks good. Intent is to use verbs from current hbase API rather than sql-ese? Yes, and this is very, very important distinction. The goal is to map as much of this shell as possible to the API, rather than trying to make it look like SQL. I want people to have to stop and think at least a little about the differences, rather than just assuming off the bat there's some magical SELECT verb that takes care of everything.
          Hide
          stack added a comment -

          I'm fine w/ ruby hashes. Fine also w/ no describe table_name col_family in first version. Same for regex.

          Your output is missing row spec. Or, outputting in hql, if row was specified in query, I would output just column and cell – no rowspec – but when say, scanning, then you needed the three rows. Also missing is timestamp (Now its available, lets add it to output)

          In hql, you could specify a table formatter. There were two types: ascii table and xhtml (Former was default; latter was used outputting content in the hql ui page and often useful when you needed to parse a big result). Going forward we should be able to add other formatter types and formatters (no formatting with tab delimiters, no column headers, json, etc.) At least ascii and probably xhtml are needed in the first version I'd say.

          Would suggest you clean up your proposal and put it up on http://wiki.apache.org/hadoop/Hbase/Shell/Replacement or into a new page. Add your distinction between hql and this dsl somewhere as a bolded design dictate.

          Show
          stack added a comment - I'm fine w/ ruby hashes. Fine also w/ no describe table_name col_family in first version. Same for regex. Your output is missing row spec. Or, outputting in hql, if row was specified in query, I would output just column and cell – no rowspec – but when say, scanning, then you needed the three rows. Also missing is timestamp (Now its available, lets add it to output) In hql, you could specify a table formatter. There were two types: ascii table and xhtml (Former was default; latter was used outputting content in the hql ui page and often useful when you needed to parse a big result). Going forward we should be able to add other formatter types and formatters (no formatting with tab delimiters, no column headers, json, etc.) At least ascii and probably xhtml are needed in the first version I'd say. Would suggest you clean up your proposal and put it up on http://wiki.apache.org/hadoop/Hbase/Shell/Replacement or into a new page. Add your distinction between hql and this dsl somewhere as a bolded design dictate.
          Hide
          stack added a comment -

          Patch that adds src/ruby and a first attempt at an HBase.rb module with basic imports and 'import Java' statement. Changes in build.xml add the src/ruby content to the root of the hbase-X.X.X.jar. I tried usring -r module_name to the irb invocation but looks like CLASSLOADER issues... our modules can't be found on irb invocation (I'd guess only finds stuff in the jruby jar).

          Seems too that we need to figure how to get an irbrc into the mix. In here we can define the prompt among other things. Do it in a manner that can be overridden; e.g. in an irb.rc that can be found on irb load so users' .irbrc override ours.

          Show
          stack added a comment - Patch that adds src/ruby and a first attempt at an HBase.rb module with basic imports and 'import Java' statement. Changes in build.xml add the src/ruby content to the root of the hbase-X.X.X.jar. I tried usring -r module_name to the irb invocation but looks like CLASSLOADER issues... our modules can't be found on irb invocation (I'd guess only finds stuff in the jruby jar). Seems too that we need to figure how to get an irbrc into the mix. In here we can define the prompt among other things. Do it in a manner that can be overridden; e.g. in an irb.rc that can be found on irb load so users' .irbrc override ours.
          Hide
          stack added a comment -

          This patch adds a module named HBase.rb to $HBASE_HOME/bin. It then adds '-r $HBASE_HOME/bin/HBase.rb' to the irb invocation. The module runs the magic 'include Java' invocation, imports basic HBase types, sets the prompt, and then has defines for 'help' and 'version'. Here's how it looks currently:

          durruti:~/Documents/checkouts/trunk stack$ ./bin/hbase shell
          irb: warn: can't alias help from irb_help.
          hbase(main):001:0> help
          HBase Shell Commands:
           version   Output the hbase version
           ...
          => nil
          hbase(main):002:0> version
          TODO: 0.2.x
          => nil
          hbase(main):003:0>
          

          There's some namespace issue that needs to be fixed – the 'irb: warn: can't alias help from irb_help'. Need to fix the '=> nil' return too.

          This HBase.rb looks like could be place to put the hbase DSL.

          We could use this and the src/ruby dir. The latter could hold fat ruby scripts or debugging code (if its findable on CLASSPATH from jirb).

          Show
          stack added a comment - This patch adds a module named HBase.rb to $HBASE_HOME/bin. It then adds '-r $HBASE_HOME/bin/HBase.rb' to the irb invocation. The module runs the magic 'include Java' invocation, imports basic HBase types, sets the prompt, and then has defines for 'help' and 'version'. Here's how it looks currently: durruti:~/Documents/checkouts/trunk stack$ ./bin/hbase shell irb: warn: can't alias help from irb_help. hbase(main):001:0> help HBase Shell Commands: version Output the hbase version ... => nil hbase(main):002:0> version TODO: 0.2.x => nil hbase(main):003:0> There's some namespace issue that needs to be fixed – the 'irb: warn: can't alias help from irb_help'. Need to fix the '=> nil' return too. This HBase.rb looks like could be place to put the hbase DSL. We could use this and the src/ruby dir. The latter could hold fat ruby scripts or debugging code (if its findable on CLASSPATH from jirb).
          Hide
          stack added a comment -

          Here's post on the 'alias help from irb_help': http://www.koders.com/noncode/fid545E597E7FE131466FB0DFC081EB6FFB30D89DD7.aspx Says just ignore it!

          Show
          stack added a comment - Here's post on the 'alias help from irb_help': http://www.koders.com/noncode/fid545E597E7FE131466FB0DFC081EB6FFB30D89DD7.aspx Says just ignore it!
          Hide
          stack added a comment -

          Committed last patch. Just emits a help with nothing in it but the version command. Version command does the right thing. There is the annoying "can't aliase help from irb_help" but hopefully can figure how to fix that

          Show
          stack added a comment - Committed last patch. Just emits a help with nothing in it but the version command. Version command does the right thing. There is the annoying "can't aliase help from irb_help" but hopefully can figure how to fix that
          Hide
          Bryan Duxbury added a comment -

          Here's an improved version of bin/hbase and hirb.rb that gives us more control. I still haven't gotten rid of the stupid warn about realiasing of help, but I'm confident it is gettable.

          Show
          Bryan Duxbury added a comment - Here's an improved version of bin/hbase and hirb.rb that gives us more control. I still haven't gotten rid of the stupid warn about realiasing of help, but I'm confident it is gettable.
          Hide
          stack added a comment -

          Committed Bryan's improvements. The "can't alias help" message is back but its after the banner message. I'd guess most folks won't even notice it. Its ok for now. Bryan's work will give us first hack at the command-line args before passing to irb.

          Show
          stack added a comment - Committed Bryan's improvements. The "can't alias help" message is back but its after the banner message. I'd guess most folks won't even notice it. Its ok for now. Bryan's work will give us first hack at the command-line args before passing to irb.
          Hide
          stack added a comment -

          Added formatter ruby module under bin. Added handling of command-line args.

          Show
          stack added a comment - Added formatter ruby module under bin. Added handling of command-line args.
          Hide
          stack added a comment -

          Basic DDL and general help is in place. CRUD is TODO.

          Show
          stack added a comment - Basic DDL and general help is in place. CRUD is TODO.
          Hide
          stack added a comment -

          New commit on this issue. Comments below. All methods are in place as of this commit. Need to do a little more testing before closing.

          M  src/test/org/apache/hadoop/hbase/TestBloomFilters.java
            Constant named changed From DEFAULT_MAX_VALUE_LENGTH to DEFAULT_LENGTH
          M  src/java/org/apache/hadoop/hbase/HColumnDescriptor.java
            Changed MAX_LENGTH to LENGTH and dropped MAX from DEFAULT_MAX_LENGTH
          M src/java/org/apache/hadoop/hbase/HConstants.java
            Added NAME and VERSIONS
          M  src/java/org/apache/hadoop/hbase/regionserver/HStore.java
            Make the delete of data file recursive; when its not, just fails because
            data file exists
          M  src/java/org/apache/hadoop/hbase/regionserver/NoSuchColumnFamilyException.java
            Don't retry these exceptions.
          M  src/java/org/apache/hadoop/hbase/HRegionInfo.java
            NAME define moved to HConstants
          M  src/java/org/apache/hadoop/hbase/master/TableOperation.java
            Pass String to TNFE rather than byte array.
          M  src/java/org/apache/hadoop/hbase/io/Cell.java
            Add constructor that takes a String
          M  src/java/org/apache/hadoop/hbase/io/BatchOperation.java
            Add Constructor that takes String
          M  src/java/org/apache/hadoop/hbase/client/HTable.java
            Add timestamp argument to the deleteFamily methods.
          M  src/java/org/apache/hadoop/hbase/client/HConnectionManager.java
            toString byte arrays in exception construction
          M  bin/HBase.rb
            Moved java includes here out of hirb.
            Added bunch of defines.
            Added implementations for scan, delete*, put, get, etc.  
            Added little test suite.
          M  bin/Formatter.rb
            Fixup.
          M  bin/hirb.rb
            Filled out help and filled out missing methods.
          
          Show
          stack added a comment - New commit on this issue. Comments below. All methods are in place as of this commit. Need to do a little more testing before closing. M src/test/org/apache/hadoop/hbase/TestBloomFilters.java Constant named changed From DEFAULT_MAX_VALUE_LENGTH to DEFAULT_LENGTH M src/java/org/apache/hadoop/hbase/HColumnDescriptor.java Changed MAX_LENGTH to LENGTH and dropped MAX from DEFAULT_MAX_LENGTH M src/java/org/apache/hadoop/hbase/HConstants.java Added NAME and VERSIONS M src/java/org/apache/hadoop/hbase/regionserver/HStore.java Make the delete of data file recursive; when its not, just fails because data file exists M src/java/org/apache/hadoop/hbase/regionserver/NoSuchColumnFamilyException.java Don't retry these exceptions. M src/java/org/apache/hadoop/hbase/HRegionInfo.java NAME define moved to HConstants M src/java/org/apache/hadoop/hbase/master/TableOperation.java Pass String to TNFE rather than byte array. M src/java/org/apache/hadoop/hbase/io/Cell.java Add constructor that takes a String M src/java/org/apache/hadoop/hbase/io/BatchOperation.java Add Constructor that takes String M src/java/org/apache/hadoop/hbase/client/HTable.java Add timestamp argument to the deleteFamily methods. M src/java/org/apache/hadoop/hbase/client/HConnectionManager.java toString byte arrays in exception construction M bin/HBase.rb Moved java includes here out of hirb. Added bunch of defines. Added implementations for scan, delete*, put, get, etc. Added little test suite. M bin/Formatter.rb Fixup. M bin/hirb.rb Filled out help and filled out missing methods.
          Hide
          stack added a comment -

          Closing. Thing basically works and is as good/bad as what its replacing. There are holes in the shell I"m sure but we can create new issues to fix. Added some doc. up on wiki page.

          Show
          stack added a comment - Closing. Thing basically works and is as good/bad as what its replacing. There are holes in the shell I"m sure but we can create new issues to fix. Added some doc. up on wiki page.

            People

            • Assignee:
              stack
              Reporter:
              stack
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development