Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.0
    • Component/s: REST
    • Labels:
      None

      Description

      I've begun work on creating a REST based interface for HBase that can use both JSON and XML and would be extensible enough to add new formats down the road. I'm at a point with this where I would like to submit it for review and to get feedback as I continue to work towards new features.

      Attached to this issue you will find the patch for the changes to this point along with a necessary jar file for the JSON serialization. Also below you will find my notes on how to use what is finished with the interface to this point.

      This patch is based off of jira issues:
      HBASE-814 and HBASE-815

      I am interested on gaining feedback on:
      -what you guys think works
      -what doesn't work for the project
      -anything that may need to be added
      -code style
      -anything else...

      Finished components:
      -framework around parsing json/xml input
      -framework around serialzing xml/json output
      -changes to exception handing
      -changes to the response object to better handle the serializing of output data
      -table CRUD calls
      -Full table fetching
      -creating/fetching scanners

      TODO:
      -fix up the filtering with scanners
      -row insert/delete operations
      -individual row fetching
      -cell fetching interface
      -scanner use interface

      Here are the wiki(ish) notes for what is done to this point:
      REST Service for HBASE Notes:

      GET /
      -retrieves a list of all the tables with their meta data in HBase
      curl -v -H "Accept: text/xml" -X GET -T - http://localhost:60050/

      curl -v -H "Accept: application/json" -X GET -T - http://localhost:60050/

      POST /
      -Create a table
      curl -H "Content-Type: text/xml" -H "Accept: text/xml" -v -X POST -T - http://localhost:60050/newTable
      <table>
      <name>test14</name>
      <columnfamilies>
      <columnfamily>
      <name>subscription</name>
      <max-versions>2</max-versions>
      <compression>NONE</compression>
      <in-memory>false</in-memory>
      <block-cache>true</block-cache>
      </columnfamily>
      </columnfamilies>
      </table>

      Response:
      <status><code>200</code><message>success</message></status>

      JSON:
      curl -H "Content-Type: application/json" -H "Accept: application/json" -v -X POST -T - http://localhost:60050/newTable
      {"name":"test5", "column_families":[

      { "name":"columnfam1", "bloomfilter":true, "time_to_live":10, "in_memory":false, "max_versions":2, "compression":"", "max_value_length":50, "block_cache_enabled":true }

      ]}

      NOTE this is an enum defined in class HColumnDescriptor.CompressionType

      GET /[table_name]
      -returns all records for the table
      curl -v -H "Accept: text/xml" -X GET -T - http://localhost:60050/tablename
      curl -v -H "Accept: application/json" -X GET -T - http://localhost:60050/tablename

      GET /[table_name]

      -Parameter Action
      metadata - returns the metadata for this table.
      regions - returns the regions for this table

      curl -v -H "Accept: text/xml" -X GET -T - http://localhost:60050/pricing1?action=metadata

      Update Table
      PUT /[table_name]
      -updates a table
      curl -v -H "Content-Type: text/xml" -H "Accept: text/xml" -X PUT -T - http://localhost:60050/pricing1
      <columnfamilies>
      <columnfamily>
      <name>subscription</name>
      <max-versions>3</max-versions>
      <compression>NONE</compression>
      <in-memory>false</in-memory>
      <block-cache>true</block-cache>
      </columnfamily>
      <columnfamily>
      <name>subscription1</name>
      <max-versions>3</max-versions>
      <compression>NONE</compression>
      <in-memory>false</in-memory>
      <block-cache>true</block-cache>
      </columnfamily>
      </columnfamilies>

      curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X PUT -T - http://localhost:60050/pricing1
      {"column_families":[

      { "name":"columnfam1", "bloomfilter":true, "time_to_live":10, "in_memory":false, "max_versions":2, "compression":"", "max_value_length":50, "block_cache_enabled":true }

      ,

      { "name":"columnfam2", "bloomfilter":true, "time_to_live":10, "in_memory":false, "max_versions":2, "compression":"", "max_value_length":50, "block_cache_enabled":true }

      ]}

      Delete Table
      curl -v -H "Content-Type: text/xml" -H "Accept: text/xml" -X DELETE -T - http://localhost:60050/TEST16

      creating a scanner
      curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X POST -T - http://localhost:60050/TEST16?action=newscanner

      //TODO fix up the scanner filters.

      response:
      xml:
      <scanner>
      <id>
      2
      </id>
      </scanner>

      json:

      {"id":1}

      Using a scanner
      curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X POST -T - "http://localhost:60050/TEST16?action=scan&scannerId=<scannerID>&numrows=<num rows to return>"

      This would be my first submission to an open source project of this size, so please, give it to me rough. =)

      Thanks.

      1. hbase-1064-patch-v4.patch
        270 kB
        Brian Beggs
      2. AgileJSON.jar
        66 kB
        Brian Beggs
      3. hbase-1064-patch-v3.patch
        265 kB
        Brian Beggs
      4. hbase-1064-patch-v2.patch
        268 kB
        Brian Beggs
      5. REST-Upgrade-Notes.txt
        5 kB
        Brian Beggs
      6. json2.jar
        101 kB
        Brian Beggs
      7. RESTPatch-pass1.patch
        222 kB
        Brian Beggs

        Activity

        Hide
        stack added a comment -

        Added note to head of the REST wik page that REST has been refactored preserving the API

        Show
        stack added a comment - Added note to head of the REST wik page that REST has been refactored preserving the API
        Hide
        stack added a comment -

        Looks like REST still uses the xmlenc for XML outputting (Thats no prob., just thought it no longer used it).

        Show
        stack added a comment - Looks like REST still uses the xmlenc for XML outputting (Thats no prob., just thought it no longer used it).
        Hide
        stack added a comment -

        Committed. Thanks for the fat patch Brian and thanks for persisting through multiple revisions. Thanks too to Michael Gottesman for original work and agile json.

        Show
        stack added a comment - Committed. Thanks for the fat patch Brian and thanks for persisting through multiple revisions. Thanks too to Michael Gottesman for original work and agile json.
        Hide
        sishen added a comment -

        The patch does a good work.

        I'm happy to see the big advance and will update my library to match the new REST interface.

        Show
        sishen added a comment - The patch does a good work. I'm happy to see the big advance and will update my library to match the new REST interface.
        Hide
        stack added a comment -

        HColumnDescriptor patch doesn't apply to TRUNK but thats minor fixup on my part. Would suggest that the REST server print out the port its running on (but that can be another issue). Its a pity this stuff don't work from the browser. I get 406 if I browse there but no biggie. I took a quick look at patch. Looks good to me. I tried it. Was able to do first few curl examples from wiki page. Results looked like those posted on wiki page.

        Sishen, what do you think? If OK with you, if it works with your library, I'd like to apply this and then open new issues as we find broken stuff?

        Show
        stack added a comment - HColumnDescriptor patch doesn't apply to TRUNK but thats minor fixup on my part. Would suggest that the REST server print out the port its running on (but that can be another issue). Its a pity this stuff don't work from the browser. I get 406 if I browse there but no biggie. I took a quick look at patch. Looks good to me. I tried it. Was able to do first few curl examples from wiki page. Results looked like those posted on wiki page. Sishen, what do you think? If OK with you, if it works with your library, I'd like to apply this and then open new issues as we find broken stuff?
        Hide
        Brian Beggs added a comment -

        There are many fixes included in this latest version of the patch, including the following items:

        I retested and fixed any requests that were not working correctly.
        Status messages should be more consistent
        Status (200/404/etc) should now be inline with the rest interface
        Exception handling fixed in a few places
        the path segments parser was rewritten to allow /api/* to be in the path (this was breaking alot of stuff before)
        Base64 encoding on row names, cell names, cell values.

        Show
        Brian Beggs added a comment - There are many fixes included in this latest version of the patch, including the following items: I retested and fixed any requests that were not working correctly. Status messages should be more consistent Status (200/404/etc) should now be inline with the rest interface Exception handling fixed in a few places the path segments parser was rewritten to allow /api/* to be in the path (this was breaking alot of stuff before) Base64 encoding on row names, cell names, cell values.
        Hide
        Brian Beggs added a comment -

        This AgileJson.jar replaces the old json.jar that is attached to this issue. This new jar should be used going forward.

        Show
        Brian Beggs added a comment - This AgileJson.jar replaces the old json.jar that is attached to this issue. This new jar should be used going forward.
        Hide
        stack added a comment -

        Brian: I was hoping it would make it into 0.19.0 (smile).

        Show
        stack added a comment - Brian: I was hoping it would make it into 0.19.0 (smile).
        Hide
        sishen added a comment -

        Brian:

        I see. I think my issues should be how to convert the byte array to the objects in client such as ruby, python, etc, but not the hbase. byte array really works well,

        Show
        sishen added a comment - Brian: I see. I think my issues should be how to convert the byte array to the objects in client such as ruby, python, etc, but not the hbase. byte array really works well,
        Hide
        Brian Beggs added a comment -

        I didn't expect it to make 0.19.0 anyway. Whatever works.

        Show
        Brian Beggs added a comment - I didn't expect it to make 0.19.0 anyway. Whatever works.
        Hide
        stack added a comment -

        Brian: I'd like to punt on this making 0.19.0. Patch is big and its likely we still have stuff to work through, test and review. I see it being 2 or 3 more days at least before we're done review and tests. RC is already well late. OK if we put your work in 0.19.1 hbase? < a month I'd say?

        Show
        stack added a comment - Brian: I'd like to punt on this making 0.19.0. Patch is big and its likely we still have stuff to work through, test and review. I see it being 2 or 3 more days at least before we're done review and tests. RC is already well late. OK if we put your work in 0.19.1 hbase? < a month I'd say?
        Hide
        Brian Beggs added a comment -

        Sishen:
        I don't believe it matters from my perspective as I'm always dealing with byte arrays....

        Stack:
        I'll encode the row, cell name and cell value.

        I have Michael's updated json jar and have incorporated the base64 changes into the json serialization.

        I'm working on a few more issues and hope to have a patch submitted soon.

        Show
        Brian Beggs added a comment - Sishen: I don't believe it matters from my perspective as I'm always dealing with byte arrays.... Stack: I'll encode the row, cell name and cell value. I have Michael's updated json jar and have incorporated the base64 changes into the json serialization. I'm working on a few more issues and hope to have a patch submitted soon.
        Hide
        sishen added a comment -

        Most of the case, the base64 encoded object is the String. But sometimes we do need more, such as int, binary stream. Can it also cover those?

        Show
        sishen added a comment - Most of the case, the base64 encoded object is the String. But sometimes we do need more, such as int, binary stream. Can it also cover those?
        Hide
        stack added a comment -

        Brian: Thanks for checking how couchdb works. As described, it makes sense for a document-oriented store – i.e. couchdb – but strikes me as unsuited to the hbase case especially with possibility of binary keys and column qualifiers.

        Show
        stack added a comment - Brian: Thanks for checking how couchdb works. As described, it makes sense for a document-oriented store – i.e. couchdb – but strikes me as unsuited to the hbase case especially with possibility of binary keys and column qualifiers.
        Hide
        stack added a comment -

        Brian:

        What jgray says regards base64'ing json sounds right.

        You'll need to base64 row names and column family qualifiers too (row names can be raw binary as can the qualifier on the column family).

        Show
        stack added a comment - Brian: What jgray says regards base64'ing json sounds right. You'll need to base64 row names and column family qualifiers too (row names can be raw binary as can the qualifier on the column family).
        Hide
        Brian Beggs added a comment -

        I just wanted to post a snippet of what the return JSON looks like see what everyone thinks:

        a row:
        {"row":"rowName","cells":[

        {"name":"firstName:","value":"firstValueIwillbeBase64ed","timestamp":1229121008893}

        ,

        {"name":"other:","value":"otherValue","timestamp":1229121008893}

        ]}

        Table:
        {"master_running":true,"tables":[
        {"name":"test13","columns":[

        {"time_to_live":-1,"in_memory":false,"name":"subscription:","max_versions":2,"max_value_length":2147483647,"block_cache_enabled":true,"bloomfilter":false}

        ]},{"name":"test14","columns":[

        {"time_to_live":-1,"in_memory":false,"name":"subscription:","max_versions":2,"max_value_length":2147483647,"block_cache_enabled":true,"bloomfilter":false}

        ]}
        ]}

        update/insert a row:
        {"columns":[

        { "name":"other:", "value":"test1", }

        ,

        { "name":"trans:", "value":"yes", }

        ]}

        I need to make 1 last tweak to the JSON to insert a table and I will post that later tonight when I'm done.

        Show
        Brian Beggs added a comment - I just wanted to post a snippet of what the return JSON looks like see what everyone thinks: a row: {"row":"rowName","cells":[ {"name":"firstName:","value":"firstValueIwillbeBase64ed","timestamp":1229121008893} , {"name":"other:","value":"otherValue","timestamp":1229121008893} ]} Table: {"master_running":true,"tables":[ {"name":"test13","columns":[ {"time_to_live":-1,"in_memory":false,"name":"subscription:","max_versions":2,"max_value_length":2147483647,"block_cache_enabled":true,"bloomfilter":false} ]},{"name":"test14","columns":[ {"time_to_live":-1,"in_memory":false,"name":"subscription:","max_versions":2,"max_value_length":2147483647,"block_cache_enabled":true,"bloomfilter":false} ]} ]} update/insert a row: {"columns":[ { "name":"other:", "value":"test1", } , { "name":"trans:", "value":"yes", } ]} I need to make 1 last tweak to the JSON to insert a table and I will post that later tonight when I'm done.
        Hide
        Michael Gottesman added a comment -

        brian, you beat me to it =). Give me 30 minutes or so and check my github repo.

        Show
        Michael Gottesman added a comment - brian, you beat me to it =). Give me 30 minutes or so and check my github repo.
        Hide
        Brian Beggs added a comment -

        Perhaps there is something we can do with the JSON annotations jar that would allow a value to be base64 encoded using the annotations?

        Show
        Brian Beggs added a comment - Perhaps there is something we can do with the JSON annotations jar that would allow a value to be base64 encoded using the annotations?
        Hide
        Jonathan Gray added a comment -

        We are using binary data with JSON by base64'ing it. This seems like a sane approach and has worked well for us.

        Show
        Jonathan Gray added a comment - We are using binary data with JSON by base64'ing it. This seems like a sane approach and has worked well for us.
        Hide
        Brian Beggs added a comment -

        I inspected the couchdb rest interface and here is what I found.

        Binary files are specifically marked as attachments and retrieved separately from the rest of the data. Binary attachments are uploaded as base64 values, but attachments are retrieved via their own urls and are not encoded.

        Since there is no distinction in hbase between binary and regular data some kind of encoding will need to be used I'm assuming.

        Show
        Brian Beggs added a comment - I inspected the couchdb rest interface and here is what I found. Binary files are specifically marked as attachments and retrieved separately from the rest of the data. Binary attachments are uploaded as base64 values, but attachments are retrieved via their own urls and are not encoded. Since there is no distinction in hbase between binary and regular data some kind of encoding will need to be used I'm assuming.
        Hide
        stack added a comment -

        dang. you'll have to base64 it then? Is that so? Whats couchdb do? REST is their client.

        Show
        stack added a comment - dang. you'll have to base64 it then? Is that so? Whats couchdb do? REST is their client.
        Hide
        Brian Beggs added a comment -

        JSON does not allow the transfer of binary data:

        From http://www.json.org/xml.html :
        JSON does not have a <[CDATA[]]> feature, so it is not well suited to act as a carrier of sounds or images or other large binary payloads.

        Show
        Brian Beggs added a comment - JSON does not allow the transfer of binary data: From http://www.json.org/xml.html : JSON does not have a <[CDATA[]]> feature, so it is not well suited to act as a carrier of sounds or images or other large binary payloads.
        Hide
        stack added a comment -

        Sounds like you've found a bug in the XML. Seems bad you can't do multiple columns at once.

        Maybe do as is for now. Add fixing multiple column update as new issue that can be done after?

        base64'ing is because you can't have binary in XML. In JSON, you can carry binary, right? If so, no need base64ing JSON payloads.

        Michael is working on cleaning up the json jar – fixing licenses, etc. Should have something to add soon.

        Show
        stack added a comment - Sounds like you've found a bug in the XML. Seems bad you can't do multiple columns at once. Maybe do as is for now. Add fixing multiple column update as new issue that can be done after? base64'ing is because you can't have binary in XML. In JSON, you can carry binary, right? If so, no need base64ing JSON payloads. Michael is working on cleaning up the json jar – fixing licenses, etc. Should have something to add soon.
        Hide
        Brian Beggs added a comment -

        I'm working on a patch for this issue along with a few others I have found.

        A few questions..

        When updating a row, with the current xml structure:
        <column>
        <name>other:</name>
        <value>test5</value>
        </column>

        you can only update one column at a time. If that's the way it should be kept fine, but I can change it to allow multiple columns in a row all be updated at once by adding a new root element to the xml.

        Should the input values be base 64 encoded as well?

        Also for the JSON implementation do we want the values there base 64 encoded as well? And same question for the JSON input, should that also be base64 encoded. It may be preferable to not base64 encode the json as base64 encode/decode is not available natively in the language.

        Show
        Brian Beggs added a comment - I'm working on a patch for this issue along with a few others I have found. A few questions.. When updating a row, with the current xml structure: <column> <name>other:</name> <value>test5</value> </column> you can only update one column at a time. If that's the way it should be kept fine, but I can change it to allow multiple columns in a row all be updated at once by adding a new root element to the xml. Should the input values be base 64 encoded as well? Also for the JSON implementation do we want the values there base 64 encoded as well? And same question for the JSON input, should that also be base64 encoded. It may be preferable to not base64 encode the json as base64 encode/decode is not available natively in the language.
        Hide
        stack added a comment -

        Brian: That complies. All tests pass. To try and prove it adheres to old interface, I tried running the sishen prescriptions at end of this page under title 'Examples using curl'. On first one, 'durruti:cleantrunk stack$ curl -v -X POST -T - http://localhost:60050/api/', I entered single line: '<?xml version="1.0" encoding="UTF-8"?> <table> <name>tables</name> <columnfamilies> <columnfamily> <name>subscription</name> <max-versions>2</max-versions> <compression>NONE</compression> <in-memory>false</in-memory> <block-cache>true</block-cache> </columnfamily> </columnfamilies> </table>' and I get back below in the rest log:

        2009-01-12 22:40:58,008 WARN jsonrest: /api/:
        java.lang.ArrayIndexOutOfBoundsException: 1
            at org.apache.hadoop.hbase.rest.Dispatcher.doPost(Dispatcher.java:219)
            at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
            at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
            at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
            at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
            at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
            at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
            at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
            at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
            at org.mortbay.http.HttpServer.service(HttpServer.java:954)
            at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
            at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
            at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
            at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
            at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
            at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
        

        Does it work for you?

        Show
        stack added a comment - Brian: That complies. All tests pass. To try and prove it adheres to old interface, I tried running the sishen prescriptions at end of this page under title 'Examples using curl'. On first one, 'durruti:cleantrunk stack$ curl -v -X POST -T - http://localhost:60050/api/ ', I entered single line: '<?xml version="1.0" encoding="UTF-8"?> <table> <name>tables</name> <columnfamilies> <columnfamily> <name>subscription</name> <max-versions>2</max-versions> <compression>NONE</compression> <in-memory>false</in-memory> <block-cache>true</block-cache> </columnfamily> </columnfamilies> </table>' and I get back below in the rest log: 2009-01-12 22:40:58,008 WARN jsonrest: /api/: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.hbase. rest .Dispatcher.doPost(Dispatcher.java:219) at javax.servlet.http.HttpServlet.service(HttpServlet.java:709) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) Does it work for you?
        Hide
        Michael Gottesman added a comment -

        @Brian

        I am going to submit it to json.org soon (I should have done it six months ago but things got in the way). It should really be called Agile-Json2.0.jar because the whole idea about it is to make Serialization to JSON of lots and lots of classes obscenely easy and then to be replaced with more hardcore items later if more performance is needed.

        Imagine using JSON.org serialization code for 5 objects. Now imagine writing JSON.org serialization code for 100 objects. Wouldnt you rather just mark it with @ToJSON?. That is the premise anyways.

        @Stacks

        the source is here:
        http://github.com/gottesmm/agile-json-2.0/tree/master

        Show
        Michael Gottesman added a comment - @Brian I am going to submit it to json.org soon (I should have done it six months ago but things got in the way). It should really be called Agile-Json2.0.jar because the whole idea about it is to make Serialization to JSON of lots and lots of classes obscenely easy and then to be replaced with more hardcore items later if more performance is needed. Imagine using JSON.org serialization code for 5 objects. Now imagine writing JSON.org serialization code for 100 objects. Wouldnt you rather just mark it with @ToJSON?. That is the premise anyways. @Stacks the source is here: http://github.com/gottesmm/agile-json-2.0/tree/master
        Hide
        Brian Beggs added a comment -

        Also for the json jar naming, how about jsonAnnotation.jar?

        Show
        Brian Beggs added a comment - Also for the json jar naming, how about jsonAnnotation.jar?
        Hide
        Brian Beggs added a comment -

        Stack, give this a try. I fixed the problems you were encountering. I was playing with an xml serialization library that used annotations like the JSON library but it was not suppose to be included in the patch. Sorry about that.

        Show
        Brian Beggs added a comment - Stack, give this a try. I fixed the problems you were encountering. I was playing with an xml serialization library that used annotations like the JSON library but it was not suppose to be included in the patch. Sorry about that.
        Hide
        stack added a comment -

        A quick look at the code and it looks fine.

        Add '<pre>' around things like this in your javadoc:

        + * 
        + * {
        + *  "type" : "WhileMatchRowFilter",
        ...
        

        .... if you want to keep your formatting.

        Are you not using xmlenc-0.53.jar to do XML?

        Show
        stack added a comment - A quick look at the code and it looks fine. Add '<pre>' around things like this in your javadoc: + * + * { + * "type" : "WhileMatchRowFilter" , ... .... if you want to keep your formatting. Are you not using xmlenc-0.53.jar to do XML?
        Hide
        stack added a comment -

        Update your hbase or leave this out of your patch 'Index: src/webapps/master/WEB-INF/web.xml'. This caused it fail to apply to trunk (Not yo really. I need to figure whats going on here...in this generated code).

        I figured out that I need to add the json.jar – what would be a better name for this jar – but how should I fix these Brian?

        
        compile:
            [javac] Compiling 276 source files to /Users/stack/Documents/checkouts/cleantrunk/build/classes
            [javac] /Users/stack/Documents/checkouts/cleantrunk/src/java/org/apache/hadoop/hbase/HColumnDescriptor.java:37: package org.simpleframework.xml does not exist
            [javac] import org.simpleframework.xml.Element;
            [javac]                               ^
            [javac] /Users/stack/Documents/checkouts/cleantrunk/src/java/org/apache/hadoop/hbase/HColumnDescriptor.java:38: package org.simpleframework.xml does not exist
            [javac] import org.simpleframework.xml.Root;
        
        
        Show
        stack added a comment - Update your hbase or leave this out of your patch 'Index: src/webapps/master/WEB-INF/web.xml'. This caused it fail to apply to trunk (Not yo really. I need to figure whats going on here...in this generated code). I figured out that I need to add the json.jar – what would be a better name for this jar – but how should I fix these Brian? compile: [javac] Compiling 276 source files to /Users/stack/Documents/checkouts/cleantrunk/build/classes [javac] /Users/stack/Documents/checkouts/cleantrunk/src/java/org/apache/hadoop/hbase/HColumnDescriptor.java:37: package org.simpleframework.xml does not exist [javac] import org.simpleframework.xml.Element; [javac] ^ [javac] /Users/stack/Documents/checkouts/cleantrunk/src/java/org/apache/hadoop/hbase/HColumnDescriptor.java:38: package org.simpleframework.xml does not exist [javac] import org.simpleframework.xml.Root;
        Hide
        Brian Beggs added a comment -

        Attached you will find the latest patch for the REST implementation.

        The interface now conforms to the current rest interface. The interface now also returns either xml or json. Simply change your
        Accept: text/xml header
        to:
        Accept: application/json

        I could really use a sanity check on this. I'm sure there are some defects and I'm going to go back through and start double checking this stuff now.

        Show
        Brian Beggs added a comment - Attached you will find the latest patch for the REST implementation. The interface now conforms to the current rest interface. The interface now also returns either xml or json. Simply change your Accept: text/xml header to: Accept: application/json I could really use a sanity check on this. I'm sure there are some defects and I'm going to go back through and start double checking this stuff now.
        Hide
        stack added a comment -

        I agree that smoothest path would be a refactor that keeps current API. Good man Brian.

        Show
        stack added a comment - I agree that smoothest path would be a refactor that keeps current API. Good man Brian.
        Hide
        Brian Beggs added a comment -

        I've been on vacation for the last few days and am just getting back to work on this today. Given the feedback I think that the best course of action is to make this interface conform to the current rest interface. I am starting work on this today. I will hopefully have a patch early next week.

        Show
        Brian Beggs added a comment - I've been on vacation for the last few days and am just getting back to work on this today. Given the feedback I think that the best course of action is to make this interface conform to the current rest interface. I am starting work on this today. I will hopefully have a patch early next week.
        Hide
        sishen added a comment - - edited

        Hi, Brian.

        I don't see the benefit of the urls

        http://localhost:60050/testtable1/thesecondrow/rowWithData:otherData
        http://localhost:60050/testtable1/thesecondrow/rowWithData:otherData/1229121022233

        Instead, it limits to get multiple columns once.

        The current REST implementation do support this but as the query string

        /testtables/row/thesecondrow?column=rowWithData:otherData
        /testtables/row/thesecondrow/1229121022233?column=rowWithData:otherData

        Show
        sishen added a comment - - edited Hi, Brian. I don't see the benefit of the urls http://localhost:60050/testtable1/thesecondrow/rowWithData:otherData http://localhost:60050/testtable1/thesecondrow/rowWithData:otherData/1229121022233 Instead, it limits to get multiple columns once. The current REST implementation do support this but as the query string /testtables/row/thesecondrow?column=rowWithData:otherData /testtables/row/thesecondrow/1229121022233?column=rowWithData:otherData
        Hide
        Brian Beggs added a comment -

        I have attached my unfinished notes on all of the operations available to the interface and examples of how to test them using curl.

        Perhaps this may give a better window into how the interface currently works.

        file is named: REST-Upgrade-Notes.txt

        Show
        Brian Beggs added a comment - I have attached my unfinished notes on all of the operations available to the interface and examples of how to test them using curl. Perhaps this may give a better window into how the interface currently works. file is named: REST-Upgrade-Notes.txt
        Hide
        Brian Beggs added a comment -

        Brian: I don't exactly following the below:

        .bq Also the reason for the change in moving to the query string for some of these items is that in order to retrieve the row/column/timestamp using the path you are unable to have any directives in the path. Unless we wanted to get into the thought of reserved words, which IMHO is a bad idea and complicates the interface.

        So with this new implementation of the REST interface it's possible to query a table, row, column, or timestamp directly using the path that follows the url.

        For example:
        http://localhost:60050/testtable1/thesecondrow
        Would retrieve the second row from testtable1.

        http://localhost:60050/testtable1/thesecondrow/rowWithData:otherData
        Would retrieve the column rowWithData:otherData from the secondrow from testtable1.

        same thing works for timestamps...
        http://localhost:60050/testtable1/thesecondrow/rowWithData:otherData/1229121022233
        Would retrieve the cell at timestamp 1229121022233, from cell rowWithData:otherData, in row thesecondrow, from table testtable1

        .bq Now I think the real question that needs to be answered... is it necessary or desirable to query out the row/column/timestamp data in this RESTful fashion using the path?

        So my question is... Is it desirable to have the interface work in such a way that you are able to query out timestamp and individual cell data as in the examples above? If the answer is no I believe it will be relatively easy to remove those parts of the interface and make this REST implementation match the current REST implementation. Though the ability to query out cells by identifier and cells by timestamp will be lost. Though I do not believe this functionality is available in the current rest implementation.

        If the answer is yes, we want to query in the /table/row/column/timestamp fashion, this is the reason that the directives (and when I say directive I mean things such as fetching region data or using a scanner) were moved into a query string. Now if we wanted to keep this interface and allow for querying with the directives in the path I believe that the logic that would be required could make the code much more complex than it already is and harder to maintain. And for what it's worth I don't feel it's the most straight forward implementation as it currently stands.

        Adding additional complexity to the path I feel would make the harder to maintain and add too. Where as putting these parameters in a query string, I feel, simplifies the addition of future code.

        To address Tom's questions:

        What advantage does this provide besides the perception of being more restful?

        Again, I'm not sure I have the full answer for this. I chose this implementation for selfish reason outlined below. And I'm not really sure if the ability to query cells by identifier/timestamp is something that is truly necessary for HBase. This is one of the questions I'm hoping someone who has been working on the project can answer.

        The reason I initially chose to start working on this implementation of the REST interface from the patches in issues 814 and 815 was that I felt it would be easier to separate the parsing/serialization code out of this version. I also felt that more modification would need to be done to the current interface to allow JSON to be sent using it than this implementation would take to send xml from it.

        I did not fully understand exactly how items were being retrieved out of the interface until I was some way into the project and began to notice the differences in the interface.

        If the proposed tablename/[row]/[cols]/[timestamp] interface is adopted, how do you GET/PUT/POST/DELETE scanners?

        From my notes:

        creating a scanner
        curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X POST -T - http://localhost:60050/TEST16?action=newscanner

        //TODO fix up the scanner filters.

        response:
        xml:
        <scanner>
        <id>
        2
        </id>
        </scanner>

        json:

        {"id":1}

        Using a scanner
        curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X POST -T - "http://localhost:60050/TEST16?action=scan&scannerid=<scannerID>&numrows=<num rows to return>"

        //TODO scanner action to return all rows between 2 row ID's

        Closing a scanner
        curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X POST -T - "http://localhost:60050/TEST16?action=closescanner&scannerid=<scannerId>"

        In short, a scanner is a stateful resource (like a table) - not an action. The proposed model means that a table cannot have any "child resources" - just rows. So you could potentially make a scanner a root-level type, and make an interface like scanner/[id]/[opts]
        So you'd POST scanner/?table=myTable&cols=....
        then GET scanner/[id]

        because the proposed table interface leaves no room for table/scanner/ - scanner would be interpreted as a row ID.

        I, for one, thought the old interface worked well because it allowed one to access different resource on a given table. Given, 'enable' and 'disable' are actions, not resources.

        I believe these issues are addressed above. I will say that putting a directive as the first item in the path is possible, though it will always need to be there.

        Think about what other resources might be added to the interface (i.e. maybe MapReduce jobs, Pig jobs, etc) - would those be resources of a specific table, or root-level types? If you adopt the tablename/rowID/cols interface, it leaves no room for child resources other than rows.

        Perhaps stack or someone can comment on this further, but it seems with the paradigm of HBase and how a column store database works I have trouble thinking of a case where you were trying to query the datababase and it didn't go from /table/row Though I could see possible changes further down the path from there.

        Also as far as PIG or MapReduce jobs go.... I believe implementing these interfaces will be taken care of by their respective groups. It's probably best to stick with what works for HBase and let the other projects decide what's best for them.

        Show
        Brian Beggs added a comment - Brian: I don't exactly following the below: .bq Also the reason for the change in moving to the query string for some of these items is that in order to retrieve the row/column/timestamp using the path you are unable to have any directives in the path. Unless we wanted to get into the thought of reserved words, which IMHO is a bad idea and complicates the interface. So with this new implementation of the REST interface it's possible to query a table, row, column, or timestamp directly using the path that follows the url. For example: http://localhost:60050/testtable1/thesecondrow Would retrieve the second row from testtable1. http://localhost:60050/testtable1/thesecondrow/rowWithData:otherData Would retrieve the column rowWithData:otherData from the secondrow from testtable1. same thing works for timestamps... http://localhost:60050/testtable1/thesecondrow/rowWithData:otherData/1229121022233 Would retrieve the cell at timestamp 1229121022233, from cell rowWithData:otherData, in row thesecondrow, from table testtable1 .bq Now I think the real question that needs to be answered... is it necessary or desirable to query out the row/column/timestamp data in this RESTful fashion using the path? So my question is... Is it desirable to have the interface work in such a way that you are able to query out timestamp and individual cell data as in the examples above? If the answer is no I believe it will be relatively easy to remove those parts of the interface and make this REST implementation match the current REST implementation. Though the ability to query out cells by identifier and cells by timestamp will be lost. Though I do not believe this functionality is available in the current rest implementation. If the answer is yes, we want to query in the /table/row/column/timestamp fashion, this is the reason that the directives (and when I say directive I mean things such as fetching region data or using a scanner) were moved into a query string. Now if we wanted to keep this interface and allow for querying with the directives in the path I believe that the logic that would be required could make the code much more complex than it already is and harder to maintain. And for what it's worth I don't feel it's the most straight forward implementation as it currently stands. Adding additional complexity to the path I feel would make the harder to maintain and add too. Where as putting these parameters in a query string, I feel, simplifies the addition of future code. To address Tom's questions: What advantage does this provide besides the perception of being more restful? Again, I'm not sure I have the full answer for this. I chose this implementation for selfish reason outlined below. And I'm not really sure if the ability to query cells by identifier/timestamp is something that is truly necessary for HBase. This is one of the questions I'm hoping someone who has been working on the project can answer. The reason I initially chose to start working on this implementation of the REST interface from the patches in issues 814 and 815 was that I felt it would be easier to separate the parsing/serialization code out of this version. I also felt that more modification would need to be done to the current interface to allow JSON to be sent using it than this implementation would take to send xml from it. I did not fully understand exactly how items were being retrieved out of the interface until I was some way into the project and began to notice the differences in the interface. If the proposed tablename/ [row] / [cols] / [timestamp] interface is adopted, how do you GET/PUT/POST/DELETE scanners? From my notes: creating a scanner curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X POST -T - http://localhost:60050/TEST16?action=newscanner //TODO fix up the scanner filters. response: xml: <scanner> <id> 2 </id> </scanner> json: {"id":1} Using a scanner curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X POST -T - "http://localhost:60050/TEST16?action=scan&scannerid=<scannerID>&numrows=<num rows to return>" //TODO scanner action to return all rows between 2 row ID's Closing a scanner curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X POST -T - "http://localhost:60050/TEST16?action=closescanner&scannerid=<scannerId>" In short, a scanner is a stateful resource (like a table) - not an action. The proposed model means that a table cannot have any "child resources" - just rows. So you could potentially make a scanner a root-level type, and make an interface like scanner/ [id] / [opts] So you'd POST scanner/?table=myTable&cols=.... then GET scanner/ [id] because the proposed table interface leaves no room for table/scanner/ - scanner would be interpreted as a row ID. I, for one, thought the old interface worked well because it allowed one to access different resource on a given table. Given, 'enable' and 'disable' are actions, not resources. I believe these issues are addressed above. I will say that putting a directive as the first item in the path is possible, though it will always need to be there. Think about what other resources might be added to the interface (i.e. maybe MapReduce jobs, Pig jobs, etc) - would those be resources of a specific table, or root-level types? If you adopt the tablename/rowID/cols interface, it leaves no room for child resources other than rows. Perhaps stack or someone can comment on this further, but it seems with the paradigm of HBase and how a column store database works I have trouble thinking of a case where you were trying to query the datababase and it didn't go from /table/row Though I could see possible changes further down the path from there. Also as far as PIG or MapReduce jobs go.... I believe implementing these interfaces will be taken care of by their respective groups. It's probably best to stick with what works for HBase and let the other projects decide what's best for them.
        Hide
        stack added a comment -

        Brian: I don't exactly following the below:

        .bq Also the reason for the change in moving to the query string for some of these items is that in order to retrieve the row/column/timestamp using the path you are unable to have any directives in the path. Unless we wanted to get into the thought of reserved words, which IMHO is a bad idea and complicates the interface.

        Can you say more so I can understand better the question below.

        .bq Now I think the real question that needs to be answered... is it necessary or desirable to query out the row/column/timestamp data in this RESTful fashion using the path?

        Can you answer the Tom Nicols questions Brian?

        As stated already, what the old API has going for it is that if the new implementation kept it up, then it could be just dropped in. Otherwise, we have to do the deprecate/remove dance.

        Good stuff.

        Show
        stack added a comment - Brian: I don't exactly following the below: .bq Also the reason for the change in moving to the query string for some of these items is that in order to retrieve the row/column/timestamp using the path you are unable to have any directives in the path. Unless we wanted to get into the thought of reserved words, which IMHO is a bad idea and complicates the interface. Can you say more so I can understand better the question below. .bq Now I think the real question that needs to be answered... is it necessary or desirable to query out the row/column/timestamp data in this RESTful fashion using the path? Can you answer the Tom Nicols questions Brian? As stated already, what the old API has going for it is that if the new implementation kept it up, then it could be just dropped in. Otherwise, we have to do the deprecate/remove dance. Good stuff.
        Hide
        Tom Nichols added a comment -

        I replied on the mailing list but only people subscribe to the dev list will see my comments. So I'll quickly summarize what I said there.

        Basically –

        1. What advantage does this provide besides the perception of being more restful?
        2. If the proposed tablename/[row]/[cols]/[timestamp] interface is adopted, how do you GET/PUT/POST/DELETE scanners?

        In short, a scanner is a stateful resource (like a table) – not an action. The proposed model means that a table cannot have any "child resources" – just rows. So you could potentially make a scanner a root-level type, and make an interface like scanner/[id]/[opts]

        So you'd POST scanner/?table=myTable&cols=....
        then GET scanner/[id]

        because the proposed table interface leaves no room for table/scanner/ – scanner would be interpreted as a row ID.

        I, for one, thought the old interface worked well because it allowed one to access different resource on a given table. Given, 'enable' and 'disable' are actions, not resources.

        Think about what other resources might be added to the interface (i.e. maybe MapReduce jobs, Pig jobs, etc) – would those be resources of a specific table, or root-level types? If you adopt the tablename/rowID/cols interface, it leaves no room for child resources other than rows.

        Show
        Tom Nichols added a comment - I replied on the mailing list but only people subscribe to the dev list will see my comments. So I'll quickly summarize what I said there. Basically – What advantage does this provide besides the perception of being more restful? If the proposed tablename/ [row] / [cols] / [timestamp] interface is adopted, how do you GET/PUT/POST/DELETE scanners? In short, a scanner is a stateful resource (like a table) – not an action. The proposed model means that a table cannot have any "child resources" – just rows. So you could potentially make a scanner a root-level type, and make an interface like scanner/ [id] / [opts] So you'd POST scanner/?table=myTable&cols=.... then GET scanner/ [id] because the proposed table interface leaves no room for table/scanner/ – scanner would be interpreted as a row ID. I, for one, thought the old interface worked well because it allowed one to access different resource on a given table. Given, 'enable' and 'disable' are actions, not resources. Think about what other resources might be added to the interface (i.e. maybe MapReduce jobs, Pig jobs, etc) – would those be resources of a specific table, or root-level types? If you adopt the tablename/rowID/cols interface, it leaves no room for child resources other than rows.
        Hide
        Andrew Purtell added a comment -

        +1, and +1 again for separating directives out into query parameters, leaving the path strictly for cell addressing.

        Show
        Andrew Purtell added a comment - +1, and +1 again for separating directives out into query parameters, leaving the path strictly for cell addressing.
        Hide
        Brian Beggs added a comment -

        I'm not sure if Michael Gottesman reads the dev list or not but he may want to comment on what I'm about to say.

        First let me point out the differences.

        current REST implementation:
        the path that follows the url is in this format /tablename/<action>/[additional info]

        where action can be enable/disable a table, a scanner or a call to retrieve a row.

        In the implementation I'm working on the path that follows the url is in this format:
        /tablename/[row]/[column]/[timestamp]
        Also a query string can be included depending on what you wanted to do.

        For example to get table regions data with this implementation:
        http://localhost:60050/<tableName>?action=regions
        the current way:
        http://localhost:60050/<tableName>/regions

        or to get a scanner with this implemenation:
        http://localhost:60050/<tableName>?action=newscanner
        the current way:
        http://localhost:60050/<table_name>/scanner

        As you can see the interface is a bit different where as the implementation I'm working on is perhaps a bit more restful in spirit.

        Also the reason for the change in moving to the query string for some of these items is that in order to retrieve the row/column/timestamp using the path you are unable to have any directives in the path. Unless we wanted to get into the thought of reserved words, which IMHO is a bad idea and complicates the interface.

        Now I think the real question that needs to be answered... is it necessary or desirable to query out the row/column/timestamp data in this RESTful fashion using the path?

        If no the interface can be changed to be much closer to the current implementation.

        If yes, then the interface needs to change.

        To be honest, a change like this is not currently on my roadmap, though I feel it could be done with a few days worth of work. Obviously I do not desire to break anyones current interface into the system, but at the same time you can't make an omelet without breaking a few eggs. And I also fee that if a big change to something like this does go in, sooner is probably better than later as adoption of this project picks up.

        I also am not sure if I have an opinion either way on the interface. I tend to like the new model a bit better, but I think that the questions that really needs to be answered are, what are the needs of the users of the rest interface currently? Are they getting everything they need? Could the interface be better? Is there a need for a better interface? Will the current interface meet the demands of future users? Is the current interface extensible enough to allow the HBase project to expand in the future?

        And I really don't have the answer to these questions. I'm still somewhat of an hbase noob.

        Show
        Brian Beggs added a comment - I'm not sure if Michael Gottesman reads the dev list or not but he may want to comment on what I'm about to say. First let me point out the differences. current REST implementation: the path that follows the url is in this format /tablename/<action>/ [additional info] where action can be enable/disable a table, a scanner or a call to retrieve a row. In the implementation I'm working on the path that follows the url is in this format: /tablename/ [row] / [column] / [timestamp] Also a query string can be included depending on what you wanted to do. For example to get table regions data with this implementation: http://localhost:60050/ <tableName>?action=regions the current way: http://localhost:60050/ <tableName>/regions or to get a scanner with this implemenation: http://localhost:60050/ <tableName>?action=newscanner the current way: http://localhost:60050/ <table_name>/scanner As you can see the interface is a bit different where as the implementation I'm working on is perhaps a bit more restful in spirit. Also the reason for the change in moving to the query string for some of these items is that in order to retrieve the row/column/timestamp using the path you are unable to have any directives in the path. Unless we wanted to get into the thought of reserved words, which IMHO is a bad idea and complicates the interface. Now I think the real question that needs to be answered... is it necessary or desirable to query out the row/column/timestamp data in this RESTful fashion using the path? If no the interface can be changed to be much closer to the current implementation. If yes, then the interface needs to change. To be honest, a change like this is not currently on my roadmap, though I feel it could be done with a few days worth of work. Obviously I do not desire to break anyones current interface into the system, but at the same time you can't make an omelet without breaking a few eggs. And I also fee that if a big change to something like this does go in, sooner is probably better than later as adoption of this project picks up. I also am not sure if I have an opinion either way on the interface. I tend to like the new model a bit better, but I think that the questions that really needs to be answered are, what are the needs of the users of the rest interface currently? Are they getting everything they need? Could the interface be better? Is there a need for a better interface? Will the current interface meet the demands of future users? Is the current interface extensible enough to allow the HBase project to expand in the future? And I really don't have the answer to these questions. I'm still somewhat of an hbase noob.
        Hide
        stack added a comment -

        IMO, if the two APIs were effectively the same, then no need of our maintaining two REST implementations and we can just slot the new stuff in soon as 0.19.0 goes out (Would be nice if tools that depend on REST weren't broken by the REST upgrade).

        Show
        stack added a comment - IMO, if the two APIs were effectively the same, then no need of our maintaining two REST implementations and we can just slot the new stuff in soon as 0.19.0 goes out (Would be nice if tools that depend on REST weren't broken by the REST upgrade).
        Hide
        sishen added a comment -

        Excellent work. Big step move.

        For the request api, should we try to keep the same as orginal? I think this will be helpful to the merge and also to the third side library which used the REST api now.

        Show
        sishen added a comment - Excellent work. Big step move. For the request api, should we try to keep the same as orginal? I think this will be helpful to the merge and also to the third side library which used the REST api now.
        Hide
        Brian Beggs added a comment -

        Yeah I've seen it and have been using it. My hope for all of this is to get this all integrated together, get the wiki page updated and complete with information for both JSON and XML for the REST interface and kind of tie both of these implementations together with a more extensible input/output interface.

        At least that's the plan. I'm hoping to have alot of this wrapped up in the coming week or 2.

        Show
        Brian Beggs added a comment - Yeah I've seen it and have been using it. My hope for all of this is to get this all integrated together, get the wiki page updated and complete with information for both JSON and XML for the REST interface and kind of tie both of these implementations together with a more extensible input/output interface. At least that's the plan. I'm hoping to have alot of this wrapped up in the coming week or 2.
        Hide
        stack added a comment -

        I like the suggestion of a metrics view.

        Brian, in case you hadn't seen it, Michael Gottesman wanted to make sure you'd seen this page he'd been working on: http://wiki.apache.org/hadoop/Hbase/JSONRest.

        Show
        stack added a comment - I like the suggestion of a metrics view. Brian, in case you hadn't seen it, Michael Gottesman wanted to make sure you'd seen this page he'd been working on: http://wiki.apache.org/hadoop/Hbase/JSONRest .
        Hide
        Jonathan Gray added a comment -

        This looks great.

        If we can get some metrics out from the Master and RegionServers through this interface, then it's a much easier way to develop a nagios/nrpe plugin. Previously I hacked apart the existing web ui to output information in XML or JSON, this would make much more sense.

        Show
        Jonathan Gray added a comment - This looks great. If we can get some metrics out from the Master and RegionServers through this interface, then it's a much easier way to develop a nagios/nrpe plugin. Previously I hacked apart the existing web ui to output information in XML or JSON, this would make much more sense.
        Hide
        Brian Beggs added a comment -

        Ok, first I misquoted the Jira issues, The issues this patch is based off of are HBASE-814 and HBASE-815.

        + The REST API looks good. Very RESTy. There might be little nitpicks later but for it looks great. How different is it from current REST. Should the two be the same? If not, how to deprecate the old?

        I can't take credit for the full architecture of the system as this patch was initially submitted by Michael Gottesman. I did rework the way that the input/output flowed out of the system mainly. I also have added some missing functionality that was in the original REST implementation that was not in this one.

        I've been trying to stay as close to the original implementation as possible. Some of the calls have changed, though I've tried to keep the xml output/input as close as possible to the original implementation.

        I will start working on a changelog to hold the differences and attach it to this issue.

        It's possible with some package name changes that both interfaces could be bundled with hbase for a release or 2 with the old one eventually being removed.

        + Where does the jar come from and whats its license
        Micheal included this jar with his original issue in HBASE-814. after I bit of research I found here: http://github.com/gottesmm/agile-json-2.0/tree/master
        It appears that the jar is based on the org.json implementation

        Here is the license from json.org:
        Copyright (c) 2002 JSON.org

        Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

        The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

        The Software shall be used for Good, not Evil.

        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

        If needed I could rewrite this code at some future point if this library would prove unsatisfactory, thought it would need to happen once I get the initial implementation finished.

        + The annotations look interesting. Could their serializations be used in other contexts, the shell say? Or - warning -> crazy-thought coming - somehow producing thrift IDL?

        something like this is quite possible I would think, and does make serialization much easier. The tradeoff however is that your domain/model objects end up with quite a few annotations in them and it can be ugly. Though it will save you from writing quite a bit of code.

        I am not familiar with thift IDL but if something using annotations could be used for serialization it would be something that could be written as a separate module and could have an impact beyond this project.

        I had initially started using annotations for the XML serialization but pulled them out and opted for a simpler approach, but this is something that could be changed at a later time. I was using this library for the annotation serialization: http://simple.sourceforge.net/

        + In hadoop/hbase, line lengths are < 80 chars (Your new classes are missing apache license, classes are missing comments describing what class is about, etc).

        Yes I will fix these issues with the next patch I supply.

        + Is there anything we could do refactoring HTable say, so your modeling was easier? Looks like lots of code in your Cell and Database controllers. Should our client be taking on this model?

        Possibly. I haven't been down that far into the interface yet to fully comment on that issue at this point.

        + Can you add a note on how you've changed how REST works, high-level?
        Yes I will supply this in the coming days.

        Show
        Brian Beggs added a comment - Ok, first I misquoted the Jira issues, The issues this patch is based off of are HBASE-814 and HBASE-815 . + The REST API looks good. Very RESTy. There might be little nitpicks later but for it looks great. How different is it from current REST. Should the two be the same? If not, how to deprecate the old? I can't take credit for the full architecture of the system as this patch was initially submitted by Michael Gottesman. I did rework the way that the input/output flowed out of the system mainly. I also have added some missing functionality that was in the original REST implementation that was not in this one. I've been trying to stay as close to the original implementation as possible. Some of the calls have changed, though I've tried to keep the xml output/input as close as possible to the original implementation. I will start working on a changelog to hold the differences and attach it to this issue. It's possible with some package name changes that both interfaces could be bundled with hbase for a release or 2 with the old one eventually being removed. + Where does the jar come from and whats its license Micheal included this jar with his original issue in HBASE-814 . after I bit of research I found here: http://github.com/gottesmm/agile-json-2.0/tree/master It appears that the jar is based on the org.json implementation Here is the license from json.org: Copyright (c) 2002 JSON.org Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. The Software shall be used for Good, not Evil. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. If needed I could rewrite this code at some future point if this library would prove unsatisfactory, thought it would need to happen once I get the initial implementation finished. + The annotations look interesting. Could their serializations be used in other contexts, the shell say? Or - warning -> crazy-thought coming - somehow producing thrift IDL? something like this is quite possible I would think, and does make serialization much easier. The tradeoff however is that your domain/model objects end up with quite a few annotations in them and it can be ugly. Though it will save you from writing quite a bit of code. I am not familiar with thift IDL but if something using annotations could be used for serialization it would be something that could be written as a separate module and could have an impact beyond this project. I had initially started using annotations for the XML serialization but pulled them out and opted for a simpler approach, but this is something that could be changed at a later time. I was using this library for the annotation serialization: http://simple.sourceforge.net/ + In hadoop/hbase, line lengths are < 80 chars (Your new classes are missing apache license, classes are missing comments describing what class is about, etc). Yes I will fix these issues with the next patch I supply. + Is there anything we could do refactoring HTable say, so your modeling was easier? Looks like lots of code in your Cell and Database controllers. Should our client be taking on this model? Possibly. I haven't been down that far into the interface yet to fully comment on that issue at this point. + Can you add a note on how you've changed how REST works, high-level? Yes I will supply this in the coming days.
        Hide
        stack added a comment -

        Looks excellent. Really sweet.

        + Where does the jar come from and whats its license?
        + The REST API looks good. Very RESTy. There might be little nitpicks later but for it looks great. How different is it from current REST. Should the two be the same? If not, how to deprecate the old?
        + The annotations look interesting. Could their serializations be used in other contexts, the shell say? Or – warning -> crazy-thought coming – somehow producing thrift IDL?
        + In hadoop/hbase, line lengths are < 80 chars (Your new classes are missing apache license, classes are missing comments describing what class is about, etc).
        + Is there anything we could do refactoring HTable say, so your modeling was easier? Looks like lots of code in your Cell and Database controllers. Should our client be taking on this model?
        + Can you add a note on how you've changed how REST works, high-level?

        Thats all for now.

        Show
        stack added a comment - Looks excellent. Really sweet. + Where does the jar come from and whats its license? + The REST API looks good. Very RESTy. There might be little nitpicks later but for it looks great. How different is it from current REST. Should the two be the same? If not, how to deprecate the old? + The annotations look interesting. Could their serializations be used in other contexts, the shell say? Or – warning -> crazy-thought coming – somehow producing thrift IDL? + In hadoop/hbase, line lengths are < 80 chars (Your new classes are missing apache license, classes are missing comments describing what class is about, etc). + Is there anything we could do refactoring HTable say, so your modeling was easier? Looks like lots of code in your Cell and Database controllers. Should our client be taking on this model? + Can you add a note on how you've changed how REST works, high-level? Thats all for now.
        Hide
        Brian Beggs added a comment -

        first pass at this patch.

        Show
        Brian Beggs added a comment - first pass at this patch.

          People

          • Assignee:
            Unassigned
            Reporter:
            Brian Beggs
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development