Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      A RESTful interface would be one means of making hbase accessible to clients that are not java. It might look something like the below:

      + An HTTP GET of http://MASTER:PORT/ outputs the master's attributes: online meta regions, list of tables, etc.: i.e. what you see now when you go to http://MASTER:PORT/master.jsp.
      + An HTTP GET of http://MASTER:PORT/TABLENAME: 200 if tables exists and HTableDescription (mimetype: text/plain or text/xml) or 401 if no such table. HTTP DELETE would drop the table. HTTP PUT would add one.
      + An HTTP GET of http://MASTER:PORT/TABLENAME/ROW: 200 if row exists and 401 if not.
      + An HTTP GET of http://MASTER:PORT/TABLENAME/ROW/COLUMNFAMILY: HColumnDescriptor (mimetype: text/plain or text/xml) or 401 if no such table.
      + An HTTP GET of http://MASTER:PORT/TABLENAME/ROW/COLUMNNAME/: 200 and latest version (mimetype: binary/octet-stream) or 401 if no such cell. HTTP DELETE would delete the cell. HTTP PUT would add a new version.
      + An HTTP GET of http://MASTER:PORT/TABLENAME/ROW/COLUMNNAME/TIMESTAMP: 200 (mimetype: binary/octet-stream) or 401 if no such cell. HTTP DELETE would remove. HTTP PUT would put this record.
      + Browser originally goes against master but master then redirects to the hosting region server to serve, update, delete, etc. the addressed cell

      1. rest.patch
        33 kB
        stack
      2. rest-11-27-07.patch
        37 kB
        Bryan Duxbury
      3. rest-11-27-07-v2.patch
        38 kB
        stack
      4. rest-11-27-07.3.patc
        47 kB
        Bryan Duxbury
      5. rest-11-28-07.patch
        50 kB
        Bryan Duxbury
      6. rest-11-28-07.2.patch
        50 kB
        Bryan Duxbury
      7. rest-11-28-07.3.patch
        51 kB
        stack

        Issue Links

          Activity

          Hide
          Billy Pearson added a comment -

          it is linked from this issue I just submitted it.

          HADOOP-2546

          Show
          Billy Pearson added a comment - it is linked from this issue I just submitted it. HADOOP-2546
          Hide
          Michael Bieniosek added a comment -

          Hey Billy,

          Could you post your PHP class? I need to use hbase from a PHP client and was wondering I could start from yours.

          Thanks.

          Show
          Michael Bieniosek added a comment - Hey Billy, Could you post your PHP class? I need to use hbase from a PHP client and was wondering I could start from yours. Thanks.
          Hide
          Billy Pearson added a comment -

          ok got a lot of work done on the php class to work with this interface

          got insert,select,delete,scanner working by using socket connection (cut curl out).

          Still got some features to add to it like formating options where output is going to be xml or other.
          I guess when I am done I could make a new feature and upload a patch for the php class if you guys let php be added to the project.

          Any idea on when/if a way to get more then the latest version of max_versions returned will be added?
          say I have a max_versions = 3
          How would I get the second oldest row?

          I have no need for it on this project but could see where it would be handy on others.
          maybe a call like this
          GET /[table_name]/row/[row_key]?columns=x:&versions=3

          so far found row keys that have /'s in them do not get inserted return 500 error
          and I found that row keys that have a ? in them get truncate at the ? so it gets inserted but
          Say row key = "aaa?bbb"
          the row would get inserted as row key "aaa" only

          Show
          Billy Pearson added a comment - ok got a lot of work done on the php class to work with this interface got insert,select,delete,scanner working by using socket connection (cut curl out). Still got some features to add to it like formating options where output is going to be xml or other. I guess when I am done I could make a new feature and upload a patch for the php class if you guys let php be added to the project. Any idea on when/if a way to get more then the latest version of max_versions returned will be added? say I have a max_versions = 3 How would I get the second oldest row? I have no need for it on this project but could see where it would be handy on others. maybe a call like this GET / [table_name] /row/ [row_key] ?columns=x:&versions=3 so far found row keys that have /'s in them do not get inserted return 500 error and I found that row keys that have a ? in them get truncate at the ? so it gets inserted but Say row key = "aaa?bbb" the row would get inserted as row key "aaa" only
          Hide
          Billy Pearson added a comment -

          not sure how this happens but here is the screen output
          I thank this is the reasion I was getting 200 html code btu could not see the results in shell.
          I rak this after restarting all hadoop and hbase so nothing should be cached in any sense

           
          [root@PE1750-1 bin]# pwd
          /hadoop/src/contrib/hbase/bin
          [root@PE1750-1 bin]# curl -v http://192.168.1.200:60010/api/webdata/row/10/
          * About to connect() to 192.168.1.200 port 60010
          *   Trying 192.168.1.200... * connected
          * Connected to 192.168.1.200 (192.168.1.200) port 60010
          > GET /api/webdata/row/10/ HTTP/1.1
          User-Agent: curl/7.12.1 (i686-redhat-linux-gnu) libcurl/7.12.1 OpenSSL/0.9.7a zlib/1.2.1.2 libidn/0.5.6
          Host: 192.168.1.200:60010
          Pragma: no-cache
          Accept: */*
          
          < HTTP/1.1 200 OK
          < Date: Mon, 03 Dec 2007 20:51:30 GMT
          < Server: Jetty/5.1.4 (Linux/2.6.9-55.0.12.ELsmp i386 java/1.5.0_12
          < Content-Type: text/xml;charset=UTF-8
          < Transfer-Encoding: chunked
          <?xml version="1.0" encoding="UTF-8"?>
          <row>
           <column>
            <name>
          stime:
            </name>
            <value>
          NDU2
            </value>
           </column>
           <column>
            <name>
          stime:now
            </name>
            <value>
          Nzg5
            </value>
           </column>
          * Connection #0 to host 192.168.1.200 left intact
          * Closing connection #0
          [root@PE1750-1 bin]# ./hbase shell
          Hbase Shell, 0.0.2 version.
          Copyright (c) 2007 by udanax, licensed to Apache Software Foundation.
          Type 'help;' for usage.
          
          hql > select * from webdata;
          +-------------------------+-------------------------+-------------------------+
          | Row                     | Column                  | Cell                    |
          +-------------------------+-------------------------+-------------------------+
          0 row(s) in set (0.58 sec)
          hql > exit;
          [root@PE1750-1 bin]#
          
          Show
          Billy Pearson added a comment - not sure how this happens but here is the screen output I thank this is the reasion I was getting 200 html code btu could not see the results in shell. I rak this after restarting all hadoop and hbase so nothing should be cached in any sense [root@PE1750-1 bin]# pwd /hadoop/src/contrib/hbase/bin [root@PE1750-1 bin]# curl -v http://192.168.1.200:60010/api/webdata/row/10/ * About to connect() to 192.168.1.200 port 60010 * Trying 192.168.1.200... * connected * Connected to 192.168.1.200 (192.168.1.200) port 60010 > GET /api/webdata/row/10/ HTTP/1.1 User-Agent: curl/7.12.1 (i686-redhat-linux-gnu) libcurl/7.12.1 OpenSSL/0.9.7a zlib/1.2.1.2 libidn/0.5.6 Host: 192.168.1.200:60010 Pragma: no-cache Accept: */* < HTTP/1.1 200 OK < Date: Mon, 03 Dec 2007 20:51:30 GMT < Server: Jetty/5.1.4 (Linux/2.6.9-55.0.12.ELsmp i386 java/1.5.0_12 < Content-Type: text/xml;charset=UTF-8 < Transfer-Encoding: chunked <?xml version= "1.0" encoding= "UTF-8" ?> <row> <column> <name> stime: </name> <value> NDU2 </value> </column> <column> <name> stime:now </name> <value> Nzg5 </value> </column> * Connection #0 to host 192.168.1.200 left intact * Closing connection #0 [root@PE1750-1 bin]# ./hbase shell Hbase Shell, 0.0.2 version. Copyright (c) 2007 by udanax, licensed to Apache Software Foundation. Type 'help;' for usage. hql > select * from webdata; +-------------------------+-------------------------+-------------------------+ | Row | Column | Cell | +-------------------------+-------------------------+-------------------------+ 0 row(s) in set (0.58 sec) hql > exit; [root@PE1750-1 bin]#
          Hide
          Bryan Duxbury added a comment -

          I am unfamiliar with php's HTTP library. However, I don't think we
          should change the content-type accepted for XML formatted data. An
          XML entity body is definitely NOT x-www-form-urlencoded. Hacking it
          to work around that essentially breaks the HTTP spec.

          I would suggest finding out if you can get lower-level access to the
          HTTP session than what php's library is giving you. This is not
          incredibly complicated functionality.

          As a last resort, I invite you to submit a patch that can read the
          xml out of the postdata when encoded as x-www-form-urlencoded and
          we'll find a way to work it in.

          Show
          Bryan Duxbury added a comment - I am unfamiliar with php's HTTP library. However, I don't think we should change the content-type accepted for XML formatted data. An XML entity body is definitely NOT x-www-form-urlencoded. Hacking it to work around that essentially breaks the HTTP spec. I would suggest finding out if you can get lower-level access to the HTTP session than what php's library is giving you. This is not incredibly complicated functionality. As a last resort, I invite you to submit a patch that can read the xml out of the postdata when encoded as x-www-form-urlencoded and we'll find a way to work it in.
          Hide
          Billy Pearson added a comment -

          I got the put option working now its returning 200 and the data is in the table now so thats good.

          But to use the put option you have to save the data to a file before putting the data via php curl.
          Thats an extra step on inserting data in to the tables I would like to skip and use post option.

          Show
          Billy Pearson added a comment - I got the put option working now its returning 200 and the data is in the table now so thats good. But to use the put option you have to save the data to a file before putting the data via php curl. Thats an extra step on inserting data in to the tables I would like to skip and use post option.
          Hide
          Billy Pearson added a comment -

          After lots of work trying to get the php curl to work with the post option. Looks like we will need support for content type "x-www-form-urlencoded". I have tried many ways to get curl to encode the data and send as text/xml but using the post fields option in php curl sends the data with content type application/x-www-form-urlencoded. Sense that is not a supported content type I get back a http error
          406 Unsupported Accept Header Content: application/x-www-form-urlencoded

          So is there a way we can add support for that?

          If so we could still use the xml format as the data put make sure to urldecode it first.
          also can we have a set filed to post the xml to that the app knows about like "xmldata" or something like that for post

          So we can set the post field:

          Example
          xmldata='<?xml version="1.0" encoding="UTF-8"?> <column> <name>a: </name> <value>YQ== </value> </column>';

          Then the app knows where to look for the xml data.

          Show
          Billy Pearson added a comment - After lots of work trying to get the php curl to work with the post option. Looks like we will need support for content type "x-www-form-urlencoded". I have tried many ways to get curl to encode the data and send as text/xml but using the post fields option in php curl sends the data with content type application/x-www-form-urlencoded. Sense that is not a supported content type I get back a http error 406 Unsupported Accept Header Content: application/x-www-form-urlencoded So is there a way we can add support for that? If so we could still use the xml format as the data put make sure to urldecode it first. also can we have a set filed to post the xml to that the app knows about like "xmldata" or something like that for post So we can set the post field: Example xmldata='<?xml version="1.0" encoding="UTF-8"?> <column> <name>a: </name> <value>YQ== </value> </column>'; Then the app knows where to look for the xml data.
          Hide
          Bryan Duxbury added a comment -

          Now that I think about it, I don't think that put/post should return a 201, because we're not always creating a new resource, in the HTTP sense. We should just change the spec to say 200.

          Show
          Bryan Duxbury added a comment - Now that I think about it, I don't think that put/post should return a 201, because we're not always creating a new resource, in the HTTP sense. We should just change the spec to say 200.
          Hide
          stack added a comment -

          You are right that the code and spec. are out of sync Billy. Inside in the putRowXml, there is following code on successful put around line #324 of the putRowXml:

                // respond with a 200
                response.setStatus(200);      
          

          Should be 201.

          Its odd that its returning success but nothing is added. Looking at code, that shouldn't be possible.

          Keep on asking questions (and finding bugs). Thanks Billy.

          Show
          stack added a comment - You are right that the code and spec. are out of sync Billy. Inside in the putRowXml, there is following code on successful put around line #324 of the putRowXml: // respond with a 200 response.setStatus(200); Should be 201. Its odd that its returning success but nothing is added. Looking at code, that shouldn't be possible. Keep on asking questions (and finding bugs). Thanks Billy.
          Hide
          Billy Pearson added a comment -

          I am also having problems getting the with put/post on rows/columns working

          I keep getting a return code of 200 and the row is not inserted. The Wiki page saids it should be 201 but the example on the same page is showing a return code of 200 also can you verily the return code and that its working?

          I would like to see some kind of example in some programming language on how you are posting so I could post in place of saving to file and using put

          I am trying to write a php class around this to use for my project I am working on so sorry for all the questions.

          Thanks

          Show
          Billy Pearson added a comment - I am also having problems getting the with put/post on rows/columns working I keep getting a return code of 200 and the row is not inserted. The Wiki page saids it should be 201 but the example on the same page is showing a return code of 200 also can you verily the return code and that its working? I would like to see some kind of example in some programming language on how you are posting so I could post in place of saving to file and using put I am trying to write a php class around this to use for my project I am working on so sorry for all the questions. Thanks
          Hide
          Billy Pearson added a comment -

          I am still getting the same error as above in my last post
          I am downloading the latest patch from here and HADOOP-2224 and applying them to trunk ver 598780 the latest one i know to build successful.

          I can pulll records that row key does not have /'s in them but not with /'s
          Is there something I am doing wrong?

          Show
          Billy Pearson added a comment - I am still getting the same error as above in my last post I am downloading the latest patch from here and HADOOP-2224 and applying them to trunk ver 598780 the latest one i know to build successful. I can pulll records that row key does not have /'s in them but not with /'s Is there something I am doing wrong?
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-Nightly #319 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/319/ )
          Hide
          stack added a comment -

          Thanks for the patch Bryan.

          Show
          stack added a comment - Thanks for the patch Bryan.
          Hide
          stack added a comment -

          Committed (Failed tests were unrelated to this patch which doesn't add any new tests and is code that doesn't run at unit test time). Resolving.

          Show
          stack added a comment - Committed (Failed tests were unrelated to this patch which doesn't add any new tests and is code that doesn't run at unit test time). Resolving.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12370483/rest-11-28-07.3.patch
          against trunk revision r599879.

          @author +1. The patch does not contain any @author tags.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new compiler warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests -1. The patch failed contrib unit tests.

          Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1220/testReport/
          Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1220/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1220/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1220/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12370483/rest-11-28-07.3.patch against trunk revision r599879. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1220/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1220/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1220/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1220/console This message is automatically generated.
          Hide
          Bryan Duxbury added a comment -

          Submitting preliminary REST implementation.

          Show
          Bryan Duxbury added a comment - Submitting preliminary REST implementation.
          Hide
          stack added a comment -

          Bryan: Try request.getRequestURI() instead of request.getPathInfo in getPathSegments method; the%2F is decoded as a slash when getPathInfo is used which is messing up the parse.

          Show
          stack added a comment - Bryan: Try request.getRequestURI() instead of request.getPathInfo in getPathSegments method; the%2F is decoded as a slash when getPathInfo is used which is messing up the parse.
          Hide
          Bryan Duxbury added a comment -

          I will check into this.

          Show
          Bryan Duxbury added a comment - I will check into this.
          Hide
          Billy Pearson added a comment - - edited

          I am still getting an error with urlencode looks like the : is throwing it off

          calling row key
          com.example.www/:http

           
          [root@PE1750-2 hbase]# curl --verbose http://192.168.1.200:60010/api/webdata/row/com.example.www%2F%3Ahttp
          * About to connect() to 192.168.1.200 port 60010
          *   Trying 192.168.1.200... * connected
          * Connected to 192.168.1.200 (192.168.1.200) port 60010
          > GET /api/webdata/row/com.example.www%2F%3Ahttp HTTP/1.1
          User-Agent: curl/7.12.1 (i686-redhat-linux-gnu) libcurl/7.12.1 OpenSSL/0.9.7a zlib/1.2.1.2 libidn/0.5.6
          Host: 192.168.1.200:60010
          Pragma: no-cache
          Accept: */*
          
          < HTTP/1.1 500 For+input+string%3A+%22%3Ahttp%22
          < Date: Fri, 30 Nov 2007 00:53:31 GMT
          < Server: Jetty/5.1.4 (Linux/2.6.9-55.0.12.ELsmp i386 java/1.5.0_12
          < Content-Type: text/html
          < Content-Length: 1282
          < Connection: close
          <html>
          <head>
          <title>Error 500 For input string: ":http"</title>
          </head>
          <body>
          <h2>HTTP ERROR: 500</h2><pre>For input string: ":http"</pre>
          <p>RequestURI=/api/webdata/row/com.example.www/:http</p>
          <p><i><small><a href="http://jetty.mortbay.org">Powered by Jetty://</a></small></i></p>
          
          </body>
          </html>
          * Closing connection #0
          
          Show
          Billy Pearson added a comment - - edited I am still getting an error with urlencode looks like the : is throwing it off calling row key com.example.www/:http [root@PE1750-2 hbase]# curl --verbose http://192.168.1.200:60010/api/webdata/row/com.example.www%2F%3Ahttp * About to connect() to 192.168.1.200 port 60010 * Trying 192.168.1.200... * connected * Connected to 192.168.1.200 (192.168.1.200) port 60010 > GET /api/webdata/row/com.example.www%2F%3Ahttp HTTP/1.1 User-Agent: curl/7.12.1 (i686-redhat-linux-gnu) libcurl/7.12.1 OpenSSL/0.9.7a zlib/1.2.1.2 libidn/0.5.6 Host: 192.168.1.200:60010 Pragma: no-cache Accept: */* < HTTP/1.1 500 For+input+string%3A+%22%3Ahttp%22 < Date: Fri, 30 Nov 2007 00:53:31 GMT < Server: Jetty/5.1.4 (Linux/2.6.9-55.0.12.ELsmp i386 java/1.5.0_12 < Content-Type: text/html < Content-Length: 1282 < Connection: close <html> <head> <title> Error 500 For input string: ":http" </title> </head> <body> <h2> HTTP ERROR: 500 </h2> <pre> For input string: ":http" </pre> <p> RequestURI=/api/webdata/row/com.example.www/:http </p> <p> <i> <small> <a href= "http://jetty.mortbay.org" > Powered by Jetty:// </a> </small> </i> </p> </body> </html> * Closing connection #0
          Hide
          Bryan Duxbury added a comment -

          Keys that have special characters should be URL encoded.

          Show
          Bryan Duxbury added a comment - Keys that have special characters should be URL encoded.
          Hide
          Billy Pearson added a comment - - edited

          how is this feature going to handle calling rows with /'s in the row keys?

          example above
          GET http://MASTER:PORT/TABLENAME/ROW/COLUMNNAME/:

          my example call
          GET http://192.168.1.200:60010/webdata/com.example.www/:http/source/

          where "com.example.www/:http" is the row key and source is column

          the / in the row key will kill the request

          Show
          Billy Pearson added a comment - - edited how is this feature going to handle calling rows with /'s in the row keys? example above GET http://MASTER:PORT/TABLENAME/ROW/COLUMNNAME/: my example call GET http://192.168.1.200:60010/webdata/com.example.www/:http/source/ where "com.example.www/:http" is the row key and source is column the / in the row key will kill the request
          Hide
          stack added a comment -

          Removed tabs, unused imports, and minor formatting changes. Also added override of NotSupported so could return messages that tell client why not supported (I was having trouble figuring why my curl upload wasn't working).

          I did some basic testing. Was able to put, get, scan. Its working good enough for a first version. +1.

          Show
          stack added a comment - Removed tabs, unused imports, and minor formatting changes. Also added override of NotSupported so could return messages that tell client why not supported (I was having trouble figuring why my curl upload wasn't working). I did some basic testing. Was able to put, get, scan. Its working good enough for a first version. +1.
          Hide
          Bryan Duxbury added a comment -

          Latest version of the patch just has some more polish and a little better behavior in the area of scanners.

          Show
          Bryan Duxbury added a comment - Latest version of the patch just has some more polish and a little better behavior in the area of scanners.
          Hide
          Bryan Duxbury added a comment -

          -Added licenses
          -Handler constructors all take a HBaseConfiguration and HBaseAdmin instances now
          -Removed all the unnecessary imports from all the classes
          -Added class comments to all handlers

          Also, I don't seem to be able to detect these tabs you're talking about. I thought it might be TextMate screwing me here, but I cat'd out the files and they look like they have two spaces to me.

          Show
          Bryan Duxbury added a comment - -Added licenses -Handler constructors all take a HBaseConfiguration and HBaseAdmin instances now -Removed all the unnecessary imports from all the classes -Added class comments to all handlers Also, I don't seem to be able to detect these tabs you're talking about. I thought it might be TextMate screwing me here, but I cat'd out the files and they look like they have two spaces to me.
          Hide
          stack added a comment -

          + Each class needs a license. Copy one from adjacent hbase classes.
          + You have tabs in this patch. Need to replaced with two spaces.
          + Most of the imports per class are not pertinent (if you had eclipse working, it'd help you here: smile)
          + You might instantiate HBaseConfiguration once and then pass it to each of the handler classes in their constructors. Same for admin and table instances? Otherwise, you'll have three copies of each?
          + Each of your new handlers needs at least a class comment as javadoc saying what each does.

          Otherwise, I think they way you have broken apart the fat REST class is a big improvement.

          Show
          stack added a comment - + Each class needs a license. Copy one from adjacent hbase classes. + You have tabs in this patch. Need to replaced with two spaces. + Most of the imports per class are not pertinent (if you had eclipse working, it'd help you here: smile) + You might instantiate HBaseConfiguration once and then pass it to each of the handler classes in their constructors. Same for admin and table instances? Otherwise, you'll have three copies of each? + Each of your new handlers needs at least a class comment as javadoc saying what each does. Otherwise, I think they way you have broken apart the fat REST class is a big improvement.
          Hide
          Bryan Duxbury added a comment -

          Refactored REST.java into several classes for easier digestion.

          Show
          Bryan Duxbury added a comment - Refactored REST.java into several classes for easier digestion.
          Hide
          stack added a comment -

          Here's a version of patch w/o tabs, some formatting fixes, a license and fixed up javadoc comments – in particular, the class comment has been changed because this patch addresses a few of the items mentioned in the TODO list.

          Show
          stack added a comment - Here's a version of patch w/o tabs, some formatting fixes, a license and fixed up javadoc comments – in particular, the class comment has been changed because this patch addresses a few of the items mentioned in the TODO list.
          Hide
          Bryan Duxbury added a comment -

          Latest version of the REST functionality. Supports xml-formatted gets/puts, metadata requests, use of timestamps.

          Show
          Bryan Duxbury added a comment - Latest version of the REST functionality. Supports xml-formatted gets/puts, metadata requests, use of timestamps.
          Hide
          stack added a comment -

          Patch looks great

          + 80 characters per line (unless its just silly doing return) and no tabs (Spacings should be two-char rather than usual 4-char).
          + There is a define for COLUMN at top of the class. Use that and you might avoid the "column" in one place and "columns" elsewhere.
          + I like the way you workaround lack of get(Text [] columns).
          + Cast here is unnecessary: +^I^I^I^I^IText current_column = (Text)columns_retrieved[i];$
          + For the below, perhaps just let the exception out:

          +^I^I} catch (javax.xml.parsers.ParserConfigurationException e) {$
          +^I^I^Iresponse.setStatus(500);$
          +^I^I^Ireturn;$
          +^I^I} catch (org.xml.sax.SAXException e){$
          +^I^I^Iresponse.setStatus(500);$
          +^I^I^Ireturn;$
          +^I^I}$

          I think jetty will do the 'right' thing (500 code plus stack trace).

          + You should probably wrap the put after you call the startupdate in a try/finally. If an exception, call abort.

          Show
          stack added a comment - Patch looks great + 80 characters per line (unless its just silly doing return) and no tabs (Spacings should be two-char rather than usual 4-char). + There is a define for COLUMN at top of the class. Use that and you might avoid the "column" in one place and "columns" elsewhere. + I like the way you workaround lack of get(Text [] columns). + Cast here is unnecessary: +^I^I^I^I^IText current_column = (Text)columns_retrieved [i] ;$ + For the below, perhaps just let the exception out: +^I^I} catch (javax.xml.parsers.ParserConfigurationException e) {$ +^I^I^Iresponse.setStatus(500);$ +^I^I^Ireturn;$ +^I^I} catch (org.xml.sax.SAXException e){$ +^I^I^Iresponse.setStatus(500);$ +^I^I^Ireturn;$ +^I^I}$ I think jetty will do the 'right' thing (500 code plus stack trace). + You should probably wrap the put after you call the startupdate in a try/finally. If an exception, call abort.
          Hide
          Bryan Duxbury added a comment -

          Updated patch for REST.java:

          -PUT/POST on a row without a timestamp works
          -DELETE on a row without a timestamp works

          Show
          Bryan Duxbury added a comment - Updated patch for REST.java: -PUT/POST on a row without a timestamp works -DELETE on a row without a timestamp works
          Hide
          stack added a comment -

          First cut at RESTful interface. Implements metainfo, gets, and scanners. Does not yet support put. Bunch of TODOs:

          + Returning results multipart/related is crippled by lack of support in the container; jetty has a MultipartResponse class but can't set properly qualified Content-Type with boundary and start parameters... they get stripped. Need to figure out how to fix this (maybe jetty 6 does it better).
          + Need to agree on timestamp format to use (ISO8601?)
          + Need to fix HTable so it has table metadata; until then, you need to specify a column getting a scanner.

          Here's some samples run against a simple table named 'x' with column family 'x:' with following contents:

          Hbase> select * from x;
          +-------------------------+-------------------------+-------------------------+
          | Row                     | Column                  | Cell                    |
          +-------------------------+-------------------------+-------------------------+
          | x                       | x:                      | xyz                     |
          +-------------------------+-------------------------+-------------------------+
          | xyz                     | x:abc                   | abc                     |
          +-------------------------+-------------------------+-------------------------+
          | xyz                     | x:xyz                   | xyzxyz                  |
          +-------------------------+-------------------------+-------------------------+
          3 row(s) in set (0.19 sec)
          

          In below session I'm using curl. Doesn't have DELETE and I fake PUT with the -T option uploading a file:

          $ curl http://localhost:60010/api/
          <?xml version="1.0" encoding="UTF-8"?>
          <tables>
           <table>
          x
           </table>
          
          $ curl --header 'Accept: text/plain' http://localhost:60010/api/
          x
          
          $ curl --header 'Accept: text/plain' http://localhost:60010/api/x
          name: x, families: {x:={name: x, max versions: 3, compression: NONE, in memory: false, max length: 2147483647, bloom filter: none}}
          
          $ curl --header 'Accept: text/xml' http://localhost:60010/api/x
          <?xml version="1.0" encoding="UTF-8"?>
          <table>
           <name>
          x
           </name>
           <columnfamilies>
            <columnfamily>
             <name>
          x:
             </name>
             <compression>
          NONE
             </compression>
             <bloomfilter>
          NONE
             </bloomfilter>
             <max-versions>
          3
             </max-versions>
             <maximum-cell-size>
          2147483647
             </maximum-cell-size>
            </columnfamily>
           </columnfamilies>
          </table>
          
          $ curl --header 'Accept: text/xml' http://localhost:60010/api/x/regions
          <?xml version="1.0" encoding="UTF-8"?>
          <regions>
           <region/>
          
          # only one region and its start key is null, the default table start key
          
          $ curl --verbose --header 'Accept: text/xml' http://localhost:60010/api/x/scanner
          * About to connect() to localhost port 60010
          *   Trying ::1... * connected
          * Connected to localhost (::1) port 60010
          > GET /api/x/scanner HTTP/1.1
          User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7l zlib/1.2.3
          Host: localhost:60010
          Pragma: no-cache
          Accept: text/xml
          
          < HTTP/1.1 404 No+handler
          < Date: Mon, 19 Nov 2007 23:22:23 GMT
          < Server: Jetty/5.1.4 (Mac OS X/10.4.10 i386 java/1.5.0_07
          < Content-Type: text/html
          < Content-Length: 1228
          <html>
          <head>
          <title>Error 404 No handler</title>
          </head>
          <body>
          ...
          
          # Fails because currently you must specify a column name (To be fixed)
          
          $ curl --verbose --header 'Accept: text/xml' -T /tmp/diff.txt http://localhost:60010/api/x/scanner?column=x:
          * About to connect() to localhost port 60010
          *   Trying ::1... * connected
          * Connected to localhost (::1) port 60010
          > PUT /api/x/scanner?column=x: HTTP/1.1
          User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7l zlib/1.2.3
          Host: localhost:60010
          Pragma: no-cache
          Accept: text/xml
          Content-Length: 7096
          Expect: 100-continue
          
          < HTTP/1.1 100 Continue
          < HTTP/1.1 201 Created
          < Date: Mon, 19 Nov 2007 23:23:33 GMT
          < Server: Jetty/5.1.4 (Mac OS X/10.4.10 i386 java/1.5.0_07
          < Location: //api /x/scanner/88316f77
          < Content-Length: 0
          * Connection #0 to host localhost left intact
          * Closing connection #0
          
          $ curl --verbose --header 'Accept: text/xml' -T /tmp/diff.txt http://localhost:60010/api/x/scanner/88316f77 
          * About to connect() to localhost port 60010
          *   Trying ::1... * connected
          * Connected to localhost (::1) port 60010
          > PUT /api/x/scanner/88316f77 HTTP/1.1
          User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7l zlib/1.2.3
          Host: localhost:60010
          Pragma: no-cache
          Accept: text/xml
          Content-Length: 7096
          Expect: 100-continue
          
          < HTTP/1.1 100 Continue
          < HTTP/1.1 200 OK
          < Date: Mon, 19 Nov 2007 23:23:56 GMT
          < Server: Jetty/5.1.4 (Mac OS X/10.4.10 i386 java/1.5.0_07
          < Content-Type: text/xml;charset=UTF-8
          < Transfer-Encoding: chunked
          <?xml version="1.0" encoding="UTF-8"?>
          <row>
           <name>
          x
           </name>
           <timestamp>
          1195372581842
           </timestamp>
           <column>
            <name>
          x:
            </name>
            <value>
          eHl6
            </value>
           </column>
          * Connection #0 to host localhost left intact
          * Closing connection #0
          </row>
          
          
          $ curl --verbose --header 'Accept: multipart/related' -T /' -T /tmp/diff.txt http://localhost:60010/api/x/scanner/88316f77
          * About to connect() to localhost port 60010
          *   Trying ::1... * connected
          * Connected to localhost (::1) port 60010
          > PUT /api/x/scanner/88316f77 HTTP/1.1
          User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7l zlib/1.2.3
          Host: localhost:60010
          Pragma: no-cache
          Accept: multipart/related
          Content-Length: 7096
          Expect: 100-continue
          
          < HTTP/1.1 100 Continue
          < HTTP/1.1 200 OK
          < Date: Mon, 19 Nov 2007 23:24:26 GMT
          < Server: Jetty/5.1.4 (Mac OS X/10.4.10 i386 java/1.5.0_07
          < Content-Type: multipart/related
          < Content-Length: 814
          --org.mortbay.http.MultiPartResponse.boundary.f97mkcfm
          Content-Type: application/octet-stream
          Content-Description: row
          Content-Transfer-Encoding: binary
          Content-Length: 3
          
          xyz
          --org.mortbay.http.MultiPartResponse.boundary.f97mkcfm
          Content-Type: application/octet-stream
          Content-Description: timestamp
          Content-Transfer-Encoding: binary
          Content-Length: 13
          
          1195372609009
          --org.mortbay.http.MultiPartResponse.boundary.f97mkcfm
          Content-Type: application/octet-stream
          Content-Description: x:abc
          Content-Transfer-Encoding: binary
          Content-Length: 3
          
          abc
          --org.mortbay.http.MultiPartResponse.boundary.f97mkcfm
          Content-Type: application/octet-stream
          Content-Description: x:xyz
          Content-Transfer-Encoding: binary
          Content-Length: 6
          
          xyzxyz
          --org.mortbay.http.MultiPartResponse.boundary.f97mkcfm--
          * Connection #0 to host localhost left intact
          * Closing connection #0
          
          $ curl --verbose --header 'Accept: multipart/related'  http://localhost:60010/api/x/row/x           
          * About to connect() to localhost port 60010
          *   Trying ::1... * connected
          * Connected to localhost (::1) port 60010
          > GET /api/x/row/x HTTP/1.1
          User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7l zlib/1.2.3
          Host: localhost:60010
          Pragma: no-cache
          Accept: multipart/related
          
          < HTTP/1.1 200 OK
          < Date: Mon, 19 Nov 2007 23:24:55 GMT
          < Server: Jetty/5.1.4 (Mac OS X/10.4.10 i386 java/1.5.0_07
          < Content-Type: multipart/related
          < Content-Length: 240
          --org.mortbay.http.MultiPartResponse.boundary.f97mkyj8
          Content-Type: application/octet-stream
          Content-Description: x:
          Content-Transfer-Encoding: binary
          Content-Length: 3
          
          xyz
          --org.mortbay.http.MultiPartResponse.boundary.f97mkyj8--
          * Connection #0 to host localhost left intact
          * Closing connection #0
          
          $ curl --verbose --header  http://localhost:60010/api/x/row/x
          curl: no URL specified!
          curl: try 'curl --help' or 'curl --manual' for more information
          durruti:~/Documents/checkouts/hadoop-trunk stack$ curl --verbose  http://localhost:60010/api/x/row/x
          * About to connect() to localhost port 60010
          *   Trying ::1... * connected
          * Connected to localhost (::1) port 60010
          > GET /api/x/row/x HTTP/1.1
          User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7l zlib/1.2.3
          Host: localhost:60010
          Pragma: no-cache
          Accept: */*
          
          < HTTP/1.1 200 OK
          < Date: Mon, 19 Nov 2007 23:25:22 GMT
          < Server: Jetty/5.1.4 (Mac OS X/10.4.10 i386 java/1.5.0_07
          < Content-Type: text/xml;charset=UTF-8
          < Transfer-Encoding: chunked
          <?xml version="1.0" encoding="UTF-8"?>
          <row>
           <column>
            <name>
          x:
            </name>
            <value>
          eHl6
            </value>
           </column>
          * Connection #0 to host localhost left intact
          * Closing connection #0
          </row>
          
          Show
          stack added a comment - First cut at RESTful interface. Implements metainfo, gets, and scanners. Does not yet support put. Bunch of TODOs: + Returning results multipart/related is crippled by lack of support in the container; jetty has a MultipartResponse class but can't set properly qualified Content-Type with boundary and start parameters... they get stripped. Need to figure out how to fix this (maybe jetty 6 does it better). + Need to agree on timestamp format to use (ISO8601?) + Need to fix HTable so it has table metadata; until then, you need to specify a column getting a scanner. Here's some samples run against a simple table named 'x' with column family 'x:' with following contents: Hbase> select * from x; +-------------------------+-------------------------+-------------------------+ | Row | Column | Cell | +-------------------------+-------------------------+-------------------------+ | x | x: | xyz | +-------------------------+-------------------------+-------------------------+ | xyz | x:abc | abc | +-------------------------+-------------------------+-------------------------+ | xyz | x:xyz | xyzxyz | +-------------------------+-------------------------+-------------------------+ 3 row(s) in set (0.19 sec) In below session I'm using curl. Doesn't have DELETE and I fake PUT with the -T option uploading a file: $ curl http: //localhost:60010/api/ <?xml version= "1.0" encoding= "UTF-8" ?> <tables> <table> x </table> $ curl --header 'Accept: text/plain' http: //localhost:60010/api/ x $ curl --header 'Accept: text/plain' http: //localhost:60010/api/x name: x, families: {x:={name: x, max versions: 3, compression: NONE, in memory: false , max length: 2147483647, bloom filter: none}} $ curl --header 'Accept: text/xml' http: //localhost:60010/api/x <?xml version= "1.0" encoding= "UTF-8" ?> <table> <name> x </name> <columnfamilies> <columnfamily> <name> x: </name> <compression> NONE </compression> <bloomfilter> NONE </bloomfilter> <max-versions> 3 </max-versions> <maximum-cell-size> 2147483647 </maximum-cell-size> </columnfamily> </columnfamilies> </table> $ curl --header 'Accept: text/xml' http: //localhost:60010/api/x/regions <?xml version= "1.0" encoding= "UTF-8" ?> <regions> <region/> # only one region and its start key is null , the default table start key $ curl --verbose --header 'Accept: text/xml' http: //localhost:60010/api/x/scanner * About to connect() to localhost port 60010 * Trying ::1... * connected * Connected to localhost (::1) port 60010 > GET /api/x/scanner HTTP/1.1 User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7l zlib/1.2.3 Host: localhost:60010 Pragma: no-cache Accept: text/xml < HTTP/1.1 404 No+handler < Date: Mon, 19 Nov 2007 23:22:23 GMT < Server: Jetty/5.1.4 (Mac OS X/10.4.10 i386 java/1.5.0_07 < Content-Type: text/html < Content-Length: 1228 <html> <head> <title>Error 404 No handler</title> </head> <body> ... # Fails because currently you must specify a column name (To be fixed) $ curl --verbose --header 'Accept: text/xml' -T /tmp/diff.txt http: //localhost:60010/api/x/scanner?column=x: * About to connect() to localhost port 60010 * Trying ::1... * connected * Connected to localhost (::1) port 60010 > PUT /api/x/scanner?column=x: HTTP/1.1 User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7l zlib/1.2.3 Host: localhost:60010 Pragma: no-cache Accept: text/xml Content-Length: 7096 Expect: 100- continue < HTTP/1.1 100 Continue < HTTP/1.1 201 Created < Date: Mon, 19 Nov 2007 23:23:33 GMT < Server: Jetty/5.1.4 (Mac OS X/10.4.10 i386 java/1.5.0_07 < Location: //api /x/scanner/88316f77 < Content-Length: 0 * Connection #0 to host localhost left intact * Closing connection #0 $ curl --verbose --header 'Accept: text/xml' -T /tmp/diff.txt http: //localhost:60010/api/x/scanner/88316f77 * About to connect() to localhost port 60010 * Trying ::1... * connected * Connected to localhost (::1) port 60010 > PUT /api/x/scanner/88316f77 HTTP/1.1 User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7l zlib/1.2.3 Host: localhost:60010 Pragma: no-cache Accept: text/xml Content-Length: 7096 Expect: 100- continue < HTTP/1.1 100 Continue < HTTP/1.1 200 OK < Date: Mon, 19 Nov 2007 23:23:56 GMT < Server: Jetty/5.1.4 (Mac OS X/10.4.10 i386 java/1.5.0_07 < Content-Type: text/xml;charset=UTF-8 < Transfer-Encoding: chunked <?xml version= "1.0" encoding= "UTF-8" ?> <row> <name> x </name> <timestamp> 1195372581842 </timestamp> <column> <name> x: </name> <value> eHl6 </value> </column> * Connection #0 to host localhost left intact * Closing connection #0 </row> $ curl --verbose --header 'Accept: multipart/related' -T /' -T /tmp/diff.txt http: //localhost:60010/api/x/scanner/88316f77 * About to connect() to localhost port 60010 * Trying ::1... * connected * Connected to localhost (::1) port 60010 > PUT /api/x/scanner/88316f77 HTTP/1.1 User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7l zlib/1.2.3 Host: localhost:60010 Pragma: no-cache Accept: multipart/related Content-Length: 7096 Expect: 100- continue < HTTP/1.1 100 Continue < HTTP/1.1 200 OK < Date: Mon, 19 Nov 2007 23:24:26 GMT < Server: Jetty/5.1.4 (Mac OS X/10.4.10 i386 java/1.5.0_07 < Content-Type: multipart/related < Content-Length: 814 --org.mortbay.http.MultiPartResponse.boundary.f97mkcfm Content-Type: application/octet-stream Content-Description: row Content-Transfer-Encoding: binary Content-Length: 3 xyz --org.mortbay.http.MultiPartResponse.boundary.f97mkcfm Content-Type: application/octet-stream Content-Description: timestamp Content-Transfer-Encoding: binary Content-Length: 13 1195372609009 --org.mortbay.http.MultiPartResponse.boundary.f97mkcfm Content-Type: application/octet-stream Content-Description: x:abc Content-Transfer-Encoding: binary Content-Length: 3 abc --org.mortbay.http.MultiPartResponse.boundary.f97mkcfm Content-Type: application/octet-stream Content-Description: x:xyz Content-Transfer-Encoding: binary Content-Length: 6 xyzxyz --org.mortbay.http.MultiPartResponse.boundary.f97mkcfm-- * Connection #0 to host localhost left intact * Closing connection #0 $ curl --verbose --header 'Accept: multipart/related' http: //localhost:60010/api/x/row/x * About to connect() to localhost port 60010 * Trying ::1... * connected * Connected to localhost (::1) port 60010 > GET /api/x/row/x HTTP/1.1 User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7l zlib/1.2.3 Host: localhost:60010 Pragma: no-cache Accept: multipart/related < HTTP/1.1 200 OK < Date: Mon, 19 Nov 2007 23:24:55 GMT < Server: Jetty/5.1.4 (Mac OS X/10.4.10 i386 java/1.5.0_07 < Content-Type: multipart/related < Content-Length: 240 --org.mortbay.http.MultiPartResponse.boundary.f97mkyj8 Content-Type: application/octet-stream Content-Description: x: Content-Transfer-Encoding: binary Content-Length: 3 xyz --org.mortbay.http.MultiPartResponse.boundary.f97mkyj8-- * Connection #0 to host localhost left intact * Closing connection #0 $ curl --verbose --header http: //localhost:60010/api/x/row/x curl: no URL specified! curl: try 'curl --help' or 'curl --manual' for more information durruti:~/Documents/checkouts/hadoop-trunk stack$ curl --verbose http: //localhost:60010/api/x/row/x * About to connect() to localhost port 60010 * Trying ::1... * connected * Connected to localhost (::1) port 60010 > GET /api/x/row/x HTTP/1.1 User-Agent: curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7l zlib/1.2.3 Host: localhost:60010 Pragma: no-cache Accept: */* < HTTP/1.1 200 OK < Date: Mon, 19 Nov 2007 23:25:22 GMT < Server: Jetty/5.1.4 (Mac OS X/10.4.10 i386 java/1.5.0_07 < Content-Type: text/xml;charset=UTF-8 < Transfer-Encoding: chunked <?xml version= "1.0" encoding= "UTF-8" ?> <row> <column> <name> x: </name> <value> eHl6 </value> </column> * Connection #0 to host localhost left intact * Closing connection #0 </row>
          Hide
          stack added a comment -
          Show
          stack added a comment - Bryan Duxbury added http://wiki.apache.org/lucene-hadoop/Hbase/HbaseRest .
          Hide
          Bryan Duxbury added a comment -

          Well, either Base64 encoding the data and sending it all together takes longer than requesting each column individually, or it's the other way around. I'd be interested in seeing which way it really is.

          I think it is perfectly acceptable to do the column-only approach in the short term and evaluate the row oriented approach later.

          Show
          Bryan Duxbury added a comment - Well, either Base64 encoding the data and sending it all together takes longer than requesting each column individually, or it's the other way around. I'd be interested in seeing which way it really is. I think it is perfectly acceptable to do the column-only approach in the short term and evaluate the row oriented approach later.
          Hide
          stack added a comment -

          See the master UI at http://MASTER:PORT (default is localhost:60000). On the master homepage, there is a HQL link that does read-only queries against tables making best effort at outputting results in XHTML (Maybe this is good enough to get you going?).

          We could Base64 the data. We could also tie a boat anchor to the server too as a means of slowing it down (smile). I suppose the RESTful way to do it is just allow clients say what they can accept in request headers (Don't know if you can stipulate xml with base64 encoded content via http request headers). If the server has support, do as the client asks.

          Show
          stack added a comment - See the master UI at http://MASTER:PORT (default is localhost:60000). On the master homepage, there is a HQL link that does read-only queries against tables making best effort at outputting results in XHTML (Maybe this is good enough to get you going?). We could Base64 the data. We could also tie a boat anchor to the server too as a means of slowing it down (smile). I suppose the RESTful way to do it is just allow clients say what they can accept in request headers (Don't know if you can stipulate xml with base64 encoded content via http request headers). If the server has support, do as the client asks.
          Hide
          stack added a comment -

          HADOOP-2171 strikes me as a little odd. A client sends HQL to a server that parses the HQL to run HTable client operations against hbase. There are no load savings over running shell on client machine that I can see.

          I don't see a problem having the master handle REST requests. Master is generally lightly loaded. It will take a lot of traffic to make it break a sweat. The masters REST load would add the master fielding HTTP redirects – a minor imposition. Should the REST load become burdensome, folks could put up an intermediary serve or take the load off the master by making their clients smarter doing HTable-like caching of data locations.

          Show
          stack added a comment - HADOOP-2171 strikes me as a little odd. A client sends HQL to a server that parses the HQL to run HTable client operations against hbase. There are no load savings over running shell on client machine that I can see. I don't see a problem having the master handle REST requests. Master is generally lightly loaded. It will take a lot of traffic to make it break a sweat. The masters REST load would add the master fielding HTTP redirects – a minor imposition. Should the REST load become burdensome, folks could put up an intermediary serve or take the load off the master by making their clients smarter doing HTable-like caching of data locations.
          Hide
          Bryan Duxbury added a comment -

          I haven't seen the output of hql.jsp, but I'll try and track it down. You're right, you can't stick pure binary data in XML, but you can use Base64 encoded binary data. It's a little bigger, but it cuts down on the number of requests you would have to make, which I think will be the crucial bottleneck. Of course, single column requests for pure binary data should still be available.

          I agree with your idea of the REST requests redirecting around between the master and the region servers. That keeps the underlying architecture correct, just with a different face on it.

          Show
          Bryan Duxbury added a comment - I haven't seen the output of hql.jsp, but I'll try and track it down. You're right, you can't stick pure binary data in XML, but you can use Base64 encoded binary data. It's a little bigger, but it cuts down on the number of requests you would have to make, which I think will be the crucial bottleneck. Of course, single column requests for pure binary data should still be available. I agree with your idea of the REST requests redirecting around between the master and the region servers. That keeps the underlying architecture correct, just with a different face on it.
          Hide
          Jim Kellerman added a comment -

          HADOOP-2171 talks about creating a shell server which I think would be better than burdening the master with having to handle REST requests.

          Show
          Jim Kellerman added a comment - HADOOP-2171 talks about creating a shell server which I think would be better than burdening the master with having to handle REST requests.
          Hide
          stack added a comment -

          Have you seen how rows are dumped out in xhtml in hql.jsp? Would that work for you? You can't wrap binary data in XML so I don't think row representation as XML would be a general soln. If you explicitly request a cell, then we could pass back binary with (data length in http headers).

          I'm thinking that main difference between the REST implemenation and current UI HQL querying on the master would be that REST clients would go first to the master but would then be redirected to the regionserver hosting the row that is being read or updated (as opposed to all being done on master HQL'ing). I'm thinking REST clients would have to be able to deal with retry if data had been moved by the time they arrived at what had been the data server. The alternative would be something like having the regionserver redirect back to the master since it is arbiter of where everything is but I imagine that could get complicated quickly what with there often being some lag before master learns that data has moved and has had chance to orchestrate redeploy in the new location.

          Show
          stack added a comment - Have you seen how rows are dumped out in xhtml in hql.jsp? Would that work for you? You can't wrap binary data in XML so I don't think row representation as XML would be a general soln. If you explicitly request a cell, then we could pass back binary with (data length in http headers). I'm thinking that main difference between the REST implemenation and current UI HQL querying on the master would be that REST clients would go first to the master but would then be redirected to the regionserver hosting the row that is being read or updated (as opposed to all being done on master HQL'ing). I'm thinking REST clients would have to be able to deal with retry if data had been moved by the time they arrived at what had been the data server. The alternative would be something like having the regionserver redirect back to the master since it is arbiter of where everything is but I imagine that could get complicated quickly what with there often being some lag before master learns that data has moved and has had chance to orchestrate redeploy in the new location.
          Hide
          Bryan Duxbury added a comment -

          Would the REST api support getting whole rows at a time? Perhaps off the GET /TABLE/ROW. The data could be packed into an XML document.

          I'd love to see this one get done, as it would get me the Ruby HBase client I've always wanted.

          Show
          Bryan Duxbury added a comment - Would the REST api support getting whole rows at a time? Perhaps off the GET /TABLE/ROW. The data could be packed into an XML document. I'd love to see this one get done, as it would get me the Ruby HBase client I've always wanted.
          Hide
          stack added a comment -

          Thanks for feedback Tom.

          401 rather than 404 was because I was watching TV at same time as writing the issue (smile). Regards 3. above, if I understand your question, there is no typing of hbase content – as far as hbase is concerned, its all bytes – so returned cell data should default mimetype binary/octet-stream. Later, might add switching resource format returned keyed off request headers.

          For cluster, table, and column descriptors, default text/plain UTF-8 or perhaps text/xml if specified in request headers.

          Thanks too for pointer to Leonard's book.

          Show
          stack added a comment - Thanks for feedback Tom. 401 rather than 404 was because I was watching TV at same time as writing the issue (smile). Regards 3. above, if I understand your question, there is no typing of hbase content – as far as hbase is concerned, its all bytes – so returned cell data should default mimetype binary/octet-stream. Later, might add switching resource format returned keyed off request headers. For cluster, table, and column descriptors, default text/plain UTF-8 or perhaps text/xml if specified in request headers. Thanks too for pointer to Leonard's book.
          Hide
          Tom White added a comment -

          Generally looks good. A few comments:

          1. 401 is "Unauthorized", instead I would use 404 "Not Found" if a resource doesn't exist.
          2. The result of a successful PUT is 201 "Created".
          3. What do you think the representation of the resources will be?

          And in case you haven't seen it already, I found "RESTful Web Services" by by Leonard Richardson and Sam Ruby (http://www.oreilly.com/catalog/9780596529260/) really useful.

          Show
          Tom White added a comment - Generally looks good. A few comments: 1. 401 is "Unauthorized", instead I would use 404 "Not Found" if a resource doesn't exist. 2. The result of a successful PUT is 201 "Created". 3. What do you think the representation of the resources will be? And in case you haven't seen it already, I found "RESTful Web Services" by by Leonard Richardson and Sam Ruby ( http://www.oreilly.com/catalog/9780596529260/ ) really useful.

            People

            • Assignee:
              Unassigned
              Reporter:
              stack
            • Votes:
              2 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development