Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-2316

[umbrella] WebHDFS: a complete FileSystem implementation for accessing HDFS over HTTP

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.23.1, 1.0.0
    • Component/s: webhdfs
    • Labels:
    • Release Note:
      Hide
      Provide WebHDFS as a complete FileSystem implementation for accessing HDFS over HTTP.
      Previous hftp feature was a read-only FileSystem and does not provide "write" accesses.
      Show
      Provide WebHDFS as a complete FileSystem implementation for accessing HDFS over HTTP. Previous hftp feature was a read-only FileSystem and does not provide "write" accesses.

      Description

      We current have hftp for accessing HDFS over HTTP. However, hftp is a read-only FileSystem and does not provide "write" accesses.

      In HDFS-2284, we propose to have WebHDFS for providing a complete FileSystem implementation for accessing HDFS over HTTP. The is the umbrella JIRA for the tasks.

      1. test-webhdfs-0.20s
        0.5 kB
        Tsz Wo Nicholas Sze
      2. test-webhdfs
        0.4 kB
        Tsz Wo Nicholas Sze
      3. WebHdfsAPI20111111.pdf
        211 kB
        Tsz Wo Nicholas Sze
      4. WebHdfsAPI20111103.pdf
        183 kB
        Tsz Wo Nicholas Sze
      5. WebHdfsAPI20111020.pdf
        141 kB
        Tsz Wo Nicholas Sze

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Hide
          Eli Collins added a comment -

          Why are you duplicating HDFS-2178? Hoop already provides a full read write FileSystem interface to HDFS that goes over http.

          Show
          Eli Collins added a comment - Why are you duplicating HDFS-2178 ? Hoop already provides a full read write FileSystem interface to HDFS that goes over http.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Eli, webhdfs is replacing hftp but Hoop is replacing HDFS Proxy. For more detailed discussion, please see HDFS-2284.

          Show
          Tsz Wo Nicholas Sze added a comment - Hi Eli, webhdfs is replacing hftp but Hoop is replacing HDFS Proxy. For more detailed discussion, please see HDFS-2284 .
          Hide
          Sanjay Radia added a comment -

          HDFS-2178 (HOOP) is HDFS Proxy using http protocol - a replacement for hdfs proxy v2, but providing rw access. It runs as separate daemons typically on an array of servers sitting next to an HDFS cluster (like HDFS proxy v2).

          HDFS-2316 is http rw access that replaces hftp but is built into the hdfs system and provides bandwidth
          scaling by redirecting from the NN to the datanode that contains the block. It will use spnego and delegation token. It does not requires a notion of trust. HDFS-2178 (HOOP) is run as a separate daemons that are trusted by hdfs. hdfs-2178 (HOOP) can provide additional features like bandwidth management and user authentication mapping.

          There is an overlap but a need for both.

          Show
          Sanjay Radia added a comment - HDFS-2178 (HOOP) is HDFS Proxy using http protocol - a replacement for hdfs proxy v2, but providing rw access. It runs as separate daemons typically on an array of servers sitting next to an HDFS cluster (like HDFS proxy v2). HDFS-2316 is http rw access that replaces hftp but is built into the hdfs system and provides bandwidth scaling by redirecting from the NN to the datanode that contains the block. It will use spnego and delegation token. It does not requires a notion of trust. HDFS-2178 (HOOP) is run as a separate daemons that are trusted by hdfs. hdfs-2178 (HOOP) can provide additional features like bandwidth management and user authentication mapping. There is an overlap but a need for both.
          Hide
          Eli Collins added a comment -

          I definitely agree with these distinct use cases, I'm just wondering if we need to have two separate FileSystem over HTTP implementations vs one client that may or may not use a proxy server (there's no reason http or FileSystem clients need to care whether they're being proxied). Sounds like we were duplicating code w/o understanding what could be shared. You and Alejandro have looked at the specifics more than I have so I trust your judgement.

          Show
          Eli Collins added a comment - I definitely agree with these distinct use cases, I'm just wondering if we need to have two separate FileSystem over HTTP implementations vs one client that may or may not use a proxy server (there's no reason http or FileSystem clients need to care whether they're being proxied). Sounds like we were duplicating code w/o understanding what could be shared. You and Alejandro have looked at the specifics more than I have so I trust your judgement.
          Hide
          Sanjay Radia added a comment -

          One issue: we have to co exist with hftp which uses a prefixes like /data. Two proposal on the table:

          1. (by Nicholas) use a different fixed prefix like /webhdfs/path since it does not conflict with /data.
          2. (by Alejandro) always use an operation in the url so that one can send /data with an operation to webhdfs and /data without operation to hftp.

          1) is is simpler to implement since urls within the context of /webhdfs can be sent to the webhdfs servlet. 2)has a nicer url since the path is the pathname being refereced.

          Show
          Sanjay Radia added a comment - One issue: we have to co exist with hftp which uses a prefixes like /data. Two proposal on the table: (by Nicholas) use a different fixed prefix like /webhdfs/path since it does not conflict with /data. (by Alejandro) always use an operation in the url so that one can send /data with an operation to webhdfs and /data without operation to hftp. 1) is is simpler to implement since urls within the context of /webhdfs can be sent to the webhdfs servlet. 2)has a nicer url since the path is the pathname being refereced.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          2) may be confusion since the same http url path represents two different file system paths. E.g.

          Show
          Tsz Wo Nicholas Sze added a comment - 2) may be confusion since the same http url path represents two different file system paths. E.g. http://namenode:port/data/foo/bar? ... means reading /foo/bar http://namenode:port/data/foo/bar?opGet=read& ... means reading /data/foo/bar
          Hide
          Alejandro Abdelnur added a comment -

          IMO, the nice thing about #2 is that the file path of HDFS: and a HTTP: URIs will be exactly the same, and in the case of using the NN/DD deployment of HOOP it will be even the same host.

          In addition is it intuitive without any caveat, a given path will just work by replacing the SCHEME://HOST:PORT part of it.

          Finally, and IMO this is very important from the Usability perspective, user applications that take are designed to take the URI of the FS as parameter and operate via HDFS: or HTTP: will be otherwise difficult to code. Hadoop's Path(String parent, String child) uses the URI.resolve(...) that uses a well defined logic to resolve URIs based on other URIs[ http://download.oracle.com/javase/6/docs/api/java/net/URI.html#resolve(java.net.URI) ]. If we use a prefix for HTTP URIs then it will become difficult and error prone to compose HDFS: URIs from HTTP: URIs and viceversa. (And I believe the same is true for libraries in other languages)

          Finally, I have not seen HDFS files under /data as a common practice, thus the name collision won't be that common.

          Show
          Alejandro Abdelnur added a comment - IMO, the nice thing about #2 is that the file path of HDFS: and a HTTP: URIs will be exactly the same, and in the case of using the NN/DD deployment of HOOP it will be even the same host. In addition is it intuitive without any caveat, a given path will just work by replacing the SCHEME://HOST:PORT part of it. Finally, and IMO this is very important from the Usability perspective, user applications that take are designed to take the URI of the FS as parameter and operate via HDFS: or HTTP: will be otherwise difficult to code. Hadoop's Path(String parent, String child) uses the URI.resolve(...) that uses a well defined logic to resolve URIs based on other URIs[ http://download.oracle.com/javase/6/docs/api/java/net/URI.html#resolve(java.net.URI ) ]. If we use a prefix for HTTP URIs then it will become difficult and error prone to compose HDFS: URIs from HTTP: URIs and viceversa. (And I believe the same is true for libraries in other languages) Finally, I have not seen HDFS files under /data as a common practice, thus the name collision won't be that common.
          Hide
          Todd Lipcon added a comment -

          Maybe I'm not following completely: is the idea that webhdfs would run on the same port as the NN web UI?

          It seems crazy to me that http://nn:50030/jmx would be the JMX servlet but http://nn:50030/jmx?opGet=read would be a file at path /jmx... hopefully I'm misunderstanding

          Show
          Todd Lipcon added a comment - Maybe I'm not following completely: is the idea that webhdfs would run on the same port as the NN web UI? It seems crazy to me that http://nn:50030/jmx would be the JMX servlet but http://nn:50030/jmx?opGet=read would be a file at path /jmx... hopefully I'm misunderstanding
          Hide
          Tsz Wo Nicholas Sze added a comment -

          @Todd, we are using the same port. If we implement (2), you are right that we will have such problem.

          Show
          Tsz Wo Nicholas Sze added a comment - @Todd, we are using the same port. If we implement (2), you are right that we will have such problem.
          Hide
          Todd Lipcon added a comment -

          Yea, that seems crazy. I'm for option #1 or for opening yet-another-port.

          Show
          Todd Lipcon added a comment - Yea, that seems crazy. I'm for option #1 or for opening yet-another-port.
          Hide
          Sanjay Radia added a comment -

          Operations like to minimize the number of ports. So we would use the same port.

          Show
          Sanjay Radia added a comment - Operations like to minimize the number of ports. So we would use the same port.
          Hide
          Sanjay Radia added a comment -

          Like hftp, Operations are processed at the NN if they involve no data transfer. If the operation involves data transfer (r or w) then the request is redirected to the DN. This allows bandwidth scaling and load distribution.
          Alejandro has pointed out that when you redirect a put or post operation, the initial part of the payload has been sent to the NN. I believe this is true. Hence for writes we could consider a two request mode - getTheWrite handle using a get and then do a put or post to the data node.

          Show
          Sanjay Radia added a comment - Like hftp, Operations are processed at the NN if they involve no data transfer. If the operation involves data transfer (r or w) then the request is redirected to the DN. This allows bandwidth scaling and load distribution. Alejandro has pointed out that when you redirect a put or post operation, the initial part of the payload has been sent to the NN. I believe this is true. Hence for writes we could consider a two request mode - getTheWrite handle using a get and then do a put or post to the data node.
          Hide
          Alejandro Abdelnur added a comment -

          @Todd, well the problem here is that we are overloading the use of a PORT for a functionality that requires the whole domain of the namespace (in HFDS we don't have special dirs like /dev). I'd be OK with a different port then.

          @Sanjay, the in-direction for writes it would mean that we moving away from a the REST protocol, which is a well understood and known way of interacting with resources. I think there a big value in having a full REST API like Hoop has today.

          Show
          Alejandro Abdelnur added a comment - @Todd, well the problem here is that we are overloading the use of a PORT for a functionality that requires the whole domain of the namespace (in HFDS we don't have special dirs like /dev). I'd be OK with a different port then. @Sanjay, the in-direction for writes it would mean that we moving away from a the REST protocol, which is a well understood and known way of interacting with resources. I think there a big value in having a full REST API like Hoop has today.
          Hide
          Alejandro Abdelnur added a comment -

          If we use a webhdfs://HOST:PORT from the FS client impl and internally we replace to it to http://HOST:PORT/webhdfs, I'd live with that one. After all (as Todd pointed offline) it is not fully REST.

          But the write/append handle thingy, any other option?

          Show
          Alejandro Abdelnur added a comment - If we use a webhdfs://HOST:PORT from the FS client impl and internally we replace to it to http://HOST:PORT/webhdfs , I'd live with that one. After all (as Todd pointed offline) it is not fully REST. But the write/append handle thingy, any other option?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > But the write/append handle thingy, any other option?

          We probably should use "Expect: 100- continue" http header. See 8.2.3 Use of the 100 (Continue) Status in http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.2

          Show
          Tsz Wo Nicholas Sze added a comment - > But the write/append handle thingy, any other option? We probably should use "Expect: 100- continue" http header. See 8.2.3 Use of the 100 (Continue) Status in http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.2
          Hide
          Sanjay Radia added a comment -

          > If we use a webhdfs://HOST:PORT from the FS client impl and internally we replace to it to
          > http://HOST:PORT/webhdfs, I'd live with that one. After all (as Todd pointed offline) it is not fully REST.
          That is exactly what Nicholas has implemented. ie with curl you use the url http://host/:port/webhdfs.

          hdfs://host:port/path and webhdfs://hostx:portx/path are consistent wrt to the path.
          While it would nice to be consistent for http, we cannot because of other services using the same port.
          So I think we have an agreement on this.

          Show
          Sanjay Radia added a comment - > If we use a webhdfs://HOST:PORT from the FS client impl and internally we replace to it to > http://HOST:PORT/webhdfs , I'd live with that one. After all (as Todd pointed offline) it is not fully REST. That is exactly what Nicholas has implemented. ie with curl you use the url http://host/:port/webhdfs . hdfs://host:port/path and webhdfs://hostx:portx/path are consistent wrt to the path. While it would nice to be consistent for http, we cannot because of other services using the same port. So I think we have an agreement on this.
          Hide
          Alejandro Abdelnur added a comment -

          Would be OK if this prefix is optional/configurable as well as the capability of running the HTTP HDFS access in the DN in a different port (and by default is with the /webhdfs prefix in the same port as the rest of the HTTP services?

          Show
          Alejandro Abdelnur added a comment - Would be OK if this prefix is optional/configurable as well as the capability of running the HTTP HDFS access in the DN in a different port (and by default is with the /webhdfs prefix in the same port as the rest of the HTTP services?
          Hide
          Sanjay Radia added a comment -

          Once we publish the spec folks will start building tools around it. These tools will not work on clusters with different configuration. Further, we have too much configuration knobs.

          Show
          Sanjay Radia added a comment - Once we publish the spec folks will start building tools around it. These tools will not work on clusters with different configuration. Further, we have too much configuration knobs.
          Hide
          Sanjay Radia added a comment -

          Differences between the patch and Hoop:

          • Hoop uses PUT for append and POST for create and mkdirs. Nicholas uses PUT for all three operations
          • Parameter names:
            • Hoop uses op for all operations while Nicholas uses getOp, putOp, etc.
            • Hoop uses "data" for reading a file while Nicholas uses "open".
            • Hoop uses setowner while Nicholas use SET_OWNER.
          • Default values: Nicholas follows default values used in HDFS but Hoop does not.
          • Permission: Nicholas use octal but Hoop use -rwxrwxrwx.
          Show
          Sanjay Radia added a comment - Differences between the patch and Hoop: Hoop uses PUT for append and POST for create and mkdirs. Nicholas uses PUT for all three operations Parameter names: Hoop uses op for all operations while Nicholas uses getOp, putOp, etc. Hoop uses "data" for reading a file while Nicholas uses "open". Hoop uses setowner while Nicholas use SET_OWNER. Default values: Nicholas follows default values used in HDFS but Hoop does not. Permission: Nicholas use octal but Hoop use -rwxrwxrwx.
          Hide
          Alejandro Abdelnur added a comment -

          @Sanjay:

          • As agreed with Nicholas, create/mkdir should be PUT (idempotent), and append should be POST (not idempotent)
          • Parameter names
            • Using 'getOp', 'putOp' and 'postOp' is redundant as the HTTP method is is GET/PUT/POST already
            • Using 'op=open' for reading a file make sense
            • (generalizing on the prev bullet item) we should use the name of the method (case insensitive) for the operation names, thus it should be 'setowner'
            • Default values: Hoop uses the JAR default values, it could be modified to use the defaults of hdfs-site.xml in HOOP config dir
            • Permissions: we should support both octal and symbolic.

          Still is open how to handle create/append operations.

          Show
          Alejandro Abdelnur added a comment - @Sanjay: As agreed with Nicholas, create/mkdir should be PUT (idempotent), and append should be POST (not idempotent) Parameter names Using 'getOp', 'putOp' and 'postOp' is redundant as the HTTP method is is GET/PUT/POST already Using 'op=open' for reading a file make sense (generalizing on the prev bullet item) we should use the name of the method (case insensitive) for the operation names, thus it should be 'setowner' Default values: Hoop uses the JAR default values, it could be modified to use the defaults of hdfs-site.xml in HOOP config dir Permissions: we should support both octal and symbolic. Still is open how to handle create/append operations.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > Still is open how to handle create/append operations.

          Have you seen my comment?

          Show
          Tsz Wo Nicholas Sze added a comment - > Still is open how to handle create/append operations. Have you seen my comment ?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > Using 'getOp', 'putOp' and 'postOp' is redundant as the HTTP method is is GET/PUT/POST already

          But it will make operation clear in the url and show up in the log messages. I think it will help other developers.

          > Using 'op=open' for reading a file make sense
          > (generalizing on the prev bullet item) we should use the name of the method (case insensitive) for the operation names,
          > thus it should be 'setowner'

          Sounds good.

          > Permissions: we should support both octal and symbolic.

          Octal is concise. Do we really need symbolic?

          Show
          Tsz Wo Nicholas Sze added a comment - > Using 'getOp', 'putOp' and 'postOp' is redundant as the HTTP method is is GET/PUT/POST already But it will make operation clear in the url and show up in the log messages. I think it will help other developers. > Using 'op=open' for reading a file make sense > (generalizing on the prev bullet item) we should use the name of the method (case insensitive) for the operation names, > thus it should be 'setowner' Sounds good. > Permissions: we should support both octal and symbolic. Octal is concise. Do we really need symbolic?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I am revising my patch. Do you think that GETFILESTATUS is hard to parse? Is GET_FILE_STATUS better?

          Show
          Tsz Wo Nicholas Sze added a comment - I am revising my patch. Do you think that GETFILESTATUS is hard to parse? Is GET_FILE_STATUS better?
          Hide
          Alejandro Abdelnur added a comment -

          'op=' parameter

          Hoop audit logs write the HTTP method, thus you have POST URL.

          IMO it should be GETFILESTATUS (the value of the params is case insensitive) and it should match the FileSytem method name.

          Show
          Alejandro Abdelnur added a comment - 'op=' parameter Hoop audit logs write the HTTP method, thus you have POST URL. IMO it should be GETFILESTATUS (the value of the params is case insensitive) and it should match the FileSytem method name.
          Hide
          Alejandro Abdelnur added a comment -

          I've just uploaded to HDFS-2178 a PDF with the proposed HTTP API.

          Show
          Alejandro Abdelnur added a comment - I've just uploaded to HDFS-2178 a PDF with the proposed HTTP API.
          Hide
          Sanjay Radia added a comment -

          Alejando has raised the issue of 100 continue not working with some http client libraries.
          Curl supports it and httpclient from Apache seem to support it. If there there is at least one java library that
          supports it then it seems an unnecessary API complication to split the create into two APIs.

          Show
          Sanjay Radia added a comment - Alejando has raised the issue of 100 continue not working with some http client libraries. Curl supports it and httpclient from Apache seem to support it. If there there is at least one java library that supports it then it seems an unnecessary API complication to split the create into two APIs.
          Hide
          Sanjay Radia added a comment -

          BTW Amazon web services encourages the use of 100-continue for PUT and POST.
          http://docs.amazonwebservices.com/AmazonS3/latest/API/

          Show
          Sanjay Radia added a comment - BTW Amazon web services encourages the use of 100-continue for PUT and POST. http://docs.amazonwebservices.com/AmazonS3/latest/API/
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Here is the WebHdfs API.

          Show
          Tsz Wo Nicholas Sze added a comment - Here is the WebHdfs API.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > Here is the WebHdfs API.

          I mean the attached file WebHdfsAPI20111020.pdf.

          Show
          Tsz Wo Nicholas Sze added a comment - > Here is the WebHdfs API. I mean the attached file WebHdfsAPI20111020.pdf.
          Hide
          Alejandro Abdelnur added a comment -


          @Nicholas,

          Thanks for the API document. In general it looks OK. A few comments:

          • GET GETHOMEDIRECTORY operation is missing.
          • The GETFILEBLOCKLOCATIONS, GETDELEGATIONTOKEN, RENEWDELEGATIONTOKEN, CANCELDELEGATIONTOKEN operations seem to be the ones that don't make sense (at the moment) in a proxy scenario. We should make those operations as optional.
          • The 'doas' query parameter is missing, this is required to enable proxyuser functionality.
          • The 'user.name' query parameter is optional as this is used only in the case of pseudo authentication, in the case of other authentication mechanism the username will be taken for the authentication credentials.
          • The document does not define any of the JSON responses nor error codes and JSON error messages. I assume you are taking the JSON responses in the doc posted in HDFS-2178. Still this has to be augmented for checksum and content-summary responses.
          • The webhdfs prefix should be optional/configurable and it should be provided by server on a 'filsystem.get' operation.
          Show
          Alejandro Abdelnur added a comment - — @Nicholas, Thanks for the API document. In general it looks OK. A few comments: GET GETHOMEDIRECTORY operation is missing. The GETFILEBLOCKLOCATIONS, GETDELEGATIONTOKEN, RENEWDELEGATIONTOKEN, CANCELDELEGATIONTOKEN operations seem to be the ones that don't make sense (at the moment) in a proxy scenario. We should make those operations as optional. The 'doas' query parameter is missing, this is required to enable proxyuser functionality. The 'user.name' query parameter is optional as this is used only in the case of pseudo authentication, in the case of other authentication mechanism the username will be taken for the authentication credentials. The document does not define any of the JSON responses nor error codes and JSON error messages. I assume you are taking the JSON responses in the doc posted in HDFS-2178 . Still this has to be augmented for checksum and content-summary responses. IMO we need operations to get create and append handles, reason in my response to Sanjay in HDFS-2178 ( https://issues.apache.org/jira/browse/HDFS-2178?focusedCommentId=13132691&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13132691 ) The webhdfs prefix should be optional/configurable and it should be provided by server on a 'filsystem.get' operation.
          Hide
          Nathan Roberts added a comment -

          Hi Nicholas, some quick comments from first read:

          • "<namenode>:<port>" and "http://<host>:<port>" seem to be used interchangeably. We should be consistent where possible.
          • Why doesn't "curl -i -L "http://<host>:<port>/webhdfs/<path>" just work? Do we really need to specify op=OPEN for this very simple, common case?
          • I believe "http://<datanode>:<path>" should be "http://<datanode>:<port>" in append.
          • Need format of responses spelled out.
          • It would be nice if we could document the possible error responses as well.
          • Since a single datanode will be performing the write of a potentially large file, does that mean that file will have an entire copy on that node (due to block placement strategies)? That doesn't seem desirable..
          • Is a SHORT sufficient for buffersize?
          • Do we need a renewlease? How will very slow writers be handled?
          • Once I have file block locations, can I go directly to those datanodes to retrieve rather than using content_range and always following a redirect?
          • Do we need flush/sync?
          Show
          Nathan Roberts added a comment - Hi Nicholas, some quick comments from first read: "<namenode>:<port>" and "http://<host>:<port>" seem to be used interchangeably. We should be consistent where possible. Why doesn't "curl -i -L "http://<host>:<port>/webhdfs/<path>" just work? Do we really need to specify op=OPEN for this very simple, common case? I believe "http://<datanode>:<path>" should be "http://<datanode>:<port>" in append. Need format of responses spelled out. It would be nice if we could document the possible error responses as well. Since a single datanode will be performing the write of a potentially large file, does that mean that file will have an entire copy on that node (due to block placement strategies)? That doesn't seem desirable.. Is a SHORT sufficient for buffersize? Do we need a renewlease? How will very slow writers be handled? Once I have file block locations, can I go directly to those datanodes to retrieve rather than using content_range and always following a redirect? Do we need flush/sync?
          Hide
          Sanjay Radia added a comment -

          Alejandro raised the following 4 issues for discussion:

          1. rename - should the target path contain the /webhdfs prefix since a client app will want to simply take the target path and use it as part of a read operation.
          2. should getStatus return the paths with the /webhdfs prefix
          3. Why is the scheme of the webhdfs file system "webhdfs:" and not "http:"
          4. case sensitivity - make the parameters lower case rather than have the filter convert them since pathname and the user name should not be converted.

          My initial thoughts are:

          • for 1 and 2: the rename target path and the paths in the filestatus should NOT contain contain /webhdfs since /webhdfs is not really part of the parameters but a part of the rest api's "headers".
          • for scheme - i don't think we should use a prexisting scheme name. Nfs community has used webnfs as the scheme for accessing nfs over http.
          • I am fine with lower case parameters.
          Show
          Sanjay Radia added a comment - Alejandro raised the following 4 issues for discussion: rename - should the target path contain the /webhdfs prefix since a client app will want to simply take the target path and use it as part of a read operation. should getStatus return the paths with the /webhdfs prefix Why is the scheme of the webhdfs file system "webhdfs:" and not "http:" case sensitivity - make the parameters lower case rather than have the filter convert them since pathname and the user name should not be converted. My initial thoughts are: for 1 and 2: the rename target path and the paths in the filestatus should NOT contain contain /webhdfs since /webhdfs is not really part of the parameters but a part of the rest api's "headers". for scheme - i don't think we should use a prexisting scheme name. Nfs community has used webnfs as the scheme for accessing nfs over http. I am fine with lower case parameters.
          Hide
          Nathan Roberts added a comment -

          Is there a mechanism for versioning this API? Seems like we should probably have one. e.g. /webhdfs/1/ or /webhdfs/v1/

          Show
          Nathan Roberts added a comment - Is there a mechanism for versioning this API? Seems like we should probably have one. e.g. /webhdfs/1/ or /webhdfs/v1/
          Hide
          Alejandro Abdelnur added a comment -

          Thanks Sanjay. A couple of follow up issues in the current API:

          • Permission masks are currently octal in webhdfs and symbolic in hoop. IMO, it should make sense to support both.
          • File ranges, webhdfs uses HTTP 'content-ranges' header, hoop uses 2 query string params offset= & len=. In webhdfs, except for this type of requests, for all other request the URL itself fully describes what is being requested. Because webhdfs uses the HTTP 'content-ranges' header a URL is not sufficient to specify the desired range. With Hoop approach a URL self-contains the desired range, making it easier to use from libraries and scripts.
          Show
          Alejandro Abdelnur added a comment - Thanks Sanjay. A couple of follow up issues in the current API: Permission masks are currently octal in webhdfs and symbolic in hoop. IMO, it should make sense to support both. File ranges, webhdfs uses HTTP 'content-ranges' header, hoop uses 2 query string params offset= & len=. In webhdfs, except for this type of requests, for all other request the URL itself fully describes what is being requested. Because webhdfs uses the HTTP 'content-ranges' header a URL is not sufficient to specify the desired range. With Hoop approach a URL self-contains the desired range, making it easier to use from libraries and scripts.
          Hide
          Thejas M Nair added a comment -

          for scheme - i don't think we should use a prexisting scheme name. Nfs community has used webnfs as the scheme for accessing nfs over http.

          The plan is to support calls over HTTP, so I think it is better to keep that clear for the users. Are there any plans of supporting non http operations ? If not, I don't see any benefit of having a 'webhdfs' scheme.

          Show
          Thejas M Nair added a comment - for scheme - i don't think we should use a prexisting scheme name. Nfs community has used webnfs as the scheme for accessing nfs over http. The plan is to support calls over HTTP, so I think it is better to keep that clear for the users. Are there any plans of supporting non http operations ? If not, I don't see any benefit of having a 'webhdfs' scheme.
          Hide
          Milind Bhandarkar added a comment -

          @Thejas, webhdfs:// is the scheme recognized by FileSystem.get in Hadoop. (Same thing as hftp://, which uses http protocol, but hftp is the file system impl.)

          Show
          Milind Bhandarkar added a comment - @Thejas, webhdfs:// is the scheme recognized by FileSystem.get in Hadoop. (Same thing as hftp://, which uses http protocol, but hftp is the file system impl.)
          Hide
          Milind Bhandarkar added a comment -

          Guys, is there a documentation for webhdfs APIs that I can read somewhere ? (A good advice for producing human readable documentation for webservices can be found here: http://answers.oreilly.com/topic/1390-how-to-document-restful-web-services/).

          +1 to Nathan's suggestion for versioning the API.

          +1 to Alejandro's suggestion for embedding byte-ranges in the URL itself.

          Show
          Milind Bhandarkar added a comment - Guys, is there a documentation for webhdfs APIs that I can read somewhere ? (A good advice for producing human readable documentation for webservices can be found here: http://answers.oreilly.com/topic/1390-how-to-document-restful-web-services/ ). +1 to Nathan's suggestion for versioning the API. +1 to Alejandro's suggestion for embedding byte-ranges in the URL itself.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Thanks everyone for looking at the webhdfs API. I will update the doc accordingly. Sorry the json types and error responses are missing. Below are some quick responses:

          > 4. case sensitivity - make the parameters lower case rather than have the filter convert them since pathname and the user name should not be converted.

          Only the parameter names are case insensitive. The parameter values are case sensitive except for op values and boolean values.

          > Permission masks are currently octal in webhdfs and symbolic in hoop. IMO, it should make sense to support both.

          I thought about adding chmod style symbolic permission. However, it does not make sense in webhdfs SETPERMISSION since it only sets absolute permission (e.g. "u=rwx,go=rx", which equals 755.) The relative permissions (e.g. "go-w") won't work. Hoop uses ls output style for setting permission (e.g. rwxr-xr-x). This seems uncommon.

          > File ranges, webhdfs uses HTTP 'content-ranges' header, ...

          I have implemented OPEN with offset and length parameters but decide to change it since it is not following the http spec. Won't it cause problems if webhdfs does follow the http spec?

          Show
          Tsz Wo Nicholas Sze added a comment - Thanks everyone for looking at the webhdfs API. I will update the doc accordingly. Sorry the json types and error responses are missing. Below are some quick responses: > 4. case sensitivity - make the parameters lower case rather than have the filter convert them since pathname and the user name should not be converted. Only the parameter names are case insensitive. The parameter values are case sensitive except for op values and boolean values. > Permission masks are currently octal in webhdfs and symbolic in hoop. IMO, it should make sense to support both. I thought about adding chmod style symbolic permission. However, it does not make sense in webhdfs SETPERMISSION since it only sets absolute permission (e.g. "u=rwx,go=rx", which equals 755.) The relative permissions (e.g. "go-w") won't work. Hoop uses ls output style for setting permission (e.g. rwxr-xr-x). This seems uncommon. > File ranges, webhdfs uses HTTP 'content-ranges' header, ... I have implemented OPEN with offset and length parameters but decide to change it since it is not following the http spec. Won't it cause problems if webhdfs does follow the http spec?
          Hide
          Alejandro Abdelnur added a comment -

          Regarding case sensitivity, given that HDFS URIs are case sensitive, being case insensitive in the querystring it would be confusing to users. Furthermore, being case insensitive in part of the querystring it would be more confusing. I'd propose we are case sensitive and all params and values we define are lowercase.

          Regarding permissions, you are correct, Hoop uses output style which are absolute. While uncommon I found the more intuitive. I'll make sure Hoop handles octal as well.

          Regarding file ranges, it is matter of convenience for users. I don't think it will cause problems because there are not libraries that handle webhdfs/hoop URLs operations (and use HTTP ranges), users will have to code the constructions of these URLs and then will use what we provide. Again, I see a big value if ALL webhdfs/hoop operations are 100% selfcontained in the URL.

          Show
          Alejandro Abdelnur added a comment - Regarding case sensitivity, given that HDFS URIs are case sensitive, being case insensitive in the querystring it would be confusing to users. Furthermore, being case insensitive in part of the querystring it would be more confusing. I'd propose we are case sensitive and all params and values we define are lowercase. Regarding permissions, you are correct, Hoop uses output style which are absolute. While uncommon I found the more intuitive. I'll make sure Hoop handles octal as well. Regarding file ranges, it is matter of convenience for users. I don't think it will cause problems because there are not libraries that handle webhdfs/hoop URLs operations (and use HTTP ranges), users will have to code the constructions of these URLs and then will use what we provide. Again, I see a big value if ALL webhdfs/hoop operations are 100% selfcontained in the URL.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          It seems that the Http spec does not force us to use Range header; see 14.35.2 Range Retrieval Requests

          A server MAY ignore the Range header. However, HTTP/1.1 origin servers and intermediate caches ought to support byte ranges when possible, since Range supports efficient recovery from partially failed transfers, and supports efficient partial retrieval of large entities.

          Show
          Tsz Wo Nicholas Sze added a comment - It seems that the Http spec does not force us to use Range header; see 14.35.2 Range Retrieval Requests A server MAY ignore the Range header. However, HTTP/1.1 origin servers and intermediate caches ought to support byte ranges when possible, since Range supports efficient recovery from partially failed transfers, and supports efficient partial retrieval of large entities.
          Hide
          Sanjay Radia added a comment -

          > ... embedding byte-ranges in the URL itself.
          This was the implementation a few days ago, It was changed to use content range header - fairly standard and it likely allow other tools to works seamlessly.

          Show
          Sanjay Radia added a comment - > ... embedding byte-ranges in the URL itself. This was the implementation a few days ago, It was changed to use content range header - fairly standard and it likely allow other tools to works seamlessly.
          Hide
          Sanjay Radia added a comment -

          Versioning: We were going with a previous suggestion to add a version parameter when we go to the next version.

          Show
          Sanjay Radia added a comment - Versioning: We were going with a previous suggestion to add a version parameter when we go to the next version.
          Hide
          Alejandro Abdelnur added a comment -
          • Regarding ranges: it is more about ease of use and full description of the resource fragment being fetched in the URL.
          • Regarding versioning: IMO it seems cleaner to do that at prefix level. Given a prefix a user will know what version of the API the server side supports. Third, from the implementation perspective, using a prefix in jax-rs instead a parameter allows to easily have a different driver classes; thus providing a clean separation & coexistence of implementations in the same server.
          Show
          Alejandro Abdelnur added a comment - Regarding ranges: it is more about ease of use and full description of the resource fragment being fetched in the URL. Regarding versioning: IMO it seems cleaner to do that at prefix level. Given a prefix a user will know what version of the API the server side supports. Third, from the implementation perspective, using a prefix in jax-rs instead a parameter allows to easily have a different driver classes; thus providing a clean separation & coexistence of implementations in the same server.
          Hide
          Alejandro Abdelnur added a comment -

          Any follow up in the open issues here?

          1. versioning
          2. ranges
          3. params case sensitivity
          4. permissions format
          5. protocol scheme to use
          6. how to get the create/append handle (https://issues.apache.org/jira/browse/HDFS-2178?focusedCommentId=13133189&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13133189)

          Also, an updated version of the docs with returned json payloads definitions and errors codes is still pending.

          Thanks.

          Show
          Alejandro Abdelnur added a comment - Any follow up in the open issues here? 1. versioning 2. ranges 3. params case sensitivity 4. permissions format 5. protocol scheme to use 6. how to get the create/append handle ( https://issues.apache.org/jira/browse/HDFS-2178?focusedCommentId=13133189&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13133189 ) Also, an updated version of the docs with returned json payloads definitions and errors codes is still pending. Thanks.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > 1. versioning

          Added v1 to the prefix, i.e. the url is http://nn:port/webhdfs/v1/path/to/file?op=...

          > 2. ranges

          Since it is optional in the http spec, let's use offset and length query parameters.

          > 3. params case sensitivity

          Won't case insensitive parameter names be more user friendly?

          > 4. permissions format

          Let's use octal. Handling sticky bit with ls output style is tricky.

          > 5. protocol scheme to use

          Do you mean the FileSystem scheme? Webhdfs is using "webhdfs://". I think "http://" is not suitable to be overloaded as a FileSystem scheme.

          > 6. how to get the create/append handle ...

          Commented on HDFS-2178.

          Sorry that I am still updating the doc.

          Show
          Tsz Wo Nicholas Sze added a comment - > 1. versioning Added v1 to the prefix, i.e. the url is http://nn:port/webhdfs/v1/path/to/file?op= ... > 2. ranges Since it is optional in the http spec, let's use offset and length query parameters. > 3. params case sensitivity Won't case insensitive parameter names be more user friendly? > 4. permissions format Let's use octal. Handling sticky bit with ls output style is tricky. > 5. protocol scheme to use Do you mean the FileSystem scheme? Webhdfs is using "webhdfs://". I think "http://" is not suitable to be overloaded as a FileSystem scheme. > 6. how to get the create/append handle ... Commented on HDFS-2178 . Sorry that I am still updating the doc.
          Hide
          Alejandro Abdelnur added a comment -

          On #3, it will be confusing to users that part of the URL is case sensitive (the path) and part it is not (the query-string). Given that HDFS is case sensitive, making the path case insensitive is not an option. Thus, I'm suggesting to make it all case sensitive and there will be no confusion there.

          On #5, It is not overloading the scheme, we are doing "http://", thus why I say we should use "http://". Whe n using curl you'll use http://

          Show
          Alejandro Abdelnur added a comment - On #3, it will be confusing to users that part of the URL is case sensitive (the path) and part it is not (the query-string). Given that HDFS is case sensitive, making the path case insensitive is not an option. Thus, I'm suggesting to make it all case sensitive and there will be no confusion there. On #5, It is not overloading the scheme, we are doing "http://", thus why I say we should use "http://". Whe n using curl you'll use http://
          Hide
          Alejandro Abdelnur added a comment -

          Current open issues:

          • 1. case insensitivity or lowercase of the query string
          • 2. rename-'destination'=param/status-reponses include or not the prefix (webhdfs/v1)
          • 3. schema to use, webhdfs:// or http://
          • 4. proxy user support via 'doas=' query string parameter
          • 5. API spec, JSON response payloads and response error codes

          For #1 and #2 Sanjay suggested 'lowercase' and 'include not'. Are we OK with that?

          Show
          Alejandro Abdelnur added a comment - Current open issues: 1. case insensitivity or lowercase of the query string 2. rename-'destination'=param/status-reponses include or not the prefix (webhdfs/v1) 3. schema to use, webhdfs:// or http:// 4. proxy user support via 'doas=' query string parameter 5. API spec, JSON response payloads and response error codes For #1 and #2 Sanjay suggested 'lowercase' and 'include not'. Are we OK with that?
          Hide
          Alejandro Abdelnur added a comment -

          I've been running tests to validate HTTP REST API compatibility between webhdfs and hoop. Following the issues I've found.

          • FileStatus JSON payload has elements that are not part of the FileStatus interface. The WebhdfsFileSystem client expects those elements and fails if they are not present. These elements are: localName, isSymlink, symlink. These elements are not later used and they are lost when creating a FileStatus in WebhdfsFileSystem. Either those elements should not be in JSON payload (my preference) or they should not be required by the WebhdfsFileSystem.
          • delete, rename, mkdirs, setReplication JSON responses use 'boolean' as element name, they should use the operation name as it is more descriptive.
          • FileChecksum JSON serialization is using the classname in the JSON payload, it should not, it should be something like:
          {
            "algorithm" : "foo",
            "bytes" : "hexabytes",
            "length" : 1000
          }
          
          • Same as FileChecksum JSON for ContentSummary JSON.
          Show
          Alejandro Abdelnur added a comment - I've been running tests to validate HTTP REST API compatibility between webhdfs and hoop. Following the issues I've found. FileStatus JSON payload has elements that are not part of the FileStatus interface. The WebhdfsFileSystem client expects those elements and fails if they are not present. These elements are: localName, isSymlink, symlink. These elements are not later used and they are lost when creating a FileStatus in WebhdfsFileSystem. Either those elements should not be in JSON payload (my preference) or they should not be required by the WebhdfsFileSystem. delete, rename, mkdirs, setReplication JSON responses use 'boolean' as element name, they should use the operation name as it is more descriptive. FileChecksum JSON serialization is using the classname in the JSON payload, it should not, it should be something like: { "algorithm" : "foo" , "bytes" : "hexabytes" , "length" : 1000 } Same as FileChecksum JSON for ContentSummary JSON.
          Hide
          Alejandro Abdelnur added a comment -

          Another thing I've noticed is that WebhdfsFileSystem is still using 100-Continue logic

          Show
          Alejandro Abdelnur added a comment - Another thing I've noticed is that WebhdfsFileSystem is still using 100-Continue logic
          Hide
          Alejandro Abdelnur added a comment -
          • FileStatus also seems to be using the classname (HdfsFileSatus) in the JSON payload.
          Show
          Alejandro Abdelnur added a comment - FileStatus also seems to be using the classname (HdfsFileSatus) in the JSON payload.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          @Alejandro,

          > GET GETHOMEDIRECTORY operation is missing.

          Do we really need it? DistributedFileSystem implements it by simple creating a path locally

          > The GETFILEBLOCKLOCATIONS, GETDELEGATIONTOKEN, RENEWDELEGATIONTOKEN, CANCELDELEGATIONTOKEN operations seem to be the ones that don't make sense (at the moment) in a proxy scenario. We should make those operations as optional.

          I agree that GETFILEBLOCKLOCATIONS should be a private API. Let's rename it to GET_BLOCK_LOCATIONS.

          For the delegation token ops, they seems making sense in proxy scenario. E.g. Oozie needs it with proxy.

          > The 'doas' query parameter is missing, this is required to enable proxyuser functionality.

          In RPC, we use "realUser" for the user submitting the call and "user" for the effective user, e.g. If Oozie performs an opertions as "nicholas", then realUser is "oozie" and user is "nicholas". How about we have something similar, say real.user and user.name?

          > The 'user.name' query parameter is optional as this is used only in the case of pseudo authentication, in the case of other authentication mechanism the username will be taken for the authentication credentials.

          Agree.

          Show
          Tsz Wo Nicholas Sze added a comment - @Alejandro, > GET GETHOMEDIRECTORY operation is missing. Do we really need it? DistributedFileSystem implements it by simple creating a path locally > The GETFILEBLOCKLOCATIONS, GETDELEGATIONTOKEN, RENEWDELEGATIONTOKEN, CANCELDELEGATIONTOKEN operations seem to be the ones that don't make sense (at the moment) in a proxy scenario. We should make those operations as optional. I agree that GETFILEBLOCKLOCATIONS should be a private API. Let's rename it to GET_BLOCK_LOCATIONS. For the delegation token ops, they seems making sense in proxy scenario. E.g. Oozie needs it with proxy. > The 'doas' query parameter is missing, this is required to enable proxyuser functionality. In RPC, we use "realUser" for the user submitting the call and "user" for the effective user, e.g. If Oozie performs an opertions as "nicholas", then realUser is "oozie" and user is "nicholas". How about we have something similar, say real.user and user.name? > The 'user.name' query parameter is optional as this is used only in the case of pseudo authentication, in the case of other authentication mechanism the username will be taken for the authentication credentials. Agree.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          @Nathan

          > "<namenode>:<port>" and "http://<host>:<port>" seem to be used interchangeably. We should be consistent where possible.

          You are right. I should use <host>:<port> only.

          > Why doesn't "curl -i -L "http://<host>:<port>/webhdfs/<path>" just work? Do we really need to specify op=OPEN for this very simple, common case?

          The op parameter does not have a default value. I think it may be confusing if we have a default - If we forgot to add op parameter, then it becomes a totally different operation.

          > I believe "http://<datanode>:<path>" should be "http://<datanode>:<port>" in append.

          Good catch!

          > Need format of responses spelled out.
          > It would be nice if we could document the possible error responses as well.

          Will post a updated doc with JSON responses and error responses soon.

          > Since a single datanode will be performing the write of a potentially large file, does that mean that file will have an entire copy on that node (due to block placement strategies)? That doesn't seem desirable..

          It is probably the case. We may change the block placement strategies as an improvement later on.

          > Is a SHORT sufficient for buffersize?

          It should be INT.

          > Do we need a renewlease? How will very slow writers be handled?

          A slow writer sends data to one of the datanodes using HTTP. That datanode uses a DFSClient to write data. The DFSClient is going to renews lease for the writer.

          > Once I have file block locations, can I go directly to those datanodes to retrieve rather than using content_range and always following a redirect?

          Yes. Clients could get block locations, construct the URLs itself and then talk to the datanodes directly. We should have an API to support this. E.g. GETFILEBLOCKLOCATIONS is better to return a list of URLs directly.

          GETFILEBLOCKLOCATIONS returns a LocatedBlocks structure which is not easy to use. I am changing GETFILEBLOCKLOCATIONS to GET_BLOCK_LOCATIONS, a private API.

          > Do we need flush/sync?

          Since the client is using HTTP, there is no way for them to call hflush. Let's leave this as a future improvement.

          Show
          Tsz Wo Nicholas Sze added a comment - @Nathan > "<namenode>:<port>" and "http://<host>:<port>" seem to be used interchangeably. We should be consistent where possible. You are right. I should use <host>:<port> only. > Why doesn't "curl -i -L "http://<host>:<port>/webhdfs/<path>" just work? Do we really need to specify op=OPEN for this very simple, common case? The op parameter does not have a default value. I think it may be confusing if we have a default - If we forgot to add op parameter, then it becomes a totally different operation. > I believe "http://<datanode>:<path>" should be "http://<datanode>:<port>" in append. Good catch! > Need format of responses spelled out. > It would be nice if we could document the possible error responses as well. Will post a updated doc with JSON responses and error responses soon. > Since a single datanode will be performing the write of a potentially large file, does that mean that file will have an entire copy on that node (due to block placement strategies)? That doesn't seem desirable.. It is probably the case. We may change the block placement strategies as an improvement later on. > Is a SHORT sufficient for buffersize? It should be INT. > Do we need a renewlease? How will very slow writers be handled? A slow writer sends data to one of the datanodes using HTTP. That datanode uses a DFSClient to write data. The DFSClient is going to renews lease for the writer. > Once I have file block locations, can I go directly to those datanodes to retrieve rather than using content_range and always following a redirect? Yes. Clients could get block locations, construct the URLs itself and then talk to the datanodes directly. We should have an API to support this. E.g. GETFILEBLOCKLOCATIONS is better to return a list of URLs directly. GETFILEBLOCKLOCATIONS returns a LocatedBlocks structure which is not easy to use. I am changing GETFILEBLOCKLOCATIONS to GET_BLOCK_LOCATIONS, a private API. > Do we need flush/sync? Since the client is using HTTP, there is no way for them to call hflush. Let's leave this as a future improvement.
          Hide
          Alejandro Abdelnur added a comment -

          @Nicholas,

          • gethomedir is required as you don't know where the FS impl creates the home dir.
          • The doAs parameter name was chosen to mimic the Java API, real.user is kind of confusing, do you mean the proxy user or the doAs user. Plus when doing Kerberos you don't use user.name. IMO doAs is a easier not to get confused.
          • The GET_BLOCK_LOCATIONS private API, how are you differentiating private and public APIs?
          • Oozie could use delegation token or it could use doAs.
          Show
          Alejandro Abdelnur added a comment - @Nicholas, gethomedir is required as you don't know where the FS impl creates the home dir. The doAs parameter name was chosen to mimic the Java API, real.user is kind of confusing, do you mean the proxy user or the doAs user. Plus when doing Kerberos you don't use user.name. IMO doAs is a easier not to get confused. The GET_BLOCK_LOCATIONS private API, how are you differentiating private and public APIs? Oozie could use delegation token or it could use doAs.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          @Alejandro

          > On #3, it will be confusing to users that part of the URL is case sensitive (the path) and part it is not (the query-string). Given that HDFS is case sensitive, making the path case insensitive is not an option. Thus, I'm suggesting to make it all case sensitive and there will be no confusion there.

          We cannot make it all case sensitive, e.g. scheme and authority are case insensitive. For examples,

          Show
          Tsz Wo Nicholas Sze added a comment - @Alejandro > On #3, it will be confusing to users that part of the URL is case sensitive (the path) and part it is not (the query-string). Given that HDFS is case sensitive, making the path case insensitive is not an option. Thus, I'm suggesting to make it all case sensitive and there will be no confusion there. We cannot make it all case sensitive, e.g. scheme and authority are case insensitive. For examples, hTTps://issues.apache.org/jira/browse/HDFS-2316 - works https://iSSues.apache.org/jira/browse/HDFS-2316 - works https://issues.apache.org/jira/BRowse/HDFS-2316 - does not works
          Show
          Tsz Wo Nicholas Sze added a comment - https://issues.apache.org/jira/browse/hdFS-2316 - also works
          Hide
          Alejandro Abdelnur added a comment -

          scheme & authority are case insensitive by definition. This is well known and expected.

          However, path & query string are not. Regarding your last example, that is JIRA functionality. And illustrates my point, the fact that 'browse' is case sensitive and 'hdfs' is not it will be confusing.

          Show
          Alejandro Abdelnur added a comment - scheme & authority are case insensitive by definition. This is well known and expected. However, path & query string are not. Regarding your last example, that is JIRA functionality. And illustrates my point, the fact that 'browse' is case sensitive and 'hdfs' is not it will be confusing.
          Hide
          Alejandro Abdelnur added a comment -

          Any update on the open issues? I'd like to get them taken care to be able to finalize HDFS-2178 accordingly.

          Show
          Alejandro Abdelnur added a comment - Any update on the open issues? I'd like to get them taken care to be able to finalize HDFS-2178 accordingly.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > gethomedir is required as you don't know where the FS impl creates the home dir.

          Okay, I can add it to webhdfs.

          > The doAs parameter name was chosen to mimic the Java API, real.user is kind of confusing, do you mean the proxy user or the doAs user. Plus when doing Kerberos you don't use user.name. IMO doAs is a easier not to get confused.

          I am fine with using doAs instead of realUser.

          BTW, hadoop auth use "user.name". The dot in the middle is different from other naming convention. How about changing it to username?

          > The GET_BLOCK_LOCATIONS private API, how are you differentiating private and public APIs?

          We will document it and state that it is a private unstable API.

          Show
          Tsz Wo Nicholas Sze added a comment - > gethomedir is required as you don't know where the FS impl creates the home dir. Okay, I can add it to webhdfs. > The doAs parameter name was chosen to mimic the Java API, real.user is kind of confusing, do you mean the proxy user or the doAs user. Plus when doing Kerberos you don't use user.name. IMO doAs is a easier not to get confused. I am fine with using doAs instead of realUser. BTW, hadoop auth use "user.name". The dot in the middle is different from other naming convention. How about changing it to username? > The GET_BLOCK_LOCATIONS private API, how are you differentiating private and public APIs? We will document it and state that it is a private unstable API.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > FileStatus JSON payload has elements that are not part of the FileStatus interface. The WebhdfsFileSystem client expects those elements and fails if they are not present. These elements are: localName, isSymlink, symlink. These elements are not later used and they are lost when creating a FileStatus in WebhdfsFileSystem. Either those elements should not be in JSON payload (my preference) or they should not be required by the WebhdfsFileSystem.

          localName is for reducing the response size. It does not include the path prefix. Otherwise, the same path prefix have to be sent for each status. It becomes a problem if the number of statuses is huge.

          symlink it in 0.23. It is a bug that it is not used.

          > delete, rename, mkdirs, setReplication JSON responses use 'boolean' as element name, they should use the operation name as it is more descriptive.

          Similar to other responses, they need a root element. The key of the root element is the type/class. Then, the client can determine how to parse the JSON object by checking the key.

          > FileChecksum JSON serialization is using the classname in the JSON payload, it should not, ...

          The classname is the root element. It is required by other format such as xml.

          Show
          Tsz Wo Nicholas Sze added a comment - > FileStatus JSON payload has elements that are not part of the FileStatus interface. The WebhdfsFileSystem client expects those elements and fails if they are not present. These elements are: localName, isSymlink, symlink. These elements are not later used and they are lost when creating a FileStatus in WebhdfsFileSystem. Either those elements should not be in JSON payload (my preference) or they should not be required by the WebhdfsFileSystem. localName is for reducing the response size. It does not include the path prefix. Otherwise, the same path prefix have to be sent for each status. It becomes a problem if the number of statuses is huge. symlink it in 0.23. It is a bug that it is not used. > delete, rename, mkdirs, setReplication JSON responses use 'boolean' as element name, they should use the operation name as it is more descriptive. Similar to other responses, they need a root element. The key of the root element is the type/class. Then, the client can determine how to parse the JSON object by checking the key. > FileChecksum JSON serialization is using the classname in the JSON payload, it should not, ... The classname is the root element. It is required by other format such as xml.
          Hide
          Alejandro Abdelnur added a comment -
          • The user.name would have to be changed in hadoop-auth, it is independent of webhdfs/hoop.

          Let me recap all outstanding issues in a follow up comment.

          Show
          Alejandro Abdelnur added a comment - The user.name would have to be changed in hadoop-auth, it is independent of webhdfs/hoop. Let me recap all outstanding issues in a follow up comment.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          WebHdfsAPI20111103.pdf: added JSON and error reponses.

          Not that GETHOMEDIRECTORY and doAs are not included yet.

          Show
          Tsz Wo Nicholas Sze added a comment - WebHdfsAPI20111103.pdf: added JSON and error reponses. Not that GETHOMEDIRECTORY and doAs are not included yet.
          Hide
          Alejandro Abdelnur added a comment -

          Thanks for the updated PDF with the API, looks good.

          Following are the remaining issues:

          1. Regarding FileStatus containing symlink & isSymlink elements. Got it, they should. It would be enough to have symlink as an optional element, thus reducing the size of the response.

          2. Regarding using 'username' parameter instead of 'user.name'. This comes from hadoop-auth (Alfredo), it should be changed there not here.

          3. Regarding querystring parameters/values case sensitive or no. IMO, as path is case sensitive, querystring should be as well not to create confusion with developers/users.

          4. Regarding filestatus containing localname instead full path to make payload smaller; it makes sense. But shouldn't be just called 'name'?

          5. Regarding filestatus, delete, rename, mkdirs, setreplication payloads and root element being a classname. JSON does not require a root element, a JSON response can be an list of key/value pairs (JSON object). I'd prefer to keep it like that. Specially for filestatus when doing a liststatus operation, else they payload will increase significantly in size. Another issue with the name of the class is that it should be an public class, not an implementation one (currently is using 'HdfsFileStatus').

          You mention that the root element class is added because of XML requiring a root element. We are not spec-ing XML here. So I don't see this as a requirement. And if somebody is doing JSON to XML they should account for that in the transcoding.

          6. Regarding the scheme to use, "webhdfs://" and "http://". We are doing HTTP, this is why, IMO, we should use "http://". For example, when using curl you'll use "http://" not "webhdfs://"; it will be less confusing to developers.

          Show
          Alejandro Abdelnur added a comment - Thanks for the updated PDF with the API, looks good. Following are the remaining issues: 1. Regarding FileStatus containing symlink & isSymlink elements. Got it, they should. It would be enough to have symlink as an optional element, thus reducing the size of the response. 2. Regarding using 'username' parameter instead of 'user.name'. This comes from hadoop-auth (Alfredo), it should be changed there not here. 3. Regarding querystring parameters/values case sensitive or no. IMO, as path is case sensitive, querystring should be as well not to create confusion with developers/users. 4. Regarding filestatus containing localname instead full path to make payload smaller; it makes sense. But shouldn't be just called 'name'? 5. Regarding filestatus, delete, rename, mkdirs, setreplication payloads and root element being a classname. JSON does not require a root element, a JSON response can be an list of key/value pairs (JSON object). I'd prefer to keep it like that. Specially for filestatus when doing a liststatus operation, else they payload will increase significantly in size. Another issue with the name of the class is that it should be an public class, not an implementation one (currently is using 'HdfsFileStatus'). You mention that the root element class is added because of XML requiring a root element. We are not spec-ing XML here. So I don't see this as a requirement. And if somebody is doing JSON to XML they should account for that in the transcoding. 6. Regarding the scheme to use, "webhdfs://" and "http://". We are doing HTTP, this is why, IMO, we should use "http://". For example, when using curl you'll use "http://" not "webhdfs://"; it will be less confusing to developers.
          Hide
          Alejandro Abdelnur added a comment -

          #2, the reason for using 'user.name' is that hadoop pseudo uses the 'user.name' system property.

          Show
          Alejandro Abdelnur added a comment - #2, the reason for using 'user.name' is that hadoop pseudo uses the 'user.name' system property.
          Hide
          Arpit Gupta added a comment -

          5. Regarding filestatus, delete, rename, mkdirs, setreplication payloads and root element being a classname. JSON does not require a root element, a JSON response can be an list of key/value pairs (JSON object). I'd prefer to keep it like that. Specially for filestatus when doing a liststatus operation, else they payload will increase significantly in size. Another issue with the name of the class is that it should be an public class, not an implementation one (currently is using 'HdfsFileStatus').

          You mention that the root element class is added because of XML requiring a root element. We are not spec-ing XML here. So I don't see this as a requirement. And if somebody is doing JSON to XML they should account for that in the transcoding.

          I dont think the size increases significantly by adding the root element. Especially for the liststatus call it is the following if the root element is there

          {"HdfsFileStatuses":{"HdfsFileStatus":[]}}
          

          or if the root is not there

          {"HdfsFileStatus":[]}
          

          I do no think this adds to much size to the response.

          Show
          Arpit Gupta added a comment - 5. Regarding filestatus, delete, rename, mkdirs, setreplication payloads and root element being a classname. JSON does not require a root element, a JSON response can be an list of key/value pairs (JSON object). I'd prefer to keep it like that. Specially for filestatus when doing a liststatus operation, else they payload will increase significantly in size. Another issue with the name of the class is that it should be an public class, not an implementation one (currently is using 'HdfsFileStatus'). You mention that the root element class is added because of XML requiring a root element. We are not spec-ing XML here. So I don't see this as a requirement. And if somebody is doing JSON to XML they should account for that in the transcoding. I dont think the size increases significantly by adding the root element. Especially for the liststatus call it is the following if the root element is there { "HdfsFileStatuses" :{ "HdfsFileStatus" :[]}} or if the root is not there { "HdfsFileStatus" :[]} I do no think this adds to much size to the response.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          1. Agree, symlink could be optional. BTW, the "isDir" and "isSymlink" are replaced with "type", which is an enum

          {FILE, DIRECTORY, SYMLINK}

          2. Sure. Let's change it in a separated JIRA then.

          3. Path is a parameter value, therefore case sensitive.

          I think case sensitive causes more confusion:

          Q: Why this is not working? http://nn:port/webhdfs/v1/path?Op=GETFILECHECKSUM
          A: You must use lower case: "Op" should be "op"

          Q: Why this is not working? http://nn:port/webhdfs/v1/path?op=GETFILECHECKSUM&does=nicholas
          A: It is a typo: "does" should be "doas".
          Q: But "doas" looks more like a typo than "does". I wish I can use "doAs"

          How about op values and boolean values? Do you also think that they should be case sensitive?

          4. "name" should like file/directory names. "localName" is an empty string if the full path is given. How about calling it "pathSuffix"?

          5. As Arpit mentioned, the payload for liststatus won't be increased significantly. We only needs two more words per request (instead of one more word per status.)

          Is Hoop going to support file system other than HDFS?

          It is common to convert JSON to/from XML. We should make the convertion trivial.

          6. Webhdfs and hoop should not share the same scheme since they require different implementatons. Webhdfs should use "webhdfs://". For hoop, I suggest not using "http://" as a file system scheme.

          Show
          Tsz Wo Nicholas Sze added a comment - 1. Agree, symlink could be optional. BTW, the "isDir" and "isSymlink" are replaced with "type", which is an enum {FILE, DIRECTORY, SYMLINK} 2. Sure. Let's change it in a separated JIRA then. 3. Path is a parameter value, therefore case sensitive. I think case sensitive causes more confusion: Q: Why this is not working? http://nn:port/webhdfs/v1/path?Op=GETFILECHECKSUM A: You must use lower case: "Op" should be "op" Q: Why this is not working? http://nn:port/webhdfs/v1/path?op=GETFILECHECKSUM&does=nicholas A: It is a typo: "does" should be "doas". Q: But "doas" looks more like a typo than "does". I wish I can use "doAs" How about op values and boolean values? Do you also think that they should be case sensitive? 4. "name" should like file/directory names. "localName" is an empty string if the full path is given. How about calling it "pathSuffix"? 5. As Arpit mentioned, the payload for liststatus won't be increased significantly. We only needs two more words per request (instead of one more word per status.) Is Hoop going to support file system other than HDFS? It is common to convert JSON to/from XML. We should make the convertion trivial. 6. Webhdfs and hoop should not share the same scheme since they require different implementatons. Webhdfs should use "webhdfs://". For hoop, I suggest not using "http://" as a file system scheme.
          Hide
          Alejandro Abdelnur added a comment -

          @Arpit,

          Now I got how you are proposing the json payload for filestatuses. You are correct, the overhead is minimal.

          @Nicholas,

          Regarding #1, 'type' sounds good.

          Regarding #2, ok.

          Regarding #3, having params being case sensitive it does not mean they have to be all lowercase. Hoop originally used case sensitive parameters using camelCase, thus the 'doas' parameter was 'doAs'. How about going back to that for all parameter names and values. And for the 'op' values it means they mimic the FileSystem method names (that was also the initial motivation on Hoop).

          Regarding #4, Having 'name' and 'localname' is not clear when you'll have one or the other. If you have the full path it means the FileStatus is selfcontained and you don't need to know the requested URL to know the file location in the filesystem and the payloads or filestatuses are bigger. Having 'localname' is the other way around, you need to know the requested URL to know the file location in the file system but the payloads of filestatuses will be bigger. IMO we should choose one. I prefer the full path because it makes the filestatus selfcontained, regarding the size of the payload, I wouldn't worry much about it as we are always talking about the contents of a single directory. And we are using a verbose syntax afterwards. And you could use compression in the server responses.

          Regarding #5, My issue here is that having an extra nested level for a possible conversion to XML. Is this a users requirement? If not I'd prefer to keep it without the class name.

          Hoop can proxy any filesystem implementation. Because of this the HTTP REST API should be restricted to the FileSystem public API; without exposing implementation specifics.

          Regarding #6, I disagree, all this discussion we are having to have a single HTTP REST API between Hoop and WebHDFS is to achieve interoperability between implementations and make it transparent to users.

          Show
          Alejandro Abdelnur added a comment - @Arpit, Now I got how you are proposing the json payload for filestatuses. You are correct, the overhead is minimal. @Nicholas, Regarding #1, 'type' sounds good. Regarding #2, ok. Regarding #3, having params being case sensitive it does not mean they have to be all lowercase. Hoop originally used case sensitive parameters using camelCase, thus the 'doas' parameter was 'doAs'. How about going back to that for all parameter names and values. And for the 'op' values it means they mimic the FileSystem method names (that was also the initial motivation on Hoop). Regarding #4, Having 'name' and 'localname' is not clear when you'll have one or the other. If you have the full path it means the FileStatus is selfcontained and you don't need to know the requested URL to know the file location in the filesystem and the payloads or filestatuses are bigger. Having 'localname' is the other way around, you need to know the requested URL to know the file location in the file system but the payloads of filestatuses will be bigger. IMO we should choose one. I prefer the full path because it makes the filestatus selfcontained, regarding the size of the payload, I wouldn't worry much about it as we are always talking about the contents of a single directory. And we are using a verbose syntax afterwards. And you could use compression in the server responses. Regarding #5, My issue here is that having an extra nested level for a possible conversion to XML. Is this a users requirement? If not I'd prefer to keep it without the class name. Hoop can proxy any filesystem implementation. Because of this the HTTP REST API should be restricted to the FileSystem public API; without exposing implementation specifics. Regarding #6, I disagree, all this discussion we are having to have a single HTTP REST API between Hoop and WebHDFS is to achieve interoperability between implementations and make it transparent to users.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > Regarding #3, having params being case sensitive it does not mean they have to be all lowercase. ...

          This is a problem of case sensitive names: people have different preferences. From the people I talked so far, if the names are case sensitive, some prefer lowercase and some prefer camelCase. However, if the names are case insensitive, everyone is fine (they won't ask for case sensitive.)

          > Regarding #4, Having 'name' and 'localname' is not clear when you'll have one or the other. ...

          I recall that full path had been sent in RPC through FileStatus but there were some issues and then HdfsFileStatus with "localName" was added. I think we should not make the same mistake again. Does "pathSuffix" sound good to you?

          > Regarding #5, My issue here is that having an extra nested level for a possible conversion to XML. ...

          We should support XML in the near future since some users may prefer XML over JSON.

          > Regarding #6, I disagree, ...

          I see your point. Are you going to use WebHdfsFileSystem in HDFS for hoop but not adding another FileSystem implementation?

          Show
          Tsz Wo Nicholas Sze added a comment - > Regarding #3, having params being case sensitive it does not mean they have to be all lowercase. ... This is a problem of case sensitive names: people have different preferences. From the people I talked so far, if the names are case sensitive, some prefer lowercase and some prefer camelCase. However, if the names are case insensitive, everyone is fine (they won't ask for case sensitive.) > Regarding #4, Having 'name' and 'localname' is not clear when you'll have one or the other. ... I recall that full path had been sent in RPC through FileStatus but there were some issues and then HdfsFileStatus with "localName" was added. I think we should not make the same mistake again. Does "pathSuffix" sound good to you? > Regarding #5, My issue here is that having an extra nested level for a possible conversion to XML. ... We should support XML in the near future since some users may prefer XML over JSON. > Regarding #6, I disagree, ... I see your point. Are you going to use WebHdfsFileSystem in HDFS for hoop but not adding another FileSystem implementation?
          Hide
          Alejandro Abdelnur added a comment -

          Regarding #3, IMO having parts of the URL being case sensitive (the path & the param names) and part of the url being case sensitive (the param values) is an issue. We cannot make the path case-insensitive because file names are not. Because of that I'm suggesting all case sensitive. But if we make the the param names case insensitive we should make the param values case insensitive as well. We just have to make sure we don't modify the case of parameters as in certain cases (a rename or a changeOwner may cause undesirable results).

          Regarding #4, pathSuffix is good. Still this means that a FileStatus response requires knowledge of the requested URL to be able to know which file we are talking about.

          Regarding #5, If we are going to support XML, we can easily add the root elements to XML. Adding a couple of nested levels because of XML conversion does not seem right. Furthermore, I would assume that for performance reasons, when generating XML we'll do directly XML via a Provider, we won't generate JSON and then convert it to XML.

          Regarding #6, Regardless if Hoop as a FileSystem implementation, they should be interoperable. This would mean that a distcp would work without changes even if the infrastructure setup changes from hoop to webhdfs or viceversa.

          Thanks.

          Alejandro

          Show
          Alejandro Abdelnur added a comment - Regarding #3, IMO having parts of the URL being case sensitive (the path & the param names) and part of the url being case sensitive (the param values) is an issue. We cannot make the path case-insensitive because file names are not. Because of that I'm suggesting all case sensitive. But if we make the the param names case insensitive we should make the param values case insensitive as well. We just have to make sure we don't modify the case of parameters as in certain cases (a rename or a changeOwner may cause undesirable results). Regarding #4, pathSuffix is good. Still this means that a FileStatus response requires knowledge of the requested URL to be able to know which file we are talking about. Regarding #5, If we are going to support XML, we can easily add the root elements to XML. Adding a couple of nested levels because of XML conversion does not seem right. Furthermore, I would assume that for performance reasons, when generating XML we'll do directly XML via a Provider, we won't generate JSON and then convert it to XML. Regarding #6, Regardless if Hoop as a FileSystem implementation, they should be interoperable. This would mean that a distcp would work without changes even if the infrastructure setup changes from hoop to webhdfs or viceversa. Thanks. Alejandro
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > Regarding #3, ... if we make the the param names case insensitive we should make the param values case insensitive as well. ...

          For the case insensitive programming languages, identifer names are often case insensitive but string values are case sensitive. SQL is an example.

          > Regarding #4, ...

          Yes, it requires the knowledge of the request. Even if absolute paths are provided, it requires the knowledge of the request to know which NameNode it is referring to. We simply cannot put everything in the response.

          > Regarding #5, If we are going to support XML, we can easily add the root elements to XML. ...

          But then the JSON schema and the XML schema will be different.

          It does make sense to first generate JSON in the server side and then convert to XML in the client side since the XML payload is heavy.

          > Regarding #6, Regardless if Hoop as a FileSystem implementation, ...

          Even the share the same FileSystem scheme, users have to change their configuration to the corresponding implementation in webhdfs and hoop do not share the same FileSystem implementation.

          Show
          Tsz Wo Nicholas Sze added a comment - > Regarding #3, ... if we make the the param names case insensitive we should make the param values case insensitive as well. ... For the case insensitive programming languages, identifer names are often case insensitive but string values are case sensitive. SQL is an example. > Regarding #4, ... Yes, it requires the knowledge of the request. Even if absolute paths are provided, it requires the knowledge of the request to know which NameNode it is referring to. We simply cannot put everything in the response. > Regarding #5, If we are going to support XML, we can easily add the root elements to XML. ... But then the JSON schema and the XML schema will be different. It does make sense to first generate JSON in the server side and then convert to XML in the client side since the XML payload is heavy. > Regarding #6, Regardless if Hoop as a FileSystem implementation, ... Even the share the same FileSystem scheme, users have to change their configuration to the corresponding implementation in webhdfs and hoop do not share the same FileSystem implementation.
          Hide
          Alejandro Abdelnur added a comment -

          Regarding #1, I'm not sure SQL is a good example of this, SQL is a mess, it depends on the SQL vendor and how the SQL DB is configured.

          Regarding #4, OK

          Regarding #5, assuming that that is the case (that clients would do the conversion), would you please tell me what kind of libraries would help to do such conversion by having the root/array names as being proposed? And how this conversion is made simple by having those elements?

          Regarding #6, let me rephrase my previous comment, if we have full interoperability between webhdfs and hoop then I don't see a need for having 2 client implementations of the 'http' filesystem.

          Show
          Alejandro Abdelnur added a comment - Regarding #1, I'm not sure SQL is a good example of this, SQL is a mess, it depends on the SQL vendor and how the SQL DB is configured. Regarding #4, OK Regarding #5, assuming that that is the case (that clients would do the conversion), would you please tell me what kind of libraries would help to do such conversion by having the root/array names as being proposed? And how this conversion is made simple by having those elements? Regarding #6, let me rephrase my previous comment, if we have full interoperability between webhdfs and hoop then I don't see a need for having 2 client implementations of the 'http' filesystem.
          Hide
          Arpit Gupta added a comment -

          Regarding #5, assuming that that is the case (that clients would do the conversion), would you please tell me what kind of libraries would help to do such conversion by having the root/array names as being proposed? And how this conversion is made simple by having those elements?

          One can use http://www.json.org/java/index.html to convert json into an xml string and then create a java dom object. If the root element is not present one will get an exception when building a dom object.

          Show
          Arpit Gupta added a comment - Regarding #5, assuming that that is the case (that clients would do the conversion), would you please tell me what kind of libraries would help to do such conversion by having the root/array names as being proposed? And how this conversion is made simple by having those elements? One can use http://www.json.org/java/index.html to convert json into an xml string and then create a java dom object. If the root element is not present one will get an exception when building a dom object.
          Hide
          Alejandro Abdelnur added a comment -

          @Arpit,

          I'm trying to understand in a general way how the additional levels indicating a class-name (container-name) simplify the creation of XML. I've tried using json-lib JSON to XML but it does not achieve the desired results. Futhermore, with json-lib it seems easier not have the class-name (container-name).

          Again, I mean in a 'general way'. Having a syntax that is convenient for parsing using a specific library doesn't seem the right approach.

          Show
          Alejandro Abdelnur added a comment - @Arpit, I'm trying to understand in a general way how the additional levels indicating a class-name (container-name) simplify the creation of XML. I've tried using json-lib JSON to XML but it does not achieve the desired results. Futhermore, with json-lib it seems easier not have the class-name (container-name). Again, I mean in a 'general way'. Having a syntax that is convenient for parsing using a specific library doesn't seem the right approach.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Alejandro, Http field names are case insensitive.

          ... Each header field consists of a name followed by a colon (":") and the field value. Field names are case-insensitive. ...

          For more details, see HTTP/1.1 Section 4.2.

          So the query parameter names should also be case insensitive. Okay?

          Show
          Tsz Wo Nicholas Sze added a comment - Hi Alejandro, Http field names are case insensitive. ... Each header field consists of a name followed by a colon (":") and the field value. Field names are case-insensitive. ... For more details, see HTTP/1.1 Section 4.2 . So the query parameter names should also be case insensitive. Okay?
          Hide
          Alejandro Abdelnur added a comment -

          Nicholas, my concern is not regarding case sensitive or case insensitive; but regarding consistency. In your suggested approach.

          *1 path is case sensitive
          *2 query parameter names are case insensitive
          *3 some query parameter values are case sensitive (destination, owner, group, user.name, doAs)
          *4 some query parameter values are case insensitive (override)

          Note that in #3 we don't have an option as the corresponding underlying entities are case sensitive

          And I don't recall now in your proposal if 'op' is case sensitive or not.

          My take is that if we are consistent, this will be easier for users. As we cannot go all case insensitive this is why I'm suggesting all case sensitive.

          Show
          Alejandro Abdelnur added a comment - Nicholas, my concern is not regarding case sensitive or case insensitive; but regarding consistency. In your suggested approach. *1 path is case sensitive *2 query parameter names are case insensitive *3 some query parameter values are case sensitive (destination, owner, group, user.name, doAs) *4 some query parameter values are case insensitive (override) Note that in #3 we don't have an option as the corresponding underlying entities are case sensitive And I don't recall now in your proposal if 'op' is case sensitive or not. My take is that if we are consistent, this will be easier for users. As we cannot go all case insensitive this is why I'm suggesting all case sensitive.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          In HTTP/1.1 or other programming languages which support case insensitive, names are case insensitive but string values are case sensitive. Our case is the same: parameter names are case insensitive and string values (including paths) are case sensitive.

          Case insensitive approaches are designed for inexperienced users. For experienced users, either way is fine as long as the documentation is clear. SQL, HTTP and other case insensitive examples like BASIC are all designed for inexperienced users in order to cover a wider audience. Some may claim that SQL and HTTP are mess but no one can deny that they are the most popular standards.

          Show
          Tsz Wo Nicholas Sze added a comment - In HTTP/1.1 or other programming languages which support case insensitive, names are case insensitive but string values are case sensitive. Our case is the same: parameter names are case insensitive and string values (including paths) are case sensitive. Case insensitive approaches are designed for inexperienced users. For experienced users, either way is fine as long as the documentation is clear. SQL, HTTP and other case insensitive examples like BASIC are all designed for inexperienced users in order to cover a wider audience. Some may claim that SQL and HTTP are mess but no one can deny that they are the most popular standards.
          Hide
          Alejandro Abdelnur added a comment -

          Nicholas, I guess we are stale here and it is matter of personal preference. It would be good to hear others here. But at this point, I'd be good with either approach; I want to get this going. It should be 100% clear in the documentation what is case sensitive and what is not, parameter name/value one by one.

          Still open are #5 and #6.

          Show
          Alejandro Abdelnur added a comment - Nicholas, I guess we are stale here and it is matter of personal preference. It would be good to hear others here. But at this point, I'd be good with either approach; I want to get this going. It should be 100% clear in the documentation what is case sensitive and what is not, parameter name/value one by one. Still open are #5 and #6.
          Hide
          Arpit Gupta added a comment -

          @Alejandro

          Again, I mean in a 'general way'. Having a syntax that is convenient for parsing using a specific library doesn't seem the right approach.

          I am not sure why the approach i suggested is not a general way. The current response we send allows users to create a dom object from the json response. If the root object is not present in that case the user would have to write specific code for different api calls and add the root object when needed. Thus i think what we have right now allows for the general way rather than specific solutions for different api calls.

          The benefit for having a response that can be converted to valid xml is that in future if we want to support xml response there is no schema change needed between xml and json.

          Also clients that are using java can use that java xpath libraries to parse the data. I am not sure if json has something as strong as xpath that one can use.

          Here you can see an example where a response has both json and xml responses

          yql call to get weather info
          xml -> http://goo.gl/i2Gii
          json -> http://goo.gl/osChW

          .
          So i believe our json response should be returning a root object.

          Show
          Arpit Gupta added a comment - @Alejandro Again, I mean in a 'general way'. Having a syntax that is convenient for parsing using a specific library doesn't seem the right approach. I am not sure why the approach i suggested is not a general way. The current response we send allows users to create a dom object from the json response. If the root object is not present in that case the user would have to write specific code for different api calls and add the root object when needed. Thus i think what we have right now allows for the general way rather than specific solutions for different api calls. The benefit for having a response that can be converted to valid xml is that in future if we want to support xml response there is no schema change needed between xml and json. Also clients that are using java can use that java xpath libraries to parse the data. I am not sure if json has something as strong as xpath that one can use. Here you can see an example where a response has both json and xml responses yql call to get weather info xml -> http://goo.gl/i2Gii json -> http://goo.gl/osChW . So i believe our json response should be returning a root object.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Regarding to #6, suppose webhdfs and hoop share the same FileSystem scheme. I think "http" as a FileSystem scheme is not an option. Otherwise, we cannot easily tell whether "http://host:port/path/to/file" is a http URL or a FileSystem URI.

          Show
          Tsz Wo Nicholas Sze added a comment - Regarding to #6, suppose webhdfs and hoop share the same FileSystem scheme. I think "http" as a FileSystem scheme is not an option. Otherwise, we cannot easily tell whether "http://host:port/path/to/file" is a http URL or a FileSystem URI.
          Hide
          Milind Bhandarkar added a comment -

          Before I get tired of the case-sensitivity arguments, let me ask you who you are designing the system for ? I suppose that is for folks like me, who have used the URL scheme for more than 18 years now. So, here is my take: anything after that host:port/ is case sensitive. (Because after host:port/, I know that it refers to a file system, or a "resource" that ultimately refers to a file system.) So, please stop arguing, and design it for curmudgeons like me. Even without the reading glasses, I can recognize the difference between capital and small letters. Thank you !

          Show
          Milind Bhandarkar added a comment - Before I get tired of the case-sensitivity arguments, let me ask you who you are designing the system for ? I suppose that is for folks like me, who have used the URL scheme for more than 18 years now. So, here is my take: anything after that host:port/ is case sensitive. (Because after host:port/, I know that it refers to a file system, or a "resource" that ultimately refers to a file system.) So, please stop arguing, and design it for curmudgeons like me. Even without the reading glasses, I can recognize the difference between capital and small letters. Thank you !
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Milind, as mentioned earlier, either case sensitive or not is fine for experienced users. No?

          Show
          Tsz Wo Nicholas Sze added a comment - Hi Milind, as mentioned earlier, either case sensitive or not is fine for experienced users. No?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          WebHdfsAPI20111111.pdf

          Revised the API doc. Thanks everyone who has commented on the previous versions!

          Show
          Tsz Wo Nicholas Sze added a comment - WebHdfsAPI20111111.pdf Revised the API doc. Thanks everyone who has commented on the previous versions!
          Hide
          Tsz Wo Nicholas Sze added a comment -

          test-webhdfs
          test-webhdfs-0.20s

          Scripts to run all webhdfs related unit tests.

          Show
          Tsz Wo Nicholas Sze added a comment - test-webhdfs test-webhdfs-0.20s Scripts to run all webhdfs related unit tests.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Closing this since all tasks for 205.1 are done. There is one remaining issue (HDFS-2545) for 0.23 though. Please feel free to create JIRAs if you find any bug.

          Show
          Tsz Wo Nicholas Sze added a comment - Closing this since all tasks for 205.1 are done. There is one remaining issue ( HDFS-2545 ) for 0.23 though. Please feel free to create JIRAs if you find any bug.
          Hide
          Alejandro Abdelnur added a comment -

          Nicholas,

          Thanks for updating the spec. Getting there. A few follow up comments/open-issues:

          1. Case insensitivity of the param names/values.

          Current opinions favor case sensitivity in the spec (IMO the implementation could be case insensitive, but the spec should be case sensitive. The client/server components should produce as per spec, they could be lax to understand. In other words Postel Law).

          2. Inconsistency on the JSON responses names:

          • MAKEDIRS/RENAME/DELETE/SETREPLICATION returns: { "boolean" : <BOOLEAN> }
          • GETHOMEDIRECTORY returns: { "path" : "<PATH>" }
          • GETDELEGATIONTOKEN returns: { "urlString" : "<DT>" }
          • RENEWDELEGATIONTOKEN returns: { "long" : <LONG> }
          • GETFILESTATUS returns: { "fileStatus" : .... }
          • LISTSTATUS returns: { "fileStatuses" : .... }

          Sometimes are basic types, sometimes are structure names, sometimes are functional names ('urlString').

          Because structure names give a sense of functional names, i'd suggest we use functional names for everything. Then it would be:

          • MAKEDIRS/RENAME/DELETE/SETREPLICATION returns: { "mkdirs/rename/.." : <BOOLEAN> }
          • GETHOMEDIRECTORY returns: { "homeDir" : "<PATH>" }
          • GETDELEGATIONTOKEN returns: { "delegationToken" : "<DT>" }
          • RENEWDELEGATIONTOKEN returns: { "delegationTokenRenewal" : <LONG> }
          • GETFILESTATUS returns: { "fileStatus" : .... }
          • LISTSTATUS returns: { "fileStatuses" : .... }

          3. FileStatus does not have symlinkPath for symlinks.

          symlinkPath should be an optional value (and the same for pathSuffix when the status is for a symlink)

          4. FileStatus permission is a String but the permission parameter is a short

          Both should be short (octal).

          5. The encoding of the delegation token, both as parameter and as response is not defined.

          The encoding should be the same in both cases (I assume, from the code, you are using HEXA. If so, wouldn't BASE64 be a more common encoding to use?)

          6. The final doc that gets checked in should no include authors, same as we don't use @author tag in the code.

          Show
          Alejandro Abdelnur added a comment - Nicholas, Thanks for updating the spec. Getting there. A few follow up comments/open-issues: 1. Case insensitivity of the param names/values. Current opinions favor case sensitivity in the spec (IMO the implementation could be case insensitive, but the spec should be case sensitive. The client/server components should produce as per spec, they could be lax to understand. In other words Postel Law). 2. Inconsistency on the JSON responses names: MAKEDIRS/RENAME/DELETE/SETREPLICATION returns: { "boolean" : <BOOLEAN> } GETHOMEDIRECTORY returns: { "path" : "<PATH>" } GETDELEGATIONTOKEN returns: { "urlString" : "<DT>" } RENEWDELEGATIONTOKEN returns: { "long" : <LONG> } GETFILESTATUS returns: { "fileStatus" : .... } LISTSTATUS returns: { "fileStatuses" : .... } Sometimes are basic types, sometimes are structure names, sometimes are functional names ('urlString'). Because structure names give a sense of functional names, i'd suggest we use functional names for everything. Then it would be: MAKEDIRS/RENAME/DELETE/SETREPLICATION returns: { "mkdirs/rename/.." : <BOOLEAN> } GETHOMEDIRECTORY returns: { "homeDir" : "<PATH>" } GETDELEGATIONTOKEN returns: { "delegationToken" : "<DT>" } RENEWDELEGATIONTOKEN returns: { "delegationTokenRenewal" : <LONG> } GETFILESTATUS returns: { "fileStatus" : .... } LISTSTATUS returns: { "fileStatuses" : .... } 3. FileStatus does not have symlinkPath for symlinks. symlinkPath should be an optional value (and the same for pathSuffix when the status is for a symlink) 4. FileStatus permission is a String but the permission parameter is a short Both should be short (octal). 5. The encoding of the delegation token, both as parameter and as response is not defined. The encoding should be the same in both cases (I assume, from the code, you are using HEXA. If so, wouldn't BASE64 be a more common encoding to use?) 6. The final doc that gets checked in should no include authors, same as we don't use @author tag in the code.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Alejandro,

          > 1. Case insensitivity of the param names/values.

          Let's follow HTTP/1.1. They have stated it in the spec.

          > 2. Inconsistency on the JSON responses names:

          They are indeed consistent: If the return type is a JSON primitive type (string, boolean or number), the format is

          {"type": <TYPE>}
          

          If the return type is a class but not an array, then the format is

          {
            "Class":
            { 
              "FieldA": <TypeA>,
              "FieldB": <TypeB>,
              ...
            }
          }
          

          If the return type is an array, then the format is

          {
            "Classes":
            { 
              "Class":
              [
                { 
                  "FieldA": <TypeA>,
                  "FieldB": <TypeB>,
                  ...
                },
                { 
                  "FieldA": <TypeA>,
                  "FieldB": <TypeB>,
                  ...
                },
                ...
              ]
            }
          }
          

          BTW, there is a typo in your comment: There is "Token:" in the GETDELEGATIONTOKEN return type, i.e.

           {"Token": { "urlString" : "<DT>" } }
          

          Okay, just found out that GETHOMEDIRECTORY does not follow above rules. "Path" should be "string". But it seems that "Path" makes more sense, what do you think?

          I did use function names in an earlier implementation but changed it to type/class name since it seems that return types should not associate with function names. Just like in Java, you won't put the method name in the object of return class.

          > 3. FileStatus does not have symlinkPath for symlinks.

          Yes, symlink is optional. I have not included it in the doc. BTW, how about calling in "symlink" instead of "symlinkPath"?

          > 4. FileStatus permission is a String but the permission parameter is a short

          If we use short, then the octal 777 will be shown as 511, which seems very confusing. So, string is used for representing octal. In the url, it is actually a string as everything is a string.

          > 5. The encoding of the delegation token, both as parameter and as response is not defined.

          It use Hadoop delegation token encoding, i.e. Token.encodeToUrlString() and Token.decodeFromUrlString(..).

          > 6. The final doc that gets checked in should no include authors, ...

          Agree.

          Show
          Tsz Wo Nicholas Sze added a comment - Hi Alejandro, > 1. Case insensitivity of the param names/values. Let's follow HTTP/1.1. They have stated it in the spec. > 2. Inconsistency on the JSON responses names: They are indeed consistent: If the return type is a JSON primitive type (string, boolean or number), the format is {"type": <TYPE>} If the return type is a class but not an array, then the format is { "Class": { "FieldA": <TypeA>, "FieldB": <TypeB>, ... } } If the return type is an array, then the format is { "Classes": { "Class": [ { "FieldA": <TypeA>, "FieldB": <TypeB>, ... }, { "FieldA": <TypeA>, "FieldB": <TypeB>, ... }, ... ] } } BTW, there is a typo in your comment: There is "Token:" in the GETDELEGATIONTOKEN return type, i.e. {"Token": { "urlString" : "<DT>" } } Okay, just found out that GETHOMEDIRECTORY does not follow above rules. "Path" should be "string". But it seems that "Path" makes more sense, what do you think? I did use function names in an earlier implementation but changed it to type/class name since it seems that return types should not associate with function names. Just like in Java, you won't put the method name in the object of return class. > 3. FileStatus does not have symlinkPath for symlinks. Yes, symlink is optional. I have not included it in the doc. BTW, how about calling in "symlink" instead of "symlinkPath"? > 4. FileStatus permission is a String but the permission parameter is a short If we use short, then the octal 777 will be shown as 511, which seems very confusing. So, string is used for representing octal. In the url, it is actually a string as everything is a string. > 5. The encoding of the delegation token, both as parameter and as response is not defined. It use Hadoop delegation token encoding, i.e. Token.encodeToUrlString() and Token.decodeFromUrlString(..). > 6. The final doc that gets checked in should no include authors, ... Agree.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Created HDFS-2552 for adding forrest doc.

          Show
          Tsz Wo Nicholas Sze added a comment - Created HDFS-2552 for adding forrest doc.
          Hide
          Alejandro Abdelnur added a comment -

          Nicholas,

          On #1, sorry don't agree, you are referring to HTTP/1.1 headers. We are talking about URLs here, except for scheme and host everything else should case sensitive. Milind has also stated his opinion on this in favor of case sensitive. Again, this is by spec; still the implementation could be lenient.

          On #2, if we go for types, then GETHOMEDIRECTORY uses

          { "Path" : "<PATH>" }

          , using similar reasoning GETDELEGATIONTOKEN should return

          { "Token" : "<TOKEN>" }

          (you are skipping the field name of the structure type as the token seems to treated as an opaque value).

          On #3, "symlink" is fine

          On #4, then we should state in the parameter 'permission' that is a String with valid values being a 3 digits octal number or '[0-7][0-7][0-7]'.

          On #5, if the Token is to be decoded/parsed by a client in any way, it should be stated, independent of Hadoop how to do the decoding.

          Thanks.

          Show
          Alejandro Abdelnur added a comment - Nicholas, On #1, sorry don't agree, you are referring to HTTP/1.1 headers. We are talking about URLs here, except for scheme and host everything else should case sensitive. Milind has also stated his opinion on this in favor of case sensitive. Again, this is by spec; still the implementation could be lenient. On #2, if we go for types, then GETHOMEDIRECTORY uses { "Path" : "<PATH>" } , using similar reasoning GETDELEGATIONTOKEN should return { "Token" : "<TOKEN>" } (you are skipping the field name of the structure type as the token seems to treated as an opaque value). On #3, "symlink" is fine On #4, then we should state in the parameter 'permission' that is a String with valid values being a 3 digits octal number or ' [0-7] [0-7] [0-7] '. On #5, if the Token is to be decoded/parsed by a client in any way, it should be stated, independent of Hadoop how to do the decoding. Thanks.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Alejandro,

          We are getting very close. All except #2 are documentation discussion. Let's continue documentation related discussion in HDFS-2552. I will response to your other points there.

          For #2, the reason of having "urlString" is to allow changing encoding/token format later on, e.g.

          {"Token": { "newEncoding" : <TOKEN> } }
          

          For path, it seems using a string is just fine. But if you disagree, I have no problem to change the JSON response for GETHOMEDIRECTORY to

          {"string": <PATH_STRING>}
          

          Which one do you prefer,

          {"string": <PATH>}

          or

          {"Path": <PATH>}

          ?

          BTW, thank you so much for all the comments. They are all very helpful!

          Show
          Tsz Wo Nicholas Sze added a comment - Alejandro, We are getting very close. All except #2 are documentation discussion. Let's continue documentation related discussion in HDFS-2552 . I will response to your other points there. For #2, the reason of having "urlString" is to allow changing encoding/token format later on, e.g. {"Token": { "newEncoding" : <TOKEN> } } For path, it seems using a string is just fine. But if you disagree, I have no problem to change the JSON response for GETHOMEDIRECTORY to {"string": <PATH_STRING>} Which one do you prefer, {"string": <PATH>} or {"Path": <PATH>} ? BTW, thank you so much for all the comments. They are all very helpful!
          Hide
          Alejandro Abdelnur added a comment -

          Regarding #2, regarding the token encoding, then it should be something like 'encodedToken' and whatever it is the encoding it should be clearly documented if the token is to be used (other than as an opaque value by clients).

          Regarding #2, 'Path' is fine.

          Thanks.

          Show
          Alejandro Abdelnur added a comment - Regarding #2, regarding the token encoding, then it should be something like 'encodedToken' and whatever it is the encoding it should be clearly documented if the token is to be used (other than as an opaque value by clients). Regarding #2, 'Path' is fine. Thanks.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Let's call the current encoding "urlString" which is the name used in the Hadoop code. Okay?

          You are right that the documentation must be clear.

          Show
          Tsz Wo Nicholas Sze added a comment - Let's call the current encoding "urlString" which is the name used in the Hadoop code. Okay? You are right that the documentation must be clear.
          Hide
          Alejandro Abdelnur added a comment -

          we are using an enum

          {FILE, DIRECTORY, SYMLINK}

          for the FileStatus, but the constructor of FileStatus, for symlinks, takes a isDir boolean.

          This means we are losing the isFile/isDir information in the case of symlinks but it seems that the HDFS implementation keeps that.

          If this is the case, we should add that info either as a new enum set or instead using enum using 2 booleans.

          Show
          Alejandro Abdelnur added a comment - we are using an enum {FILE, DIRECTORY, SYMLINK} for the FileStatus, but the constructor of FileStatus, for symlinks, takes a isDir boolean. This means we are losing the isFile/isDir information in the case of symlinks but it seems that the HDFS implementation keeps that. If this is the case, we should add that info either as a new enum set or instead using enum using 2 booleans.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          For symlinks, should it be isDir==false and isFile==false?

          Show
          Tsz Wo Nicholas Sze added a comment - For symlinks, should it be isDir==false and isFile==false?
          Hide
          Alejandro Abdelnur added a comment -

          No, no, the options would be:

          1. enum {FILE, DIRECTORY, FILE_SYMLINK, DIRECTORY_SYMLINK}
          2. boolean isDir, boolean isSymlink

          I'd prefer #2, but I'm good either way.

          Show
          Alejandro Abdelnur added a comment - No, no, the options would be: enum {FILE, DIRECTORY, FILE_SYMLINK, DIRECTORY_SYMLINK} boolean isDir, boolean isSymlink I'd prefer #2, but I'm good either way.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          INodeSymlink.isDirectory() always return false, how could we have DIRECTORY_SYMLINK? Do you mean for some FileSystem other than HDFS?

          Show
          Tsz Wo Nicholas Sze added a comment - INodeSymlink.isDirectory() always return false, how could we have DIRECTORY_SYMLINK? Do you mean for some FileSystem other than HDFS?
          Hide
          Alejandro Abdelnur added a comment -

          well, then we are good, when symlink isDir is always false.

          Show
          Alejandro Abdelnur added a comment - well, then we are good, when symlink isDir is always false.
          Hide
          Alejandro Abdelnur added a comment -

          symlinks already return false for isDir, not an issue.

          Show
          Alejandro Abdelnur added a comment - symlinks already return false for isDir, not an issue.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          @Alejandro, thanks for checking it.

          Show
          Tsz Wo Nicholas Sze added a comment - @Alejandro, thanks for checking it.
          Hide
          Eli Collins added a comment -

          Nicholas / Tucu - see HADOOP-6585 for the rationale.

          Show
          Eli Collins added a comment - Nicholas / Tucu - see HADOOP-6585 for the rationale.
          Hide
          Matt Foley added a comment -

          re-opening to accomodate request to fix also in 0.22.0.

          Show
          Matt Foley added a comment - re-opening to accomodate request to fix also in 0.22.0.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Commit #217 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/217/)
          HDFS-2316. Record completion of umbrella jira. Contributed by Tsz Wo (Nicholas), Sze.

          mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206989
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Commit #217 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/217/ ) HDFS-2316 . Record completion of umbrella jira. Contributed by Tsz Wo (Nicholas), Sze. mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206989 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-0.23-Commit #220 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/220/)
          HDFS-2316. Record completion of umbrella jira. Contributed by Tsz Wo (Nicholas), Sze.

          mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206989
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          Hudson added a comment - Integrated in Hadoop-Common-0.23-Commit #220 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/220/ ) HDFS-2316 . Record completion of umbrella jira. Contributed by Tsz Wo (Nicholas), Sze. mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206989 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #1403 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1403/)
          HDFS-2316. Record completion of umbrella jira. Contributed by Tsz Wo (Nicholas), Sze.

          mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206990
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #1403 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1403/ ) HDFS-2316 . Record completion of umbrella jira. Contributed by Tsz Wo (Nicholas), Sze. mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206990 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-0.23-Commit #231 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/231/)
          HDFS-2316. Record completion of umbrella jira. Contributed by Tsz Wo (Nicholas), Sze.

          mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206989
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Commit #231 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/231/ ) HDFS-2316 . Record completion of umbrella jira. Contributed by Tsz Wo (Nicholas), Sze. mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206989 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #1329 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1329/)
          HDFS-2316. Record completion of umbrella jira. Contributed by Tsz Wo (Nicholas), Sze.

          mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206990
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1329 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1329/ ) HDFS-2316 . Record completion of umbrella jira. Contributed by Tsz Wo (Nicholas), Sze. mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206990 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #1353 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1353/)
          HDFS-2316. Record completion of umbrella jira. Contributed by Tsz Wo (Nicholas), Sze.

          mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206990
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1353 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1353/ ) HDFS-2316 . Record completion of umbrella jira. Contributed by Tsz Wo (Nicholas), Sze. mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206990 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #91 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/91/)
          HDFS-2316. Record completion of umbrella jira. Contributed by Tsz Wo (Nicholas), Sze.

          mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206989
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #91 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/91/ ) HDFS-2316 . Record completion of umbrella jira. Contributed by Tsz Wo (Nicholas), Sze. mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206989 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #878 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/878/)
          HDFS-2316. Record completion of umbrella jira. Contributed by Tsz Wo (Nicholas), Sze.

          mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206990
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #878 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/878/ ) HDFS-2316 . Record completion of umbrella jira. Contributed by Tsz Wo (Nicholas), Sze. mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206990 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-0.23-Build #107 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/107/)
          HDFS-2316. Record completion of umbrella jira. Contributed by Tsz Wo (Nicholas), Sze.

          mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206989
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Build #107 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/107/ ) HDFS-2316 . Record completion of umbrella jira. Contributed by Tsz Wo (Nicholas), Sze. mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206989 Files : /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #911 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/911/)
          HDFS-2316. Record completion of umbrella jira. Contributed by Tsz Wo (Nicholas), Sze.

          mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206990
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #911 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/911/ ) HDFS-2316 . Record completion of umbrella jira. Contributed by Tsz Wo (Nicholas), Sze. mattf : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1206990 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Fixing WebHDFS capitalization. Will also fix CHANGES.txt files.

          Show
          Tsz Wo Nicholas Sze added a comment - Fixing WebHDFS capitalization. Will also fix CHANGES.txt files.
          Hide
          Harsh J added a comment -

          Reverting back to the previous fix version. This was fixed in 0.23.1 and 1.0.0, and is in the re-opened state for 0.22 alone.

          Show
          Harsh J added a comment - Reverting back to the previous fix version. This was fixed in 0.23.1 and 1.0.0, and is in the re-opened state for 0.22 alone.
          Hide
          Konstantin Shvachko added a comment -

          Harsh, Nicholas. I don't think anybody cares porting it to 0.22 at this point.
          Should we resolve it then?

          Show
          Konstantin Shvachko added a comment - Harsh, Nicholas. I don't think anybody cares porting it to 0.22 at this point. Should we resolve it then?

            People

            • Assignee:
              Tsz Wo Nicholas Sze
              Reporter:
              Tsz Wo Nicholas Sze
            • Votes:
              0 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development