Jackrabbit Content Repository
  1. Jackrabbit Content Repository
  2. JCR-3005

Make it possible to get multiple nodes in one call via davex

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.3.6
    • Component/s: jackrabbit-jcr-server
    • Labels:
      None

      Description

      I'm working on this currently

        Activity

        Hide
        Christian Stocker added a comment -

        Attached is a patch, which basically implements this. It's a Proof of Concept and far from perfect. But I wanted to get some feedback before I invest more time

        It is modelled the same way as COPY, CLONE and DIFF in JcrRemotingServlet.java

        The basic idea is taken from http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API but I choose not to use JSON for submitting the data for consistency and code reuseability

        You can call it with

        curl -X POST -d ':get=/node1&:get=/node2' http://localhost:8080/server/workspace/jcr:root/node2.0.json

        and it returns an array of all the found nodes as a JSON array

        Questions:

        • Should we return the amount of found nodes as well. Something like {"total_nodes": 3, "nodes": [node1,node2,node3]}

          ?

        • Now, if the node in the URL matches on of the nodes in the body, it is only returned once, if the same node is mentioned more than once in the body it is returned more than once. What should be the approach here? Should we avoid duplicate nodes or just live with it?
        • What should happen, if there's a node not found. I prefer the "just ignore it" way
        • What should happen, if there's any other error? Report that or just return what can be returend?
        • I added the path of the node as "jcr:path" to the data. Is this the right approach or does anyone have a better idea?
        Show
        Christian Stocker added a comment - Attached is a patch, which basically implements this. It's a Proof of Concept and far from perfect. But I wanted to get some feedback before I invest more time It is modelled the same way as COPY, CLONE and DIFF in JcrRemotingServlet.java The basic idea is taken from http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API but I choose not to use JSON for submitting the data for consistency and code reuseability You can call it with curl -X POST -d ':get=/node1&:get=/node2' http://localhost:8080/server/workspace/jcr:root/node2.0.json and it returns an array of all the found nodes as a JSON array Questions: Should we return the amount of found nodes as well. Something like {"total_nodes": 3, "nodes": [node1,node2,node3]} ? Now, if the node in the URL matches on of the nodes in the body, it is only returned once, if the same node is mentioned more than once in the body it is returned more than once. What should be the approach here? Should we avoid duplicate nodes or just live with it? What should happen, if there's a node not found. I prefer the "just ignore it" way What should happen, if there's any other error? Report that or just return what can be returend? I added the path of the node as "jcr:path" to the data. Is this the right approach or does anyone have a better idea?
        Hide
        Lukas Kahwe Smith added a comment -
        • I think it would be a nice touch and probably fairly cheap to add some metadata to the result, "total_nodes" would be one obvious one. not sure if it makes sense to add other metadata like duplicates, missing or stuff like that, at any rate it would be good to prepare the structure returned for this so that we can add anything else that makes sense later without BC breaks
        • I think it should only be once, actually I think the nodes should be a hash map with path: node pairs
        • Missing nodes should just be ignored (see also http://java.net/jira/browse/JSR_333-38)
        • If any nodes can be found, imho they should be returned, errors should if at all be returned as part of the metadata (see above)
        • Like I said, I would prefer a hash map including the path to identify the node in the nodes list
        Show
        Lukas Kahwe Smith added a comment - I think it would be a nice touch and probably fairly cheap to add some metadata to the result, "total_nodes" would be one obvious one. not sure if it makes sense to add other metadata like duplicates, missing or stuff like that, at any rate it would be good to prepare the structure returned for this so that we can add anything else that makes sense later without BC breaks I think it should only be once, actually I think the nodes should be a hash map with path: node pairs Missing nodes should just be ignored (see also http://java.net/jira/browse/JSR_333-38 ) If any nodes can be found, imho they should be returned, errors should if at all be returned as part of the metadata (see above) Like I said, I would prefer a hash map including the path to identify the node in the nodes list
        Hide
        Christian Stocker added a comment -

        I agree with Lukas, something like
        {"nodes":{"/node1":

        {"value1":"data1"}

        ,"/node2":

        {"value1":"data2"}

        }}

        shout allow for forward compatibility and make it easy for clients to parse, we could leave out the jcr:path value then.

        I'll try to adjust to patch soon

        Show
        Christian Stocker added a comment - I agree with Lukas, something like {"nodes":{"/node1": {"value1":"data1"} ,"/node2": {"value1":"data2"} }} shout allow for forward compatibility and make it easy for clients to parse, we could leave out the jcr:path value then. I'll try to adjust to patch soon
        Hide
        Christian Stocker added a comment -

        Here's the new patch, which returns the data as discussed in the comments
        Something like: {"nodes":{"/article":

        {data}, "/article2":{data}

        }}

        Show
        Christian Stocker added a comment - Here's the new patch, which returns the data as discussed in the comments Something like: {"nodes":{"/article": {data}, "/article2":{data} }}
        Hide
        Jukka Zitting added a comment -

        Looks good, thanks! I committed the patch as-is in revision 1179531 and changed it slightly in revision 1179532 to allow exceptions from the getMultiple method instead of just logging them (BTW, we use SLF4J for logging).

        Some comments on the implementation:

        • Wouldn't it make more sense for this to be a GET request rather than a POST? Something like: curl 'http://localhost:8080/server/workspace/jcr:root/node2.0.json?:get=/node1&:get=/node2'
        • Instead of implementing parts of the JSON serialization in the getSimple() method, it would be cleaner to add a method like JsonWriter.write(Node[] nodes, int depth)
        • The return statement inside the conditional in doPost() increases the complexity of the already tricky flow of that method. Can we refactor the code slightly to make the flow more obvious?
        Show
        Jukka Zitting added a comment - Looks good, thanks! I committed the patch as-is in revision 1179531 and changed it slightly in revision 1179532 to allow exceptions from the getMultiple method instead of just logging them (BTW, we use SLF4J for logging). Some comments on the implementation: Wouldn't it make more sense for this to be a GET request rather than a POST? Something like: curl 'http://localhost:8080/server/workspace/jcr:root/node2.0.json?:get=/node1&:get=/node2' Instead of implementing parts of the JSON serialization in the getSimple() method, it would be cleaner to add a method like JsonWriter.write(Node[] nodes, int depth) The return statement inside the conditional in doPost() increases the complexity of the already tricky flow of that method. Can we refactor the code slightly to make the flow more obvious?
        Hide
        Christian Stocker added a comment -
        • GET instead of POST. I decided against that, since the URL can grow very large if you ask for many nodes

        The other parts I can look into, but I'm not the java expert, so if anyone else feels inclined to do that, feel free. I'm fine with everything.

        Show
        Christian Stocker added a comment - GET instead of POST. I decided against that, since the URL can grow very large if you ask for many nodes The other parts I can look into, but I'm not the java expert, so if anyone else feels inclined to do that, feel free. I'm fine with everything.
        Hide
        angela added a comment -

        to me the patch looks awkward.
        the response should somehow be in relation to the requested resource... to return the json representation of other possible unrelated resources looks odd to me.

        why don't you handle that with a specific REPORT request to the workspace resource?
        that' would make more sense to me.

        angela

        Show
        angela added a comment - to me the patch looks awkward. the response should somehow be in relation to the requested resource... to return the json representation of other possible unrelated resources looks odd to me. why don't you handle that with a specific REPORT request to the workspace resource? that' would make more sense to me. angela
        Hide
        angela added a comment -

        jukka, i am a little bit surprised that you commit a patch before having your comments/questions sorted out.
        i would opt for reverting that until we reach some sort of consensus about this issue.

        Show
        angela added a comment - jukka, i am a little bit surprised that you commit a patch before having your comments/questions sorted out. i would opt for reverting that until we reach some sort of consensus about this issue.
        Hide
        Jukka Zitting added a comment -

        > commit a patch before having your comments/questions sorted out

        It's easier to work on incrementally improving this when the code is already in svn. This way we can better isolate individual improvements to separate patches/commits instead of working just on a one big patch.

        Show
        Jukka Zitting added a comment - > commit a patch before having your comments/questions sorted out It's easier to work on incrementally improving this when the code is already in svn. This way we can better isolate individual improvements to separate patches/commits instead of working just on a one big patch.
        Hide
        Jukka Zitting added a comment -

        > Instead of implementing parts of the JSON serialization in the getSimple() method, it would be cleaner to add a method like JsonWriter.write(Node[] nodes, int depth)

        Done in revision 1179541.

        Show
        Jukka Zitting added a comment - > Instead of implementing parts of the JSON serialization in the getSimple() method, it would be cleaner to add a method like JsonWriter.write(Node[] nodes, int depth) Done in revision 1179541.
        Hide
        angela added a comment -

        > It's easier to work on incrementally improving this when the code is already in svn.

        that's true... but on the other hand i wouldn't want to have strange features to be released
        which we then have to stick with for compatibility reasons.

        can you please make sure that this feature is not released next week unless we found a consensus?
        thanks.

        Show
        angela added a comment - > It's easier to work on incrementally improving this when the code is already in svn. that's true... but on the other hand i wouldn't want to have strange features to be released which we then have to stick with for compatibility reasons. can you please make sure that this feature is not released next week unless we found a consensus? thanks.
        Hide
        Jukka Zitting added a comment - - edited

        As a general rule I wouldn't consider us committed to providing backwards compatibility on any feature that we haven't documented (I'd count a relevant Jira issue marked as resolved in a specific release as the minimal level of documentation about a new feature). Anyway, I see your point and will make sure that this is either resolved in time for 2.3.1 or that the feature is disabled in the release until we have consensus about the exact implementation.

        In revision 1179564 I changed this feature to respond only to GET requests on a workspace resource. If such a request contains one or more :path parameters, then the JSON response containing information about all the identified nodes/subtrees is returned. For example:

        $ curl http://localhost:8080/server/default?:path=/node1&:path=/node2
        {"nodes":{"/node1":

        {...},"/node2":{...}

        }}

        An optional :depth parameter can be used to override the configured default subtree depth to include. Note that the current code doesn't support the configuration of default depth per node type, only the overall default depth is used.

        Notes:

        • I prefer GET over REPORT for this since GET requests are generally easier to handle by many clients and they are better understood by various intermediaries like web proxies or firewalls
        • I see the point of using POST instead of GET for a large request, but I'd rather handle that using a generic POST-to-GET conversion so we can support both GET and POST requests for this information.
        Show
        Jukka Zitting added a comment - - edited As a general rule I wouldn't consider us committed to providing backwards compatibility on any feature that we haven't documented (I'd count a relevant Jira issue marked as resolved in a specific release as the minimal level of documentation about a new feature). Anyway, I see your point and will make sure that this is either resolved in time for 2.3.1 or that the feature is disabled in the release until we have consensus about the exact implementation. In revision 1179564 I changed this feature to respond only to GET requests on a workspace resource. If such a request contains one or more :path parameters, then the JSON response containing information about all the identified nodes/subtrees is returned. For example: $ curl http://localhost:8080/server/default?:path=/node1&:path=/node2 {"nodes":{"/node1": {...},"/node2":{...} }} An optional :depth parameter can be used to override the configured default subtree depth to include. Note that the current code doesn't support the configuration of default depth per node type, only the overall default depth is used. Notes: I prefer GET over REPORT for this since GET requests are generally easier to handle by many clients and they are better understood by various intermediaries like web proxies or firewalls I see the point of using POST instead of GET for a large request, but I'd rather handle that using a generic POST-to-GET conversion so we can support both GET and POST requests for this information.
        Hide
        Jukka Zitting added a comment -

        In revision 1179606 I restored support for POST requests accessing this same functionality. I looked at implementing a more generic POST-to-GET conversion, but that probably needs a bit more thought.

        I also added some javadoc documentation about this multi-read feature.

        I guess the only thing still needed is a test case. Anything else we should still change?

        Show
        Jukka Zitting added a comment - In revision 1179606 I restored support for POST requests accessing this same functionality. I looked at implementing a more generic POST-to-GET conversion, but that probably needs a bit more thought. I also added some javadoc documentation about this multi-read feature. I guess the only thing still needed is a test case. Anything else we should still change?
        Hide
        angela added a comment -

        what i meant was: the request target should be the workspace resource. simply retrieving the workspace name from the locator wasn't my intention.
        it still looks wrong to me that you request ANY dav resource and get back the json-serialization of some other resources.

        Show
        angela added a comment - what i meant was: the request target should be the workspace resource. simply retrieving the workspace name from the locator wasn't my intention. it still looks wrong to me that you request ANY dav resource and get back the json-serialization of some other resources.
        Hide
        Jukka Zitting added a comment -

        > simply retrieving the workspace name from the locator wasn't my intention.

        I'm just using the locator to check whether the request is targeting a workspace resource or not. Is there a better way to do that? The other alternative I considered was using instanceof WorkspaceResource, but that didn't seem any cleaner. Ideally I think we should allow the resource itself to process the HTTP request, but that would require quite a bit of refactoring of the current codebase.

        > request ANY dav resource and get back the json-serialization of some other resources.

        In the HTTP sense the resource is the one identified by the full URI (e.g. .../default?:path=/node1&path=/node2) instead of just the path part (.../default), so I don't see a fundamental disconnect here. From the perspective of a clean URI space though it would of course be nicer if the multi-read case was better rooted to a single root resource.

        One way we could achieve that would be to introduce extra ":include" and ":exclude" patterns for the normal batch-read functionality. Such patterns direct which parts of the subtree rooted at the identified resources should be included in or excluded from the JSON response. For example:

        $ curl http://localhost:8080/server/default/jcr:root/path.json?:include=node1&:include=node2
        {...,"node1":

        {...},"node2":{...}}

        or

        $ curl http://localhost:8080/server/default/jcr:root/.json?:include=path&:exclude=path/node2
        {...,"path":{"node1":{...}

        ,"node3":

        {...}

        }}

        Show
        Jukka Zitting added a comment - > simply retrieving the workspace name from the locator wasn't my intention. I'm just using the locator to check whether the request is targeting a workspace resource or not. Is there a better way to do that? The other alternative I considered was using instanceof WorkspaceResource, but that didn't seem any cleaner. Ideally I think we should allow the resource itself to process the HTTP request, but that would require quite a bit of refactoring of the current codebase. > request ANY dav resource and get back the json-serialization of some other resources. In the HTTP sense the resource is the one identified by the full URI (e.g. .../default?:path=/node1&path=/node2) instead of just the path part (.../default), so I don't see a fundamental disconnect here. From the perspective of a clean URI space though it would of course be nicer if the multi-read case was better rooted to a single root resource. One way we could achieve that would be to introduce extra ":include" and ":exclude" patterns for the normal batch-read functionality. Such patterns direct which parts of the subtree rooted at the identified resources should be included in or excluded from the JSON response. For example: $ curl http://localhost:8080/server/default/jcr:root/path.json?:include=node1&:include=node2 {...,"node1": {...},"node2":{...}} or $ curl http://localhost:8080/server/default/jcr:root/.json?:include=path&:exclude=path/node2 {...,"path":{"node1":{...} ,"node3": {...} }}
        Hide
        angela added a comment -

        > One way we could achieve that would be to introduce extra ":include" and ":exclude" patterns for the normal batch-read functionality.
        > Such patterns direct which parts of the subtree rooted at the identified resources should be included in or excluded from the JSON response.

        that would make much more sense to me.

        Show
        angela added a comment - > One way we could achieve that would be to introduce extra ":include" and ":exclude" patterns for the normal batch-read functionality. > Such patterns direct which parts of the subtree rooted at the identified resources should be included in or excluded from the JSON response. that would make much more sense to me.
        Hide
        Jukka Zitting added a comment -

        OK, I'm working on a solution based on the proposed include/exclude patterns.

        Show
        Jukka Zitting added a comment - OK, I'm working on a solution based on the proposed include/exclude patterns.
        Hide
        Jukka Zitting added a comment -

        In revision 1181746 I implemented initial support for such :include parameters, interpreted relative to the addressed node resource. The JSON output format is still the same as proposed above and the depth of included subtrees is set based on the addressed resource instead of the included nodes themselves. No :exclude support is included. A better alternative could be to treat the :include rules more like extensions to the specified depth and have the results be embedded inside the normal JSON tree, but that still requires extra work.

        Unscheduling from 2.3.1 to give us more time to come up with a solution that works for everyone. To reflect this unfinished state I added a -DJCR-3305=true feature flag that needs to be used to enable the current multi-read code. I also commented out the Javadoc documentation for now.

        Show
        Jukka Zitting added a comment - In revision 1181746 I implemented initial support for such :include parameters, interpreted relative to the addressed node resource. The JSON output format is still the same as proposed above and the depth of included subtrees is set based on the addressed resource instead of the included nodes themselves. No :exclude support is included. A better alternative could be to treat the :include rules more like extensions to the specified depth and have the results be embedded inside the normal JSON tree, but that still requires extra work. Unscheduling from 2.3.1 to give us more time to come up with a solution that works for everyone. To reflect this unfinished state I added a -DJCR-3305=true feature flag that needs to be used to enable the current multi-read code. I also commented out the Javadoc documentation for now.
        Hide
        Lukas Kahwe Smith added a comment -

        how are things looking with this ticket? I tried briefly with a 2.3.2 dev build using "-DJCR-3305=true" and couldn't get it to work. Should I try again?
        This patch is the last one required to get PHPCR working with Jackrabbit.

        Show
        Lukas Kahwe Smith added a comment - how are things looking with this ticket? I tried briefly with a 2.3.2 dev build using "-DJCR-3305=true" and couldn't get it to work. Should I try again? This patch is the last one required to get PHPCR working with Jackrabbit.
        Hide
        Christian Stocker added a comment -

        so, some testing:

        curl http://localhost:8080/server/default/jcr:root/foo.0.json?:include=bar
        

        works (but doesn't return the /foo node),

        curl http://localhost:8080/server/default/jcr:root/.0.json?:include=/foo/bar
        

        as well

        But I can't get it to work with POST

        curl -d :include=bar http://localhost:8081/server/default/jcr:root/foo.0.json
        

        returns a

        <?xml version="1.0" encoding="UTF-8" standalone="no"?>
        <D:error xmlns:D="DAV:">
        <dcr:exception xmlns:dcr="http://www.day.com/jcr/webdav/1.0">
        <dcr:class>org.apache.jackrabbit.spi.commons.conversion.IllegalNameException</dcr:class>
        <dcr:message>Prefix must not be empty</dcr:message>
        </dcr:exception>
        </D:error>
        

        How is a POST request supposed to work?

        Show
        Christian Stocker added a comment - so, some testing: curl http: //localhost:8080/server/ default /jcr:root/foo.0.json?:include=bar works (but doesn't return the /foo node), curl http: //localhost:8080/server/ default /jcr:root/.0.json?:include=/foo/bar as well But I can't get it to work with POST curl -d :include=bar http: //localhost:8081/server/ default /jcr:root/foo.0.json returns a <?xml version= "1.0" encoding= "UTF-8" standalone= "no" ?> <D:error xmlns:D= "DAV:" > <dcr:exception xmlns:dcr= "http: //www.day.com/jcr/webdav/1.0" > <dcr:class>org.apache.jackrabbit.spi.commons.conversion.IllegalNameException</dcr:class> <dcr:message>Prefix must not be empty</dcr:message> </dcr:exception> </D:error> How is a POST request supposed to work?
        Hide
        Jukka Zitting added a comment -

        Sorry for the delay. I'll try to look more into this over the next few days. Marking for 2.3.6 so we keep this on radar for the upcoming 2.4 release.

        Show
        Jukka Zitting added a comment - Sorry for the delay. I'll try to look more into this over the next few days. Marking for 2.3.6 so we keep this on radar for the upcoming 2.4 release.
        Hide
        Jukka Zitting added a comment -

        Attached a patch for enabling POST access for this feature. The patch also drops the conditional feature flag as I think this is now good enough for use as a stable API.

        Note that only the paths explicitly mentioned in the :include parameters (relative to the target resource of the URL) are included in the response, the target resource itself is by default not included.

        Currently only normal paths are supported as :include parameters. In a future version, depending on applicable use cases, we can extend this with support for path wildcards and accompanying :exclude patterns.

        Show
        Jukka Zitting added a comment - Attached a patch for enabling POST access for this feature. The patch also drops the conditional feature flag as I think this is now good enough for use as a stable API. Note that only the paths explicitly mentioned in the :include parameters (relative to the target resource of the URL) are included in the response, the target resource itself is by default not included. Currently only normal paths are supported as :include parameters. In a future version, depending on applicable use cases, we can extend this with support for path wildcards and accompanying :exclude patterns.
        Hide
        angela added a comment -

        looks fine. thanks

        Show
        angela added a comment - looks fine. thanks
        Hide
        Jukka Zitting added a comment -

        Thanks! Committed in revision 1221335.

        Show
        Jukka Zitting added a comment - Thanks! Committed in revision 1221335.

          People

          • Assignee:
            Jukka Zitting
            Reporter:
            Christian Stocker
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development