Uploaded image for project: 'Sling'
  1. Sling
  2. SLING-848

Support getting versioned resources by using uri path parameters

    Details

      Description

      Getting versioned content should be support thorough uri path parameters, like /something/hello;v=1.1
      For jcr based resources the value of the version should either point to a version name or label.

      In order to not change our existing apis, we introduce a new utility method which removes all uri path parameters
      and returns a map of these. Every resource provider could use this utility method and then decide to act on these
      parameters.
      If a requested version does not exists, a 404 is returned.
      If the requested node does not directly point to a versionable node, the algorithm checks the parent hierarchy until a versionable node is found, and tries to get the version of this node and then goes down the path again. If the versionable node does not have the requested version or the child, a 404 is returned.

        Issue Links

          Activity

          Hide
          tripod Tobias Bocanegra added a comment -

          you mean: something/hello.html;v=1.1 ?
          this will set a resource resolver option v=1.1 ?

          so in general, the resource resolver should have a new method:

          getResource(String path, Sring versionName)

          or should there be a 'VersionSelector' interface, that is resposible to select the proper version for a requested path?

          Show
          tripod Tobias Bocanegra added a comment - you mean: something/hello.html;v=1.1 ? this will set a resource resolver option v=1.1 ? so in general, the resource resolver should have a new method: getResource(String path, Sring versionName) or should there be a 'VersionSelector' interface, that is resposible to select the proper version for a requested path?
          Hide
          cziegeler Carsten Ziegeler added a comment -

          No, I don't want to change the existing API: the idea is to provide an utility method which can be used by all resource provider implementations.
          And change the jcr implementation to support this version handling.

          Show
          cziegeler Carsten Ziegeler added a comment - No, I don't want to change the existing API: the idea is to provide an utility method which can be used by all resource provider implementations. And change the jcr implementation to support this version handling.
          Hide
          andreas@apache.org Andreas Hartmann added a comment -

          A problem with predefined parameter names is that they restrict the available space for application parameter names. In Lenya, we prefix all framework-level request parameters with "lenya.", e.g. for revisions we use lenya.revision=xyz. I'm not sure if this also applies to URI path parameters, though.

          Show
          andreas@apache.org Andreas Hartmann added a comment - A problem with predefined parameter names is that they restrict the available space for application parameter names. In Lenya, we prefix all framework-level request parameters with "lenya.", e.g. for revisions we use lenya.revision=xyz. I'm not sure if this also applies to URI path parameters, though.
          Hide
          tripod Tobias Bocanegra added a comment -

          but do you want to add this to the resource resolver, or the request resolver?

          so should a ResourceResolver.getResource("/something/hello;v=1.1") return a different version?
          or should a request to "/something/hello.html;v=1.1" trigger a resource resolution to the respective version? (which eventually needs the former, too)

          Show
          tripod Tobias Bocanegra added a comment - but do you want to add this to the resource resolver, or the request resolver? so should a ResourceResolver.getResource("/something/hello;v=1.1") return a different version? or should a request to "/something/hello.html;v=1.1" trigger a resource resolution to the respective version? (which eventually needs the former, too)
          Hide
          cziegeler Carsten Ziegeler added a comment -

          @Andreas: I don't think that we have to prefix the uri path parameters. The parameter is used to get the corresponding representation of the resource and I think it's unlikely that someone else is using the same at the same time.

          @Tobias: The check is added to the jcr resource provider which provides the resource for a given path (being it the path from the http request or a path passed in to the resource resolver). So we can handle this transparently in the resource provider. Noone else is affected by this.

          Show
          cziegeler Carsten Ziegeler added a comment - @Andreas: I don't think that we have to prefix the uri path parameters. The parameter is used to get the corresponding representation of the resource and I think it's unlikely that someone else is using the same at the same time. @Tobias: The check is added to the jcr resource provider which provides the resource for a given path (being it the path from the http request or a path passed in to the resource resolver). So we can handle this transparently in the resource provider. Noone else is affected by this.
          Hide
          fmeschbe Felix Meschberger added a comment -

          The point of using the URI parameter notation is based on a discussion I once had with Roy Fielding about encoding the version of a JCR node in an URL. He pointed me to this feature, which is also described in Section 3.3, Path, of RFC 3986 [1].

          I think, this functionality is as simple as it is flexible.

          We also chose this option because we explicitly did not want to extend the API at this point in time.

          One thing, we might well consider in the future, is that we might add some thing like a VersionSelector to the ResourceResolver, which is used by the ResourceProviders to further select versions.

          One thing to not forget about is, that this ;v= functionality is not only supported for ResourceResolver.resolve but also for ResourceResolver.getResource. Consquently, the ResourceResolver.map should generate URI strings, which also contain the ;v= URI parameters in case the resource is based on a certain version (of a Node or similar).

          Maybe the Resource.getPath() should also include the ;v= parameter since it is transparently supported by the ResourceResolver and its ResourceProviders.

          Finally, the ResourceMetadata map should probably also contain the an entry with key "v" and the respective value.

          [1] http://www.faqs.org/rfcs/rfc3986.html

          Show
          fmeschbe Felix Meschberger added a comment - The point of using the URI parameter notation is based on a discussion I once had with Roy Fielding about encoding the version of a JCR node in an URL. He pointed me to this feature, which is also described in Section 3.3, Path, of RFC 3986 [1] . I think, this functionality is as simple as it is flexible. We also chose this option because we explicitly did not want to extend the API at this point in time. One thing, we might well consider in the future, is that we might add some thing like a VersionSelector to the ResourceResolver, which is used by the ResourceProviders to further select versions. One thing to not forget about is, that this ;v= functionality is not only supported for ResourceResolver.resolve but also for ResourceResolver.getResource. Consquently, the ResourceResolver.map should generate URI strings, which also contain the ;v= URI parameters in case the resource is based on a certain version (of a Node or similar). Maybe the Resource.getPath() should also include the ;v= parameter since it is transparently supported by the ResourceResolver and its ResourceProviders. Finally, the ResourceMetadata map should probably also contain the an entry with key "v" and the respective value. [1] http://www.faqs.org/rfcs/rfc3986.html
          Hide
          tripod Tobias Bocanegra added a comment - - edited

          so consequently (according to rfc3986) , the v= parameter should be applicable to each segment. eg:

          /content/home;v=1.0/tobi;v=1.5

          (assuming /content/home/tobi is currently deleted and did only exist in the version 1.0 of the 'home')

          what is the "external" form of this?, probably:

          /content/home;v=1.0/tobi.html;v=1.5

          having the .html at the end makes no sense.

          but this makes every script logic that just appends the .html to the resource path pretty useless.
          the problem you have is:

          • if you include the v= to the resource.getPath() the resource gets unusable for "normal" scripts,
          • if you omit it from the path, the getResource() / getPath() is not symmetric

          so maybe this is not such a good idea after all?

          btw: i think the "component" of this issue is wrong.

          Show
          tripod Tobias Bocanegra added a comment - - edited so consequently (according to rfc3986) , the v= parameter should be applicable to each segment. eg: /content/home;v=1.0/tobi;v=1.5 (assuming /content/home/tobi is currently deleted and did only exist in the version 1.0 of the 'home') what is the "external" form of this?, probably: /content/home;v=1.0/tobi.html;v=1.5 having the .html at the end makes no sense. but this makes every script logic that just appends the .html to the resource path pretty useless. the problem you have is: if you include the v= to the resource.getPath() the resource gets unusable for "normal" scripts, if you omit it from the path, the getResource() / getPath() is not symmetric so maybe this is not such a good idea after all? btw: i think the "component" of this issue is wrong.
          Hide
          fmeschbe Felix Meschberger added a comment - - edited

          Yes, according to rfc3986

          /content/home;v=1.0/tobi;v=1.5

          would be perfectly valid. But we won't support it, since this leads into areas which are too complicated to grasp and which can almost not be described. For this reason, we define, that the ;v= URI parameter is only recognized and used on the last segment.

          Now, for the location of the ;v= specification. The intened location is of course to be at the end, as in

          /content/home/tobi.html;v=1.5

          If we would have Resource.getPath() include the ;v= we would of course also have to support things like

          /content/home/tobi;v=1.5.selector.html

          You are right, that this is neither nice nore very logic, though not necessairily violating the spec.

          Maybe we should think about this some more.

          Show
          fmeschbe Felix Meschberger added a comment - - edited Yes, according to rfc3986 /content/home;v=1.0/tobi;v=1.5 would be perfectly valid. But we won't support it, since this leads into areas which are too complicated to grasp and which can almost not be described. For this reason, we define, that the ;v= URI parameter is only recognized and used on the last segment. Now, for the location of the ;v= specification. The intened location is of course to be at the end, as in /content/home/tobi.html;v=1.5 If we would have Resource.getPath() include the ;v= we would of course also have to support things like /content/home/tobi;v=1.5.selector.html You are right, that this is neither nice nore very logic, though not necessairily violating the spec. Maybe we should think about this some more.
          Hide
          bdelacretaz Bertrand Delacretaz added a comment -

          I think

          /content/home/tobi.html;v=1.5

          breaks existing logic, as IMHO programmers assume that things that affect the representation come at the end of the URL.

          I much prefer

          /content/home/tobi;v=1.5.selector.html

          Where the URI parameters (which affect the selected content) come first, and the selectors and extensions (which affect the representation) last.

          Also, why not change the API? If we start supporting URI parameters as specified, the least surprising way would be to add a (backwards-compatible) getURIParameters method to the SlingHttpServletRequest interface. Hacking the content path downstream feels wrong.

          Show
          bdelacretaz Bertrand Delacretaz added a comment - I think /content/home/tobi.html;v=1.5 breaks existing logic, as IMHO programmers assume that things that affect the representation come at the end of the URL. I much prefer /content/home/tobi;v=1.5.selector.html Where the URI parameters (which affect the selected content) come first, and the selectors and extensions (which affect the representation) last. Also, why not change the API? If we start supporting URI parameters as specified, the least surprising way would be to add a (backwards-compatible) getURIParameters method to the SlingHttpServletRequest interface. Hacking the content path downstream feels wrong.
          Hide
          marcfaindu Marc added a comment -

          Supporting versions is a great feature. Just for the record of future bug hunters: Mac OS 10.4 + HTTP Basic Authentication cannot handle these links [1]. BTW this was also one of the reasons Rails dropped using ; in URLs [2]. One of the other reasons was caching but I'm not sure if this still applies.

          [1] https://bugs.webkit.org/show_bug.cgi?id=10073
          [2] http://dev.rubyonrails.org/changeset/6485

          Show
          marcfaindu Marc added a comment - Supporting versions is a great feature. Just for the record of future bug hunters: Mac OS 10.4 + HTTP Basic Authentication cannot handle these links [1] . BTW this was also one of the reasons Rails dropped using ; in URLs [2] . One of the other reasons was caching but I'm not sure if this still applies. [1] https://bugs.webkit.org/show_bug.cgi?id=10073 [2] http://dev.rubyonrails.org/changeset/6485
          Hide
          tripod Tobias Bocanegra added a comment -

          that's was i feared. using the ; parameters is not widely used, and causes problems, as marc pointed out.

          i think it's ok to use the parameters for the resource resolver - but then you can also extend the api. i would avoid using them on the URI from the request, especially, because the generation of such links is a bit special. i'd rather see a "global" query parameter or cookie that controls the version selection.

          Show
          tripod Tobias Bocanegra added a comment - that's was i feared. using the ; parameters is not widely used, and causes problems, as marc pointed out. i think it's ok to use the parameters for the resource resolver - but then you can also extend the api. i would avoid using them on the URI from the request, especially, because the generation of such links is a bit special. i'd rather see a "global" query parameter or cookie that controls the version selection.
          Hide
          andreas@apache.org Andreas Hartmann added a comment -

          @Tobias:

          > i'd rather see a "global" query parameter or cookie that controls the version selection.

          you mean that the version selection would apply to the whole repository (from the point of view of the user session)? IMO this would be a major restriction; the use case "View revision x of a document" is provided by virtually all CMS/DMS applications, and the URI path parameter would make it very easy to implement this use case.

          Show
          andreas@apache.org Andreas Hartmann added a comment - @Tobias: > i'd rather see a "global" query parameter or cookie that controls the version selection. you mean that the version selection would apply to the whole repository (from the point of view of the user session)? IMO this would be a major restriction; the use case "View revision x of a document" is provided by virtually all CMS/DMS applications, and the URI path parameter would make it very easy to implement this use case.
          Hide
          bdelacretaz Bertrand Delacretaz added a comment -

          > i'd rather see a "global" query parameter or cookie that controls the version selection.

          Aaargh no! Calling the REST gods to the rescue

          If the semicolon is a problem in URLs (damn browsers), why not one of the following:

          content/home/tobi=v=1.5.selector.html
          content/home/tobi/sling:v1.5.selector.html
          content/home/tobi.selector.html?sling:v=1.5

          Show
          bdelacretaz Bertrand Delacretaz added a comment - > i'd rather see a "global" query parameter or cookie that controls the version selection. Aaargh no! Calling the REST gods to the rescue If the semicolon is a problem in URLs (damn browsers), why not one of the following: content/home/tobi=v=1.5.selector.html content/home/tobi/sling:v1.5.selector.html content/home/tobi.selector.html?sling:v=1.5
          Hide
          cziegeler Carsten Ziegeler added a comment -

          I don't think that the semicolon is really a problem in most browsers as the java session id is appended to the url in this way for years. If browsers have
          problems with this (as Marc reported) then it's most likely that a lot of web sites are not working in these browsers today, so I guess sooner or later
          these problems will be fixed or the users switch to other browsers.

          Show
          cziegeler Carsten Ziegeler added a comment - I don't think that the semicolon is really a problem in most browsers as the java session id is appended to the url in this way for years. If browsers have problems with this (as Marc reported) then it's most likely that a lot of web sites are not working in these browsers today, so I guess sooner or later these problems will be fixed or the users switch to other browsers.
          Hide
          tripod Tobias Bocanegra added a comment -

          > content/home/tobi.selector.html?sling:v=1.5
          that's what i meant.

          Show
          tripod Tobias Bocanegra added a comment - > content/home/tobi.selector.html?sling:v=1.5 that's what i meant.
          Hide
          cziegeler Carsten Ziegeler added a comment -

          Ok, as I really would like to get this feature implemented let's recap:

          • the uri parameter is a good way of expressing the requested version
          • we only allow one version to be specified, so the uri parameter can only be added once
          • the uri parameter should be added to the end of the resource path but before selectors and extensions to make it simple to generate urls (like resource.getPath() + ".html")
          • the above shows that resource.getPath() should include the version information (if available)

          This all leads us to urls, like /content/somewhere/over/the/rainbow[versionInfo][.selectorstring][.extension]
          As the version information can contain dots, a url like /mycontent;v=1.0.a.b.c.html is not easy to pass as the passing algorithm can't decide where the version information ends and where the selectors start. Escaping the dot is an ugly (and I think forbidden) solution, so we should just add the version information in quotes:
          /mycontent;v='1.0.a'.b.c.html

          The sling resource resolver will extract uri parameters from the uri and pass them to the resource providers in an additional parameter. Currently the resource providers just get a path; we will extend this to be a path and a map of all uri parameters.
          The map of uri parameters will later on also end up in the resource meta data.

          Does this make sense?

          Show
          cziegeler Carsten Ziegeler added a comment - Ok, as I really would like to get this feature implemented let's recap: the uri parameter is a good way of expressing the requested version we only allow one version to be specified, so the uri parameter can only be added once the uri parameter should be added to the end of the resource path but before selectors and extensions to make it simple to generate urls (like resource.getPath() + ".html") the above shows that resource.getPath() should include the version information (if available) This all leads us to urls, like /content/somewhere/over/the/rainbow [versionInfo] [.selectorstring] [.extension] As the version information can contain dots, a url like /mycontent;v=1.0.a.b.c.html is not easy to pass as the passing algorithm can't decide where the version information ends and where the selectors start. Escaping the dot is an ugly (and I think forbidden) solution, so we should just add the version information in quotes: /mycontent;v='1.0.a'.b.c.html The sling resource resolver will extract uri parameters from the uri and pass them to the resource providers in an additional parameter. Currently the resource providers just get a path; we will extend this to be a path and a map of all uri parameters. The map of uri parameters will later on also end up in the resource meta data. Does this make sense?
          Hide
          fmeschbe Felix Meschberger added a comment -

          +1 with enhancements:

          The URI parameter parser should also recognize this: /a/path.selector.html;v=5.0 (version at the end with or without quotes)

          and the ResourceResolver.map(String) method might want to convert something like

          /a/patch;v='5.0'.sel.html

          to

          /a/patch.sel.html;v=5.0

          for nicer optics.

          Show
          fmeschbe Felix Meschberger added a comment - +1 with enhancements: The URI parameter parser should also recognize this: /a/path.selector.html;v=5.0 (version at the end with or without quotes) and the ResourceResolver.map(String) method might want to convert something like /a/patch;v='5.0'.sel.html to /a/patch.sel.html;v=5.0 for nicer optics.
          Hide
          tripod Tobias Bocanegra added a comment -

          i'm still not convinced that this does not lead to severe inconsistencies in the way resources are used.

          to achieve better stability, i also suggest to

          • mark those resources clearly with an own interface, eg: VersionedResource
          • add Resource.getName() which returns the last segment of the path, without any parameter
          • versioned resource should be adaptable to javax.jcr.Version (although this is NOT the actual resource, i.e. frozen node)
          Show
          tripod Tobias Bocanegra added a comment - i'm still not convinced that this does not lead to severe inconsistencies in the way resources are used. to achieve better stability, i also suggest to mark those resources clearly with an own interface, eg: VersionedResource add Resource.getName() which returns the last segment of the path, without any parameter versioned resource should be adaptable to javax.jcr.Version (although this is NOT the actual resource, i.e. frozen node)
          Hide
          fmeschbe Felix Meschberger added a comment -

          VersionedResource: I dont think we need that. What we might want to have, though is a new ResourceMetadata property which provides the URI parameters. This may be a map with all the URI parameters.

          Resource.getName(): We have ResousourceUtil.getName(Resource) which is supposed to return the name of the resource. This must of course be extended to "cut off" the URI parameter from the path, too. Whether we really need a Resource.getName() method is not related to this issue – it might make sense from a usability POV

          Resource.adaptTo(Version.class): This would be Resource.adaptTo(Node.class).getBaseVersion(), right ?

          Show
          fmeschbe Felix Meschberger added a comment - VersionedResource: I dont think we need that. What we might want to have, though is a new ResourceMetadata property which provides the URI parameters. This may be a map with all the URI parameters. Resource.getName(): We have ResousourceUtil.getName(Resource) which is supposed to return the name of the resource. This must of course be extended to "cut off" the URI parameter from the path, too. Whether we really need a Resource.getName() method is not related to this issue – it might make sense from a usability POV Resource.adaptTo(Version.class): This would be Resource.adaptTo(Node.class).getBaseVersion(), right ?
          Hide
          tripod Tobias Bocanegra added a comment -

          > Resource.adaptTo(Version.class): This would be Resource.adaptTo(Node.class).getBaseVersion(), right ?
          no, the node addressed by the versioned path is the jcr:frozenNode of the version. so actually it would be:
          Resource.adaptTo(Node.class).getParent()

          Show
          tripod Tobias Bocanegra added a comment - > Resource.adaptTo(Version.class): This would be Resource.adaptTo(Node.class).getBaseVersion(), right ? no, the node addressed by the versioned path is the jcr:frozenNode of the version. so actually it would be: Resource.adaptTo(Node.class).getParent()
          Hide
          tomek.rekawek Tomek Rękawek added a comment -

          Hello after 5 years . I dug up this issue as it was mentioned in the recent discussion on sling-users [1] regarding support for the JCR versioning API in Sling. It's also related to SLING-4318.

          I prepared implementation of the feature described in this JIRA. It has been published as a GitHub pull request [2]. Please find description of the proposed changes below:

          Parameters support in ResourceResolver

          The first part of the implementation is the semicolon parameters support in ResourceResolver and related classes. Methods getResource() and resolveResource() are able to parse extract semicolon parameters, add them to ResourceMetadata and pass to ResourceProvider as a Map<String, String>. Following types of parameters are supported:

          • /content/test;v='1.0'.html (after resource path, before extension)
          • /content/test.html;v=1.0 (after extension, no need to escape dot with quotes)

          It's possible to use several parameters in one URI:

          • /content/test;v='1.0';x=abc.html (resulting map: {v: 1.0, x: abc})

          Parameters are removed from the path at an early stage of URI processing in ResourceResolverImpl, so they shouldn't affect further path decomposition. Unit test for the parser provides a complete specification [3].

          Parsing itself is done with two FSMs. The first one [4] extracts the whole parameter string from the URI so the second [5] can split parameters into a map. Besides from changes in o.a.s.resourceresolver bundle I had to make following changes in the Sling API:

          • there are new methods related to parameter map in ResourceMetadata,
          • ResourceProvider#getResource() method takes a new argument (map of the parameters).

          Versioning parameter in JcrResourceProvider

          Now we can get the parameter map in the JcrResourceProvider and find an appropriate version of the node. The lookup itself is done in JcrResourceProvider#createResource() method. Following rules apply:

          • /content/test;v='1.0' reads version 1.0 of the test node,
          • /content/test/jcr:title;v='1.0' (where jcr:title is a property) reads version 1.0 of the test node and returns its jcr:title,
          • /content/test/abc/xyz;v='1.0' (where abc and xyz are not versionable) traverses the tree up until it finds a versionable node (=test) with version 1.0. Then appropriate subnode (abc/xyz) of the given version is returned.
          • if there is no given version in versionable node or there is no versionable ancestor, null is returned.

          Because node passed to the JcrPropertyResource and JcrNodeResource are in fact frozenNodes from the version storage, there are some changes in these classes, related to getPath() method (we don't want to return /jcr:system/jcr:versionStorage/... as a path or version number as a name). As a result, path and name returned by the versioned resources have the ;v= parameter appended at the end.

          There is a module [6] containing integration tests for the new behaviour, working on a real JCR repository via sling-mock and sling-mock-jackrabbit. I had to update dependency versions in two last modules, so they are aligned to the latest Sling API.

          I'm looking forward to feedback from the Sling community.

          [1] http://thread.gmane.org/gmane.comp.apache.sling.user/1610
          [2] https://github.com/apache/sling/pull/57
          [3] o.a.s.resourceresolver.impl.tree.params.PathParametersParserTest and o.a.s.r.impl.MockedResourceResolverImplTest#testVersions()
          [4] o.a.s.resourceresolver.impl.tree.params.PathParser
          [5] o.a.s.resourceresolver.impl.tree.params.ParametersParser
          [6] o.a.s.jcr.repository.it-resource-versioning

          Show
          tomek.rekawek Tomek Rękawek added a comment - Hello after 5 years . I dug up this issue as it was mentioned in the recent discussion on sling-users [1] regarding support for the JCR versioning API in Sling. It's also related to SLING-4318 . I prepared implementation of the feature described in this JIRA. It has been published as a GitHub pull request [2]. Please find description of the proposed changes below: Parameters support in ResourceResolver The first part of the implementation is the semicolon parameters support in ResourceResolver and related classes. Methods getResource() and resolveResource() are able to parse extract semicolon parameters, add them to ResourceMetadata and pass to ResourceProvider as a Map<String, String>. Following types of parameters are supported: /content/test;v='1.0'.html (after resource path, before extension) /content/test.html;v=1.0 (after extension, no need to escape dot with quotes) It's possible to use several parameters in one URI: /content/test;v='1.0';x=abc.html (resulting map: {v: 1.0, x: abc}) Parameters are removed from the path at an early stage of URI processing in ResourceResolverImpl, so they shouldn't affect further path decomposition. Unit test for the parser provides a complete specification [3]. Parsing itself is done with two FSMs. The first one [4] extracts the whole parameter string from the URI so the second [5] can split parameters into a map. Besides from changes in o.a.s.resourceresolver bundle I had to make following changes in the Sling API: there are new methods related to parameter map in ResourceMetadata, ResourceProvider#getResource() method takes a new argument (map of the parameters). Versioning parameter in JcrResourceProvider Now we can get the parameter map in the JcrResourceProvider and find an appropriate version of the node. The lookup itself is done in JcrResourceProvider#createResource() method. Following rules apply: /content/test;v='1.0' reads version 1.0 of the test node, /content/test/jcr:title;v='1.0' (where jcr:title is a property) reads version 1.0 of the test node and returns its jcr:title, /content/test/abc/xyz;v='1.0' (where abc and xyz are not versionable) traverses the tree up until it finds a versionable node (=test) with version 1.0. Then appropriate subnode (abc/xyz) of the given version is returned. if there is no given version in versionable node or there is no versionable ancestor, null is returned. Because node passed to the JcrPropertyResource and JcrNodeResource are in fact frozenNodes from the version storage, there are some changes in these classes, related to getPath() method (we don't want to return /jcr:system/jcr:versionStorage/... as a path or version number as a name). As a result, path and name returned by the versioned resources have the ;v= parameter appended at the end. There is a module [6] containing integration tests for the new behaviour, working on a real JCR repository via sling-mock and sling-mock-jackrabbit. I had to update dependency versions in two last modules, so they are aligned to the latest Sling API. I'm looking forward to feedback from the Sling community. [1] http://thread.gmane.org/gmane.comp.apache.sling.user/1610 [2] https://github.com/apache/sling/pull/57 [3] o.a.s.resourceresolver.impl.tree.params.PathParametersParserTest and o.a.s.r.impl.MockedResourceResolverImplTest#testVersions() [4] o.a.s.resourceresolver.impl.tree.params.PathParser [5] o.a.s.resourceresolver.impl.tree.params.ParametersParser [6] o.a.s.jcr.repository.it-resource-versioning
          Hide
          cziegeler Carsten Ziegeler added a comment -

          I had a brief look at the patch and it adds a new marker interface to the ResourcerResolver (to make it parameterizable). I can't clearly see why this is needed - and if possible we should avoid such additions to the resource api.

          Show
          cziegeler Carsten Ziegeler added a comment - I had a brief look at the patch and it adds a new marker interface to the ResourcerResolver (to make it parameterizable). I can't clearly see why this is needed - and if possible we should avoid such additions to the resource api.
          Hide
          tomek.rekawek Tomek Rękawek added a comment -

          Actually, it adds a new marker interface to the ResourceProvider. I think it's compliant with the previous discussion:

          The sling resource resolver will extract uri parameters from the uri and pass them to the resource providers in an additional parameter. Currently the resource providers just get a path; we will extend this to be a path and a map of all uri parameters.
          The map of uri parameters will later on also end up in the resource meta data.

          In the first version I modified the ResourceProvider interface, but Bertrand pointed out that it would be incompatible with existing ResourceProviders. That's why there is the new ParametrizableResourceProvider.

          Show
          tomek.rekawek Tomek Rękawek added a comment - Actually, it adds a new marker interface to the ResourceProvider. I think it's compliant with the previous discussion: The sling resource resolver will extract uri parameters from the uri and pass them to the resource providers in an additional parameter. Currently the resource providers just get a path; we will extend this to be a path and a map of all uri parameters. The map of uri parameters will later on also end up in the resource meta data. In the first version I modified the ResourceProvider interface, but Bertrand pointed out that it would be incompatible with existing ResourceProviders. That's why there is the new ParametrizableResourceProvider.
          Hide
          cziegeler Carsten Ziegeler added a comment -

          Right, sorry, yes ResourceProvider it is

          What is currently in this map?

          Show
          cziegeler Carsten Ziegeler added a comment - Right, sorry, yes ResourceProvider it is What is currently in this map?
          Hide
          tomek.rekawek Tomek Rękawek added a comment -

          All parameters parsed from the URI. For the URI like

          /content/test;city=London;state=UK.html
          

          it'll contain two values:

          [city: London, state: UK]
          
          Show
          tomek.rekawek Tomek Rękawek added a comment - All parameters parsed from the URI. For the URI like /content/test;city=London;state=UK.html it'll contain two values: [city: London, state: UK]
          Hide
          cziegeler Carsten Ziegeler added a comment -

          Ok, thanks - I agree that passing this in the string will a) break existing providers and b) require reparsing in each provider.

          Show
          cziegeler Carsten Ziegeler added a comment - Ok, thanks - I agree that passing this in the string will a) break existing providers and b) require reparsing in each provider.
          Hide
          tomek.rekawek Tomek Rękawek added a comment -

          Patch merged in r1658445.

          Show
          tomek.rekawek Tomek Rękawek added a comment - Patch merged in r1658445.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user trekawek closed the pull request at:

          https://github.com/apache/sling/pull/57

          Show
          githubbot ASF GitHub Bot added a comment - Github user trekawek closed the pull request at: https://github.com/apache/sling/pull/57
          Hide
          cziegeler Carsten Ziegeler added a comment -

          Thanks for taking this up.

          I had a brief look at the applied code and I think the parameters are not added to the metadata in all cases, e.g. if
          getResource(Resource, String) is called, this calls getResourceInternal(String) which then calls getAbsoluteResourceInternal.

          In general I think we should add the parameters to the metadata in all places where we do a setResolutionPath or setResolutionPathInfo; but not in any other place, which I think there is one or two in the source.

          Show
          cziegeler Carsten Ziegeler added a comment - Thanks for taking this up. I had a brief look at the applied code and I think the parameters are not added to the metadata in all cases, e.g. if getResource(Resource, String) is called, this calls getResourceInternal(String) which then calls getAbsoluteResourceInternal. In general I think we should add the parameters to the metadata in all places where we do a setResolutionPath or setResolutionPathInfo; but not in any other place, which I think there is one or two in the source.
          Hide
          tomek.rekawek Tomek Rękawek added a comment - - edited

          Thanks for the feedback!

          I had a brief look at the applied code and I think the parameters are not added to the metadata in all cases, e.g. if getResource(Resource, String) is called, this calls getResourceInternal(String) which then calls getAbsoluteResourceInternal.

          You are right.

          In general I think we should add the parameters to the metadata in all places where we do a setResolutionPath or setResolutionPathInfo; but not in any other place, which I think there is one or two in the source.

          Good idea, it'll make the metadata-related code more consistent. Please find the patch attached (SLING-848-metadata.patch).

          Show
          tomek.rekawek Tomek Rękawek added a comment - - edited Thanks for the feedback! I had a brief look at the applied code and I think the parameters are not added to the metadata in all cases, e.g. if getResource(Resource, String) is called, this calls getResourceInternal(String) which then calls getAbsoluteResourceInternal. You are right. In general I think we should add the parameters to the metadata in all places where we do a setResolutionPath or setResolutionPathInfo; but not in any other place, which I think there is one or two in the source. Good idea, it'll make the metadata-related code more consistent. Please find the patch attached ( SLING-848 -metadata.patch).
          Hide
          cziegeler Carsten Ziegeler added a comment -

          Tomek Rękawek Patch lgtm

          Show
          cziegeler Carsten Ziegeler added a comment - Tomek Rękawek Patch lgtm
          Hide
          tomek.rekawek Tomek Rękawek added a comment -

          I'll merge it as soon as I get my account working (now id.apache.org claims there is no such user as tomekr).

          Show
          tomek.rekawek Tomek Rękawek added a comment - I'll merge it as soon as I get my account working (now id.apache.org claims there is no such user as tomekr).
          Hide
          sseifert@pro-vision.de Stefan Seifert added a comment -

          please note that in rev. 1660759 i reverted part of your change to sling mock in rev. 1659489, 1658445:

          • revert updating sling api/resource resolver/jcr resource dependenciens - using sling mock in a project should not enforce using the latest version of those
          • keep adding of getThreadResourceResolver() method, but throw UnsupportedOperationException

          did you have a specific reason why you updated those dependencies?

          i had to remove the line

                  bundleContext.registerService(PathMapper.class.getName(), new PathMapper(), null);
          

          in the mock JCR resource resolver factory as well, is it important, or can it be added in projects using the latest sling api as well.

          if we need to do special support for the latest apis we may have to branch the sling mock code base, but i want to delay this as long as possible to avoid maintaining multiple branches.

          Show
          sseifert@pro-vision.de Stefan Seifert added a comment - please note that in rev. 1660759 i reverted part of your change to sling mock in rev. 1659489, 1658445: revert updating sling api/resource resolver/jcr resource dependenciens - using sling mock in a project should not enforce using the latest version of those keep adding of getThreadResourceResolver() method, but throw UnsupportedOperationException did you have a specific reason why you updated those dependencies? i had to remove the line bundleContext.registerService(PathMapper.class.getName(), new PathMapper(), null ); in the mock JCR resource resolver factory as well, is it important, or can it be added in projects using the latest sling api as well. if we need to do special support for the latest apis we may have to branch the sling mock code base, but i want to delay this as long as possible to avoid maintaining multiple branches.
          Hide
          tomek.rekawek Tomek Rękawek added a comment -

          Hello Stefan,

          I'm sorry that I did these changes in the *mock modules without consulting you. Please find my reasons below:

          1. This SLING-848 issue changes a few important modules: Sling API, Resource resolver and JCR resource provider.
          2. I created an integration test it-resource-versioning
          3. In this test I wanted to use sling-mock (as I need to mock the resource resolver)
          4. I updated the Sling API, Resource resolver and JCR resource bundles in the sling-mock as I needed to have the new features from SLING-848 enabled in the mocked resolver.
          5. I added the empty getThreadResourceResolver() as it was required by the new Sling API (it wasn't my change, but simply a thing that happened between Sling API 2.4.0 and 2.8.1-SNAPSHOT)
          6. I needed the PathMapper for the similar reason - JcrResourceProviderFactory started referencing it in version 2.4.2

          Now I see that I can solve point 4 and 5 without a need to modify the sling-mock, simply overriding versions of appropriate bundles in my it-resource-versioning/pom.xml, which I did in rev. 1660831. I still need the PathMapper to have the new JcrResourceProviderFactory working.

          I made a workaround in rev. 1660831, copying the MockJcrResourceResolverFactory to my testing module and adding PathMapper there. I think it's the only solution without updating jcr resource dependency in sling-mock to 2.4.2.

          Show
          tomek.rekawek Tomek Rękawek added a comment - Hello Stefan, I'm sorry that I did these changes in the *mock modules without consulting you. Please find my reasons below: This SLING-848 issue changes a few important modules: Sling API, Resource resolver and JCR resource provider. I created an integration test it-resource-versioning In this test I wanted to use sling-mock (as I need to mock the resource resolver) I updated the Sling API, Resource resolver and JCR resource bundles in the sling-mock as I needed to have the new features from SLING-848 enabled in the mocked resolver. I added the empty getThreadResourceResolver() as it was required by the new Sling API (it wasn't my change, but simply a thing that happened between Sling API 2.4.0 and 2.8.1-SNAPSHOT) I needed the PathMapper for the similar reason - JcrResourceProviderFactory started referencing it in version 2.4.2 Now I see that I can solve point 4 and 5 without a need to modify the sling-mock, simply overriding versions of appropriate bundles in my it-resource-versioning/pom.xml, which I did in rev. 1660831. I still need the PathMapper to have the new JcrResourceProviderFactory working. I made a workaround in rev. 1660831, copying the MockJcrResourceResolverFactory to my testing module and adding PathMapper there. I think it's the only solution without updating jcr resource dependency in sling-mock to 2.4.2.
          Hide
          sseifert@pro-vision.de Stefan Seifert added a comment -

          no problem, it was just not obvious to me that this ticket related the sling mock as well.

          please create a new sling ticket concerning sling mock to think about a better solution for point 6 without the code duplication, and reference it with this ticket. i will have to think about it. perhaps we can make MockJcrResourceResolverFactory ab bit more pluggable for this. i fear this is not the last time we will face such issues as the referenced/mocked bundles itself are further developed.

          Show
          sseifert@pro-vision.de Stefan Seifert added a comment - no problem, it was just not obvious to me that this ticket related the sling mock as well. please create a new sling ticket concerning sling mock to think about a better solution for point 6 without the code duplication, and reference it with this ticket. i will have to think about it. perhaps we can make MockJcrResourceResolverFactory ab bit more pluggable for this. i fear this is not the last time we will face such issues as the referenced/mocked bundles itself are further developed.
          Hide
          tomek.rekawek Tomek Rękawek added a comment -

          I created SLING-4437.

          Show
          tomek.rekawek Tomek Rękawek added a comment - I created SLING-4437 .
          Hide
          cziegeler Carsten Ziegeler added a comment -

          Tomek Rękawek Are there any http based tests for this?

          Show
          cziegeler Carsten Ziegeler added a comment - Tomek Rękawek Are there any http based tests for this?
          Hide
          tomek.rekawek Tomek Rękawek added a comment -

          I created a simple http test in the SLING-4804. It helped me discovering a bug, also fixed in the SLING-4804. The patch modifies the Sling Engine - it would be great if you can take a look on it.

          Show
          tomek.rekawek Tomek Rękawek added a comment - I created a simple http test in the SLING-4804 . It helped me discovering a bug, also fixed in the SLING-4804 . The patch modifies the Sling Engine - it would be great if you can take a look on it.
          Hide
          alexander.klimetschek Alexander Klimetschek added a comment - - edited

          Tomek Rękawek Carsten Ziegeler

          This change looks like it makes it impossible to read resources containing a ";" in their name via getResource(path). In case of the JCR resource provider, these can be created via the JCR API.

          This is because a path like /content/something;else will be parsed as /content/something in getResource(path) (within getResourceInternal()) and thus not found. I am using 1.5.22 of org.apache.sling.resourceresolver.

          This creates a problem for application code creating resources through another API (such as JCR) that is not aware of this semantic, or for existing content that includes ";" in resource names. We (Adobe AEM) have a lot of code that works like this and now fails if a ";" is included in the path:

          Node node = parent.addNode("something;else");
          Resource resource = resourceResolver.getResource(node.getPath());
          

          I wonder if all Resource creation APIs and utilities within Sling are also adapted consistently at this point?

          IIUC, this type of parameter parsing for URLs should only be done in resolve(), but not in getResource()?

          Show
          alexander.klimetschek Alexander Klimetschek added a comment - - edited Tomek Rękawek Carsten Ziegeler This change looks like it makes it impossible to read resources containing a ";" in their name via getResource(path). In case of the JCR resource provider, these can be created via the JCR API. This is because a path like /content/something;else will be parsed as /content/something in getResource(path) (within getResourceInternal()) and thus not found. I am using 1.5.22 of org.apache.sling.resourceresolver. This creates a problem for application code creating resources through another API (such as JCR) that is not aware of this semantic, or for existing content that includes ";" in resource names. We (Adobe AEM) have a lot of code that works like this and now fails if a ";" is included in the path: Node node = parent.addNode( "something; else " ); Resource resource = resourceResolver.getResource(node.getPath()); I wonder if all Resource creation APIs and utilities within Sling are also adapted consistently at this point? IIUC, this type of parameter parsing for URLs should only be done in resolve(), but not in getResource()?
          Hide
          reschke Julian Reschke added a comment -

          Out of curiosity: will percent-escaping the semicolon disable the special handling?

          Show
          reschke Julian Reschke added a comment - Out of curiosity: will percent-escaping the semicolon disable the special handling?
          Hide
          tomek.rekawek Tomek Rękawek added a comment -

          Alexander Klimetschek, according to the discussion in this JIRA, the semicolon parameters are supported both in the getResource() and resolve() methods (see ResourceVersioningTest.java).

          I can see 3 options here:

          • disable the parameters support in getResource(),
          • make the whole parameter support configurable,
          • make the getResource() parameters support configurable.
          Show
          tomek.rekawek Tomek Rękawek added a comment - Alexander Klimetschek , according to the discussion in this JIRA, the semicolon parameters are supported both in the getResource() and resolve() methods (see ResourceVersioningTest.java ). I can see 3 options here: disable the parameters support in getResource(), make the whole parameter support configurable, make the getResource() parameters support configurable.
          Hide
          cziegeler Carsten Ziegeler added a comment - - edited

          I don't think a configuration would work as this means its either on or off, so as soon as you have both use cases, you're in a dead end. But I would like to understand the problem space a little bit more.
          If your resource path is using a semicolon and you're using this in a url, you have to encode it - this has always been the case as the semicolon denotes uri parameters and these are stripped off by servlet engines like jetty. With the implementation of this issue, we just use the uri parameter to get the version, so we're using additional information.
          When implementing this additional support we decided to support the same syntax in getResource() to allow to get a version of a resource. Otherwise there would be no way to specify this.
          Now, if we revert something we did for this issue, we break other peoples code using exactly that feature.
          On the other hand, getResource() should not have magic, as it should use the path as is.
          Maybe there is any easy way out, the paths we care for are of this sort:
          /content/test;v='1.0'
          so there is a name and a value.
          Based on that , we could change getResource into:

          • no semicolon : use path as is
          • semicolon but no equals sign in uri parameter, use path as is
          • semicolon and equals sign, use as (version) parameter

          While this is maybe not the easiest to understand, it will implement both use cases without a configuation.
          And as the quick check is checking for a semicolon, there shouldn't be a performance penality

          And potentially we should implement the same logic in resolve()

          Show
          cziegeler Carsten Ziegeler added a comment - - edited I don't think a configuration would work as this means its either on or off, so as soon as you have both use cases, you're in a dead end. But I would like to understand the problem space a little bit more. If your resource path is using a semicolon and you're using this in a url, you have to encode it - this has always been the case as the semicolon denotes uri parameters and these are stripped off by servlet engines like jetty. With the implementation of this issue, we just use the uri parameter to get the version, so we're using additional information. When implementing this additional support we decided to support the same syntax in getResource() to allow to get a version of a resource. Otherwise there would be no way to specify this. Now, if we revert something we did for this issue, we break other peoples code using exactly that feature. On the other hand, getResource() should not have magic, as it should use the path as is. Maybe there is any easy way out, the paths we care for are of this sort: /content/test;v='1.0' so there is a name and a value. Based on that , we could change getResource into: no semicolon : use path as is semicolon but no equals sign in uri parameter, use path as is semicolon and equals sign, use as (version) parameter While this is maybe not the easiest to understand, it will implement both use cases without a configuation. And as the quick check is checking for a semicolon, there shouldn't be a performance penality And potentially we should implement the same logic in resolve()
          Hide
          tomek.rekawek Tomek Rękawek added a comment -

          I like the equals sign idea.

          Actually, it works like this already Take a look on the PathParserTest.

          Show
          tomek.rekawek Tomek Rękawek added a comment - I like the equals sign idea. Actually, it works like this already Take a look on the PathParserTest .
          Hide
          reschke Julian Reschke added a comment -

          Making the semantics of ";" depend on whether it is followed by an equals sign IMHO is a hack. It just makes what happens even more obscure.

          Show
          reschke Julian Reschke added a comment - Making the semantics of ";" depend on whether it is followed by an equals sign IMHO is a hack. It just makes what happens even more obscure.
          Hide
          tomek.rekawek Tomek Rękawek added a comment -

          Making the semantics of ";" depend on whether it is followed by an equals sign IMHO is a hack. It just makes what happens even more obscure.

          I'm not sure about this. The URI parameter format is defined by the RFC. If an URI contains something that doesn't match the specification, it shouldn't be treated as a parameter.

          Show
          tomek.rekawek Tomek Rękawek added a comment - Making the semantics of ";" depend on whether it is followed by an equals sign IMHO is a hack. It just makes what happens even more obscure. I'm not sure about this. The URI parameter format is defined by the RFC. If an URI contains something that doesn't match the specification, it shouldn't be treated as a parameter.
          Hide
          reschke Julian Reschke added a comment -

          The URI parameter format is defined by the RFC.

          Source, please. All I can see in RFC 3986 is that ";" is in the reserved set.

          Show
          reschke Julian Reschke added a comment - The URI parameter format is defined by the RFC. Source, please. All I can see in RFC 3986 is that ";" is in the reserved set.
          Hide
          cziegeler Carsten Ziegeler added a comment -

          Julian Reschke While I agree that the proposal is not the nicest solution (I wrote that above already), we have to be a little bit pragmatic here and not break people's code again.
          If there is a better alternative, I'm all ears.

          Show
          cziegeler Carsten Ziegeler added a comment - Julian Reschke While I agree that the proposal is not the nicest solution (I wrote that above already), we have to be a little bit pragmatic here and not break people's code again. If there is a better alternative, I'm all ears.
          Hide
          tomek.rekawek Tomek Rękawek added a comment -

          Julian Reschke

          Source, please. All I can see in RFC 3986 is that ";" is in the reserved set.

          Sorry, you're right, RFC doesn't explicitly define this form of ;key=val parameters, it's up to the application developers. Nevertheless, we've defined such format in this JIRA.

          we have to be a little bit pragmatic here and not break people's code again.

          Please notice that the parameters have worked in this way (looking for the equals sign or passing the path as-is if it can't be found) for two years now, since this issue has been resolved.

          I think that the Alex'es example (parent.addNode("something;else")) is not exact - it should include the equals sign somewhere. Alexander Klimetschek - could you confirm if the example you've provided is really a way to reproduce this issue?

          Show
          tomek.rekawek Tomek Rękawek added a comment - Julian Reschke Source, please. All I can see in RFC 3986 is that ";" is in the reserved set. Sorry, you're right, RFC doesn't explicitly define this form of ;key=val parameters, it's up to the application developers. Nevertheless, we've defined such format in this JIRA. we have to be a little bit pragmatic here and not break people's code again. Please notice that the parameters have worked in this way (looking for the equals sign or passing the path as-is if it can't be found) for two years now, since this issue has been resolved. I think that the Alex'es example ( parent.addNode("something;else") ) is not exact - it should include the equals sign somewhere. Alexander Klimetschek - could you confirm if the example you've provided is really a way to reproduce this issue?
          Hide
          alexander.klimetschek Alexander Klimetschek added a comment - - edited

          The original filename that triggered this issue is actually _qwertyuiopasdfghjklzxcvbnm,.;'`1234567890-=~!@$^&(1)_+'.jpg. It does have an equals sign later in the string.

          I could then also reproduce it with something;one=two.

          It is not reproduceable with something;else.

          Agree with Carsten, OSGi configuration is not solving the issue. The Resource API is typically used by many different application parts that wouldn't be able to decide on a common configuration value.

          Two ideas from my side:

          1.
          It seems to me only resolve() needs to handle these parameters with the path/URL, as it is designed for passing through the raw URL path from a servlet for example. A client using getResource() OTOH is programmatic and "knows" what it's doing and could pass the parameters explicitly in a map, as proposed early above in this issue by having an extra getResource(String path, String versionName) or more generic getResource(String path, Map parameters).

          The only downside I can see is that adding a new method to the ResourceResolver for this purpose might not be desirable to keep the interface simple to understand for people new to Sling (and the resolve vs. getResource difference is already a small trap). Furthermore, code that retrieves versioned resources and relies on getResource() (can someone point me to an example?) would have to switch - either to resolve() or the new getResource(String, Map). This raises the point of backwards compatibility, but I think it makes sense to address the first regression (altering of the valid namespace).

          And one problem we'd have to test: once this resource has been created, it might be part of a URL and run as is through resolve(), and that should work as well. Not sure if it would do...

          2.
          Maybe simply try to retrieve the resource with the full path if the cut off path (if parameters were found) doesn't return a resource.

          Not sure if that creates issues with different permissions and the cut off path is not found due to missing read permission. A getOrCreate style method might run into this, but likely would not be able to create the full path variant then.

          Show
          alexander.klimetschek Alexander Klimetschek added a comment - - edited The original filename that triggered this issue is actually _qwertyuiopasdfghjklzxcvbnm,.;'`1234567890-=~!@$^&(1)_+'.jpg . It does have an equals sign later in the string. I could then also reproduce it with something;one=two . It is not reproduceable with something;else . Agree with Carsten, OSGi configuration is not solving the issue. The Resource API is typically used by many different application parts that wouldn't be able to decide on a common configuration value. Two ideas from my side: 1. It seems to me only resolve() needs to handle these parameters with the path/URL, as it is designed for passing through the raw URL path from a servlet for example. A client using getResource() OTOH is programmatic and "knows" what it's doing and could pass the parameters explicitly in a map, as proposed early above in this issue by having an extra getResource(String path, String versionName) or more generic getResource(String path, Map parameters) . The only downside I can see is that adding a new method to the ResourceResolver for this purpose might not be desirable to keep the interface simple to understand for people new to Sling (and the resolve vs. getResource difference is already a small trap). Furthermore, code that retrieves versioned resources and relies on getResource() (can someone point me to an example?) would have to switch - either to resolve() or the new getResource(String, Map) . This raises the point of backwards compatibility, but I think it makes sense to address the first regression (altering of the valid namespace). And one problem we'd have to test: once this resource has been created, it might be part of a URL and run as is through resolve() , and that should work as well. Not sure if it would do... 2. Maybe simply try to retrieve the resource with the full path if the cut off path (if parameters were found) doesn't return a resource. Not sure if that creates issues with different permissions and the cut off path is not found due to missing read permission. A getOrCreate style method might run into this, but likely would not be able to create the full path variant then.
          Hide
          cziegeler Carsten Ziegeler added a comment -

          I personally think it is very unlike that one really has a node in the repository which contains both a semicolon and an equals sign..this sounds like a really terrible name to me. Looking at the quoted example, this really looks like a test string. So I'm wondering how likely this is to happen in real world.

          A new method just for the use case to pass in the version info has been discarded before and as you note, adding it now, doesn't really help as it will break current usage.

          I was thinking about your second solution as well, not sure if it is a good idea.

          But before we continue, can someone verify, that using a semicolon and equals sign in a url really works?

          Show
          cziegeler Carsten Ziegeler added a comment - I personally think it is very unlike that one really has a node in the repository which contains both a semicolon and an equals sign..this sounds like a really terrible name to me. Looking at the quoted example, this really looks like a test string. So I'm wondering how likely this is to happen in real world. A new method just for the use case to pass in the version info has been discarded before and as you note, adding it now, doesn't really help as it will break current usage. I was thinking about your second solution as well, not sure if it is a good idea. But before we continue, can someone verify, that using a semicolon and equals sign in a url really works?
          Hide
          fielding Roy T. Fielding added a comment -

          Note that the semicolon-delimited syntax for path segment parameters was defined in RFC1808 because I wanted a reserved syntax to support versions. These parameters are supposed to be supplied at the end of a path segment, not arbitrarily in the middle. They are not specifically defined in RFC3986 because they are not needed for the generic syntax (scheme-independent parsing), but semicolons are reserved for this purpose. In other words, a path that contains a semicolon is supposed to be percent-encoded if it is not being used as a parameter delimiter. An "=" is not necessary (e.g., VMS files used a name;NN syntax).

          Of course, that is not going to help us for existing deployments if we are not automatically percent-encoding pathnames containing ";". We should have been doing that all along.

          I think the problem here is that the resolver is assuming that ";" indicates path parameters and then removes them from the path resolution. What it should be doing is either manage the segment names properly (meaning that a non-encoded ";" is impossible) or be smarter about matching the resource resolution so that ";" is allowable in unencoded form. IOW, what Alex said above.

          Also, filename extensions do not need to be at the end of path segments, and I certainly hope we aren't using single-quotes around parameter values because that isn't supported by any syntax.

          Show
          fielding Roy T. Fielding added a comment - Note that the semicolon-delimited syntax for path segment parameters was defined in RFC1808 because I wanted a reserved syntax to support versions. These parameters are supposed to be supplied at the end of a path segment, not arbitrarily in the middle. They are not specifically defined in RFC3986 because they are not needed for the generic syntax (scheme-independent parsing), but semicolons are reserved for this purpose. In other words, a path that contains a semicolon is supposed to be percent-encoded if it is not being used as a parameter delimiter. An "=" is not necessary (e.g., VMS files used a name;NN syntax). Of course, that is not going to help us for existing deployments if we are not automatically percent-encoding pathnames containing ";". We should have been doing that all along. I think the problem here is that the resolver is assuming that ";" indicates path parameters and then removes them from the path resolution. What it should be doing is either manage the segment names properly (meaning that a non-encoded ";" is impossible) or be smarter about matching the resource resolution so that ";" is allowable in unencoded form. IOW, what Alex said above. Also, filename extensions do not need to be at the end of path segments, and I certainly hope we aren't using single-quotes around parameter values because that isn't supported by any syntax.
          Hide
          tomek.rekawek Tomek Rękawek added a comment -

          The current state is as follows:

          1. Parameters has to be put at the end of the path

          Valid:

          /content/mysite/page;foo=bar
          

          Invalid:

          /content/mysite;foo=bar/page
          

          2. Parameters can be put before or after extension

          /content/mysite/page;foo=bar.html
          /content/mysite/page.html;foo=bar
          

          3. Single-quotes can be used to escape values

          /content/mysite/page;foo='bar'
          /content/mysite/page;foo=bar
          

          4. Single-quotes has to be used to escape values containing dots, if the parameters are specified before extension
          Valid:

          /content/mysite/page.html;v=1.0
          /content/mysite/page;v='1.0'.html
          

          Invalid:

          /content/mysite/page.html;v=1.0.html
          

          If I understand correctly, the rule 1 is OK, but 2, 3 and 4 controversial and Roy T. Fielding suggestion is to get rid of them. In other words:

          • parameters can only be put after extension,
          • single-quotes can't be used to escape the parameter value,
          • the path should be resolved in a "smart" way, so if there's a node matching the path (even containing parameters), it should be used directly.

          Is this right?

          Show
          tomek.rekawek Tomek Rękawek added a comment - The current state is as follows: 1. Parameters has to be put at the end of the path Valid: /content/mysite/page;foo=bar Invalid: /content/mysite;foo=bar/page 2. Parameters can be put before or after extension /content/mysite/page;foo=bar.html /content/mysite/page.html;foo=bar 3. Single-quotes can be used to escape values /content/mysite/page;foo='bar' /content/mysite/page;foo=bar 4. Single-quotes has to be used to escape values containing dots, if the parameters are specified before extension Valid: /content/mysite/page.html;v=1.0 /content/mysite/page;v='1.0'.html Invalid: /content/mysite/page.html;v=1.0.html If I understand correctly, the rule 1 is OK, but 2, 3 and 4 controversial and Roy T. Fielding suggestion is to get rid of them. In other words: parameters can only be put after extension, single-quotes can't be used to escape the parameter value, the path should be resolved in a "smart" way, so if there's a node matching the path (even containing parameters), it should be used directly. Is this right?
          Hide
          fielding Roy T. Fielding added a comment -

          To hopefully clarify, a path segment is each part between slash characters. I don't know if Sling has any use for parameters on hierarchical segments (e.g., versioned folders), but it would be odd for Sling to disallow them on parts of the URI it might not control.

          IOW, rule 1 might be OK but is overly restrictive for segments we don't own; 2 is just plain weird (if that is desired, use selectors instead of parameters); and both 3 and 4 are horribly ugly and unlikely to interop because many reference parsers will stop on single-quotes.

          Instead of single-quotes, it is better to append parameters and encode specific characters that we think might confuse data with delimiters. In general, URIs are meant to be LR parsed without any look-ahead for matching brackets/parens/quotes.

          Show
          fielding Roy T. Fielding added a comment - To hopefully clarify, a path segment is each part between slash characters. I don't know if Sling has any use for parameters on hierarchical segments (e.g., versioned folders), but it would be odd for Sling to disallow them on parts of the URI it might not control. IOW, rule 1 might be OK but is overly restrictive for segments we don't own; 2 is just plain weird (if that is desired, use selectors instead of parameters); and both 3 and 4 are horribly ugly and unlikely to interop because many reference parsers will stop on single-quotes. Instead of single-quotes, it is better to append parameters and encode specific characters that we think might confuse data with delimiters. In general, URIs are meant to be LR parsed without any look-ahead for matching brackets/parens/quotes.
          Hide
          alexander.klimetschek Alexander Klimetschek added a comment - - edited

          Carsten Ziegeler

          Looking at the quoted example, this really looks like a test string. So I'm wondering how likely this is to happen in real world.


          Yes, it's a test string, and I agree it should be pretty rare.

          However, my experience tells me that especially at such a fundamental layer as the resource API, rare or not rare is not something you can decide as the framework provider and it might even change over time. What if someone actually stores names with such parameters deliberately as resources for some meta-reason? There is also no other restriction that the resource API imposes on names (other than slashes as path delimiters) as far as I know, and this case seems not strong enough to start imposing a limit after the API has existed for ~7 years (before this change). Hence I think these names should be safely supported.

          Furthermore, it isn't even clearly documented anywhere, and as the discussion shows, we don't have an exact description of the format and what an "invalid" resource path would be, assuming nothing is changed.

          I would look at option 2 and see if that works (after the general questions are answered).

          Although I think it's still a problem of mixing URLs (where these parameters come in) and the resource namespace, which came in by sharing the same implementation (getResourceInternal) between resolve() and getResource(), while it should probably be separated, so the parameter parsing happens for resolve() only and other means given to retrieve things with parameter if you are a client using getResource() semantics.

          Show
          alexander.klimetschek Alexander Klimetschek added a comment - - edited Carsten Ziegeler Looking at the quoted example, this really looks like a test string. So I'm wondering how likely this is to happen in real world. Yes, it's a test string, and I agree it should be pretty rare. However , my experience tells me that especially at such a fundamental layer as the resource API, rare or not rare is not something you can decide as the framework provider and it might even change over time. What if someone actually stores names with such parameters deliberately as resources for some meta-reason? There is also no other restriction that the resource API imposes on names (other than slashes as path delimiters) as far as I know, and this case seems not strong enough to start imposing a limit after the API has existed for ~7 years (before this change). Hence I think these names should be safely supported. Furthermore, it isn't even clearly documented anywhere, and as the discussion shows, we don't have an exact description of the format and what an "invalid" resource path would be, assuming nothing is changed. I would look at option 2 and see if that works (after the general questions are answered). Although I think it's still a problem of mixing URLs (where these parameters come in) and the resource namespace, which came in by sharing the same implementation (getResourceInternal) between resolve() and getResource(), while it should probably be separated, so the parameter parsing happens for resolve() only and other means given to retrieve things with parameter if you are a client using getResource() semantics.

            People

            • Assignee:
              tomek.rekawek Tomek Rękawek
              Reporter:
              cziegeler Carsten Ziegeler
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development