Sling
  1. Sling
  2. SLING-249

Allow mapping nodes to internet domains

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: JCR Resource 2.0.2
    • Fix Version/s: JCR Resource 2.0.4
    • Component/s: JCR
    • Labels:
      None

      Description

      Sling should support hosting multiple domains, with different JCR roots.
      E.g.:
      http://www.domain1.com could map to /content/domain1.com
      http://www.domain2.com could map to /content/domain2.com

      While developing a website, the fully qualified domain might not be available. Ideally, the mapping could be configured in a flexible way. One option would be to maintain a set of regular expressions to match against URLs. Each regexp would then match to a path in the JCR.

        Issue Links

          Activity

          Hide
          Felix Meschberger added a comment -

          .... and when implementing we should not forget to also implement the backwards mapping to cut off any inserted path part in the ResourceResolver.map(String path) implementation.

          Show
          Felix Meschberger added a comment - .... and when implementing we should not forget to also implement the backwards mapping to cut off any inserted path part in the ResourceResolver.map(String path) implementation.
          Hide
          Bertrand Delacretaz added a comment -

          Can't you use mod_proxy for that, or do you have a use case that wouldn't be covered by mod_proxy?

          Show
          Bertrand Delacretaz added a comment - Can't you use mod_proxy for that, or do you have a use case that wouldn't be covered by mod_proxy?
          Hide
          Tobias Bocanegra added a comment -

          would it be sufficient to 'calculate' the mapping information per request and use this mapping for all subsequent resolves ?

          eg: for a request to: http://www.domain1.com/home.html

          the following mapping table could be created:

          / -> /content/www_domain1_com/
          apps -> /apps/www_domain1_com
          apps -> /apps

          Show
          Tobias Bocanegra added a comment - would it be sufficient to 'calculate' the mapping information per request and use this mapping for all subsequent resolves ? eg: for a request to: http://www.domain1.com/home.html the following mapping table could be created: / -> /content/www_domain1_com/ apps -> /apps/www_domain1_com apps -> /apps
          Hide
          Vidar S. Ramdal added a comment -

          I suppose mod_proxy would let me do this, but it requires yet another server. Having the functionality in Sling would be much more elegant.

          Show
          Vidar S. Ramdal added a comment - I suppose mod_proxy would let me do this, but it requires yet another server. Having the functionality in Sling would be much more elegant.
          Hide
          Vidar S. Ramdal added a comment -

          @Tobias Bocanegra:
          Yes, such a mapping looks good, but it still needs to be more flexible.
          E.g., you would want http://www.domain1.com and http://domain1.com to resolve to the same JCR node.

          Show
          Vidar S. Ramdal added a comment - @Tobias Bocanegra: Yes, such a mapping looks good, but it still needs to be more flexible. E.g., you would want http://www.domain1.com and http://domain1.com to resolve to the same JCR node.
          Hide
          eric marts added a comment -

          This is should be part of the request processing logic IMHO

          Show
          eric marts added a comment - This is should be part of the request processing logic IMHO
          Hide
          Vidar S. Ramdal added a comment -

          What are the odds for this making the 1.0 release?

          Show
          Vidar S. Ramdal added a comment - What are the odds for this making the 1.0 release?
          Hide
          Bertrand Delacretaz added a comment -

          I'm not planning to work on this, but if someone provides an implementation that has no negative impact on existing features (and includes automated tests , I'd be happy to integrate it.

          Show
          Bertrand Delacretaz added a comment - I'm not planning to work on this, but if someone provides an implementation that has no negative impact on existing features (and includes automated tests , I'd be happy to integrate it.
          Hide
          Felix Meschberger added a comment -

          I am now taking this issue up after it has been laying around for too long ...

          And here is my proposal:

          The JcrResourceResolver is extended to support path mappings considering the Host request header.

          • Configuration is provided to map the Host header to a resource prefix.
          • Additional configuration is added to define a default mapping.
          • The default configuration ignores the Host headers and uses the request URL unmodified for further mappings.

          In contrast to other mappings which just take the request URL into consideration, this configuration also takes a request header into consideration. Therefore this solution will only have an effect on the implementation of the ResourceResolver.resovle(HttpServletRequest) method. The implementation for the ResourceResolver.resolve(String) is not affected by this extension.

          Furthermore, the functionality of the ResourceResolver.map(String) method is not modified and this addition has no influence on this method which continues to return an (absolute) resource path. Instead we add a new API method ResourceResolver.map(HttpServletRequest, String) which returns an URL constructed from the request and resource path as follows:

          1. The resource path is first mapped by calling the ResourceResolver.map(String) method
          2. The resulting path is then applied to the reverse virtual host mapping.
          3. An URL is returned composed as follows:

          • Scheme from the request
          • Host from virtual host mapping, defaulting to Host header from request
          • Path consisting of context path and mapped resource path

          For consistency with the new map(HttpServletRequest, String) method (and also to provide more flexibility for resource resolution) the resolve(HttpServletRequest) method is deprecated in favor of a new resolve(HttpServletRequest, String) method. The deprecated method is defined such that implementations must call resolve(HttpServletRequest, String) where the string argument is the HttpServletRequest.getPathInfo() value.

          Show
          Felix Meschberger added a comment - I am now taking this issue up after it has been laying around for too long ... And here is my proposal: The JcrResourceResolver is extended to support path mappings considering the Host request header. Configuration is provided to map the Host header to a resource prefix. Additional configuration is added to define a default mapping. The default configuration ignores the Host headers and uses the request URL unmodified for further mappings. In contrast to other mappings which just take the request URL into consideration, this configuration also takes a request header into consideration. Therefore this solution will only have an effect on the implementation of the ResourceResolver.resovle(HttpServletRequest) method. The implementation for the ResourceResolver.resolve(String) is not affected by this extension. Furthermore, the functionality of the ResourceResolver.map(String) method is not modified and this addition has no influence on this method which continues to return an (absolute) resource path. Instead we add a new API method ResourceResolver.map(HttpServletRequest, String) which returns an URL constructed from the request and resource path as follows: 1. The resource path is first mapped by calling the ResourceResolver.map(String) method 2. The resulting path is then applied to the reverse virtual host mapping. 3. An URL is returned composed as follows: Scheme from the request Host from virtual host mapping, defaulting to Host header from request Path consisting of context path and mapped resource path For consistency with the new map(HttpServletRequest, String) method (and also to provide more flexibility for resource resolution) the resolve(HttpServletRequest) method is deprecated in favor of a new resolve(HttpServletRequest, String) method. The deprecated method is defined such that implementations must call resolve(HttpServletRequest, String) where the string argument is the HttpServletRequest.getPathInfo() value.
          Hide
          Vidar S. Ramdal added a comment -

          Great that you're taking this task

          What about resources that should be shared among domains?
          Let's say "domain root nodes" are stored under /content/domain1.com, /content/domain2.com etc.
          There might be some scripts, CSS, graphics etc that we want to be available for all hosts, let's say they're stored under /shared. When rendering HTML for the root node of http://domain1.com, we want to link to /shared/style.css. This means the web browser sends a request to the server, including the Host header - which will cause Sling to look for /content/domain1.com/shared/style.css - which does not exist.

          Show
          Vidar S. Ramdal added a comment - Great that you're taking this task What about resources that should be shared among domains? Let's say "domain root nodes" are stored under /content/domain1.com, /content/domain2.com etc. There might be some scripts, CSS, graphics etc that we want to be available for all hosts, let's say they're stored under /shared. When rendering HTML for the root node of http://domain1.com , we want to link to /shared/style.css. This means the web browser sends a request to the server, including the Host header - which will cause Sling to look for /content/domain1.com/shared/style.css - which does not exist.
          Hide
          Felix Meschberger added a comment -

          Good point. How about this: First look for the resource with the virtual host mapping applied. If not found, look for the resource without the virtual host mapping applied.

          Show
          Felix Meschberger added a comment - Good point. How about this: First look for the resource with the virtual host mapping applied. If not found, look for the resource without the virtual host mapping applied.
          Hide
          Vidar S. Ramdal added a comment -

          That will work.

          Another thing: At some point, you'll probably want multiple domains sharing the same content root. As mentioned in comment 12569799 above, you'll want "www.domain1.com" and "domain1.com" to refer to the same content. There might be other cases too, where the two domain names are lexically unrelated, but should refer to the same.

          So instead of just relying on a simple mapping, how about introducing a nodetype called "domainroot":
          [sling:domainRoot]
          mixin

          • sling:domains (string) multiple

          Then a domain root would be resolved by doing a query like //*[contains(sling:domains, request.getHeader("Host"))]

          Show
          Vidar S. Ramdal added a comment - That will work. Another thing: At some point, you'll probably want multiple domains sharing the same content root. As mentioned in comment 12569799 above, you'll want "www.domain1.com" and "domain1.com" to refer to the same content. There might be other cases too, where the two domain names are lexically unrelated, but should refer to the same. So instead of just relying on a simple mapping, how about introducing a nodetype called "domainroot": [sling:domainRoot] mixin sling:domains (string) multiple Then a domain root would be resolved by doing a query like //* [contains(sling:domains, request.getHeader("Host"))]
          Hide
          Felix Meschberger added a comment -

          My current approach is to add a mapping configuration like for the existing URL mapping.

          The basic form would be:

          www.domain1.com-/domain1
          otherdomain.com-/other

          This would map www.domain1.com to /domain and otherdomain to /other.

          We could also assume these mappings to be regular expressions, eg.

          (www.)?domain1.com-/domain1
          otherdomain.com-/other

          which would map both www.domain1.com and domain1.com to /domain1

          This would of course present some reversemapping issues in that it is unclear how to reverse map a resource /domain1/statics/site.css. So we would have

          (www.)?domain1.com>/domain1
          www.domain1.com</domain1
          otherdomain.com-/other

          Thus splitting incoming and outgoing mapping.

          Of course another approach without using regular expressions would be to use multiple entries mapping to the same root path:

          www.domain1.com-/domain1
          domain1.com-/domain1
          otherdomain.com-/other

          Here resolution of incoming requests would still resolve domain1.com and www.domain1.com to /domain1. But reverse mapping would map /domain1/statics/site.css to www.domain1.com/statics/site.css since the mappings are applied in a first-match-applies approach.

          WDYT ?

          Show
          Felix Meschberger added a comment - My current approach is to add a mapping configuration like for the existing URL mapping. The basic form would be: www.domain1.com-/domain1 otherdomain.com-/other This would map www.domain1.com to /domain and otherdomain to /other. We could also assume these mappings to be regular expressions, eg. (www.)?domain1.com-/domain1 otherdomain.com-/other which would map both www.domain1.com and domain1.com to /domain1 This would of course present some reversemapping issues in that it is unclear how to reverse map a resource /domain1/statics/site.css. So we would have (www.)?domain1.com>/domain1 www.domain1.com</domain1 otherdomain.com-/other Thus splitting incoming and outgoing mapping. Of course another approach without using regular expressions would be to use multiple entries mapping to the same root path: www.domain1.com-/domain1 domain1.com-/domain1 otherdomain.com-/other Here resolution of incoming requests would still resolve domain1.com and www.domain1.com to /domain1. But reverse mapping would map /domain1/statics/site.css to www.domain1.com/statics/site.css since the mappings are applied in a first-match-applies approach. WDYT ?
          Hide
          Bertrand Delacretaz added a comment -

          > how about introducing a nodetype called "domainroot":
          > [sling:domainRoot]
          > mixin
          > - sling:domains (string) multiple
          > Then a domain root would be resolved by doing a query like //*[contains(sling:domains, request.getHeader("Host"))]

          I like this idea, it is similar to how we treat the vanityUrl stuff - and as the domain mappings have to do with specific content trees, it makes perfect sense to me to have this info in the content as opposed to OSGi configs.

          It might seem more costly than a regexp-based configuration, but this domainRoot info is easy to observe and cache: the mapper does a query on sling:domainRoot when it starts, disables itself if not found, and observes that node type for future updates.

          Show
          Bertrand Delacretaz added a comment - > how about introducing a nodetype called "domainroot": > [sling:domainRoot] > mixin > - sling:domains (string) multiple > Then a domain root would be resolved by doing a query like //* [contains(sling:domains, request.getHeader("Host"))] I like this idea, it is similar to how we treat the vanityUrl stuff - and as the domain mappings have to do with specific content trees, it makes perfect sense to me to have this info in the content as opposed to OSGi configs. It might seem more costly than a regexp-based configuration, but this domainRoot info is easy to observe and cache: the mapper does a query on sling:domainRoot when it starts, disables itself if not found, and observes that node type for future updates.
          Hide
          Alexander Klimetschek added a comment -

          > how about introducing a nodetype called "domainroot":
          > [sling:domainRoot]

          +1

          Show
          Alexander Klimetschek added a comment - > how about introducing a nodetype called "domainroot": > [sling:domainRoot] +1
          Hide
          Vidar S. Ramdal added a comment -

          @Felix
          I hadn't thought of reverse mapping, and I'm not sure anymore if I see a case where regexps would be really useful. So I would prefer the multi-domain-to-one-node approach.

          Show
          Vidar S. Ramdal added a comment - @Felix I hadn't thought of reverse mapping, and I'm not sure anymore if I see a case where regexps would be really useful. So I would prefer the multi-domain-to-one-node approach.
          Hide
          Vidar S. Ramdal added a comment -

          @Bertrand

          I'm not sure about caching domainRoots. At least in our case, we need to add domains run-time. So at least we need a simple way to re-scan the domain info.

          Thinking about it, a query like //[contains(sling:domains, request.getHeader("Host"))] allows domain roots to be located all over the tree, at any level. We might not want domain roots located under other domain roots. Maybe we should specify that domain roots must be located directly under jcr:root (or another specified location) - in which case the query should be something like /[contains(sling:domains, request.getHeader("Host"))]

          Show
          Vidar S. Ramdal added a comment - @Bertrand I'm not sure about caching domainRoots. At least in our case, we need to add domains run-time. So at least we need a simple way to re-scan the domain info. Thinking about it, a query like // [contains(sling:domains, request.getHeader("Host"))] allows domain roots to be located all over the tree, at any level. We might not want domain roots located under other domain roots. Maybe we should specify that domain roots must be located directly under jcr:root (or another specified location) - in which case the query should be something like / [contains(sling:domains, request.getHeader("Host"))]
          Hide
          Bertrand Delacretaz added a comment -

          > I'm not sure about caching domainRoots. At least in our case, we need to add domains run-time.
          > So at least we need a simple way to re-scan the domain info.

          I think that would work with observation, the DomainRootMapperService would:

          1. Run your suggested query when it starts, to find out about all existing sling:domains nodes
          2. Build its mapping table from that info
          3. Observe the repository for any changes to sling:domains nodes
          4. When such changes occur, update the mapping table

          In this way changes are (almost) immediately taken into account, but you only run the query at startup, running it on every request might be expensive. And sling:domains nodes can be anywhere in the repository.

          Show
          Bertrand Delacretaz added a comment - > I'm not sure about caching domainRoots. At least in our case, we need to add domains run-time. > So at least we need a simple way to re-scan the domain info. I think that would work with observation, the DomainRootMapperService would: 1. Run your suggested query when it starts, to find out about all existing sling:domains nodes 2. Build its mapping table from that info 3. Observe the repository for any changes to sling:domains nodes 4. When such changes occur, update the mapping table In this way changes are (almost) immediately taken into account, but you only run the query at startup, running it on every request might be expensive. And sling:domains nodes can be anywhere in the repository.
          Hide
          Felix Meschberger added a comment -

          Caching and cleanup:
          Yes, we need a cache we use JCR Observation to manage the cache (that is simply clear it on changes)

          Queries:
          We will probably not use query, because the current query implementation is optimized at doing full text searches, which is not exactly what we would need to do. Therefore these queries will be expensive. Instead the resource tree is scanned for sling:virtualPath properties and caches them internally.

          Domain Roots:
          As explained in [1] the easiest way is to use the use the existing sling:vanityPath property for all needs and not introduce a new property.

          [1] http://markmail.org/message/aa3civ2m4ql3cnkx

          Show
          Felix Meschberger added a comment - Caching and cleanup: Yes, we need a cache we use JCR Observation to manage the cache (that is simply clear it on changes) Queries: We will probably not use query, because the current query implementation is optimized at doing full text searches, which is not exactly what we would need to do. Therefore these queries will be expensive. Instead the resource tree is scanned for sling:virtualPath properties and caches them internally. Domain Roots: As explained in [1] the easiest way is to use the use the existing sling:vanityPath property for all needs and not introduce a new property. [1] http://markmail.org/message/aa3civ2m4ql3cnkx
          Hide
          Bertrand Delacretaz added a comment -

          > We will probably not use query, because the current query implementation is optimized at doing
          > full text searches...
          >...Instead the resource tree is scanned for sling:virtualPath properties and caches them internally...

          A query like

          //element(*, sling:whateverMixinWeChoose)

          is certainly more expensive than scanning a few nodes that you know where to find, but do you really want to scan the entire repository at startup? That won't scale.

          As the query would run only on service startup (observation takes over after that), I don't see a problem with using a query.

          Show
          Bertrand Delacretaz added a comment - > We will probably not use query, because the current query implementation is optimized at doing > full text searches... >...Instead the resource tree is scanned for sling:virtualPath properties and caches them internally... A query like //element(*, sling:whateverMixinWeChoose) is certainly more expensive than scanning a few nodes that you know where to find, but do you really want to scan the entire repository at startup? That won't scale. As the query would run only on service startup (observation takes over after that), I don't see a problem with using a query.
          Hide
          Felix Meschberger added a comment -

          > //element(*, sling:whateverMixinWeChoose)

          The problem with this is (1) that it seems to be really expensive (ok running it once, might probably be ok) and (2) might block indefinitely because if at the same time a huge property is indexed, the index is locked and so are queries.

          We might of course start off with a query and then, as experience may tell, convert to something else.

          The advantage of a query is of course, that it is much simpler to use on our part and is much more expressive than scanning the repository.

          Show
          Felix Meschberger added a comment - > //element(*, sling:whateverMixinWeChoose) The problem with this is (1) that it seems to be really expensive (ok running it once, might probably be ok) and (2) might block indefinitely because if at the same time a huge property is indexed, the index is locked and so are queries. We might of course start off with a query and then, as experience may tell, convert to something else. The advantage of a query is of course, that it is much simpler to use on our part and is much more expressive than scanning the repository.
          Hide
          Felix Meschberger added a comment -

          In Rev. 720647 I implemented a first shot at a new JCR ResourceResolver called JcrResourceResolver2. In addition I added configuration to the JcrResourceResovlerFactoryImpl to select the new (default) or old resource resolver. This selection may also be made by setting the "resource.resolver.new" framework property (e.g. in sling.properties) to true (use new resolver) or false (use old resolver).

          This property and its corresponding configuration option is temporary and will be removed again as soon as the new resource resovler is acceptable and the old resource resolver is dropped.

          For more information on this new resource resolver, refer to the wiki page at http://cwiki.apache.org/SLING/flexible-resource-resolution.html

          Show
          Felix Meschberger added a comment - In Rev. 720647 I implemented a first shot at a new JCR ResourceResolver called JcrResourceResolver2. In addition I added configuration to the JcrResourceResovlerFactoryImpl to select the new (default) or old resource resolver. This selection may also be made by setting the "resource.resolver.new" framework property (e.g. in sling.properties) to true (use new resolver) or false (use old resolver). This property and its corresponding configuration option is temporary and will be removed again as soon as the new resource resovler is acceptable and the old resource resolver is dropped. For more information on this new resource resolver, refer to the wiki page at http://cwiki.apache.org/SLING/flexible-resource-resolution.html
          Hide
          Felix Meschberger added a comment -

          And here comes the new API of the ResourceResolver in Rev. 720676.

          Show
          Felix Meschberger added a comment - And here comes the new API of the ResourceResolver in Rev. 720676.
          Hide
          Felix Meschberger added a comment -

          My assumption is, that this issue is complete.

          Vidar, could you please confirm ? Thanks.

          Show
          Felix Meschberger added a comment - My assumption is, that this issue is complete. Vidar, could you please confirm ? Thanks.
          Hide
          Vidar S. Ramdal added a comment -

          Looks good, thanks!

          Show
          Vidar S. Ramdal added a comment - Looks good, thanks!
          Hide
          Felix Meschberger added a comment -

          wrong component

          Show
          Felix Meschberger added a comment - wrong component

            People

            • Assignee:
              Felix Meschberger
              Reporter:
              Vidar S. Ramdal
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development