Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.1.0
    • Component/s: search
    • Labels:
      None
    • Environment:

      Tested on macosx 10.4.7, JDK 1.5.0_06

      Description

      Patch that implements server-side XSLT transforms of query results.

      The filter is activated by using select/html instead of select/ to run queries, and optionally adding a transform parameter to specify the XSLT transform to use, for example:

      http://localhost:8983/solr/select/html?q=usage&transform=my.xsl

      In which case my.xsl should be found in solr/conf/xslt/ with the example configuration. The default transform (solr/conf/xslt/query-to-html.xsl) outputs a simplistic HTML format.

      Performance is suboptimal, as the filter reparses the XML output generated by Solr. Modifying the XMLWriter to output to a ContentHandler would be more efficient, but I didn't have time to go that far.

      The TransformerProvider trivially caches the last Transformer used, could be improved using an LRU cache of several transformers, I haven't checked if Solr's infrastructure contains such an animal already.

      The patch is all new files, except for adding this in web.xml before the first <servlet>:

      <filter>
      <filter-name>xslt</filter-name>
      <filter-class>org.apache.solr.xslt.XSLTServletFilter</filter-class>
      </filter>

      <!-- apply the XSLT filter when select/html is used to make queries -->
      <filter-mapping>
      <filter-name>xslt</filter-name>
      <url-pattern>/select/html/*</url-pattern>
      </filter-mapping>

      I've left the client-side XSLT stuff (stylesheet parameter) as is for the moment.

      1. SOLR-49.diff
        21 kB
        Hoss Man
      2. solr-XSLTResponseWriter-20061016.tar.gz
        5 kB
        Bertrand Delacretaz
      3. solr-XSLTResponseWriter-20060922.tar.gz
        4 kB
        Bertrand Delacretaz
      4. solr-XSLTResponseWriter-files.tar.gz
        2 kB
        Bertrand Delacretaz
      5. xslt-filter-files.tar.gz
        3 kB
        Bertrand Delacretaz

        Activity

        Hide
        Bertrand Delacretaz added a comment -

        New files

        Show
        Bertrand Delacretaz added a comment - New files
        Hide
        Yonik Seeley added a comment -

        Thanks Bertrand, I've not used servlet filters before.

        What do people think the tradeoffs are of using a different url "/select/html" vs a different response writer "wt=html&wt.xslt=..."
        ?

        Anyone else have opinions on if this should be committed?

        Show
        Yonik Seeley added a comment - Thanks Bertrand, I've not used servlet filters before. What do people think the tradeoffs are of using a different url "/select/html" vs a different response writer "wt=html&wt.xslt=..." ? Anyone else have opinions on if this should be committed?
        Hide
        Bertrand Delacretaz added a comment -

        In retrospect I think a different response writer is more consistent with the way other output formats are generated, shouldn't be hard to implement that way.

        The content-type should also be selectable by a request parameter, with text/html as the default I guess.

        Show
        Bertrand Delacretaz added a comment - In retrospect I think a different response writer is more consistent with the way other output formats are generated, shouldn't be hard to implement that way. The content-type should also be selectable by a request parameter, with text/html as the default I guess.
        Hide
        Bertrand Delacretaz added a comment -

        Here's a new patch, I've reworked the code into an XSLTResponseWriter.

        Must be configured like this in solrconfig.xml:

        <queryResponseWriter name="xslt" class="org.apache.solr.request.XSLTResponseWriter"/>

        And the following request parameters activate it:

        wt = xslt
        tr = my-xslt-transform.xsl
        ct = text/html (which is the default value)

        (don't you love terse param names

        The XSLT transform is read using SolrConfig.config.openResource(...), so it must be available in the solr/conf directory when running the Solr example config.

        Show
        Bertrand Delacretaz added a comment - Here's a new patch, I've reworked the code into an XSLTResponseWriter. Must be configured like this in solrconfig.xml: <queryResponseWriter name="xslt" class="org.apache.solr.request.XSLTResponseWriter"/> And the following request parameters activate it: wt = xslt tr = my-xslt-transform.xsl ct = text/html (which is the default value) (don't you love terse param names The XSLT transform is read using SolrConfig.config.openResource(...), so it must be available in the solr/conf directory when running the Solr example config.
        Hide
        Bertrand Delacretaz added a comment -

        Here's yet another version which takes the Content-Type from the XSLT transform.

        The code and these instructions replace the previous versions:

        Must be configured like this in solrconfig.xml:

        <!--
        XSLT response writer (SOLR-49)
        Changes to XSLT transforms are taken into account every xsltCacheLifetimeSeconds at most.
        -->
        <queryResponseWriter
        name="xslt"
        class="org.apache.solr.request.XSLTResponseWriter"
        xsltCacheLifetimeSeconds="5"
        />

        The following request parameters activate the XSLTResponseWriter:

        wt = xslt
        tr = my-xslt-transform.xsl

        The Content-Type comes from the xsl:output element of the XSLT transform:

        <xsl:output media-type="text/html"/>

        And finally, the TransformerProvider warns about the possible performance implications of its simplistic cache, when first used:

        ATTENTION: The TransformerProvider's simplistic XSLT caching mechanism is
        not appropriate for high load scenarios, unless a single XSLT transform is used
        and xsltCacheLifetimeSeconds is set to a sufficiently high value.

        Show
        Bertrand Delacretaz added a comment - Here's yet another version which takes the Content-Type from the XSLT transform. The code and these instructions replace the previous versions: Must be configured like this in solrconfig.xml: <!-- XSLT response writer ( SOLR-49 ) Changes to XSLT transforms are taken into account every xsltCacheLifetimeSeconds at most. --> <queryResponseWriter name="xslt" class="org.apache.solr.request.XSLTResponseWriter" xsltCacheLifetimeSeconds="5" /> The following request parameters activate the XSLTResponseWriter: wt = xslt tr = my-xslt-transform.xsl The Content-Type comes from the xsl:output element of the XSLT transform: <xsl:output media-type="text/html"/> And finally, the TransformerProvider warns about the possible performance implications of its simplistic cache, when first used: ATTENTION: The TransformerProvider's simplistic XSLT caching mechanism is not appropriate for high load scenarios, unless a single XSLT transform is used and xsltCacheLifetimeSeconds is set to a sufficiently high value.
        Hide
        Bertrand Delacretaz added a comment -

        Here's the latest incarnation, using a more Solrish way of initializing the response writer.

        The solr-XSLTResponseWriter-20061016.tar.gz attachment replaces all the previous patches.

        I have added an init(NamedList args) method to the QueryResponseWriter (solr-49.patch in the attached file), which means that the solrconfig.xml part has changed:

        <!--
        XSLT response writer (SOLR-49)
        Changes to XSLT transforms are taken into account every xsltCacheLifetimeSeconds at most.
        -->
        <queryResponseWriter
        name="xslt"
        class="org.apache.solr.request.XSLTResponseWriter"
        >
        <int name="xsltCacheLifetimeSeconds">5</int>
        </queryResponseWriter>

        Apart from that, the patch works as indicated in my previous comment.

        Show
        Bertrand Delacretaz added a comment - Here's the latest incarnation, using a more Solrish way of initializing the response writer. The solr-XSLTResponseWriter-20061016.tar.gz attachment replaces all the previous patches. I have added an init(NamedList args) method to the QueryResponseWriter (solr-49.patch in the attached file), which means that the solrconfig.xml part has changed: <!-- XSLT response writer ( SOLR-49 ) Changes to XSLT transforms are taken into account every xsltCacheLifetimeSeconds at most. --> <queryResponseWriter name="xslt" class="org.apache.solr.request.XSLTResponseWriter" > <int name="xsltCacheLifetimeSeconds">5</int> </queryResponseWriter> Apart from that, the patch works as indicated in my previous comment.
        Hide
        Hoss Man added a comment -

        I'm going to try and review this again today.

        Show
        Hoss Man added a comment - I'm going to try and review this again today.
        Hide
        Hoss Man added a comment -

        SOLR-49.diff is solr-XSLTResponseWriter-20061016.tar.gz in svn patch form with a few small tweaks...

        1) I removed the javadocs on the init() method in each of the concrete QueryResponseWriter classes so they would inherit the interface docs.
        2) I added a small unit test to demonstrate that the stylesheet was being applied

        I think this is commitable as is, but one small thing occured to me that i wanted to get concensus on first: right now this can be used to expose any file in the $

        {solr.home}/conf by trying to use it as an XSLT ... should it respect the
        <gettableFiles> directive in the solrconfig.xml – which might be anoying since it requires explicitly listing each file, or should we change this to only look at files in some new ${solr.home}

        /xslt (or $

        {solr.home}

        /conf/xslt) directory?

        another minor nit: query-to-html.xsl seems like it would render the "query" as html, not the results of the query ... maybe we should just call it "example.xsl" ?

        Show
        Hoss Man added a comment - SOLR-49 .diff is solr-XSLTResponseWriter-20061016.tar.gz in svn patch form with a few small tweaks... 1) I removed the javadocs on the init() method in each of the concrete QueryResponseWriter classes so they would inherit the interface docs. 2) I added a small unit test to demonstrate that the stylesheet was being applied I think this is commitable as is, but one small thing occured to me that i wanted to get concensus on first: right now this can be used to expose any file in the $ {solr.home}/conf by trying to use it as an XSLT ... should it respect the <gettableFiles> directive in the solrconfig.xml – which might be anoying since it requires explicitly listing each file, or should we change this to only look at files in some new ${solr.home} /xslt (or $ {solr.home} /conf/xslt) directory? another minor nit: query-to-html.xsl seems like it would render the "query" as html, not the results of the query ... maybe we should just call it "example.xsl" ?
        Hide
        Yonik Seeley added a comment -

        > right now this can be used to expose any file in the $

        {solr.home}

        /conf by trying to use it as an XSLT

        It's good that you noted it, but I think it's fine for now:

        • conf doesn't contain personal data like logs could...
        • Solr is a backend system with all the doors left wide open... IMO, locking a window isn't currently worth the effort, esp if it makes it harder to use.
        Show
        Yonik Seeley added a comment - > right now this can be used to expose any file in the $ {solr.home} /conf by trying to use it as an XSLT It's good that you noted it, but I think it's fine for now: conf doesn't contain personal data like logs could... Solr is a backend system with all the doors left wide open... IMO, locking a window isn't currently worth the effort, esp if it makes it harder to use.
        Hide
        Bertrand Delacretaz added a comment -

        I'm with Yonik on the security thing, Solr is wide open as is anyway.

        OTOH, forcing the XSLT files to be under conf/xslt would avoid cluttering the conf directory, I think it's a good idea.

        > query-to-html.xsl seems like it would render the "query" as html...

        You mean that the filename gives this impression, right? No problem with a rename, example.xsl is fine, or maybe xslt-writer-example.xsl.

        Show
        Bertrand Delacretaz added a comment - I'm with Yonik on the security thing, Solr is wide open as is anyway. OTOH, forcing the XSLT files to be under conf/xslt would avoid cluttering the conf directory, I think it's a good idea. > query-to-html.xsl seems like it would render the "query" as html... You mean that the filename gives this impression, right? No problem with a rename, example.xsl is fine, or maybe xslt-writer-example.xsl.
        Hide
        Hoss Man added a comment -

        Commited with both the subdirectory change and the rename to "example.xsl"

        Thanks again Bertrand.

        Show
        Hoss Man added a comment - Commited with both the subdirectory change and the rename to "example.xsl" Thanks again Bertrand.
        Hide
        Bertrand Delacretaz added a comment -

        Thanks for committing, I have documented this at http://wiki.apache.org/solr/XsltResponseWriter

        Show
        Bertrand Delacretaz added a comment - Thanks for committing, I have documented this at http://wiki.apache.org/solr/XsltResponseWriter
        Hide
        Hoss Man added a comment -

        This bug was modified as part of a bulk update using the criteria...

        • Marked ("Resolved" or "Closed") and "Fixed"
        • Had no "Fix Version" versions
        • Was listed in the CHANGES.txt for 1.1

        The Fix Version for all 38 issues found was set to 1.1, email notification
        was suppressed to prevent excessive email.

        For a list of all the issues modified, search jira comments for this
        (hopefully) unique string: 20080415hossman3

        Show
        Hoss Man added a comment - This bug was modified as part of a bulk update using the criteria... Marked ("Resolved" or "Closed") and "Fixed" Had no "Fix Version" versions Was listed in the CHANGES.txt for 1.1 The Fix Version for all 38 issues found was set to 1.1, email notification was suppressed to prevent excessive email. For a list of all the issues modified, search jira comments for this (hopefully) unique string: 20080415hossman3

          People

          • Assignee:
            Hoss Man
            Reporter:
            Bertrand Delacretaz
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development