Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.3, 4.0-ALPHA
    • Fix Version/s: 3.4, 4.0-ALPHA
    • Component/s: update
    • Labels:
      None

      Description

      An update request handler that can accept a tr param, allowing the indexing of any XML content that is passed to solr, so long as there is an XSLT stylesheet in solr/conf/xslt that can transform it to the <add><doc/></add> format.

      Could be used, for example, to allow Solr to ingest docbook directly, without any preprocessing.

      1. xslt-update-handler.patch
        18 kB
        Uwe Schindler
      2. xslt-update-handler.patch
        15 kB
        Upayavira

        Issue Links

          Activity

          Hide
          Upayavira added a comment -

          Patch for XsltUpdateRequestHandler, along with a test case for it

          Show
          Upayavira added a comment - Patch for XsltUpdateRequestHandler, along with a test case for it
          Hide
          Uwe Schindler added a comment -

          XML is binary data, so you should not convert it to Strings. Ideally the already transformed DOM tree or SAX stream would directly be passed to the importer. I know, this is not easily possible, so the most correct way would be to pass the binary byte[] dierectly and reparse.

          I will try to investigate to directly pass the SAX events / XSL DOM tree around, which is possible, as transformer API can also directly pipe to StAX, used by the underlying XMLImporter.

          Show
          Uwe Schindler added a comment - XML is binary data, so you should not convert it to Strings. Ideally the already transformed DOM tree or SAX stream would directly be passed to the importer. I know, this is not easily possible, so the most correct way would be to pass the binary byte[] dierectly and reparse. I will try to investigate to directly pass the SAX events / XSL DOM tree around, which is possible, as transformer API can also directly pipe to StAX, used by the underlying XMLImporter.
          Hide
          Uwe Schindler added a comment -

          Also you miss to pass the content type charset to the StreamSource. I will post a improved patch fixing both problems soon.

          Thanks for the patch!

          Show
          Uwe Schindler added a comment - Also you miss to pass the content type charset to the StreamSource. I will post a improved patch fixing both problems soon. Thanks for the patch!
          Hide
          Upayavira added a comment -

          Great! I was sure I'd missed stuff. Happy to improve stuff here too (e.g. port to 3.x).

          Show
          Upayavira added a comment - Great! I was sure I'd missed stuff. Happy to improve stuff here too (e.g. port to 3.x).
          Hide
          Uwe Schindler added a comment -

          Here improved patch. This impl does not internally serialize the XML again to a stream and read it using StAX; this one uses the XSL ResultTreeFragment (RTF) which is always built as a DOM tree by XSL transformers and feeds it to StAX. So we dont need any stupid serialize/deserialize step inbetween. This patch also respects the content-type parameter of the input like XMLLoader. The intermediate buffering is needed because we need to change from push to pull APIs.

          This patch also fixes a small issue in XSLTResponseWriter, as it also misses to correctly log transformation warn/error events to slf4j.

          Show
          Uwe Schindler added a comment - Here improved patch. This impl does not internally serialize the XML again to a stream and read it using StAX; this one uses the XSL ResultTreeFragment (RTF) which is always built as a DOM tree by XSL transformers and feeds it to StAX. So we dont need any stupid serialize/deserialize step inbetween. This patch also respects the content-type parameter of the input like XMLLoader. The intermediate buffering is needed because we need to change from push to pull APIs. This patch also fixes a small issue in XSLTResponseWriter, as it also misses to correctly log transformation warn/error events to slf4j.
          Hide
          Uwe Schindler added a comment -

          Merging to 3.x should be simple, too!

          Show
          Uwe Schindler added a comment - Merging to 3.x should be simple, too!
          Hide
          Uwe Schindler added a comment -

          Committed trunk revision: 1141999
          Committed 3.x revision: 1142003

          Thanks Upayavira, the idea is great and also of use for myself (if PANGAEA/panFMP moves to Solr, but since we have facetting now in Lucene I don't think we will do this step)!

          Show
          Uwe Schindler added a comment - Committed trunk revision: 1141999 Committed 3.x revision: 1142003 Thanks Upayavira, the idea is great and also of use for myself (if PANGAEA/panFMP moves to Solr, but since we have facetting now in Lucene I don't think we will do this step)!
          Hide
          Hoss Man added a comment -

          Hmmm... from a user perspective does it really make sense for this to be an entirely new RequestHandler?

          wouldn't it make more sense if users could just continue to use XmlUpdateRequestHandler along with a tr param indicating the transform to apply first?

          Show
          Hoss Man added a comment - Hmmm... from a user perspective does it really make sense for this to be an entirely new RequestHandler? wouldn't it make more sense if users could just continue to use XmlUpdateRequestHandler along with a tr param indicating the transform to apply first?
          Hide
          Uwe Schindler added a comment -

          I was thinking about that, it would be easy to implement as the current code would simply be moved to XMLLoader?

          Should I add patch relative to whats currently committed?

          Show
          Uwe Schindler added a comment - I was thinking about that, it would be easy to implement as the current code would simply be moved to XMLLoader? Should I add patch relative to whats currently committed?
          Hide
          Uwe Schindler added a comment -

          On the other hand, this one is similar to XSLTResponseWriter which also is separate to XMLResponseWriter. XMLResponseWriter could also take an optional tr param and then transform? So the current solution is more consistent.

          Show
          Uwe Schindler added a comment - On the other hand, this one is similar to XSLTResponseWriter which also is separate to XMLResponseWriter. XMLResponseWriter could also take an optional tr param and then transform? So the current solution is more consistent.
          Hide
          Upayavira added a comment -

          I considered the same thing, making the XmlUpdateRequestHandler accept tr, but opted not to for the same reason as Uwe. Which ever way, consistency is a good thing!

          Show
          Upayavira added a comment - I considered the same thing, making the XmlUpdateRequestHandler accept tr, but opted not to for the same reason as Uwe. Which ever way, consistency is a good thing!
          Hide
          David Smiley added a comment -

          Just a side comment...
          I've been posting arbitrary XSLT and transforming it before this patch using the DIH ContentStreamDataSource: http://wiki.apache.org/solr/DataImportHandler#ContentStreamDataSource

          Show
          David Smiley added a comment - Just a side comment... I've been posting arbitrary XSLT and transforming it before this patch using the DIH ContentStreamDataSource: http://wiki.apache.org/solr/DataImportHandler#ContentStreamDataSource
          Hide
          Robert Muir added a comment -

          bulk close for 3.4

          Show
          Robert Muir added a comment - bulk close for 3.4

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Upayavira
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development