Solr
  1. Solr
  2. SOLR-285

Server Side XSLT for update processing

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: update
    • Labels:
      None

      Description

      Ideally, we should support a way for people to specify send XML ContentStreams to Solr and do server side XSLT processing to convert it (much like the XSLTResposneWriter supports server side XSLT processing of responses.

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          this is mainly just a proof of concept ... there is a lot of room for improvement here .. this reuses the same TransformerProvider as the XSLTResposneWriter but doesn't even try to use hte cache (even if it did, using it in conjunction with XSLTResposneWriter would constantly invalidate the cache)

          the biggest improvement would be to find someway to pipeline the XSLT transformation into the Stax parsing ... i tried to at least use a DOMResult for hte transformer and a DOMSource for the XMLStreamReader but i got this exception...

          SEVERE: java.lang.UnsupportedOperationException: XMLInputFactory.createXMLStreamReader(javax.xml.transform.dom.DOMSource) not yet implemented
          at com.bea.xml.stream.MXParserFactory.createXMLStreamReader(MXParserFactory.java:70)

          ...oh well.

          patch also includes a simple rss2solr.xml stylesheet that does some very simplistic/silly transformations to match the example schema.xml

          comments from people who understand javax.xml.* better then i do would be greatly appreciated.

          Show
          Hoss Man added a comment - this is mainly just a proof of concept ... there is a lot of room for improvement here .. this reuses the same TransformerProvider as the XSLTResposneWriter but doesn't even try to use hte cache (even if it did, using it in conjunction with XSLTResposneWriter would constantly invalidate the cache) the biggest improvement would be to find someway to pipeline the XSLT transformation into the Stax parsing ... i tried to at least use a DOMResult for hte transformer and a DOMSource for the XMLStreamReader but i got this exception... SEVERE: java.lang.UnsupportedOperationException: XMLInputFactory.createXMLStreamReader(javax.xml.transform.dom.DOMSource) not yet implemented at com.bea.xml.stream.MXParserFactory.createXMLStreamReader(MXParserFactory.java:70) ...oh well. patch also includes a simple rss2solr.xml stylesheet that does some very simplistic/silly transformations to match the example schema.xml comments from people who understand javax.xml.* better then i do would be greatly appreciated.
          Hide
          Thomas Peuss added a comment -

          The major problem I have with this solution is that it holds the whole transformed document in memory. My suggestion is to use a stream transformation technology like Joost (http://joost.sourceforge.net/). Here is a little snippet what you can do:
          PipedReader read=new PipedReader();
          PipedWriter writer=new PipedWriter(read);
          Processor proc =
          new Processor(new InputSource(new FileReader("order-group.stx")), pContext);
          StreamEmitter emitter = StreamEmitter.newEmitter(writer,
          "UTF-8",proc.outputProperties);
          proc.setContentHandler(emitter);
          proc.setLexicalHandler(emitter);

          proc.parse(new InputSource(new FileReader("order.xml")));

          BufferedReader bufRead=new BufferedReader(read);
          System.out.println(bufRead.readLine());

          So you can give this Reader directly to the XmlUpdateHandler then and you are done without buffering the transformed document in memory.

          The downside of course is that you have to provide STX-files instead of XSL-files. But the syntax is very similar.

          What do you think?

          Show
          Thomas Peuss added a comment - The major problem I have with this solution is that it holds the whole transformed document in memory. My suggestion is to use a stream transformation technology like Joost ( http://joost.sourceforge.net/ ). Here is a little snippet what you can do: PipedReader read=new PipedReader(); PipedWriter writer=new PipedWriter(read); Processor proc = new Processor(new InputSource(new FileReader("order-group.stx")), pContext); StreamEmitter emitter = StreamEmitter.newEmitter(writer, "UTF-8",proc.outputProperties); proc.setContentHandler(emitter); proc.setLexicalHandler(emitter); proc.parse(new InputSource(new FileReader("order.xml"))); BufferedReader bufRead=new BufferedReader(read); System.out.println(bufRead.readLine()); So you can give this Reader directly to the XmlUpdateHandler then and you are done without buffering the transformed document in memory. The downside of course is that you have to provide STX-files instead of XSL-files. But the syntax is very similar. What do you think?
          Hide
          Hoss Man added a comment -

          FYI for people interested in the STX/Joost approach Thomas described, he opened SOLR-370 to track that.

          personally i think they both have merits: XSLT is something that (in theory) more people are familiar with, while the STX stuff seems to be more efficient for large amounts of data.

          Show
          Hoss Man added a comment - FYI for people interested in the STX/Joost approach Thomas described, he opened SOLR-370 to track that. personally i think they both have merits: XSLT is something that (in theory) more people are familiar with, while the STX stuff seems to be more efficient for large amounts of data.
          Hide
          Erick Erickson added a comment -

          Cleaning up old JIRAs, re-open if necessary.

          Show
          Erick Erickson added a comment - Cleaning up old JIRAs, re-open if necessary.

            People

            • Assignee:
              Unassigned
              Reporter:
              Hoss Man
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development