Solr
  1. Solr
  2. SOLR-1725

Script based UpdateRequestProcessorFactory

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 4.0-BETA, 5.0
    • Component/s: update
    • Labels:

      Description

      A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code.

      The update request processor factory enables writing update processors in scripts located in solr.solr.home directory. The functory accepts one (mandatory) configuration parameter named scripts which accepts a comma-separated list of file names. It will look for these files under the conf directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so scriptA.js will be executed before scriptB.js).

      The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory.

      Each script file is expected to have one or more methods with the same signature as the methods in the UpdateRequestProcessor interface. It is not required to define all methods, only those hat are required by the processing logic.

      The following variables are define as global variables for each script:

      • req - The SolrQueryRequest
      • rsp- The SolrQueryResponse
      • logger - A logger that can be used for logging purposes in the script
      1. SOLR-1725.patch
        71 kB
        Hoss Man
      2. SOLR-1725.patch
        71 kB
        Hoss Man
      3. SOLR-1725.patch
        70 kB
        Hoss Man
      4. SOLR-1725.patch
        69 kB
        Hoss Man
      5. SOLR-1725.patch
        52 kB
        Hoss Man
      6. SOLR-1725.patch
        34 kB
        Erik Hatcher
      7. SOLR-1725.patch
        33 kB
        Erik Hatcher
      8. SOLR-1725.patch
        34 kB
        Erik Hatcher
      9. SOLR-1725.patch
        58 kB
        Erik Hatcher
      10. SOLR-1725.patch
        59 kB
        Uri Boness
      11. SOLR-1725.patch
        54 kB
        Uri Boness
      12. SOLR-1725.patch
        36 kB
        Uri Boness
      13. SOLR-1725.patch
        34 kB
        Uri Boness
      14. SOLR-1725.patch
        38 kB
        Uri Boness
      15. SOLR-1725-rev1.patch
        60 kB
        Simon Rosenthal

        Activity

        Hide
        Uri Boness added a comment -

        Initial implementation. Includes a simple test (probably more tests are required). Builds a script engine per script file - each file has its own scope.

        This patch also introduces a new Interface - SolrResourceLoaderAware which can be used by any plugin loaded by SolrCore. (Any plugin implementing this interface will be injected by the resource loader of the SolrCore). The ScriptUpdateRequestProcessorFactory uses the resource loader to load the scripts from solr home conf directory.

        Show
        Uri Boness added a comment - Initial implementation. Includes a simple test (probably more tests are required). Builds a script engine per script file - each file has its own scope. This patch also introduces a new Interface - SolrResourceLoaderAware which can be used by any plugin loaded by SolrCore. (Any plugin implementing this interface will be injected by the resource loader of the SolrCore). The ScriptUpdateRequestProcessorFactory uses the resource loader to load the scripts from solr home conf directory.
        Hide
        Mark Miller added a comment -

        This is a great idea.

        This patch also introduces a new Interface - SolrResourceLoaderAware

        What about the existing ResourceLoaderAware?

        Show
        Mark Miller added a comment - This is a great idea. This patch also introduces a new Interface - SolrResourceLoaderAware What about the existing ResourceLoaderAware?
        Hide
        Uri Boness added a comment -

        What about the existing ResourceLoaderAware?

        Woops... missed that one out ... I'll check it out and update the patch

        Show
        Uri Boness added a comment - What about the existing ResourceLoaderAware? Woops... missed that one out ... I'll check it out and update the patch
        Hide
        Uri Boness added a comment -

        Is there any reason for currently limiting the classes that can be ResourceLoaderAware? This limitation is explicit in SolrResourceLoader (line: 584):

        awareCompatibility.put(
              ResourceLoaderAware.class, new Class[] {
                CharFilterFactory.class,
                TokenFilterFactory.class,
                TokenizerFactory.class,
                FieldType.class
              }
            );
        

        If the type is not one of this classes an exception is thrown

        Show
        Uri Boness added a comment - Is there any reason for currently limiting the classes that can be ResourceLoaderAware? This limitation is explicit in SolrResourceLoader (line: 584): awareCompatibility.put( ResourceLoaderAware.class, new Class [] { CharFilterFactory.class, TokenFilterFactory.class, TokenizerFactory.class, FieldType.class } ); If the type is not one of this classes an exception is thrown
        Hide
        Mark Miller added a comment -

        I believe its somewhat arbitrary based on what makes sense - so adding something to it shouldn't be a problem.

        Show
        Mark Miller added a comment - I believe its somewhat arbitrary based on what makes sense - so adding something to it shouldn't be a problem.
        Hide
        Uri Boness added a comment - - edited

        Right... ok... I'll add another class to this list (I just don't understand why would you want to limit the types that can be *Aware - in a way it defeats the whole idea of the *Aware abstraction).

        Show
        Uri Boness added a comment - - edited Right... ok... I'll add another class to this list (I just don't understand why would you want to limit the types that can be *Aware - in a way it defeats the whole idea of the *Aware abstraction).
        Hide
        Uri Boness added a comment -

        A new patch, this time leverages the already existing ResourceLoaderAware interface

        Show
        Uri Boness added a comment - A new patch, this time leverages the already existing ResourceLoaderAware interface
        Hide
        Mark Miller added a comment - - edited

        I believe the history was, on development of *aware - hey, here is a patch that restricts this to certain classes if we want? Anyone object? - and no one did.

        edit

        Actually there was some further discussion -

        https://issues.apache.org/jira/browse/SOLR-414

        Show
        Mark Miller added a comment - - edited I believe the history was, on development of *aware - hey, here is a patch that restricts this to certain classes if we want? Anyone object? - and no one did. edit Actually there was some further discussion - https://issues.apache.org/jira/browse/SOLR-414
        Hide
        Uri Boness added a comment -

        Thanks for the reference

        Show
        Uri Boness added a comment - Thanks for the reference
        Hide
        Koji Sekiguchi added a comment -

        I like the idea, Uri. I've not looked into the patch yet, it depends on Java 6? I think Solr support Java 5. There is ScriptTransformer in DIH which uses javax.script but it looks like ScriptEngineManager is loaded at runtime.

        Show
        Koji Sekiguchi added a comment - I like the idea, Uri. I've not looked into the patch yet, it depends on Java 6? I think Solr support Java 5. There is ScriptTransformer in DIH which uses javax.script but it looks like ScriptEngineManager is loaded at runtime.
        Hide
        Uri Boness added a comment -

        Yes, it depends on Java 6. I guess the concern is mainly for the unit tests? (at runtime the it shouldn't really matter)

        Show
        Uri Boness added a comment - Yes, it depends on Java 6. I guess the concern is mainly for the unit tests? (at runtime the it shouldn't really matter)
        Hide
        Uri Boness added a comment -

        Sorry... of course it matters for the build

        Show
        Uri Boness added a comment - Sorry... of course it matters for the build
        Hide
        Uri Boness added a comment -

        Third try , this time Java 5 compatible

        Show
        Uri Boness added a comment - Third try , this time Java 5 compatible
        Hide
        Uri Boness added a comment -

        This new patch cleans things up a bit. The main change here is the addition of the script engine abstraction. Supporting JDK 6 script engine at the moment is a bit tricky as Solr is built on JDK 5. So this support must be done using reflection at runtime which makes the code a bit of a mess. The abstraction helps to keep the code clean.

        Show
        Uri Boness added a comment - This new patch cleans things up a bit. The main change here is the addition of the script engine abstraction. Supporting JDK 6 script engine at the moment is a bit tricky as Solr is built on JDK 5. So this support must be done using reflection at runtime which makes the code a bit of a mess. The abstraction helps to keep the code clean.
        Hide
        Uri Boness added a comment -

        The DIH ScriptTransformer can really be cleaned up using this patch as well. I didn't add it to this patch as I didn't know whether it was a good idea to put too much into one patch.

        Show
        Uri Boness added a comment - The DIH ScriptTransformer can really be cleaned up using this patch as well. I didn't add it to this patch as I didn't know whether it was a good idea to put too much into one patch.
        Hide
        Mark Miller added a comment -

        We might want to think about making this a contrib?

        How do people feel about putting in core Solr code/functionality that requires Java 6?

        So far you have been able to run all of Solr with just Java 5. Do we want to go down the road of some non contrib features only working with something higher than 5?

        Should that only be allowed as a contrib?

        We certainly want the functionality I think, but as far as I can tell this would break new ground in terms of Solr's jvm version requirements (noting that Uri has made it so that everything still builds and runs with 5, you just can't use this functionality unless you are on 6)

        Any opinions?

        Show
        Mark Miller added a comment - We might want to think about making this a contrib? How do people feel about putting in core Solr code/functionality that requires Java 6? So far you have been able to run all of Solr with just Java 5. Do we want to go down the road of some non contrib features only working with something higher than 5? Should that only be allowed as a contrib? We certainly want the functionality I think, but as far as I can tell this would break new ground in terms of Solr's jvm version requirements (noting that Uri has made it so that everything still builds and runs with 5, you just can't use this functionality unless you are on 6) Any opinions?
        Hide
        Uri Boness added a comment -

        If we move this to the contrib. we'll need to extract the script engine abstraction of a separate contrib. utils library (so the DIH will be able to utilize it). I believe this can create a bit of a mess just for this small (though useful) functionality. Is there are real reason for not keeping it in the core?

        I think either way, if people will want to use it they'll need to read somewhere how... I think it'd be nice to save them the extra effort of putting an extra jar file in the lib directory - the configuration (writing the script and configuring the update processors) they'll need to adjust anyway. The only thing that we must stress in the documentation (both in the schema and in the wiki) is that they can only use this feature in Java 6.

        Two additional things to note:
        1. JDK 5 has reached the end of service life (EOSL) already and is not actively supported by Sun (/Oracle).
        2. The general recommendation is to run Solr on Java 6 anyways (due to some threading issues in Java 5).

        Show
        Uri Boness added a comment - If we move this to the contrib. we'll need to extract the script engine abstraction of a separate contrib. utils library (so the DIH will be able to utilize it). I believe this can create a bit of a mess just for this small (though useful) functionality. Is there are real reason for not keeping it in the core? I think either way, if people will want to use it they'll need to read somewhere how... I think it'd be nice to save them the extra effort of putting an extra jar file in the lib directory - the configuration (writing the script and configuring the update processors) they'll need to adjust anyway. The only thing that we must stress in the documentation (both in the schema and in the wiki) is that they can only use this feature in Java 6. Two additional things to note: 1. JDK 5 has reached the end of service life (EOSL) already and is not actively supported by Sun (/Oracle). 2. The general recommendation is to run Solr on Java 6 anyways (due to some threading issues in Java 5).
        Hide
        Jan Høydahl added a comment -

        This is very much wanted!

        Would it make more sense to execute the scripts in the order they are named in the scripts param? If I have two pipelines/chains, that need to use the same scripts but in different orders, I'm in trouble.

        Isn't solr/lib/scripts a more natural location of code than solr/conf?

        How to pass parameters to each script, to facilitate reusable scripts instead of hardcoded ones?

        To overcome some of these limitations, why not reuse the existing pipeline mechanism to define even the chain, i.e. allow only one script at a time? Then the order of scripts are then dictated by the order of <processor > tags in the ProcessorChain and we reuse the parameter passing logic. A positive side effect is that you can compose a ProcessorChain with a mix and match of Java and Script based Processors. Class/script instantiation needs to be optimized of course.

        Example use case: Say you have an XML input with structured data where one of the fields is the file name of a PDF. You want to convert the PDF usigng a Tika processor (hopefully to come) and then sanitize Author metadata from the parsed PDF. This could then look like this in solrconfig.xml:

        <updateRequestProcessorChain name="xmlwithpdf">
        <processor class="solr.FileReaderProcessorFactory">
        <str name="filenameField">filename</str>
        <str name="outputField">binarydata</str>
        </processor>
        <processor class="solr.TikaProcessorFactory">
        <str name="inputField">binarydata</str>
        <str name="fmap.author">tikaauthor</str>
        <str name="fmap.content">text</str>
        </processor>
        <processor class="solr.ScriptUpdateProcessorFactory">
        <str name="script">author_samitizer.js</str>
        <str name="inputField">tikaauthor</str>
        <str name="outputField">author</str>
        <str name="discardRegex">Microsoft Word.|Adobe Distiller.|PDF995.*|Unknown</str>
        <bool name="overwriteExisting">false</str>
        </processor>
        <processor class="solr.LogUpdateProcessorFactory" />
        <processor class="solr.RunUpdateProcessorFactory" />
        </updateRequestProcessorChain>

        Show
        Jan Høydahl added a comment - This is very much wanted! Would it make more sense to execute the scripts in the order they are named in the scripts param? If I have two pipelines/chains, that need to use the same scripts but in different orders, I'm in trouble. Isn't solr/lib/scripts a more natural location of code than solr/conf? How to pass parameters to each script, to facilitate reusable scripts instead of hardcoded ones? To overcome some of these limitations, why not reuse the existing pipeline mechanism to define even the chain, i.e. allow only one script at a time? Then the order of scripts are then dictated by the order of <processor > tags in the ProcessorChain and we reuse the parameter passing logic. A positive side effect is that you can compose a ProcessorChain with a mix and match of Java and Script based Processors. Class/script instantiation needs to be optimized of course. Example use case: Say you have an XML input with structured data where one of the fields is the file name of a PDF. You want to convert the PDF usigng a Tika processor (hopefully to come) and then sanitize Author metadata from the parsed PDF. This could then look like this in solrconfig.xml: <updateRequestProcessorChain name="xmlwithpdf"> <processor class="solr.FileReaderProcessorFactory"> <str name="filenameField">filename</str> <str name="outputField">binarydata</str> </processor> <processor class="solr.TikaProcessorFactory"> <str name="inputField">binarydata</str> <str name="fmap.author">tikaauthor</str> <str name="fmap.content">text</str> </processor> <processor class="solr.ScriptUpdateProcessorFactory"> <str name="script">author_samitizer.js</str> <str name="inputField">tikaauthor</str> <str name="outputField">author</str> <str name="discardRegex">Microsoft Word. |Adobe Distiller. |PDF995.*|Unknown</str> <bool name="overwriteExisting">false</str> </processor> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain>
        Hide
        Uri Boness added a comment -

        Would it make more sense to execute the scripts in the order they are named in the scripts param? If I have two pipelines/chains, that need to use the same scripts but in different orders, I'm in trouble.

        Absolutely! The reason why it is currently lexicographically ordered is due to an initial (different) implementation that i had. I'll change it and add a patch for it.

        Show
        Uri Boness added a comment - Would it make more sense to execute the scripts in the order they are named in the scripts param? If I have two pipelines/chains, that need to use the same scripts but in different orders, I'm in trouble. Absolutely! The reason why it is currently lexicographically ordered is due to an initial (different) implementation that i had. I'll change it and add a patch for it.
        Hide
        Uri Boness added a comment -

        How to pass parameters to each script, to facilitate reusable scripts instead of hardcoded ones?

        Another great idea... I'll put it in the same patch.

        To overcome some of these limitations, why not reuse the existing pipeline mechanism to define even the chain, i.e. allow only one script at a time? Then the order of scripts are then dictated by the order of <processor > tags in the ProcessorChain and we reuse the parameter passing logic. A positive side effect is that you can compose a ProcessorChain with a mix and match of Java and Script based Processors. Class/script instantiation needs to be optimized of course.

        Well, once you'll have the parameters support than you'll be able to do it even if the processor supports multiple scripts.

        Show
        Uri Boness added a comment - How to pass parameters to each script, to facilitate reusable scripts instead of hardcoded ones? Another great idea... I'll put it in the same patch. To overcome some of these limitations, why not reuse the existing pipeline mechanism to define even the chain, i.e. allow only one script at a time? Then the order of scripts are then dictated by the order of <processor > tags in the ProcessorChain and we reuse the parameter passing logic. A positive side effect is that you can compose a ProcessorChain with a mix and match of Java and Script based Processors. Class/script instantiation needs to be optimized of course. Well, once you'll have the parameters support than you'll be able to do it even if the processor supports multiple scripts.
        Hide
        Chris A. Mattmann added a comment -

        Is there any reason for SOLR not to move to JDK 1.6?

        Show
        Chris A. Mattmann added a comment - Is there any reason for SOLR not to move to JDK 1.6?
        Hide
        Chris A. Mattmann added a comment -

        oh and one other thing – I really like this patch, Uri! I'm looking to integrate it into a data processing project here at JPL. Great idea!

        Show
        Chris A. Mattmann added a comment - oh and one other thing – I really like this patch, Uri! I'm looking to integrate it into a data processing project here at JPL. Great idea!
        Hide
        Uri Boness added a comment -

        A new patch which incorporates some of the ideas suggested above, in particular:

        1. The order of script execution is based on the order in which they are configured in solrconfig.xml
        2. It is now possible to configure global scope variables which will be available for scripts. for example:

        <processor class="solr.ScriptUpdateProcessorFactory">
            <str name="scripts">updateProcessor.js</str>
            <lst name="params">
                <bool name="boolValue">true</bool>
                <int name="intValue">3</int>
            </lst>
        </processor>
        
        Show
        Uri Boness added a comment - A new patch which incorporates some of the ideas suggested above, in particular: 1. The order of script execution is based on the order in which they are configured in solrconfig.xml 2. It is now possible to configure global scope variables which will be available for scripts. for example: <processor class= "solr.ScriptUpdateProcessorFactory" > <str name= "scripts" >updateProcessor.js</str> <lst name= "params" > <bool name= "boolValue" > true </bool> < int name= "intValue" >3</ int > </lst> </processor>
        Hide
        Uri Boness added a comment -

        oh and one other thing - I really like this patch, Uri! I'm looking to integrate it into a data processing project here at JPL. Great idea!

        Thanks

        Show
        Uri Boness added a comment - oh and one other thing - I really like this patch, Uri! I'm looking to integrate it into a data processing project here at JPL. Great idea! Thanks
        Hide
        Lance Norskog added a comment -

        This is growing into the DataImportHandler
        Is it possible to just create a super-simple configuration language that turns into the DIH language? That way you get the benefit of this simplified format without having to support another execution engine.

        <updateRequestProcessorChain name="xmlwithpdf"></bq>
        <processor class="solr.FileReaderProcessorFactory">
        <str name="filenameField">filename</str>
        <str name="outputField">binarydata</str>

        Show
        Lance Norskog added a comment - This is growing into the DataImportHandler Is it possible to just create a super-simple configuration language that turns into the DIH language? That way you get the benefit of this simplified format without having to support another execution engine. <updateRequestProcessorChain name="xmlwithpdf"></bq> <processor class="solr.FileReaderProcessorFactory"> <str name="filenameField">filename</str> <str name="outputField">binarydata</str>
        Hide
        Jan Høydahl added a comment -

        Lance, what do you mean by "DIH language"?

        In my example xml that you quoted, the two first processors, FileReaderProcessorFactory and TikaProcessorFactory, were supposed to be (imagined) ordinary Java processors, not scripting ones.

        Uri, I'd prefer if the manner of configuration was as similar as possible, i.e. if we could get rid of the <lst name="params"> part, and instead pass all top-level params directly to the script (except the "scripts" param itself).

        Even better if the definition of a processor was in a separate xml section and then refer by name only in each chain, but that is a bigger change outside scope of this patch.

        Show
        Jan Høydahl added a comment - Lance, what do you mean by "DIH language"? In my example xml that you quoted, the two first processors, FileReaderProcessorFactory and TikaProcessorFactory, were supposed to be (imagined) ordinary Java processors, not scripting ones. Uri, I'd prefer if the manner of configuration was as similar as possible, i.e. if we could get rid of the <lst name="params"> part, and instead pass all top-level params directly to the script (except the "scripts" param itself). Even better if the definition of a processor was in a separate xml section and then refer by name only in each chain, but that is a bigger change outside scope of this patch.
        Hide
        Uri Boness added a comment -

        Lance, I lost you a bit as well.

        Uri, I'd prefer if the manner of configuration was as similar as possible, i.e. if we could get rid of the <lst name="params"> part, and instead pass all top-level params directly to the script (except the "scripts" param itself).

        Hmm... personally I prefer configurations that clearly indicate their purpose. leaving out the params list will make things a bit confusing - some parameters are available for the scripts, others are not... it's not really clear.

        manner of configuration was as similar as possible

        The configuration are similar. All elements in solrconfig.xml have one standard way of configuration which can be anything from a lst, bool, str, etc.... Tomorrow a new processor will popup which will also require a lst configuration... and that's fine.

        Even better if the definition of a processor was in a separate xml section and then refer by name only in each chain, but that is a bigger change outside scope of this patch.

        Well, indeed that's a bigger change. Like everything, this kind of configuration has it's pros&cons.

        I guess it's best if people will just state their preferences regarding how they would like to see this processor configured and based on that I'll adjust the patch.

        Show
        Uri Boness added a comment - Lance, I lost you a bit as well. Uri, I'd prefer if the manner of configuration was as similar as possible, i.e. if we could get rid of the <lst name="params"> part, and instead pass all top-level params directly to the script (except the "scripts" param itself). Hmm... personally I prefer configurations that clearly indicate their purpose. leaving out the params list will make things a bit confusing - some parameters are available for the scripts, others are not... it's not really clear. manner of configuration was as similar as possible The configuration are similar. All elements in solrconfig.xml have one standard way of configuration which can be anything from a lst , bool , str , etc.... Tomorrow a new processor will popup which will also require a lst configuration... and that's fine. Even better if the definition of a processor was in a separate xml section and then refer by name only in each chain, but that is a bigger change outside scope of this patch. Well, indeed that's a bigger change. Like everything, this kind of configuration has it's pros&cons. I guess it's best if people will just state their preferences regarding how they would like to see this processor configured and based on that I'll adjust the patch.
        Hide
        Hoss Man added a comment -

        Some random comments/questions from the peanut gallery...

        1) what is the value add in making ScriptUpdateProcessorFactory support multiple "scripts" ? ... wouldn't it be simpler to require that users declare multiple instances of ScriptUpdateProcessorFactory (that hte processor chain already executes in sequence) then to add sequential processing to the ScriptUpdateProcessor?

        2) The NamedList init args can be as deep of a data structure as you want, so something like this would be totally feasible (if desired) ...

        <processor class="solr.ScriptUpdateProcessorFactory">
          <lst name="scripts">
            <lst name="updateProcessor1.js">
              <bool name="someParamName">true</bool>
              <int name="someOtherParamName">3</int>
            </lst>
            <lst name="updateProcessor2.js">
              <bool name="fooParam">true</bool>
              <str name="barParam">3</str>
            </lst>
          </lst>
          <lst name="otherProcessorOPtionsIfNeeded">
            ...
          </lst>
        </processor>
        
        Show
        Hoss Man added a comment - Some random comments/questions from the peanut gallery... 1) what is the value add in making ScriptUpdateProcessorFactory support multiple "scripts" ? ... wouldn't it be simpler to require that users declare multiple instances of ScriptUpdateProcessorFactory (that hte processor chain already executes in sequence) then to add sequential processing to the ScriptUpdateProcessor? 2) The NamedList init args can be as deep of a data structure as you want, so something like this would be totally feasible (if desired) ... <processor class= "solr.ScriptUpdateProcessorFactory" > <lst name= "scripts" > <lst name= "updateProcessor1.js" > <bool name= "someParamName" > true </bool> < int name= "someOtherParamName" >3</ int > </lst> <lst name= "updateProcessor2.js" > <bool name= "fooParam" > true </bool> <str name= "barParam" >3</str> </lst> </lst> <lst name= "otherProcessorOPtionsIfNeeded" > ... </lst> </processor>
        Hide
        Uri Boness added a comment -

        1) what is the value add in making ScriptUpdateProcessorFactory support multiple "scripts" ? ... wouldn't it be simpler to require that users declare multiple instances of ScriptUpdateProcessorFactory (that hte processor chain already executes in sequence) then to add sequential processing to the ScriptUpdateProcessor?

        Well... to my taste it makes the configuration cleaner (no need to define several script processors). The thing is, you have the choice here - either specify several scripts (comma separated) or split them to several processors.

        2) The NamedList init args can be as deep of a data structure as you want, so something like this would be totally feasible (if desired) ...

        That's definitely another option.

        The only thing is that you'd probably want some way to define shared parameters (shared between the scripts that is) and not be forced to specify them several times for each script. I guess you can do something like this:

        <processor class="solr.ScriptUpdateProcessorFactory">
          <lst name="sharedParams">
            <bool name="paramName">true</bool>
          </lst>
          <lst name="scripts">
            <lst name="updateProcessor1.js">
              <bool name="someParamName">true</bool>
              <int name="someOtherParamName">3</int>
            </lst>
            <lst name="updateProcessor2.js">
              <bool name="fooParam">true</bool>
              <str name="barParam">3</str>
            </lst>
          </lst>
          <lst name="otherProcessorOPtionsIfNeeded">
            ...
          </lst>
        </processor>
        
        Show
        Uri Boness added a comment - 1) what is the value add in making ScriptUpdateProcessorFactory support multiple "scripts" ? ... wouldn't it be simpler to require that users declare multiple instances of ScriptUpdateProcessorFactory (that hte processor chain already executes in sequence) then to add sequential processing to the ScriptUpdateProcessor? Well... to my taste it makes the configuration cleaner (no need to define several script processors). The thing is, you have the choice here - either specify several scripts (comma separated) or split them to several processors. 2) The NamedList init args can be as deep of a data structure as you want, so something like this would be totally feasible (if desired) ... That's definitely another option. The only thing is that you'd probably want some way to define shared parameters (shared between the scripts that is) and not be forced to specify them several times for each script. I guess you can do something like this: <processor class= "solr.ScriptUpdateProcessorFactory" > <lst name= "sharedParams" > <bool name= "paramName" > true </bool> </lst> <lst name= "scripts" > <lst name= "updateProcessor1.js" > <bool name= "someParamName" > true </bool> < int name= "someOtherParamName" >3</ int > </lst> <lst name= "updateProcessor2.js" > <bool name= "fooParam" > true </bool> <str name= "barParam" >3</str> </lst> </lst> <lst name= "otherProcessorOPtionsIfNeeded" > ... </lst> </processor>
        Hide
        Jan Høydahl added a comment -

        It looks logical and nice.

        However, I'm leaning towards keeping it very simple. The simplest is one script per processor, since that will always work.

        As more and more update processors are written, in Java, JS, Jython and more, it would be a clear benefit if Administrators don't need to care about the underlying implementation, but can use same way of configuring each one - That's why I opt for the top-level param structure as default.

        I have years of experience with FAST document processing, which is really a killer feature, mainly because it's so dead simple. Drop in a python script with a deployment descriptor and start using it in your pipelines. You don't care if the implementation is pure Python, a C library wrapper or whatever, you just care about what parameters to give it. I see this patch as one big step towards the same simplicity with Solr!

        Show
        Jan Høydahl added a comment - It looks logical and nice. However, I'm leaning towards keeping it very simple. The simplest is one script per processor, since that will always work. As more and more update processors are written, in Java, JS, Jython and more, it would be a clear benefit if Administrators don't need to care about the underlying implementation, but can use same way of configuring each one - That's why I opt for the top-level param structure as default. I have years of experience with FAST document processing, which is really a killer feature, mainly because it's so dead simple. Drop in a python script with a deployment descriptor and start using it in your pipelines. You don't care if the implementation is pure Python, a C library wrapper or whatever, you just care about what parameters to give it. I see this patch as one big step towards the same simplicity with Solr!
        Hide
        Yonik Seeley added a comment -

        Cool feature!

        Performance:

        • It looks like scripts are read from the resource loader and parsed again (eval) for every update request. This can be pretty expensive, esp for those scripting languages that generate java class files instead of using an interpreter. One way to combat this would be to cache and reuse them.

        Interface:

        • Should we have a way to specify a script in-line (in solrconfig.xml)?
        • Or even cooler... allow passing of scripts as parameters in the update request! Think about the power of pointing Solr to a CSV file and also providing document transformers & field manipulators on the fly!
        • This seems to raise the visibility of the UpdateCommand classes, directly exposing them to users w/o plugins. We should perhaps consider interface cleanups on these classes at the same time as this issue.
        • Examples! Using javascript (since it's both fast and included in JDK6), let's see what the scripts are for some common usecases. This both helps improve the design as well as lets other people give feedback w/o having to read through code.
        Show
        Yonik Seeley added a comment - Cool feature! Performance: It looks like scripts are read from the resource loader and parsed again (eval) for every update request. This can be pretty expensive, esp for those scripting languages that generate java class files instead of using an interpreter. One way to combat this would be to cache and reuse them. Interface: Should we have a way to specify a script in-line (in solrconfig.xml)? Or even cooler... allow passing of scripts as parameters in the update request! Think about the power of pointing Solr to a CSV file and also providing document transformers & field manipulators on the fly! This seems to raise the visibility of the UpdateCommand classes, directly exposing them to users w/o plugins. We should perhaps consider interface cleanups on these classes at the same time as this issue. Examples! Using javascript (since it's both fast and included in JDK6), let's see what the scripts are for some common usecases. This both helps improve the design as well as lets other people give feedback w/o having to read through code.
        Hide
        Uri Boness added a comment -

        Performance:

        It looks like scripts are read from the resource loader and parsed again (eval) for every update request. This can be pretty expensive, esp for those scripting languages that generate java class files instead of using an interpreter. One way to combat this would be to cache and reuse them.

        Yes, indeed the scripts are evaluated per request but for a reason. One of the goals here is to keep the scripts as close as possible to the update processor interface, so the functions in the scripts has the same signature as the methods in the processor. But in order for the scripts to be flexible I decided to introduce some global scoped variables which are accessible in the functions. (currently the current solr request, response and a logger are there). The problem is that the API only defines 3 scopes where you can register variables and the lowest one is the engine itself. Since the evaluation of a script is done on the engine level as well, when using this API together with the global variables I don't think you can escape the need for creating an engine per request (thus, also evaluating the scripts).

        But I agree with you that if there is a way around it, caching the evaluated/compiled scripts will definitely boost things up. I'll need to investigate this further and come up with alternatives (I already have some ideas using ThreadLocals).

        Should we have a way to specify a script in-line (in solrconfig.xml)?

        Personally I prefer keeping the solrconfig.xml as clean as possible. I do however think that a standardization of Solr scripting support in general can be great. (for example, have a scripts folder under solr.solr.home were all the scripts are placed, or come up with a standard configuration structure for the scripts... perhaps something in the direction Hoss suggested above).

        This seems to raise the visibility of the UpdateCommand classes, directly exposing them to users w/o plugins. We should perhaps consider interface cleanups on these classes at the same time as this issue.

        +1

        Examples! Using javascript (since it's both fast and included in JDK6), let's see what the scripts are for some common usecases. This both helps improve the design as well as lets other people give feedback w/o having to read through code.

        Yep.. that would probably be very helpful. basically I think anyone who's ever written an update processor can perhaps try to convert it to a script and see how it works. The usual use case for me is to just add a few fields which are derived from the other fields, but perhaps there are some other more interesting use cases out there. I guess these examples should be put in the Wiki, right?

        Show
        Uri Boness added a comment - Performance: It looks like scripts are read from the resource loader and parsed again (eval) for every update request. This can be pretty expensive, esp for those scripting languages that generate java class files instead of using an interpreter. One way to combat this would be to cache and reuse them. Yes, indeed the scripts are evaluated per request but for a reason. One of the goals here is to keep the scripts as close as possible to the update processor interface, so the functions in the scripts has the same signature as the methods in the processor. But in order for the scripts to be flexible I decided to introduce some global scoped variables which are accessible in the functions. (currently the current solr request, response and a logger are there). The problem is that the API only defines 3 scopes where you can register variables and the lowest one is the engine itself. Since the evaluation of a script is done on the engine level as well, when using this API together with the global variables I don't think you can escape the need for creating an engine per request (thus, also evaluating the scripts). But I agree with you that if there is a way around it, caching the evaluated/compiled scripts will definitely boost things up. I'll need to investigate this further and come up with alternatives (I already have some ideas using ThreadLocals). Should we have a way to specify a script in-line (in solrconfig.xml)? Personally I prefer keeping the solrconfig.xml as clean as possible. I do however think that a standardization of Solr scripting support in general can be great. (for example, have a scripts folder under solr.solr.home were all the scripts are placed, or come up with a standard configuration structure for the scripts... perhaps something in the direction Hoss suggested above). This seems to raise the visibility of the UpdateCommand classes, directly exposing them to users w/o plugins. We should perhaps consider interface cleanups on these classes at the same time as this issue. +1 Examples! Using javascript (since it's both fast and included in JDK6), let's see what the scripts are for some common usecases. This both helps improve the design as well as lets other people give feedback w/o having to read through code. Yep.. that would probably be very helpful. basically I think anyone who's ever written an update processor can perhaps try to convert it to a script and see how it works. The usual use case for me is to just add a few fields which are derived from the other fields, but perhaps there are some other more interesting use cases out there. I guess these examples should be put in the Wiki, right?
        Hide
        Uri Boness added a comment -

        Been looking more into it and I think there's a nice way in which we can cache the evaluated scripts. But... (and there's always a but) to make it work cleanly we need to be able to extend the scripting support, which means we need to be able to compile the code in Java 6.

        And this brings us back to Mark's comment above on how do we want to do that.

        Show
        Uri Boness added a comment - Been looking more into it and I think there's a nice way in which we can cache the evaluated scripts. But... (and there's always a but) to make it work cleanly we need to be able to extend the scripting support, which means we need to be able to compile the code in Java 6. And this brings us back to Mark's comment above on how do we want to do that.
        Hide
        Yonik Seeley added a comment -

        As you pointed out, Java5 is EOL'd already and Sun/Oracle doesn't even let you download JDK5 anymore w/o registration.
        Wouldn't hurt my feelings to move to Java6. After all, the SolrCloud stuff we're working on uses zookeeper which requires 1.6.

        Show
        Yonik Seeley added a comment - As you pointed out, Java5 is EOL'd already and Sun/Oracle doesn't even let you download JDK5 anymore w/o registration. Wouldn't hurt my feelings to move to Java6. After all, the SolrCloud stuff we're working on uses zookeeper which requires 1.6.
        Hide
        Uri Boness added a comment -

        Well then... I just hope others will not shed tears as well and we can make Solr 1.5 Java 6 compiled

        Show
        Uri Boness added a comment - Well then... I just hope others will not shed tears as well and we can make Solr 1.5 Java 6 compiled
        Hide
        David Smiley added a comment -

        This capability is awesome, just as it's DIH equivalent is. It's been over a year since any activity here. As time passes, the case for moving to Java 6 increases. Short of that, I don't see a problem with this working like the DIH ScriptTransformer does by using the Java 6 features via reflection.

        Show
        David Smiley added a comment - This capability is awesome, just as it's DIH equivalent is. It's been over a year since any activity here. As time passes, the case for moving to Java 6 increases. Short of that, I don't see a problem with this working like the DIH ScriptTransformer does by using the Java 6 features via reflection.
        Hide
        Grant Ingersoll added a comment -

        As time passes, the case for moving to Java 6 increases.

        Solr trunk is on 1.6.

        Show
        Grant Ingersoll added a comment - As time passes, the case for moving to Java 6 increases. Solr trunk is on 1.6.
        Hide
        Bill Bell added a comment -

        Is there a reason why this is not committed. It seems pretty awesome!!

        Show
        Bill Bell added a comment - Is there a reason why this is not committed. It seems pretty awesome!!
        Hide
        Simon Willnauer added a comment -

        Is there a reason why this is not committed. It seems pretty awesome!!

        indeed this looks good... somebody should bring it uptodate I guess

        Show
        Simon Willnauer added a comment - Is there a reason why this is not committed. It seems pretty awesome!! indeed this looks good... somebody should bring it uptodate I guess
        Hide
        Simon Rosenthal added a comment -

        With the hope that this can be committed to trunk soon, I updated the patch to work with reorganized sources in trunk, and a couple of other small changes so that the tests would compile
        .
        Some tests fail - I'm seeing
        [junit] ERROR: SolrIndexSearcher opens=30 closes=28
        [junit] junit.framework.AssertionFailedError: ERROR: SolrIndexSearcher opens=30 closes=28

        in the ScriptUpdateProcessorFactoryTest

        Show
        Simon Rosenthal added a comment - With the hope that this can be committed to trunk soon, I updated the patch to work with reorganized sources in trunk, and a couple of other small changes so that the tests would compile . Some tests fail - I'm seeing [junit] ERROR: SolrIndexSearcher opens=30 closes=28 [junit] junit.framework.AssertionFailedError: ERROR: SolrIndexSearcher opens=30 closes=28 in the ScriptUpdateProcessorFactoryTest
        Hide
        Jan Høydahl added a comment -

        As a Solr4 (alpha) release is getting more realistic, perhaps we shall just aim this as a 4.0 only feature, and go with clean Java6 requirement, which makes everything simpler!

        Show
        Jan Høydahl added a comment - As a Solr4 (alpha) release is getting more realistic, perhaps we shall just aim this as a 4.0 only feature, and go with clean Java6 requirement, which makes everything simpler!
        Hide
        Erik Hatcher added a comment -

        Here's a patch updated to current trunk.

        Show
        Erik Hatcher added a comment - Here's a patch updated to current trunk.
        Hide
        Erik Hatcher added a comment -

        I'm picking this much needed issue up. I've updated Simon's last patch to trunk. I even did a quick (commented out in ScriptEngineTest) test to see if JRuby worked and it did. I haven't looked, yet, at what needs to be removed/refactored since we can assume Java 6 now, but that'll get cleaned up before committing.

        What else is needed here?

        Show
        Erik Hatcher added a comment - I'm picking this much needed issue up. I've updated Simon's last patch to trunk. I even did a quick (commented out in ScriptEngineTest) test to see if JRuby worked and it did. I haven't looked, yet, at what needs to be removed/refactored since we can assume Java 6 now, but that'll get cleaned up before committing. What else is needed here?
        Hide
        Andrzej Bialecki added a comment -

        Erik, good stuff, very much needed!

        My comments relate to a use case that would require extending this functionality to be able to re-configure the processor on the fly (without a core reload) and to support non-file based scripts (e.g. parsing scripts from SolrParams). Currently these classes are not extensible at all, but only small modifications would be required to make them much more powerful and reusable:

        • I'd like to be able to re-configure SUPFactory programatically and not via init args. The implementation appears to handle this by not throwing an exception if <script></script> exists and is empty. That's fine, but if it wasn't intentional then please don't block it
        • could you please change initEngines(req,rsp) and associated private classes (and members in SUPFactory) to protected? I'd like to be able to customize these parts, too.
        • actually, I don't like ScriptFile at all, and it's baked into EngineInfo so that I can't get rid of it - in my use case I wouldn't have a file, and anyway the whole idea of re-reading actual files on each request is doubtful, in terms of performance... I'd rather see something like this:
        protected static class Script {
         private final String scriptName;
         private final String type;
         private final byte[] script;
        
         protected Script(String scriptName, String type, byte[] content) {
          ...
         }
        ...
        }
        

        I.e. to separate the actual content of the script from the way we obtain it, and do the reading as a part of init(args), so that it's customizable. The benefit of this is also that we would cache the script content so that no IO would be needed on each request. Then initEngines(..) could simply open a reader on the cached bytes to eval() the script.

        Show
        Andrzej Bialecki added a comment - Erik, good stuff, very much needed! My comments relate to a use case that would require extending this functionality to be able to re-configure the processor on the fly (without a core reload) and to support non-file based scripts (e.g. parsing scripts from SolrParams). Currently these classes are not extensible at all, but only small modifications would be required to make them much more powerful and reusable: I'd like to be able to re-configure SUPFactory programatically and not via init args. The implementation appears to handle this by not throwing an exception if <script></script> exists and is empty. That's fine, but if it wasn't intentional then please don't block it could you please change initEngines(req,rsp) and associated private classes (and members in SUPFactory) to protected? I'd like to be able to customize these parts, too. actually, I don't like ScriptFile at all, and it's baked into EngineInfo so that I can't get rid of it - in my use case I wouldn't have a file, and anyway the whole idea of re-reading actual files on each request is doubtful, in terms of performance... I'd rather see something like this: protected static class Script { private final String scriptName; private final String type; private final byte [] script; protected Script( String scriptName, String type, byte [] content) { ... } ... } I.e. to separate the actual content of the script from the way we obtain it, and do the reading as a part of init(args), so that it's customizable. The benefit of this is also that we would cache the script content so that no IO would be needed on each request. Then initEngines(..) could simply open a reader on the cached bytes to eval() the script.
        Hide
        Erik Hatcher added a comment -

        Updated patch that removes the Java 5 support (removing a lot of now unnecessary code) and other fiddlings.

        Show
        Erik Hatcher added a comment - Updated patch that removes the Java 5 support (removing a lot of now unnecessary code) and other fiddlings.
        Hide
        Erik Hatcher added a comment -

        The latest patch also refactors the test case a bit, creating a common base class for useful stuff borrowed from the field mutating update processor tests.

        I'm having issues with the tests in the latest patches. Running "ant -Dtestcase=ScriptUpdateProcessorFactoryTest test" I get, after it "hangs" for a couple of minutes:

           [junit4] ERROR   0.00s | ScriptUpdateProcessorFactoryTest (suite)
           [junit4]    > Throwable #1: java.lang.AssertionError: ERROR: SolrIndexSearcher opens=3 closes=1
        

        I've not yet figured out what is causing this, as the test class itself is fairly straightforward.

        Also, testMultipleScripts test case is a bit odd... setting one of the factories to enabled=false (so it's not really testing multiple, though awkwardly it actually is anyway for some reason).

        Still fiddling, but figured I'd post an updated patch in case anyone else has thoughts on these issues.

        Show
        Erik Hatcher added a comment - The latest patch also refactors the test case a bit, creating a common base class for useful stuff borrowed from the field mutating update processor tests. I'm having issues with the tests in the latest patches. Running "ant -Dtestcase=ScriptUpdateProcessorFactoryTest test" I get, after it "hangs" for a couple of minutes: [junit4] ERROR 0.00s | ScriptUpdateProcessorFactoryTest (suite) [junit4] > Throwable #1: java.lang.AssertionError: ERROR: SolrIndexSearcher opens=3 closes=1 I've not yet figured out what is causing this, as the test class itself is fairly straightforward. Also, testMultipleScripts test case is a bit odd... setting one of the factories to enabled=false (so it's not really testing multiple, though awkwardly it actually is anyway for some reason). Still fiddling, but figured I'd post an updated patch in case anyone else has thoughts on these issues.
        Hide
        Erik Hatcher added a comment -

        Updated patch, removing the "enabled" concept which wasn't working well for the tests only. Also fixed up tests that were not decrementing searcher count causing the test monitor to raise a red flag.

        Show
        Erik Hatcher added a comment - Updated patch, removing the "enabled" concept which wasn't working well for the tests only. Also fixed up tests that were not decrementing searcher count causing the test monitor to raise a red flag.
        Hide
        Erik Hatcher added a comment -

        New patch, which allows script files to not have to implement every method.

        Show
        Erik Hatcher added a comment - New patch, which allows script files to not have to implement every method.
        Hide
        Erik Hatcher added a comment -

        With the latest patch, you can implement an update processor more simply (not having to implement every process* and finish() methods) like this:

        function processAdd(cmd) {
          doc = cmd.solrDoc;
        
          doc.addField('foo_s', 'bar');
        }
        

        I think there needs to be a simpler mode such that a script can be assumed to be simply under processAdd (it's not going to be a common activity to need to implement a processAnythingElse) where the script is handed a SolrInputDocument cleanly and not (necessarily) having to see the AddUpdateCommand object at all.

        Thoughts on how that should work? (at the ScriptEngine level, it's cleaner to eval than it is to invoke a function). Maybe something like:

            <processor class="solr.ScriptUpdateProcessorFactory">
              <str name="add">add_doc.js</str>
            </processor>
        

        where add_doc.js is evaluated only during processAdd and handed a SolrInputDocument "doc" (along with the request and logger objects of course). We can still keep it such that a full "update processor" can be implemented with script to have the full power of all update processor methods.

        Show
        Erik Hatcher added a comment - With the latest patch, you can implement an update processor more simply (not having to implement every process* and finish() methods) like this: function processAdd(cmd) { doc = cmd.solrDoc; doc.addField('foo_s', 'bar'); } I think there needs to be a simpler mode such that a script can be assumed to be simply under processAdd (it's not going to be a common activity to need to implement a processAnythingElse) where the script is handed a SolrInputDocument cleanly and not (necessarily) having to see the AddUpdateCommand object at all. Thoughts on how that should work? (at the ScriptEngine level, it's cleaner to eval than it is to invoke a function). Maybe something like: <processor class= "solr.ScriptUpdateProcessorFactory" > <str name= "add" >add_doc.js</str> </processor> where add_doc.js is evaluated only during processAdd and handed a SolrInputDocument "doc" (along with the request and logger objects of course). We can still keep it such that a full "update processor" can be implemented with script to have the full power of all update processor methods.
        Hide
        Erik Hatcher added a comment -

        Is there really a need for multiple scripts specified for a single update processor? We could simplify the implementation and testing if we kept it at one script per ScriptUpdateProcessorFactory. I appreciate the flexibility scripts plural offers, but practically speaking it's not really necessary as an implementer could combine all script activities into one script file, or specify multiple ScriptUpdateProcessorFactory's.

        Show
        Erik Hatcher added a comment - Is there really a need for multiple scripts specified for a single update processor? We could simplify the implementation and testing if we kept it at one script per ScriptUpdateProcessorFactory. I appreciate the flexibility scripts plural offers, but practically speaking it's not really necessary as an implementer could combine all script activities into one script file, or specify multiple ScriptUpdateProcessorFactory's.
        Hide
        Erik Hatcher added a comment -

        Just FYI, with the current patch, I've gotten the following to work:

          <updateRequestProcessorChain name="script">
            <processor class="solr.ScriptUpdateProcessorFactory">
              <str name="scripts">updateProcessor.js</str>
            </processor>
            <processor class="solr.LogUpdateProcessorFactory" />
            <processor class="solr.RunUpdateProcessorFactory" />
          </updateRequestProcessorChain>
        
        
          <updateRequestProcessorChain name="ruby">
            <processor class="solr.ScriptUpdateProcessorFactory">
              <str name="scripts">updateProcessor.rb</str>
            </processor>
            <processor class="solr.LogUpdateProcessorFactory" />
            <processor class="solr.RunUpdateProcessorFactory" />
          </updateRequestProcessorChain>
        
          <updateRequestProcessorChain name="python">
            <processor class="solr.ScriptUpdateProcessorFactory">
              <str name="scripts">updateProcessor.py</str>
            </processor>
            <processor class="solr.LogUpdateProcessorFactory" />
            <processor class="solr.RunUpdateProcessorFactory" />
          </updateRequestProcessorChain>
        

        using the following scripts:

        JavaScript

        function processAdd(cmd) {
          doc = cmd.solrDoc;
        
          doc.addField('foo_s', 'bar');
        }
        

        JRuby

        def processAdd(cmd)
          doc = cmd.solrDoc
        
          doc.addField('foo_s', 'bar')
        
          $logger.info("Added field to #{doc}")
        end
        

        Jython

        def processAdd(cmd):
          doc = cmd.solrDoc
          doc.addField('foo_s', 'bar')
        

        I simply added jruby.jar and jython.jar to my local lucene/solr/lib directory and rebuilt the example. Pretty cool!

        Show
        Erik Hatcher added a comment - Just FYI, with the current patch, I've gotten the following to work: <updateRequestProcessorChain name= "script" > <processor class= "solr.ScriptUpdateProcessorFactory" > <str name= "scripts" >updateProcessor.js</str> </processor> <processor class= "solr.LogUpdateProcessorFactory" /> <processor class= "solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> <updateRequestProcessorChain name= "ruby" > <processor class= "solr.ScriptUpdateProcessorFactory" > <str name= "scripts" >updateProcessor.rb</str> </processor> <processor class= "solr.LogUpdateProcessorFactory" /> <processor class= "solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> <updateRequestProcessorChain name= "python" > <processor class= "solr.ScriptUpdateProcessorFactory" > <str name= "scripts" >updateProcessor.py</str> </processor> <processor class= "solr.LogUpdateProcessorFactory" /> <processor class= "solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> using the following scripts: JavaScript function processAdd(cmd) { doc = cmd.solrDoc; doc.addField('foo_s', 'bar'); } JRuby def processAdd(cmd) doc = cmd.solrDoc doc.addField('foo_s', 'bar') $logger.info( "Added field to #{doc}" ) end Jython def processAdd(cmd): doc = cmd.solrDoc doc.addField('foo_s', 'bar') I simply added jruby.jar and jython.jar to my local lucene/solr/lib directory and rebuilt the example. Pretty cool!
        Hide
        Erik Hatcher added a comment -

        Another TODO is to get this to work with a scripting language implementation JAR file being added as a "plugin" somehow. I'm not yet sure what it'd take to be able to keep solr.war as-is, and have an external jruby.jar picked up. Anyone with insight into how to make this work?

        Show
        Erik Hatcher added a comment - Another TODO is to get this to work with a scripting language implementation JAR file being added as a "plugin" somehow. I'm not yet sure what it'd take to be able to keep solr.war as-is, and have an external jruby.jar picked up. Anyone with insight into how to make this work?
        Hide
        Lance Norskog added a comment -

        I simply added jruby.jar and jython.jar to my local lucene/solr/lib directory and rebuilt the example.

        Next post, 3 minutes later:

        Another TODO is to get this to work with a scripting language implementation JAR file being added as a "plugin" somehow. I'm not yet sure what it'd take to be able to keep solr.war as-is, and have an external jruby.jar picked up.

        Doesn't the first comment answer the second? What other deployment are you describing?

        Show
        Lance Norskog added a comment - I simply added jruby.jar and jython.jar to my local lucene/solr/lib directory and rebuilt the example. Next post, 3 minutes later: Another TODO is to get this to work with a scripting language implementation JAR file being added as a "plugin" somehow. I'm not yet sure what it'd take to be able to keep solr.war as-is, and have an external jruby.jar picked up. Doesn't the first comment answer the second? What other deployment are you describing?
        Hide
        Erik Hatcher added a comment -

        Doesn't the first comment answer the second? What other deployment are you describing?

        No, it doesn't. lucene/solr/lib (we're talking about a source checkout dir structure here) is a build time directory, not used at run-time. Dropping those jars in there causes them to get built ("and rebuilt the example", i.e. ran "ant example") into solr.war.

        Show
        Erik Hatcher added a comment - Doesn't the first comment answer the second? What other deployment are you describing? No, it doesn't. lucene/solr/lib (we're talking about a source checkout dir structure here) is a build time directory, not used at run-time. Dropping those jars in there causes them to get built ("and rebuilt the example", i.e. ran "ant example") into solr.war.
        Hide
        Lance Norskog added a comment -

        So, parking them into solr.home/lib or solr.home/cores/collection/lib does not work? The scripting engine will not be dynamically loaded?

        Show
        Lance Norskog added a comment - So, parking them into solr.home/lib or solr.home/cores/collection/lib does not work? The scripting engine will not be dynamically loaded?
        Hide
        Erik Hatcher added a comment -

        So, parking them into solr.home/lib or solr.home/cores/collection/lib does not work? The scripting engine will not be dynamically loaded?

        My comment above still stands: "Another TODO is to get this to work with a scripting language implementation JAR file being added as a "plugin" somehow. I'm not yet sure what it'd take to be able to keep solr.war as-is, and have an external jruby.jar picked up.".

        In other words, a TODO. I didn't try it, but I'm fairly certain there's no way that'd work currently, as somehow we'd need to incorporate the SolrResourceLoader into the picture to see that "classpath" for anything we load. Surely we'll need some SolrResourceLoader#newInstance action, or something. I dunno yet. Wanna iterate on this patch with me some, Lance? Feel free to take some stabs at it.

        Show
        Erik Hatcher added a comment - So, parking them into solr.home/lib or solr.home/cores/collection/lib does not work? The scripting engine will not be dynamically loaded? My comment above still stands: "Another TODO is to get this to work with a scripting language implementation JAR file being added as a "plugin" somehow. I'm not yet sure what it'd take to be able to keep solr.war as-is, and have an external jruby.jar picked up.". In other words, a TODO. I didn't try it, but I'm fairly certain there's no way that'd work currently, as somehow we'd need to incorporate the SolrResourceLoader into the picture to see that "classpath" for anything we load. Surely we'll need some SolrResourceLoader#newInstance action, or something. I dunno yet. Wanna iterate on this patch with me some, Lance? Feel free to take some stabs at it.
        Hide
        Hoss Man added a comment -

        My comment above still stands: "Another TODO is to get this to work with a scripting language implementation JAR file being added as a "plugin" somehow.

        I played around with this on the train today and confirmed that we can do runtime loading of jars that included script engines if we changed the ScriptEngineManager instantiation so that we use the one arg constructor and pass in resourceLoader.getClassLoader().

        A few other notes based on reviewing the patch and playing arround with it. Baring objections i'll probably take a stab at addressing these tomorow or friday...

        • i don't see any mechanism for scripts to indicate that processing should "stop" – ie: the way a java UpdateProcessor would just return w/o calling super.foo. we should add/test/doc some functionality to look at the result of the invokeFunction call to support this
        • the tests seem to assert that the side effects of the scripts happen (ie: that the "testcase" records the function names) but i don't see any assertions that the expected modifications of the update commands is happening (ie: that documents are being modified in processAdd
        • we need to test that request params are easily accessable (i'm not sure how well the SolrQueryRequest class works in various scripting langauges, so might need to pull out hte SolrParams and expose directly - either way we need to demonstrate doing it in a test)
        • whitespace/comma/pipesplitting of the script names is a bad meme. we should stop doing that, and require that multiple scripts be specified as multiple <str> params
          • we can add convenience code to support <arr name="script"><str><str></arr> style as well
        • "ScriptFile" and it's extension parsing is very primitive and broken on any file with "." in it's name. We should just use the helper method for parsing filename extensions that already exists in commons-io
        • from what i can tell looking at the ScriptEngine javadocs, it's possible that a ScriptEngine might exist w/o a specific file extension, or that multiple engines could support the same extension(s) we should offer an init param that lets the user specify a ScriptEngine by "shortname" to override whatever extension might be found
        • currently, problems with scripts not being found, or engines for scripts not being found, aren't reported until first request tries to use them - we should error check all of this in init (or inform) and fail fast.
          • ditto for the assumption in invokeFunction that we can cast every ScriptEngine to Invocable – we should at check this on init/inform and fail fast
        • the way the various UpdateProcessor methods are implemented to be lenient about any scripts that don't explicitly implement a method seems kludgy – isn't there anyway we can introspect the engine to ask if a function exists?
          • in particular, when i did some testing with jruby, i found that it didn't work at all - i guess jruby was throwing a ScriptException instead of NoSuchMethodException?
        undefined method `processCommit' for main:Object (NoMethodError)
        org.jruby.embed.InvokeFailedException: (NoMethodError) undefined method `processCommit' for main:Object
        	at org.jruby.embed.internal.EmbedRubyObjectAdapterImpl.call(EmbedRubyObjectAdapterImpl.java:403)
        	at org.jruby.embed.internal.EmbedRubyObjectAdapterImpl.callMethod(EmbedRubyObjectAdapterImpl.java:189)
        	at org.jruby.embed.ScriptingContainer.callMethod(ScriptingContainer.java:1386)
        	at org.jruby.embed.jsr223.JRubyEngine.invokeFunction(JRubyEngine.java:262)
        	at org.apache.solr.update.processor.ScriptUpdateProcessorFactory$ScriptUpdateProcessor.invokeFunction(ScriptUpdateProcessorFactory.java:221)
        	at org.apache.solr.update.processor.ScriptUpdateProcessorFactory$ScriptUpdateProcessor.processCommit(ScriptUpdateProcessorFactory.java:202)
        
        Show
        Hoss Man added a comment - My comment above still stands: "Another TODO is to get this to work with a scripting language implementation JAR file being added as a "plugin" somehow. I played around with this on the train today and confirmed that we can do runtime loading of jars that included script engines if we changed the ScriptEngineManager instantiation so that we use the one arg constructor and pass in resourceLoader.getClassLoader(). A few other notes based on reviewing the patch and playing arround with it. Baring objections i'll probably take a stab at addressing these tomorow or friday... i don't see any mechanism for scripts to indicate that processing should "stop" – ie: the way a java UpdateProcessor would just return w/o calling super.foo. we should add/test/doc some functionality to look at the result of the invokeFunction call to support this the tests seem to assert that the side effects of the scripts happen (ie: that the "testcase" records the function names) but i don't see any assertions that the expected modifications of the update commands is happening (ie: that documents are being modified in processAdd we need to test that request params are easily accessable (i'm not sure how well the SolrQueryRequest class works in various scripting langauges, so might need to pull out hte SolrParams and expose directly - either way we need to demonstrate doing it in a test) whitespace/comma/pipesplitting of the script names is a bad meme. we should stop doing that, and require that multiple scripts be specified as multiple <str> params we can add convenience code to support <arr name="script"><str><str></arr> style as well "ScriptFile" and it's extension parsing is very primitive and broken on any file with "." in it's name. We should just use the helper method for parsing filename extensions that already exists in commons-io from what i can tell looking at the ScriptEngine javadocs, it's possible that a ScriptEngine might exist w/o a specific file extension, or that multiple engines could support the same extension(s) we should offer an init param that lets the user specify a ScriptEngine by "shortname" to override whatever extension might be found currently, problems with scripts not being found, or engines for scripts not being found, aren't reported until first request tries to use them - we should error check all of this in init (or inform) and fail fast. ditto for the assumption in invokeFunction that we can cast every ScriptEngine to Invocable – we should at check this on init/inform and fail fast the way the various UpdateProcessor methods are implemented to be lenient about any scripts that don't explicitly implement a method seems kludgy – isn't there anyway we can introspect the engine to ask if a function exists? in particular, when i did some testing with jruby, i found that it didn't work at all - i guess jruby was throwing a ScriptException instead of NoSuchMethodException? undefined method `processCommit' for main:Object (NoMethodError) org.jruby.embed.InvokeFailedException: (NoMethodError) undefined method `processCommit' for main:Object at org.jruby.embed.internal.EmbedRubyObjectAdapterImpl.call(EmbedRubyObjectAdapterImpl.java:403) at org.jruby.embed.internal.EmbedRubyObjectAdapterImpl.callMethod(EmbedRubyObjectAdapterImpl.java:189) at org.jruby.embed.ScriptingContainer.callMethod(ScriptingContainer.java:1386) at org.jruby.embed.jsr223.JRubyEngine.invokeFunction(JRubyEngine.java:262) at org.apache.solr.update.processor.ScriptUpdateProcessorFactory$ScriptUpdateProcessor.invokeFunction(ScriptUpdateProcessorFactory.java:221) at org.apache.solr.update.processor.ScriptUpdateProcessorFactory$ScriptUpdateProcessor.processCommit(ScriptUpdateProcessorFactory.java:202)
        Hide
        Hoss Man added a comment -

        Updated patch with lots of changes, here's the notes i kept as i went along...

        • updated patch to trunk
        • added classloader to ScriptEngineManager constructor
        • fixed InputStreamReader to use UTF-8
        • fixed Reader usage to ensure it's closed
        • fixed file extension parsing using commons-io
        • renamed some test files to make it more clear at a glance what they were for
        • more tests and asserts to verify that changes are working (including adding field values)
        • added logic to allow script functions to return "false" to end processing of this command
        • fixed init to call super.init
        • removed splitting of string "scripts" init param, replaced with multivalued "script" init param
        • generallize "params" to allow any init param type (not just NamedList)
        • add support for forcing engine name
        • add details about supported engines if we can't fine one by extension/name
        • updated javadocs

        Things i still want to deal with:

        • make test assume() javascript/js engine is available - from what i can tell JREs aren't required to support it
        • check engine(s) & scripts exist in inform
        • add tests of bad configs (bad engine name, bad extension)
        • polish javadocs
        • deal with problem of processor functions that don't exist in script.

        The last one is a biggee. from what i can tell, the ScriptEngine api doesn't give us any sort of "reflection" and the more i think about it, the more silently ignoring when a function doesn't exist seems like a really bad idea – what if the script has a function name with a typo in it? what if a stray "cat" command causes someome to accidently truncate a script file to 0 bytes? (so it's still a legal script, but doesn't have any functions?) ... i think we should fail hard if any method used isn't found – so people have to implement every method that they use (ie: if you never use "rollback" it's no big deal that you don't have a processRollback(...) method)

        Show
        Hoss Man added a comment - Updated patch with lots of changes, here's the notes i kept as i went along... updated patch to trunk added classloader to ScriptEngineManager constructor fixed InputStreamReader to use UTF-8 fixed Reader usage to ensure it's closed fixed file extension parsing using commons-io renamed some test files to make it more clear at a glance what they were for more tests and asserts to verify that changes are working (including adding field values) added logic to allow script functions to return "false" to end processing of this command fixed init to call super.init removed splitting of string "scripts" init param, replaced with multivalued "script" init param generallize "params" to allow any init param type (not just NamedList) add support for forcing engine name add details about supported engines if we can't fine one by extension/name updated javadocs Things i still want to deal with: make test assume() javascript/js engine is available - from what i can tell JREs aren't required to support it check engine(s) & scripts exist in inform add tests of bad configs (bad engine name, bad extension) polish javadocs deal with problem of processor functions that don't exist in script. The last one is a biggee. from what i can tell, the ScriptEngine api doesn't give us any sort of "reflection" and the more i think about it, the more silently ignoring when a function doesn't exist seems like a really bad idea – what if the script has a function name with a typo in it? what if a stray "cat" command causes someome to accidently truncate a script file to 0 bytes? (so it's still a legal script, but doesn't have any functions?) ... i think we should fail hard if any method used isn't found – so people have to implement every method that they use (ie: if you never use "rollback" it's no big deal that you don't have a processRollback(...) method)
        Hide
        Robert Muir added a comment -

        fixed InputStreamReader to use UTF-8

        I couldn't resist: I think instead of

        InputStream input = resourceLoader.openResource(fileName);
        return new InputStreamReader(input, "UTF-8");
        

        I would try something like

        InputStream input = resourceLoader.openResource(fileName);
        CharsetDecoder decoder = IOUtils.CHARSET_UTF_8.newDecoder()
                      .onMalformedInput(CodingErrorAction.REPORT)
                      .onUnmappableCharacter(CodingErrorAction.REPORT);
        return new InputStreamReader(input, decoder);
        

        we could maybe add some sugar like this somewhere: I think
        we should be throwing exceptions on invalid encoding when possible.
        I can't stand how java silently defaults to replacing errors with \uFFFD

        Show
        Robert Muir added a comment - fixed InputStreamReader to use UTF-8 I couldn't resist: I think instead of InputStream input = resourceLoader.openResource(fileName); return new InputStreamReader(input, "UTF-8" ); I would try something like InputStream input = resourceLoader.openResource(fileName); CharsetDecoder decoder = IOUtils.CHARSET_UTF_8.newDecoder() .onMalformedInput(CodingErrorAction.REPORT) .onUnmappableCharacter(CodingErrorAction.REPORT); return new InputStreamReader(input, decoder); we could maybe add some sugar like this somewhere: I think we should be throwing exceptions on invalid encoding when possible. I can't stand how java silently defaults to replacing errors with \uFFFD
        Hide
        Robert Muir added a comment -

        We do have sugar for this already actually: IOUtils.getDecodingReader(InputStream, Charset)

        Show
        Robert Muir added a comment - We do have sugar for this already actually: IOUtils.getDecodingReader(InputStream, Charset)
        Hide
        Hoss Man added a comment -

        Updated patch, changes...

        • refactor TestBadConfig and BadIndexSchemaTest to use common base class
        • add TestBadConfig tests that bad script processor configs should fail on init
        • add initEngine call to inform so errors are triggered at startup
        • tweak SolrCore init error reporting to make the underlying msg more visible
        • tweak script processor error reporting to have better error msgs
        • test wrapping of errors thrown by the script functions
        • explicitly check that ScriptEngine is Invocable
        • make tests use assume() on existence of javascript engine since JVMs aren't garunteed to support it
        • make processor fail hard if a method is missing from a script
        • revert unneccessary SolrResourceLoader change
        • fixed rmuirs charset suggestions

        ...the only thing from my previous list i didn't get to today was polishing up the javadocs (still some TODOs in there about scoping)

        FWIW: Erik and i had a (VERY) breif exchange on IRC last night about the "NoSuchMethod" problem in which he pointed me to his earlier comment about a simpler syntax for just dealing with "processAdd" as it's own script, which got me reading a lot of the older comments on this issue (i fully admit: i glossed over most of them when i first started looking at this issue a few days ago because when skimming it looked like they were mostly all about Java5 stuff) which got me thinking more about scoping and caching of ScriptEngine objects.

        In general, i think what we have here is definitely usable (if nothing else then for rapid prototyping and experimentation), and provides a fairly direct representation of an "UpdateProcessor implemented as a script - ie: isolated instances are constructed for each request (with no risk of multiple requests polluting each other) and individual functions/methods are called for each command in that request.

        But there are probably other approaches we could take, in slightly different directions, that deviate from the existing UpdateProcessor model to get other gains:

        • Construct ScriptEngine instances that are scoped to the processorfactory (and live for hte lifespan of hte solr core), and using eval or invokeFunction for the individual methods passing req,rsp,and cmd as args (requires the user to be more careful about variable usage in their script implementations)
        • let the user specify distinct script files for each method they want to implement and require that the ScriptEngines are "Compilable", so we can generate a CompiledScript during processorfactory initialization, such that the CompiledScript's lifespan is the same sa the solr core, and call CompiledScript.eval(Bindings) with Bindings containing the req/rsp/cmd in each process method – ie: the entire script file is a single function (should be much more efficient, but has a lot of preconditions, and like the above suggestion requires the user to be careful about the variables in their scripts)
        • etc...

        ...the bottom line being i think we should commit this, and iterate in other issues on other ways scripting could be exposed as update processors.

        with that in mind, the one pro-active thing i would think we should still do here, is to change the name to something like "StatelessScriptUpdateProcessorFactory" since the scripts are re-evaluated from scratch on every request. that will help us differnetiate it from any alternative future factory implementations where we try to support compiling/cache/reusing scripts.

        thoughts?

        Show
        Hoss Man added a comment - Updated patch, changes... refactor TestBadConfig and BadIndexSchemaTest to use common base class add TestBadConfig tests that bad script processor configs should fail on init add initEngine call to inform so errors are triggered at startup tweak SolrCore init error reporting to make the underlying msg more visible tweak script processor error reporting to have better error msgs test wrapping of errors thrown by the script functions explicitly check that ScriptEngine is Invocable make tests use assume() on existence of javascript engine since JVMs aren't garunteed to support it make processor fail hard if a method is missing from a script revert unneccessary SolrResourceLoader change fixed rmuirs charset suggestions ...the only thing from my previous list i didn't get to today was polishing up the javadocs (still some TODOs in there about scoping) — FWIW: Erik and i had a (VERY) breif exchange on IRC last night about the "NoSuchMethod" problem in which he pointed me to his earlier comment about a simpler syntax for just dealing with "processAdd" as it's own script, which got me reading a lot of the older comments on this issue (i fully admit: i glossed over most of them when i first started looking at this issue a few days ago because when skimming it looked like they were mostly all about Java5 stuff) which got me thinking more about scoping and caching of ScriptEngine objects. In general, i think what we have here is definitely usable (if nothing else then for rapid prototyping and experimentation), and provides a fairly direct representation of an "UpdateProcessor implemented as a script - ie: isolated instances are constructed for each request (with no risk of multiple requests polluting each other) and individual functions/methods are called for each command in that request. But there are probably other approaches we could take, in slightly different directions, that deviate from the existing UpdateProcessor model to get other gains: Construct ScriptEngine instances that are scoped to the processorfactory (and live for hte lifespan of hte solr core), and using eval or invokeFunction for the individual methods passing req,rsp,and cmd as args (requires the user to be more careful about variable usage in their script implementations) let the user specify distinct script files for each method they want to implement and require that the ScriptEngines are "Compilable", so we can generate a CompiledScript during processorfactory initialization, such that the CompiledScript's lifespan is the same sa the solr core, and call CompiledScript.eval(Bindings) with Bindings containing the req/rsp/cmd in each process method – ie: the entire script file is a single function (should be much more efficient, but has a lot of preconditions, and like the above suggestion requires the user to be careful about the variables in their scripts) etc... ...the bottom line being i think we should commit this, and iterate in other issues on other ways scripting could be exposed as update processors. with that in mind, the one pro-active thing i would think we should still do here, is to change the name to something like "StatelessScriptUpdateProcessorFactory" since the scripts are re-evaluated from scratch on every request. that will help us differnetiate it from any alternative future factory implementations where we try to support compiling/cache/reusing scripts. thoughts?
        Hide
        Hoss Man added a comment -

        Updated patch renaming the factory to StatelessScriptUpdateProcessorFactoryTest and polishing the javadocs (mainly around explaining the scope for each script)

        plan to commit tomorrow

        Show
        Hoss Man added a comment - Updated patch renaming the factory to StatelessScriptUpdateProcessorFactoryTest and polishing the javadocs (mainly around explaining the scope for each script) plan to commit tomorrow
        Hide
        Erik Hatcher added a comment -

        Hoss - I've been following along. Well done Hoss (and Robert) on the state of this now. +1. We can certainly iterate on the fine details, but this is a superb start. Thanks again for all the polish and cleanup that has developed. I'm going to be trying this out fully and reporting back. Let's start some wiki documentation on this too.

        Show
        Erik Hatcher added a comment - Hoss - I've been following along. Well done Hoss (and Robert) on the state of this now. +1. We can certainly iterate on the fine details, but this is a superb start. Thanks again for all the polish and cleanup that has developed. I'm going to be trying this out fully and reporting back. Let's start some wiki documentation on this too.
        Hide
        Erik Hatcher added a comment -

        Let's start some wiki documentation on this too.

        For starters, I put a ScriptUpdateProcessor section here, but it'll deserve its own page and then some probably. - http://wiki.apache.org/solr/UpdateRequestProcessor

        Show
        Erik Hatcher added a comment - Let's start some wiki documentation on this too. For starters, I put a ScriptUpdateProcessor section here, but it'll deserve its own page and then some probably. - http://wiki.apache.org/solr/UpdateRequestProcessor
        Hide
        Erik Hatcher added a comment -

        Hoss - in StatelessScriptUpdateProcessorFactory, there's this:

              Reader scriptSrc = scriptFile.openReader(resourceLoader);
        
              try {
                engine.eval(scriptFile.openReader(resourceLoader)
        ...
        

        Are those two consecutive openReader calls redundant or needed?

        Andrzej - does ScriptFile now address your needs with it being ResourceLoader savvy? I'm not sure it does - I think you were asking for scripts to be loaded from the factory configuration directly? Just checking if we're still missing a use case Andrzej had in mind.

        Show
        Erik Hatcher added a comment - Hoss - in StatelessScriptUpdateProcessorFactory, there's this: Reader scriptSrc = scriptFile.openReader(resourceLoader); try { engine.eval(scriptFile.openReader(resourceLoader) ... Are those two consecutive openReader calls redundant or needed? Andrzej - does ScriptFile now address your needs with it being ResourceLoader savvy? I'm not sure it does - I think you were asking for scripts to be loaded from the factory configuration directly? Just checking if we're still missing a use case Andrzej had in mind.
        Hide
        Erik Hatcher added a comment -

        I'm not fond of a script having to implement all the functions, which seems to be the way the latest patch works. I tried a JRuby script that simply has this:

        def processAdd(cmd)
          doc = cmd.solrDoc
        
          doc.addField('foo_s', 'bar')
        
          $logger.info("Added field to #{doc}")
        end
        

        Which fails because "finish" isn't found.

        It's deceiving, because there is a test case to go along with conditional.updateprocessor.js, which just has processAdd, but the test case only calls processAdd, not a full document update from the outside. So at the very least, the .js scripts should all be fully fleshed out to what would work for real. But I really think we should default to no-op on all methods that don't exist when tried to invoke. Is that so bad?

        Show
        Erik Hatcher added a comment - I'm not fond of a script having to implement all the functions, which seems to be the way the latest patch works. I tried a JRuby script that simply has this: def processAdd(cmd) doc = cmd.solrDoc doc.addField('foo_s', 'bar') $logger.info( "Added field to #{doc}" ) end Which fails because "finish" isn't found. It's deceiving, because there is a test case to go along with conditional.updateprocessor.js, which just has processAdd, but the test case only calls processAdd, not a full document update from the outside. So at the very least, the .js scripts should all be fully fleshed out to what would work for real. But I really think we should default to no-op on all methods that don't exist when tried to invoke. Is that so bad?
        Hide
        Hoss Man added a comment -

        Are those two consecutive openReader calls redundant or needed?

        that was a very bad bug, thank you for catching that.

        Andrzej - does ScriptFile now address your needs with it being ResourceLoader savvy? I'm not sure it does - I think you were asking for scripts to be loaded from the factory configuration directly? Just checking if we're still missing a use case Andrzej had in mind.

        We are most definitely missing what ab was asking about – i was alluding to his comments when i mentioned that we should probably iterate on other script based updateprocessors that scope things differently and have longer life cycles. The fact that we do not attempt to cache the bytes of the file is one way this processor is "stateless" because you can actually edit the script files w/o needing a core reload and your changes automaticly get picked up.

        The majority of what is in this factory is really around dealing with parsing init args and dealing with script files. The amount of code needed to interact with teh ScriptEngineManager and ScriptEngines is so tiny it seems like what he's describing could be done just as easily with a completely distinct processor then worrying about extending this one – hence i didn't worry about making ScriptFile or EngineInfo public (but who knows, maybe if/when we iterate on other versions, we'll find that a lot of this can be refactored into a baseclass)

        So at the very least, the .js scripts should all be fully fleshed out to what would work for real.

        good catch .. i thought i already did that.

        But I really think we should default to no-op on all methods that don't exist when tried to invoke. Is that so bad?

        I think it would be horrific. Consider a small typo in your example...

        def processadd(cmd)
          doc = cmd.solrDoc
        
          doc.addField('foo_s', 'bar')
        
          $logger.info("Added field to #{doc}")
        end
        

        ...it will silently do nothing, which is terrible.

        I would rather people have to write a few no-op methods and get good errors then rip their hair out – or have data loss – because they have a subtle typo in a function name that they don't notice

        Updated patch fixes the double reader open and adds the NOOP methods to all example scripts (except for "missing.functions.updateprocessor.js" because that's the entire point of the file)

        Show
        Hoss Man added a comment - Are those two consecutive openReader calls redundant or needed? that was a very bad bug, thank you for catching that. Andrzej - does ScriptFile now address your needs with it being ResourceLoader savvy? I'm not sure it does - I think you were asking for scripts to be loaded from the factory configuration directly? Just checking if we're still missing a use case Andrzej had in mind. We are most definitely missing what ab was asking about – i was alluding to his comments when i mentioned that we should probably iterate on other script based updateprocessors that scope things differently and have longer life cycles. The fact that we do not attempt to cache the bytes of the file is one way this processor is "stateless" because you can actually edit the script files w/o needing a core reload and your changes automaticly get picked up. The majority of what is in this factory is really around dealing with parsing init args and dealing with script files. The amount of code needed to interact with teh ScriptEngineManager and ScriptEngines is so tiny it seems like what he's describing could be done just as easily with a completely distinct processor then worrying about extending this one – hence i didn't worry about making ScriptFile or EngineInfo public (but who knows, maybe if/when we iterate on other versions, we'll find that a lot of this can be refactored into a baseclass) So at the very least, the .js scripts should all be fully fleshed out to what would work for real. good catch .. i thought i already did that. But I really think we should default to no-op on all methods that don't exist when tried to invoke. Is that so bad? I think it would be horrific. Consider a small typo in your example... def processadd(cmd) doc = cmd.solrDoc doc.addField('foo_s', 'bar') $logger.info( "Added field to #{doc}" ) end ...it will silently do nothing, which is terrible. I would rather people have to write a few no-op methods and get good errors then rip their hair out – or have data loss – because they have a subtle typo in a function name that they don't notice — Updated patch fixes the double reader open and adds the NOOP methods to all example scripts (except for "missing.functions.updateprocessor.js" because that's the entire point of the file)
        Hide
        Erik Hatcher added a comment -

        I would rather people have to write a few no-op methods and get good errors then rip their hair out – or have data loss – because they have a subtle typo in a function name that they don't notice

        No argument there. I think this just demands that we implement an add-only type of capability such that the entire script is implicitly inside a processAdd call. The most common use case here is for processAdd, and streamlining this to be easy and less-error-prone (why even have "processAdd" method wrapper at all if we're aiming for less hair pulling?) for the common needs. We can iterate on that after a commit of what's there though.... having a full update processor via script is great too, for sure.

        Show
        Erik Hatcher added a comment - I would rather people have to write a few no-op methods and get good errors then rip their hair out – or have data loss – because they have a subtle typo in a function name that they don't notice No argument there. I think this just demands that we implement an add-only type of capability such that the entire script is implicitly inside a processAdd call. The most common use case here is for processAdd, and streamlining this to be easy and less-error-prone (why even have "processAdd" method wrapper at all if we're aiming for less hair pulling?) for the common needs. We can iterate on that after a commit of what's there though.... having a full update processor via script is great too, for sure.
        Hide
        Erik Hatcher added a comment -

        Do we really need to support multiple scripts inside a definition of a (Stateless)ScriptUpdateProcessorFactory? It just seems added looping when why not just define two different StatelessScriptUpdateProcessorFactory's each with an individual script? (or, combine the logic of the scripts into a single script if these were my scripts)

        Show
        Erik Hatcher added a comment - Do we really need to support multiple scripts inside a definition of a (Stateless)ScriptUpdateProcessorFactory? It just seems added looping when why not just define two different StatelessScriptUpdateProcessorFactory's each with an individual script? (or, combine the logic of the scripts into a single script if these were my scripts)
        Hide
        Lance Norskog added a comment -

        What if I have a main file and a library file? How would that work?

        Show
        Lance Norskog added a comment - What if I have a main file and a library file? How would that work?
        Hide
        David Smiley added a comment -

        +1 to Erik's proposal of streamlining the common case of needing just processAdd() – I shouldn't be required to even define a processAdd() method in my script for this common case. I'd rather not have subclasses for this. Perhaps the configuration in solrconfig for this processor would have a processAddScript="myscript.rb" and a processDeleteScript= ... etc. as an alternative to script="script.rb" – if you use the one-file script then you are required to define each method. And, how about if scriptMethods="processAdd,processCommit" then you just need to define the ones you list.

        Like Erik, I too have been following the awesome progress here as of late – thanks for your hard work Hoss (and Erik before him). It has come far.

        Show
        David Smiley added a comment - +1 to Erik's proposal of streamlining the common case of needing just processAdd() – I shouldn't be required to even define a processAdd() method in my script for this common case. I'd rather not have subclasses for this. Perhaps the configuration in solrconfig for this processor would have a processAddScript="myscript.rb" and a processDeleteScript= ... etc. as an alternative to script="script.rb" – if you use the one-file script then you are required to define each method. And, how about if scriptMethods="processAdd,processCommit" then you just need to define the ones you list. Like Erik, I too have been following the awesome progress here as of late – thanks for your hard work Hoss (and Erik before him). It has come far.
        Hide
        Hoss Man added a comment -

        I think this just demands that we implement an add-only type of capability such that the entire script is implicitly inside a processAdd call.

        ...

        Perhaps the configuration in solrconfig for this processor would have a processAddScript="myscript.rb" and a processDeleteScript= ... etc. as an alternative to script="script.rb"

        That's all fine and good and i have no objection to any of it – but as i tried to explain before those aternative ideas still have a raft of questions related to what the lifecyle of the scripts should be, what the bindings should be for the relevant objects (SolrQueryRequest, AddDocCmd, etc...) and how they should be evaluated (CompiledScript vs script that is evaled on every processAdd, etc...).

        Hence my point that i think we should commit this as "StatelessScriptUpdateProcessorFactory", where the cript processing mirrors the lifecylce of a native java "UpdateProcessor" and iterate with other approaches, using other factories, in other jira issues – if we can refactor common stuff out then great, but we shouldn't try to over think generalizing the internals of this implementation in anticipation of a hypothetical future class that will likely be just as easy to write independently.

        Do we really need to support multiple scripts inside a definition of a (Stateless)ScriptUpdateProcessorFactory? It just seems added looping when why not just define two different StatelessScriptUpdateProcessorFactory's each with an individual script? (or, combine the logic of the scripts into a single script if these were my scripts)

        good question. That looping code was already there when i started looking at the patch – i left it in mainly because:

        • it was already written and didn't overly complicate things
        • it seemed like it would probably be easier/simpler for a lot of users to just add a <str name="script">foo.js</str> when they wanted to add a script then to add an entire new <processor>...</processor>
        • we use a single ScriptEngineManager per request, per UpdatePocessor instance. In theory it will be more efficient for some languages the to generate ScriptEngines for each script from the same ScriptEngineManager then from distinct ScriptEngineManagers (ie: imagine if your scripting langauge was Java: configuring two scripts in a single <processor> means you spin up one JVM per request; if you put each script in it's own <processor> you spin up 2 JVMs per request)
        • according to the javax.script javadocs, because we use a single ScriptEngineManager per request then in theory any variable in "global" scope will be common across all the script files (for that request). (In my JVM, this doesn't work for multiple javascript scripts that try to refer to the same global vars, no idea if other javac.script implementations support it)

        What if I have a main file and a library file? How would that work?

        No freaking clue. ..

        • The javax.script APIs provide no mechanism for Java code to specify that modules should be loaded before evaluating a script, or any way to configure where the engine should look for modules if a script attempts to load them using it's own native syntax
        • javascript doesn't even have a language mechanism (that i know of) for a script file to specify that it wants to "import" another file/script so i don't even know of a decent way to test what happens if you try in a lanaguge that does ... (ie: will it try to use the ScriptEngineManager's classloader? will it try to read from the system default path for that language's libs?)
        Show
        Hoss Man added a comment - I think this just demands that we implement an add-only type of capability such that the entire script is implicitly inside a processAdd call. ... Perhaps the configuration in solrconfig for this processor would have a processAddScript="myscript.rb" and a processDeleteScript= ... etc. as an alternative to script="script.rb" That's all fine and good and i have no objection to any of it – but as i tried to explain before those aternative ideas still have a raft of questions related to what the lifecyle of the scripts should be, what the bindings should be for the relevant objects (SolrQueryRequest, AddDocCmd, etc...) and how they should be evaluated (CompiledScript vs script that is evaled on every processAdd, etc...). Hence my point that i think we should commit this as "StatelessScriptUpdateProcessorFactory", where the cript processing mirrors the lifecylce of a native java "UpdateProcessor" and iterate with other approaches, using other factories, in other jira issues – if we can refactor common stuff out then great, but we shouldn't try to over think generalizing the internals of this implementation in anticipation of a hypothetical future class that will likely be just as easy to write independently. Do we really need to support multiple scripts inside a definition of a (Stateless)ScriptUpdateProcessorFactory? It just seems added looping when why not just define two different StatelessScriptUpdateProcessorFactory's each with an individual script? (or, combine the logic of the scripts into a single script if these were my scripts) good question. That looping code was already there when i started looking at the patch – i left it in mainly because: it was already written and didn't overly complicate things it seemed like it would probably be easier/simpler for a lot of users to just add a <str name="script">foo.js</str> when they wanted to add a script then to add an entire new <processor>...</processor> we use a single ScriptEngineManager per request, per UpdatePocessor instance. In theory it will be more efficient for some languages the to generate ScriptEngines for each script from the same ScriptEngineManager then from distinct ScriptEngineManagers (ie: imagine if your scripting langauge was Java: configuring two scripts in a single <processor> means you spin up one JVM per request; if you put each script in it's own <processor> you spin up 2 JVMs per request) according to the javax.script javadocs, because we use a single ScriptEngineManager per request then in theory any variable in "global" scope will be common across all the script files (for that request). (In my JVM, this doesn't work for multiple javascript scripts that try to refer to the same global vars, no idea if other javac.script implementations support it) What if I have a main file and a library file? How would that work? No freaking clue. .. The javax.script APIs provide no mechanism for Java code to specify that modules should be loaded before evaluating a script, or any way to configure where the engine should look for modules if a script attempts to load them using it's own native syntax javascript doesn't even have a language mechanism (that i know of) for a script file to specify that it wants to "import" another file/script so i don't even know of a decent way to test what happens if you try in a lanaguge that does ... (ie: will it try to use the ScriptEngineManager's classloader? will it try to read from the system default path for that language's libs?)
        Hide
        Hoss Man added a comment -

        patch updated to trunk.

        i plan to cmmit & backport to 4x in the next 24 hours.

        i'll also create some other (related) jiras for tracking ideas discussed here that didn't make it into this "StatelessScriptUpdateProcessorFactory" but i don't promise i'll catch them all – folks should feel free to open & links issues on their own with the things they ar most concerned about.

        Show
        Hoss Man added a comment - patch updated to trunk. i plan to cmmit & backport to 4x in the next 24 hours. i'll also create some other (related) jiras for tracking ideas discussed here that didn't make it into this "StatelessScriptUpdateProcessorFactory" but i don't promise i'll catch them all – folks should feel free to open & links issues on their own with the things they ar most concerned about.
        Hide
        Erik Hatcher added a comment -

        i plan to cmmit & backport to 4x in the next 24 hours.

        Hoss - you go! Thank you for wrangling this one and polishing out the pedantic details needed to get it to this state. Way +1.

        Show
        Erik Hatcher added a comment - i plan to cmmit & backport to 4x in the next 24 hours. Hoss - you go! Thank you for wrangling this one and polishing out the pedantic details needed to get it to this state. Way +1.
        Hide
        Hoss Man added a comment -

        Committed revision 1360931. trunk
        Committed revision 1360952. 4x

        Show
        Hoss Man added a comment - Committed revision 1360931. trunk Committed revision 1360952. 4x
        Hide
        Mikhail Khludnev added a comment -

        Hoss,

        Congrats! Btw, have you seen one more cool update processor factory SOLR-3585 ?

        Show
        Mikhail Khludnev added a comment - Hoss, Congrats! Btw, have you seen one more cool update processor factory SOLR-3585 ?
        Hide
        Erik Hatcher added a comment -

        Committed an example script/configuration to trunk (r1366588) and 4_x (r1366589).

        Show
        Erik Hatcher added a comment - Committed an example script/configuration to trunk (r1366588) and 4_x (r1366589).
        Hide
        Steve Rowe added a comment -

        The tests committed here 3 weeks ago have never succeeded under the Jenkins trunk and branch_4x maven builds. (For some reason failure notification emails aren't making it to the dev list.) E.g. https://builds.apache.org/job/Lucene-Solr-Maven-trunk/554/. Javascript engine appears to not be found. I don't understand why this would be the case, though, since the Ant tests succeed running under the same JVM.

        Show
        Steve Rowe added a comment - The tests committed here 3 weeks ago have never succeeded under the Jenkins trunk and branch_4x maven builds. (For some reason failure notification emails aren't making it to the dev list.) E.g. https://builds.apache.org/job/Lucene-Solr-Maven-trunk/554/ . Javascript engine appears to not be found. I don't understand why this would be the case, though, since the Ant tests succeed running under the same JVM.
        Hide
        Erik Hatcher added a comment -

        The tests committed here 3 weeks ago have never succeeded under the Jenkins trunk and branch_4x maven builds.

        Odd. Maybe something mentioned here http://stackoverflow.com/questions/6558055/is-osgi-fundamentally-incompatible-with-jsr-223-scripting-language-discovery is relevant? I haven't found anything else seemingly relevant yet on this issue.

        Show
        Erik Hatcher added a comment - The tests committed here 3 weeks ago have never succeeded under the Jenkins trunk and branch_4x maven builds. Odd. Maybe something mentioned here http://stackoverflow.com/questions/6558055/is-osgi-fundamentally-incompatible-with-jsr-223-scripting-language-discovery is relevant? I haven't found anything else seemingly relevant yet on this issue.
        Hide
        Steve Rowe added a comment -

        Maybe something mentioned here http://stackoverflow.com/questions/6558055/is-osgi-fundamentally-incompatible-with-jsr-223-scripting-language-discovery is relevant?

        Thanks for looking Erik, but I'm not sure if it's relevant.

        I get an NPE on lucene.zones.apache.org using this program with the OpenJDK VM at /usr/local/openjdk6/ (the default javac/java on that box). Jenkins Java 6 jobs running Ant use this same VM (via /home/hudson/tools/java/latest1.6 -> openjdk6 -> /usr/local/openjdk6.

        I don't get it. How do these tests pass under Ant? I can't see any obviously named jars under solr/lib/ that would provide the javascript engine...

        Show
        Steve Rowe added a comment - Maybe something mentioned here http://stackoverflow.com/questions/6558055/is-osgi-fundamentally-incompatible-with-jsr-223-scripting-language-discovery is relevant? Thanks for looking Erik, but I'm not sure if it's relevant. I get an NPE on lucene.zones.apache.org using this program with the OpenJDK VM at /usr/local/openjdk6/ (the default javac/java on that box). Jenkins Java 6 jobs running Ant use this same VM (via /home/hudson/tools/java/latest1.6 -> openjdk6 -> /usr/local/openjdk6 . I don't get it. How do these tests pass under Ant? I can't see any obviously named jars under solr/lib/ that would provide the javascript engine...
        Hide
        Erik Hatcher added a comment -

        How do these tests pass under Ant?

        Maybe this is due to some libraries Ant itself is including in the classpath of the tests running?

        I'll go ahead and re-open this issue so it is red-flagged as something we should resolve before 4.0 final release.

        Perhaps we can include a scripting implementation in Solr, at least for testing purposes but maybe also to ship with to ensure this works out of the box on all JVMs. jruby.jar would be nice to have handy always

        Show
        Erik Hatcher added a comment - How do these tests pass under Ant? Maybe this is due to some libraries Ant itself is including in the classpath of the tests running? I'll go ahead and re-open this issue so it is red-flagged as something we should resolve before 4.0 final release. Perhaps we can include a scripting implementation in Solr, at least for testing purposes but maybe also to ship with to ensure this works out of the box on all JVMs. jruby.jar would be nice to have handy always
        Hide
        Erik Hatcher added a comment -

        re-opening to have to have the tests (specifically the failing Maven run) looked at.

        Show
        Erik Hatcher added a comment - re-opening to have to have the tests (specifically the failing Maven run) looked at.
        Hide
        Erik Hatcher added a comment -

        marking the re-opening as "critical" to fix, hopefully at least before 4.0 final.

        Show
        Erik Hatcher added a comment - marking the re-opening as "critical" to fix, hopefully at least before 4.0 final.
        Hide
        Uwe Schindler added a comment -

        Hi, this is not a problem at all. OpenJDK on FreeBSD contains no scripting engine. So it was added in ants lib path. This is why it works on ant in FreeBSD Jenkins. Rhino is the javascript engine, missing in openjdks for legal reasons. Rhino is shipped with official jdks and is mandatory, so thats a stupid freebsd issue. Steven should add it to maven builds, too.

        You can resolve issue.

        Show
        Uwe Schindler added a comment - Hi, this is not a problem at all. OpenJDK on FreeBSD contains no scripting engine. So it was added in ants lib path. This is why it works on ant in FreeBSD Jenkins. Rhino is the javascript engine, missing in openjdks for legal reasons. Rhino is shipped with official jdks and is mandatory, so thats a stupid freebsd issue. Steven should add it to maven builds, too. You can resolve issue.
        Hide
        Steve Rowe added a comment -

        Thanks Uwe, I'll add rhino to maven builds.

        Show
        Steve Rowe added a comment - Thanks Uwe, I'll add rhino to maven builds.
        Hide
        Steve Rowe added a comment -

        OpenJDK on FreeBSD contains no scripting engine. So it was added in ants lib path.

        How? I've found the necessary jars, at /usr/home/hudson/tools/java/openjdk-missing-libs/, but I can't see how Ant's lib path includes them. I looked at ~hudson/.profile, and lib/ and bin/ant under /usr/home/hudson/tools/ant/apache-ant-1.8.2 - none of these refer to the directory containing js.jar and script-js.jar.

        I'm asking because I'd like to set Maven up similarly to Ant.

        Show
        Steve Rowe added a comment - OpenJDK on FreeBSD contains no scripting engine. So it was added in ants lib path. How? I've found the necessary jars, at /usr/home/hudson/tools/java/openjdk-missing-libs/ , but I can't see how Ant's lib path includes them. I looked at ~hudson/.profile , and lib/ and bin/ant under /usr/home/hudson/tools/ant/apache-ant-1.8.2 - none of these refer to the directory containing js.jar and script-js.jar . I'm asking because I'd like to set Maven up similarly to Ant.
        Hide
        Robert Muir added a comment -

        I think they are added to ~hudson/.ant/lib ?

        Show
        Robert Muir added a comment - I think they are added to ~hudson/.ant/lib ?
        Hide
        Robert Muir added a comment -
        [rcmuir@lucene /home/hudson/.ant/lib]$ ls -la
        total 1843
        drwxr-xr-x  2 hudson  hudson       5 Mar 30 15:46 .
        drwxr-xr-x  3 hudson  hudson       8 May 13 12:41 ..
        -rw-r--r--  1 hudson  hudson  947592 Mar 30 15:45 ivy-2.2.0.jar
        -rw-r--r--  1 hudson  hudson  701049 Jul 27  2006 js.jar
        -rw-r--r--  1 hudson  hudson   34607 Oct 16  2006 script-js.jar
        
        Show
        Robert Muir added a comment - [rcmuir@lucene /home/hudson/.ant/lib]$ ls -la total 1843 drwxr-xr-x 2 hudson hudson 5 Mar 30 15:46 . drwxr-xr-x 3 hudson hudson 8 May 13 12:41 .. -rw-r--r-- 1 hudson hudson 947592 Mar 30 15:45 ivy-2.2.0.jar -rw-r--r-- 1 hudson hudson 701049 Jul 27 2006 js.jar -rw-r--r-- 1 hudson hudson 34607 Oct 16 2006 script-js.jar
        Hide
        Steve Rowe added a comment -

        Thanks Robert, I see them now.

        Show
        Steve Rowe added a comment - Thanks Robert, I see them now.
        Hide
        Hoss Man added a comment -

        I (think i) fixed the assumptions in these tests to actually skip properly if the engines aren't available...

        Committed revision 1369874. - trunk
        Committed revision 1369875. - 4x

        Show
        Hoss Man added a comment - I (think i) fixed the assumptions in these tests to actually skip properly if the engines aren't available... Committed revision 1369874. - trunk Committed revision 1369875. - 4x
        Hide
        Steve Rowe added a comment -

        After Hoss's commits, both ASF Jenkins Maven jobs have run, and under both jobs, tests that previously were failing under Maven due to the lack of a javascript engine in the classpath are now being skipped.

        After those jobs started, I committed a change to dev/nightly/common-maven.sh that includes the two rhino jars in the Maven JVM boot class path: r1369936.

        I've enqueued the Maven jobs again on ASF Jenkins.

        Show
        Steve Rowe added a comment - After Hoss's commits, both ASF Jenkins Maven jobs have run, and under both jobs, tests that previously were failing under Maven due to the lack of a javascript engine in the classpath are now being skipped. After those jobs started, I committed a change to dev/nightly/common-maven.sh that includes the two rhino jars in the Maven JVM boot class path: r1369936. I've enqueued the Maven jobs again on ASF Jenkins.
        Hide
        Uwe Schindler added a comment -

        I think, this is all fine:

        • Java 6 spec requires a JavaScript engine to be shipped with JDK, it is just missing at FreeBSD's package (there is an issue open upstream). If JavaScript is not there for Java 6 it is like missing UTF8 charset
        • I strongly -1 shipping with additional scripting engines. No need for that. If user Foo wants to script Solr with engine Bar, he can add the SPI Jar to classpath. No need to ship. This is why SPI was invented!

        We should maybe only fix Solr's classloader to be set as context classloader, too. SPIs cannot be loaded from $SOLR_HOME/lib, because context classloader does not see the jars. We fixed that for codecs and analyzer SPI JARs in Solr, but the most correct solution would be to enable Solr's threads to see the ResourceLoader as context classloader. Then you can add scripting engines, XML parsers, charset providers, locales,... just like plugins or codecs or analyzerfactories into the Solr home's lib folder without adding to WAR.

        Show
        Uwe Schindler added a comment - I think, this is all fine: Java 6 spec requires a JavaScript engine to be shipped with JDK, it is just missing at FreeBSD's package (there is an issue open upstream). If JavaScript is not there for Java 6 it is like missing UTF8 charset I strongly -1 shipping with additional scripting engines. No need for that. If user Foo wants to script Solr with engine Bar, he can add the SPI Jar to classpath. No need to ship. This is why SPI was invented! We should maybe only fix Solr's classloader to be set as context classloader, too. SPIs cannot be loaded from $SOLR_HOME/lib, because context classloader does not see the jars. We fixed that for codecs and analyzer SPI JARs in Solr, but the most correct solution would be to enable Solr's threads to see the ResourceLoader as context classloader. Then you can add scripting engines, XML parsers, charset providers, locales,... just like plugins or codecs or analyzerfactories into the Solr home's lib folder without adding to WAR.
        Hide
        Hoss Man added a comment -

        Java 6 spec requires a JavaScript engine to be shipped with JDK

        i didn't know that ... i couldn't find anything in the docs that suggested certain engines were mandatory, hence the assuptions i nthe test (the maven tests just indicated that those assumptions ere broken)

        I strongly -1 shipping with additional scripting engines

        i didn't see anyone suggesting that ... no argument there.

        We should maybe only fix Solr's classloader to be set as context classloader, too.

        that sounds like an ortoginal issue ... great idea, didn't know it was possible, please go ahead and do it, but let's track it in it's own issue

        Show
        Hoss Man added a comment - Java 6 spec requires a JavaScript engine to be shipped with JDK i didn't know that ... i couldn't find anything in the docs that suggested certain engines were mandatory, hence the assuptions i nthe test (the maven tests just indicated that those assumptions ere broken) I strongly -1 shipping with additional scripting engines i didn't see anyone suggesting that ... no argument there. We should maybe only fix Solr's classloader to be set as context classloader, too. that sounds like an ortoginal issue ... great idea, didn't know it was possible, please go ahead and do it, but let's track it in it's own issue
        Hide
        Hoss Man added a comment -

        I strongly -1 shipping with additional scripting engines

        i didn't see anyone suggesting that ... no argument there.

        sorry .. i overlooked that part of erik's comment .. i'm with Uwe: let's let users add their own script engines as plugins

        Show
        Hoss Man added a comment - I strongly -1 shipping with additional scripting engines i didn't see anyone suggesting that ... no argument there. sorry .. i overlooked that part of erik's comment .. i'm with Uwe: let's let users add their own script engines as plugins
        Hide
        Uwe Schindler added a comment - - edited

        Hoss, you are right, it is not required that JS is available, the Java 6 specs says http://download.oracle.com/otndocs/jcp/j2se-1.6.0-pr-spec-oth-JSpec/:

        JSR 223: Scripting for the Java Platform
        A large percentage of Java developers also use scripting languages. While the Java language is suitable for many tasks, and especially for writing robust, long-lived applications, scripting languages are useful for many other tasks.

        JSR 223 defines a framework for connecting interpreters of arbitrary scripting languages to Java programs. It includes facilities for locating the available scripting engines, invoking scripts from Java code and vice versa, and making Java application objects visible to scripts. The framework is divided into two parts, the Scripting API and an optional Web Scripting Framework. This feature will incorporate just the Scripting API into this version of the Java SE platform.

        There will be no requirement that any particular scripting language be supported by the platform; implementors may choose to include support for the scripting language(s) of their choice as they see fit.

        [ JSR 223; javax.script ]

        But all JDKs on all platforms except FreeBSD contain them. So we should have the "error messages" printed on failure to lookup engine and the assumption in test as you committed.

        But as Hoss says, too: No need to ship engines. Its just bloat because there are millions of them

        Show
        Uwe Schindler added a comment - - edited Hoss, you are right, it is not required that JS is available, the Java 6 specs says http://download.oracle.com/otndocs/jcp/j2se-1.6.0-pr-spec-oth-JSpec/ : JSR 223: Scripting for the Java Platform A large percentage of Java developers also use scripting languages. While the Java language is suitable for many tasks, and especially for writing robust, long-lived applications, scripting languages are useful for many other tasks. JSR 223 defines a framework for connecting interpreters of arbitrary scripting languages to Java programs. It includes facilities for locating the available scripting engines, invoking scripts from Java code and vice versa, and making Java application objects visible to scripts. The framework is divided into two parts, the Scripting API and an optional Web Scripting Framework. This feature will incorporate just the Scripting API into this version of the Java SE platform. There will be no requirement that any particular scripting language be supported by the platform; implementors may choose to include support for the scripting language(s) of their choice as they see fit. [ JSR 223; javax.script ] But all JDKs on all platforms except FreeBSD contain them. So we should have the "error messages" printed on failure to lookup engine and the assumption in test as you committed. But as Hoss says, too: No need to ship engines. Its just bloat because there are millions of them
        Hide
        Uwe Schindler added a comment -

        I opened SOLR-3716.

        Show
        Uwe Schindler added a comment - I opened SOLR-3716 .
        Hide
        Steve Rowe added a comment - - edited

        On IRC, Uwe suggested adding the Rhino jars to $JAVA_HOME/jre/lib/ext/ on the FreeBSD ASF Jenkins lucene slave (lucene.zones.apache.org) to allow both Ant and Maven build jobs to enable scripting tests. I copied js.jar and script-js.jar from /usr/home/hudson/tools/java/openjdk-missing-libs/ to /usr/local/openjdk{6,7}/jre/lib/ext/, and the ScriptEngineTest tests under the Maven branch_4x job have succeeded, except for testJRuby(), which was skipped (as expected).

        I also removed js.jar and script-js.jar from ~hudson/.ant/lib/.

        Show
        Steve Rowe added a comment - - edited On IRC, Uwe suggested adding the Rhino jars to $JAVA_HOME/jre/lib/ext/ on the FreeBSD ASF Jenkins lucene slave ( lucene.zones.apache.org ) to allow both Ant and Maven build jobs to enable scripting tests. I copied js.jar and script-js.jar from /usr/home/hudson/tools/java/openjdk-missing-libs/ to /usr/local/openjdk{6,7}/jre/lib/ext/ , and the ScriptEngineTest tests under the Maven branch_4x job have succeeded, except for testJRuby() , which was skipped (as expected). I also removed js.jar and script-js.jar from ~hudson/.ant/lib/ .
        Hide
        Robert Muir added a comment -

        rmuir20120906-bulk-40-change

        Show
        Robert Muir added a comment - rmuir20120906-bulk-40-change
        Hide
        Hoss Man added a comment -

        This feature (and the fixes to the tests that caused it to be re-opened) made it into 4.0-beta

        Show
        Hoss Man added a comment - This feature (and the fixes to the tests that caused it to be re-opened) made it into 4.0-beta

          People

          • Assignee:
            Hoss Man
            Reporter:
            Uri Boness
          • Votes:
            10 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development