Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8542

Integrate Learning to Rank into Solr

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.4, master (7.0)
    • Component/s: None
    • Labels:
      None

      Description

      This is a ticket to integrate learning to rank machine learning models into Solr. Solr Learning to Rank (LTR) provides a way for you to extract features directly inside Solr for use in training a machine learned model. You can then deploy that model to Solr and use it to rerank your top X search results. This concept was previously presented by the authors at Lucene/Solr Revolution 2015.


      Solr Reference Guide documentation:

      Source code and README files:

      1. SOLR-8542-trunk.patch
        414 kB
        Michael Nilsson
      2. SOLR-8542-branch_5x.patch
        414 kB
        Michael Nilsson
      3. SOLR-8542.patch
        570 kB
        Christine Poerschke

        Issue Links

          Activity

          Hide
          mnilsson Michael Nilsson added a comment -

          Attached the patch against trunk which contains our LTR code as a contrib module, plus a readme.md going over how to use it.

          Show
          mnilsson Michael Nilsson added a comment - Attached the patch against trunk which contains our LTR code as a contrib module, plus a readme.md going over how to use it.
          Hide
          mnilsson Michael Nilsson added a comment -

          Attached a patch for the ltr contrib module against branch_5x as well

          Show
          mnilsson Michael Nilsson added a comment - Attached a patch for the ltr contrib module against branch_5x as well
          Hide
          cpoerschke Christine Poerschke added a comment -

          Hello Joshua, Michael and Diego. Thanks for your patch for this new feature.

          Just to say that i've started taking a look at yesterday's SOLR-8542-trunk.patch and have three simple observations so far, in no particular order:

          • Many of the lines in solr/contrib/ltr/README.txt are very long. Having said that, I do not know what the recommended maximum line length for README files is and am perhaps just using the wrong browser or editor to read.
          • Binary diff for solr/contrib/ltr/test-lib/jcl-over-slf4j-1.7.7.jar seems to form part of the patch, unintentionally so probably.
          • Running ant validate after applying to patch locally points out 'tabs instead of spaces' and 'invalid logging pattern' for some of the files.

          (The https://en.wikipedia.org/wiki/Learning_to_rank page mentioned in the README.txt for reading up on learning to rank will be my commute reading.)

          Show
          cpoerschke Christine Poerschke added a comment - Hello Joshua, Michael and Diego. Thanks for your patch for this new feature. Just to say that i've started taking a look at yesterday's SOLR-8542 -trunk.patch and have three simple observations so far, in no particular order: Many of the lines in solr/contrib/ltr/README.txt are very long. Having said that, I do not know what the recommended maximum line length for README files is and am perhaps just using the wrong browser or editor to read. Binary diff for solr/contrib/ltr/test-lib/jcl-over-slf4j-1.7.7.jar seems to form part of the patch, unintentionally so probably. Running ant validate after applying to patch locally points out 'tabs instead of spaces' and 'invalid logging pattern' for some of the files. (The https://en.wikipedia.org/wiki/Learning_to_rank page mentioned in the README.txt for reading up on learning to rank will be my commute reading.)
          Hide
          elyograg Shawn Heisey added a comment -

          I personally use 80 columns for files like README.txt, but from other people's additions to CHANGES.txt, I know that others are using more. I am frequently viewing text files like this in ssh or on terminals, so I find lines longer than 80 characters to be annoying. For source code, I edit in an IDE more often than with vi, so longer lines are not really a problem there.

          Show
          elyograg Shawn Heisey added a comment - I personally use 80 columns for files like README.txt, but from other people's additions to CHANGES.txt, I know that others are using more. I am frequently viewing text files like this in ssh or on terminals, so I find lines longer than 80 characters to be annoying. For source code, I edit in an IDE more often than with vi, so longer lines are not really a problem there.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user diegoceccarelli opened a pull request:

          https://github.com/apache/lucene-solr/pull/217

          SOLR-8542: Integrate Learning to Rank into Solr

          See https://issues.apache.org/jira/i#browse/SOLR-8542

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/bloomberg/lucene-solr trunk-learning-to-rank-plugin

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucene-solr/pull/217.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #217


          commit 336db4ccf6434e690a745a4af88b5d9c21edc25e
          Author: Diego Ceccarelli <dceccarelli4@bloomberg.net>
          Date: 2016-01-13T22:29:17Z

          SOLR-8542: Integrate Learning to Rank into Solr

          Solr Learning to Rank (LTR) provides a way for you to extract features
          directly inside Solr for use in training a machine learned model. You
          can then deploy that model to Solr and use it to rerank your top X
          search results. This concept was previously presented by the authors at
          Lucene/Solr Revolution 2015


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user diegoceccarelli opened a pull request: https://github.com/apache/lucene-solr/pull/217 SOLR-8542 : Integrate Learning to Rank into Solr See https://issues.apache.org/jira/i#browse/SOLR-8542 You can merge this pull request into a Git repository by running: $ git pull https://github.com/bloomberg/lucene-solr trunk-learning-to-rank-plugin Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/217.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #217 commit 336db4ccf6434e690a745a4af88b5d9c21edc25e Author: Diego Ceccarelli <dceccarelli4@bloomberg.net> Date: 2016-01-13T22:29:17Z SOLR-8542 : Integrate Learning to Rank into Solr Solr Learning to Rank (LTR) provides a way for you to extract features directly inside Solr for use in training a machine learned model. You can then deploy that model to Solr and use it to rerank your top X search results. This concept was previously presented by the authors at Lucene/Solr Revolution 2015
          Hide
          diegoceccarelli Diego Ceccarelli added a comment - - edited

          Thanks Christine and Shawn for your comments,
          The above patch for the current trunk fix the problems that you highlighted:

          • now the README fits in 80 columns
          • ant validate works.
          • solr/contrib/ltr/test-lib/jcl-over-slf4j-1.7.7.jar is not part of the patch

          The patch also contains some example files and an explanation (reported in the JIRA description) on
          how to test the plugin on the techproducts example of Solr.

          Show
          diegoceccarelli Diego Ceccarelli added a comment - - edited Thanks Christine and Shawn for your comments, The above patch for the current trunk fix the problems that you highlighted: now the README fits in 80 columns ant validate works. solr/contrib/ltr/test-lib/jcl-over-slf4j-1.7.7.jar is not part of the patch The patch also contains some example files and an explanation (reported in the JIRA description) on how to test the plugin on the techproducts example of Solr.
          Hide
          ichattopadhyaya Ishan Chattopadhyaya added a comment -

          Exciting stuff!
          Even though I haven't yet tried out the patch here, I was wondering how easy would it be to plugin some of the RankLib stuff in future? There's SOLR-8183 for this. I was wondering if the framework developed here is generic enough to have some of those algorithms (and others) to be plugged in. Personally, I'm interested in the GBDT algorithm (since I've used that in a previous project) and MART seems close to that.

          Show
          ichattopadhyaya Ishan Chattopadhyaya added a comment - Exciting stuff! Even though I haven't yet tried out the patch here, I was wondering how easy would it be to plugin some of the RankLib stuff in future? There's SOLR-8183 for this. I was wondering if the framework developed here is generic enough to have some of those algorithms (and others) to be plugged in. Personally, I'm interested in the GBDT algorithm (since I've used that in a previous project) and MART seems close to that.
          Hide
          diegoceccarelli Diego Ceccarelli added a comment - - edited

          Hi Ishan, thanks for pointing out SOLR-8183, I didn't know about that, it seems quite related.
          We can plug RankLib creating a new class representing the new LTR model, extending ModelMetadata, for example:

          public class RankLibModel extends ModelMetadata {
          	
          	Ranker rankLibRanker;
          	RankerFactory rankerFactory = new RankerFactory();
          	DenseDataPoint documentFeatures = new DenseDataPoint(); // this contructor is missing, we will need a way to create a datapoint
          	
          	public RankLibModel(String name, String type, List<Feature> features,
          	      String featureStoreName, Collection<Feature> allFeatures,
          	      NamedParams params) {
          		  super(name, type, features, featureStoreName, allFeatures, params);
          		  // the  file containing the model is  a parameter
          		  String ranklibModelFile = getParams().getParam("model-file")
          		  // load the model
          		  rankLibRanking = rankerFactory.loadModel(ranklibModelFile);
          	}
          	
          	@Override
          	public float score(float[] modelFeatureValuesNormalized) {
          		// set the feature vector in the datapoint object
          	        documentFeatures.setFeatureVector(modelFeatureValuesNormalized)
          		// predict the score using the ranklib model
          		return rankLibRanker.eval(documentFeatures);
          	}
          		  
          }
          

          This code will load a particular RankLib model, using the file specified into the model store configuration.
          If you send to Solr a model configuration file like this:

          {
              "type":"org.apache.solr.ltr.ranking.RankLibModel",
              "name":"ranklib-GBDT",
              "features":[
              {"name":"isInStock"},
              {"name":"price"},
              {"name":"originalScore"},
              {"name":"productNameMatchQuery"}
              ],
              "params":{
          		"model-file":"/data/ranking/ranking-GBDT.txt"        
              }
          }
          

          The plugin will create a RankLib model by using the model in /data/ranking/ranking-GBDT.txt and you'll be able
          to use it at ranking time using its name ranklib-GBDT, adding the ltr param to the query:

          http://localhost:8983/solr/techproducts/query?indent=on&q=test&wt=json&rq={!ltr model=ranklib-GBDT reRankDocs=25} 
          

          At query time, the features isInStock , price , originalScore , and productNameMatchQuery will be computed and provided in the score(float[] modelFeatureValuesNormalized) method in order to get the new predicted score
          for each document. If RankLib's licence is compatible I think we could plug this into the plugin. Any comments?

          Show
          diegoceccarelli Diego Ceccarelli added a comment - - edited Hi Ishan, thanks for pointing out SOLR-8183 , I didn't know about that, it seems quite related. We can plug RankLib creating a new class representing the new LTR model, extending ModelMetadata , for example: public class RankLibModel extends ModelMetadata { Ranker rankLibRanker; RankerFactory rankerFactory = new RankerFactory(); DenseDataPoint documentFeatures = new DenseDataPoint(); // this contructor is missing, we will need a way to create a datapoint public RankLibModel( String name, String type, List<Feature> features, String featureStoreName, Collection<Feature> allFeatures, NamedParams params) { super (name, type, features, featureStoreName, allFeatures, params); // the file containing the model is a parameter String ranklibModelFile = getParams().getParam( "model-file" ) // load the model rankLibRanking = rankerFactory.loadModel(ranklibModelFile); } @Override public float score( float [] modelFeatureValuesNormalized) { // set the feature vector in the datapoint object documentFeatures.setFeatureVector(modelFeatureValuesNormalized) // predict the score using the ranklib model return rankLibRanker.eval(documentFeatures); } } This code will load a particular RankLib model, using the file specified into the model store configuration. If you send to Solr a model configuration file like this: { "type" : "org.apache.solr.ltr.ranking.RankLibModel" , "name" : "ranklib-GBDT" , "features" :[ { "name" : "isInStock" }, { "name" : "price" }, { "name" : "originalScore" }, { "name" : "productNameMatchQuery" } ], "params" :{ "model-file" : "/data/ranking/ranking-GBDT.txt" } } The plugin will create a RankLib model by using the model in /data/ranking/ranking-GBDT.txt and you'll be able to use it at ranking time using its name ranklib-GBDT , adding the ltr param to the query: http: //localhost:8983/solr/techproducts/query?indent=on&q=test&wt=json&rq={!ltr model=ranklib-GBDT reRankDocs=25} At query time, the features isInStock , price , originalScore , and productNameMatchQuery will be computed and provided in the score(float[] modelFeatureValuesNormalized) method in order to get the new predicted score for each document. If RankLib's licence is compatible I think we could plug this into the plugin. Any comments?
          Hide
          ajinkyakale Ajinkya Kale added a comment -

          +1 to RankLib inside this plugin. Will save re-implementations of LTR algorithms.

          Show
          ajinkyakale Ajinkya Kale added a comment - +1 to RankLib inside this plugin. Will save re-implementations of LTR algorithms.
          Hide
          upayavira Upayavira added a comment -

          Why mstore and fstore on the schema api? Can't we have schema/feature-store and schema/model-store? They are way more self-explanatory and make the LTR stuff that little bit more accessible.

          Show
          upayavira Upayavira added a comment - Why mstore and fstore on the schema api? Can't we have schema/feature-store and schema/model-store? They are way more self-explanatory and make the LTR stuff that little bit more accessible.
          Hide
          cpoerschke Christine Poerschke added a comment -

          Hi Diego - thanks for patch update:

          • I started looking at the code in https://github.com/apache/lucene-solr/pull/217.patch today but am still a bit undecided on how to best share comments, e.g. reviews.apache.org vs. github pull request comments vs. in this JIRA log vs. some other way? Code comments in this JIRA log would probably distract from what the new feature is about. reviews.apache.org seems to have a nicer diff than github but does it require extra step(s) after updating the github pull request (I have not used reviews.apache.org so far).
          • ticket cross-reference: LUCENE-6971 removed StorableField and StoredDocument yesterday/today (217.patch from the day-before-yesterday used them in a few places)
          Show
          cpoerschke Christine Poerschke added a comment - Hi Diego - thanks for patch update: I started looking at the code in https://github.com/apache/lucene-solr/pull/217.patch today but am still a bit undecided on how to best share comments, e.g. reviews.apache.org vs. github pull request comments vs. in this JIRA log vs. some other way? Code comments in this JIRA log would probably distract from what the new feature is about. reviews.apache.org seems to have a nicer diff than github but does it require extra step(s) after updating the github pull request (I have not used reviews.apache.org so far). ticket cross-reference: LUCENE-6971 removed StorableField and StoredDocument yesterday/today (217.patch from the day-before-yesterday used them in a few places)
          Hide
          teofili Tommaso Teofili added a comment -

          I do not seem to be able to browse the PR at https://github.com/apache/lucene-solr/pull/217.patch is the attached patch supposed to be the one to review instead ?

          Show
          teofili Tommaso Teofili added a comment - I do not seem to be able to browse the PR at https://github.com/apache/lucene-solr/pull/217.patch is the attached patch supposed to be the one to review instead ?
          Hide
          diegoceccarelli Diego Ceccarelli added a comment -

          Hi Tommaso, It was removed during the transition from svn to git. We'll reopen the PR today.

          Show
          diegoceccarelli Diego Ceccarelli added a comment - Hi Tommaso, It was removed during the transition from svn to git. We'll reopen the PR today.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user diegoceccarelli opened a pull request:

          https://github.com/apache/lucene-solr/pull/4

          SOLR-8542: Integrate Learning to Rank into Solr

          Solr Learning to Rank (LTR) provides a way for you to extract features
          directly inside Solr for use in training a machine learned model. You
          can then deploy that model to Solr and use it to rerank your top X
          search results. This concept was previously presented by the authors at
          Lucene/Solr Revolution 2015

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/bloomberg/lucene-solr master-ltr-plugin-rfc

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucene-solr/pull/4.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #4


          commit 1bee2ad0ce64b2f091e34f7fb42e00387616c987
          Author: Diego Ceccarelli <dceccarelli4@bloomberg.net>
          Date: 2016-01-13T22:29:17Z

          SOLR-8542: Integrate Learning to Rank into Solr

          Solr Learning to Rank (LTR) provides a way for you to extract features
          directly inside Solr for use in training a machine learned model. You
          can then deploy that model to Solr and use it to rerank your top X
          search results. This concept was previously presented by the authors at
          Lucene/Solr Revolution 2015


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user diegoceccarelli opened a pull request: https://github.com/apache/lucene-solr/pull/4 SOLR-8542 : Integrate Learning to Rank into Solr Solr Learning to Rank (LTR) provides a way for you to extract features directly inside Solr for use in training a machine learned model. You can then deploy that model to Solr and use it to rerank your top X search results. This concept was previously presented by the authors at Lucene/Solr Revolution 2015 You can merge this pull request into a Git repository by running: $ git pull https://github.com/bloomberg/lucene-solr master-ltr-plugin-rfc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/4.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4 commit 1bee2ad0ce64b2f091e34f7fb42e00387616c987 Author: Diego Ceccarelli <dceccarelli4@bloomberg.net> Date: 2016-01-13T22:29:17Z SOLR-8542 : Integrate Learning to Rank into Solr Solr Learning to Rank (LTR) provides a way for you to extract features directly inside Solr for use in training a machine learned model. You can then deploy that model to Solr and use it to rerank your top X search results. This concept was previously presented by the authors at Lucene/Solr Revolution 2015
          Hide
          mnilsson Michael Nilsson added a comment -

          We have reopened the pull request now into master, which used to be trunk before the svn->git conversion. Next week we will start addressing the comments posted thus far in the ticket as well.

          Show
          mnilsson Michael Nilsson added a comment - We have reopened the pull request now into master, which used to be trunk before the svn->git conversion. Next week we will start addressing the comments posted thus far in the ticket as well.
          Hide
          cpoerschke Christine Poerschke added a comment -

          Hello. Just a quick note to say that i'm resuming actively looking at this ticket, today focused mainly on the solr/contrib/ltr/src/java/org/apache/solr/ltr/rest classes.

          code comments/questions:

          • In ManagedFeatureStore and ManagedModelStore the doDeleteChild method makes no storeManagedData method call - oversight?
          • ManagedFeatureStore.doGet throws an exception when the childId concerned is not present, might it just return a response without features?
          • ManagedResource.doPut->ManagedFeatureStore.applyUpdatesToManagedData->update->addFeature calling chain it seems could throw an exception when a name being updated/added already exists. REST wikipedia page mentions about PUT and DELETE being idempotent - should repeats of the same name simply replace the existing entry for that name?

          observations (question to follow):

          • ManagedFeatureStore.addFeature calls NameValidator.check and could throw an InvalidFeatureNameException exception
          • ManagedFeatureStore.createFeature would throw an exception if Class.forName(type) finds no class or f.init(name, params, id) throws an exception
          • ManagedModelStore.applyUpdatesToManagedData->update->makeModelMetaData throws an exception when the data has no features field or when there are other 'invalid input' type problems
          • LTRComponent uses ManagedFeatureStore and ManagedModelStore
          • LTRQParserPlugin uses ManagedModelStore, and ManagedModelStore in turn uses ManagedFeatureStore

          question (for everyone and perhaps more REST that LTR related actually really?):

          • To what extent should the REST/ManagedResource class be only representing state and/or to what extent should it also contain 'invalid input' type logic and associated error handling?
          • If the represented state could be logically valid as well as invalid, might the state representation and use of the represented state be separated out, perhaps something along these lines in LTRComponent.inform(SolrCore core)?
          core.getRestManager().addManagedResource(LTRParams.FSTORE_END_POINT, ManagedFeatureStoreInfo.class);
          ManagedFeatureStoreInfo fri = (ManagedFeatureStoreInfo) core.getRestManager().getManagedResource(LTRParams.FSTORE_END_POINT);
          
          core.getRestManager().addManagedResource(LTRParams.MSTORE_END_POINT, ManagedModelStoreInfo.class);
          ManagedModelStoreInfo mri = (ManagedModelStoreInfo) core.getRestManager().getManagedResource(LTRParams.MSTORE_END_POINT);
          
          LTRModelStore ltr_ms;
          try {
            ltr_ms = new LTRModelStore(fri, mri);
          } catch ... {
            // exception handling here
          }
          // TODO: do something here so that ltr_ms is available to LTRQParserPlugin
          // question: would feature store and model store changes still propagate through to ltr_ms?
          
          Show
          cpoerschke Christine Poerschke added a comment - Hello. Just a quick note to say that i'm resuming actively looking at this ticket, today focused mainly on the solr/contrib/ltr/src/java/org/apache/solr/ltr/rest classes. code comments/questions: In ManagedFeatureStore and ManagedModelStore the doDeleteChild method makes no storeManagedData method call - oversight? ManagedFeatureStore.doGet throws an exception when the childId concerned is not present, might it just return a response without features? ManagedResource.doPut->ManagedFeatureStore.applyUpdatesToManagedData->update->addFeature calling chain it seems could throw an exception when a name being updated/added already exists. REST wikipedia page mentions about PUT and DELETE being idempotent - should repeats of the same name simply replace the existing entry for that name? observations (question to follow): ManagedFeatureStore.addFeature calls NameValidator.check and could throw an InvalidFeatureNameException exception ManagedFeatureStore.createFeature would throw an exception if Class.forName(type) finds no class or f.init(name, params, id) throws an exception ManagedModelStore.applyUpdatesToManagedData->update->makeModelMetaData throws an exception when the data has no features field or when there are other 'invalid input' type problems LTRComponent uses ManagedFeatureStore and ManagedModelStore LTRQParserPlugin uses ManagedModelStore, and ManagedModelStore in turn uses ManagedFeatureStore question (for everyone and perhaps more REST that LTR related actually really?): To what extent should the REST/ManagedResource class be only representing state and/or to what extent should it also contain 'invalid input' type logic and associated error handling? If the represented state could be logically valid as well as invalid, might the state representation and use of the represented state be separated out, perhaps something along these lines in LTRComponent.inform(SolrCore core) ? core.getRestManager().addManagedResource(LTRParams.FSTORE_END_POINT, ManagedFeatureStoreInfo.class); ManagedFeatureStoreInfo fri = (ManagedFeatureStoreInfo) core.getRestManager().getManagedResource(LTRParams.FSTORE_END_POINT); core.getRestManager().addManagedResource(LTRParams.MSTORE_END_POINT, ManagedModelStoreInfo.class); ManagedModelStoreInfo mri = (ManagedModelStoreInfo) core.getRestManager().getManagedResource(LTRParams.MSTORE_END_POINT); LTRModelStore ltr_ms; try { ltr_ms = new LTRModelStore(fri, mri); } catch ... { // exception handling here } // TODO: do something here so that ltr_ms is available to LTRQParserPlugin // question: would feature store and model store changes still propagate through to ltr_ms?
          Hide
          cpoerschke Christine Poerschke added a comment -

          Continued looking at this ticket's patch/pull request - cool stuff! Comments and questions to follow. Thank you.

          Show
          cpoerschke Christine Poerschke added a comment - Continued looking at this ticket's patch/pull request - cool stuff! Comments and questions to follow. Thank you.
          Hide
          cpoerschke Christine Poerschke added a comment -

          The branch behind the https://github.com/apache/lucene-solr/pull/4 above is master-ltr-plugin-rfc and i've just created master-ltr-plugin-rfc-cpoerschke-comments branch off that.

          In (unrelated) SOLR-8621 we had an in-progress branch also and its usage and intentions emerged and were clarified over time, and so based on that perhaps it's helpful to suggest usage up-front here:

          • master-ltr-plugin-rfc branches off (Jan 29th) master
          • master-ltr-plugin-rfc-cpoerschke-comments branches off (Feb 24th) master-ltr-plugin-rfc
          • 'git merge' and 'git rebase' and 'git --force push' will be avoided
          • further commits to master-ltr-plugin-rfc* are anticipated
          • 'git cherry-pick' of changes from master to master-ltr-plugin-rfc* will be done where helpful (e.g. SOLR-8600 was cherry-picked from master to master-ltr-plugin-rfc-cpoerschke-comments)
          • cherry-picking between master-ltr-plugin-rfc* branches welcome and will be done where helpful
          • at some point in the future activity on master-ltr-plugin-rfc* branches will cease and if required a new (say) master-ltr-plugin-rfc-march branch off (Mar 1?th) master will be created
          • at the very end everything will be squashed and rebased onto latest master and then committed as a single commit

          Does that sound workable or too complicated? Alternatives, comments, etc. welcome as usual. (And to clarify, suggested usage here is specific for this SOLR-8542 ticket only, any general recommended usage type discussions would be for elsewhere.)

          Show
          cpoerschke Christine Poerschke added a comment - The branch behind the https://github.com/apache/lucene-solr/pull/4 above is master-ltr-plugin-rfc and i've just created master-ltr-plugin-rfc-cpoerschke-comments branch off that. In (unrelated) SOLR-8621 we had an in-progress branch also and its usage and intentions emerged and were clarified over time, and so based on that perhaps it's helpful to suggest usage up-front here: master-ltr-plugin-rfc branches off (Jan 29th) master master-ltr-plugin-rfc-cpoerschke-comments branches off (Feb 24th) master-ltr-plugin-rfc 'git merge' and 'git rebase' and 'git --force push' will be avoided further commits to master-ltr-plugin-rfc* are anticipated 'git cherry-pick' of changes from master to master-ltr-plugin-rfc* will be done where helpful (e.g. SOLR-8600 was cherry-picked from master to master-ltr-plugin-rfc-cpoerschke-comments) cherry-picking between master-ltr-plugin-rfc* branches welcome and will be done where helpful at some point in the future activity on master-ltr-plugin-rfc* branches will cease and if required a new (say) master-ltr-plugin-rfc-march branch off (Mar 1?th) master will be created at the very end everything will be squashed and rebased onto latest master and then committed as a single commit Does that sound workable or too complicated? Alternatives, comments, etc. welcome as usual. (And to clarify, suggested usage here is specific for this SOLR-8542 ticket only, any general recommended usage type discussions would be for elsewhere.)
          Hide
          cpoerschke Christine Poerschke added a comment -

          Question related to Feature Engineering - is that the right term? - and feature extraction.

          LTRQParserPlugin.java#L117 mentions

          For training a new model offline you need feature vectors, but dont yet have a model.

          and README.txt#L280 mentions about for now using a dummy model e.g.

          fv=true&fl=*,score,[features]&rq={!ltr model=dummyModel reRankDocs=25}

          to extract features.

          If it is known already, could you outline what the replacement for the above fv/fl/dummyModel combination is likely to look like?

          Semi-related to that:

          • would the efi.* parameters move out of the rq then since candidate features to be returned in the response might reference external feature info?
          • might it be useful to have optional version and/or comment string elements in the feature and model JSON format? Illustration:
            {
              "type": "org.apache.solr.ltr.feature.impl.SolrFeature",
              "name":  "documentRecency",
              "comment": "Initial version, we may have to tweak the recip function arguments later.",
              "params": {
                  "q": "{!func}recip( ms(NOW,publish_date), 3.16e-11, 1, 1)"
              }
            }
            ...
            {
                "type":"org.apache.solr.ltr.ranking.RankSVMModel",
                "name":"myModelName",
                "version": "1.0",
                "comment": "features and parameters determined using XYZ with ABC data, ticket reference: 12345",
                "features":[
                    ...
                ],
                "params":{
                    ...
                }
            }
            
          Show
          cpoerschke Christine Poerschke added a comment - Question related to Feature Engineering - is that the right term? - and feature extraction. LTRQParserPlugin.java#L117 mentions For training a new model offline you need feature vectors, but dont yet have a model. and README.txt#L280 mentions about for now using a dummy model e.g. fv=true&fl=*,score,[features]&rq={!ltr model=dummyModel reRankDocs=25} to extract features. If it is known already, could you outline what the replacement for the above fv/fl/dummyModel combination is likely to look like? Semi-related to that: would the efi.* parameters move out of the rq then since candidate features to be returned in the response might reference external feature info? might it be useful to have optional version and/or comment string elements in the feature and model JSON format? Illustration: { "type" : "org.apache.solr.ltr.feature.impl.SolrFeature" , "name" : "documentRecency" , "comment" : "Initial version, we may have to tweak the recip function arguments later." , "params" : { "q" : "{!func}recip( ms(NOW,publish_date), 3.16e-11, 1, 1)" } } ... { "type" : "org.apache.solr.ltr.ranking.RankSVMModel" , "name" : "myModelName" , "version" : "1.0" , "comment" : "features and parameters determined using XYZ with ABC data, ticket reference: 12345" , "features" :[ ... ], "params" :{ ... } }
          Hide
          cpoerschke Christine Poerschke added a comment -

          Question related to the optional "store" element in the features and model JSON.

          Could you clarify/outline when/how the "store" element would be used? Illustration:

          ###### features.json
          [
          {
            "name":"isBook",
            # absence of "store" element means default store
            "type":"org.apache.solr.ltr.feature.impl.OriginalScoreFeature",
            "params":{}
          },
          {
            "name": "isBook", # same feature name but different store (and different type and/or params)
            "store": "someStore",
            "type": "org.apache.solr.ltr.feature.impl.SolrFeature",
            "params":{ "fq": ["{!terms f=category}book"] }
          }
          ]
          ...
          ###### model.json
          {
              "type":"org.apache.solr.ltr.ranking.RankSVMModel",
              "name":"myModelName",
              "name":"myStore", # can this model reference features from another store (in this example assume the myStore store has no isBook feature)?
              "features":[
                  { "name": "userTextTitleMatch"},
                  { "name": "originalScore"},
                  { "name": "isBook"}
              ],
              "params":{
                  "weights": {
                      "userTextTitleMatch": 1.0,
                      "originalScore": 0.5,
                      "isBook": 0.1
                  }
          
              }
          }
          

          Are feature and model stores local to each solr config or can they be shared across configs? Illustration:

          ###### extract from zookeeper data:
          /collections 
           /collections/collection1
           DATA:
               {"configName":"configA"}
           /collections/collection2
           DATA:
               {"configName":"configB"}
          
          /configs
           /configs/configA
            /configs/configA/solrconfig.xml
            /configs/configA/schema.xml
           /configs/configB
            /configs/configB/solrconfig.xml
            /configs/configB/schema.xml
          
          ???/features.json
          ???/model.json
          
          Show
          cpoerschke Christine Poerschke added a comment - Question related to the optional "store" element in the features and model JSON. Could you clarify/outline when/how the "store" element would be used? Illustration: ###### features.json [ { "name" : "isBook" , # absence of "store" element means default store "type" : "org.apache.solr.ltr.feature.impl.OriginalScoreFeature" , "params" :{} }, { "name" : "isBook" , # same feature name but different store (and different type and/or params) "store" : "someStore" , "type" : "org.apache.solr.ltr.feature.impl.SolrFeature" , "params" :{ "fq" : [ "{!terms f=category}book" ] } } ] ... ###### model.json { "type" : "org.apache.solr.ltr.ranking.RankSVMModel" , "name" : "myModelName" , "name" : "myStore" , # can this model reference features from another store (in this example assume the myStore store has no isBook feature)? "features" :[ { "name" : "userTextTitleMatch" }, { "name" : "originalScore" }, { "name" : "isBook" } ], "params" :{ "weights" : { "userTextTitleMatch" : 1.0, "originalScore" : 0.5, "isBook" : 0.1 } } } Are feature and model stores local to each solr config or can they be shared across configs? Illustration: ###### extract from zookeeper data: /collections /collections/collection1 DATA: { "configName" : "configA" } /collections/collection2 DATA: { "configName" : "configB" } /configs /configs/configA /configs/configA/solrconfig.xml /configs/configA/schema.xml /configs/configB /configs/configB/solrconfig.xml /configs/configB/schema.xml ???/features.json ???/model.json
          Hide
          mnilsson Michael Nilsson added a comment -

          Hey Christine, I've posted a response to most of your comments thus far below.

          doDeleteChild method makes no storeManagedData method call
          We have a ticket for this that we'll fix along with other improvements for our next commit.

          ManagedFeatureStore.doGet throws an exception when the childId concerned is not present
          We could return a response with no features if desired, we were currently using the error response to differentiate between a feature store not existing and one existing without any features added to it yet.

          ManagedResource.doPut addFeature could throw an exception when a name being updated/added already exists. Should repeats of the same name simply replace the existing entry for that name?
          Typically when you have models deployed using some features, you don't want to "update" an existing feature. You should instead add a new feature with your updates and deploy a newly trained model using it, because you don't want the meaning/value of the original feature used by historical models to change. This is to ensure reproducible results when testing an old model that used the old version of the feature. We use this error to prevent this from happening.

          LTRComponent state + use of state separation. Would feature store and model store changes still propagate through to ltr_ms
          If you deploy new features to your feature store, you would want to start extracting those features, which means we should propagate them down. We could make feature stores write-once, and any new features would require a new feature store with all the old ones copied over to avoid this, but that might be cumbersome to the user and leave lots of old feature stores around until the user cleans them up.
          Question: The only reason we currently have the LTRComponent is so that it can register the Model and Feature stores as managed resources because it can be SolrCore aware. Is there a way we can do this without the use of a component?

          Branch/commit process
          Everything you said sounds do-able. The only question I have is regarding "'git merge' and 'git rebase' and 'git --force push' will be avoided". Agreed about git force, but if at the end we're going to make a new master-ltr-plugin-rfc-march branch, and everything is going to be squashed and rebased, why not allow merges into the master-ltr-plugin-rfc to keep up to date with master changes instead of cherry-picking everything one by one into it?

          Feature engineering dummy model replacement
          Currently you have to use a dummy model to reference what features you want extracted like you said.

          fv=true&fl=*,score,[features]&rq={!ltr model=dummyModel reRankDocs=25}

          The only reason you need the model is because it has a FeatureStore, which has all the features you are looking to extract. Instead, we are planning on allowing you to specify which FeatureStore you want to use for feature extraction directly in the features Document Transformer. We will also remove the superfluous fv=true parameter, since the document transformer already identifies the fact that you want to extract features. The new expected sample request for feature extraction would probably look something like this instead

          fl=*,score,[features featureStore=MyFeatures]

          would the efi. parameters move out of the rq
          We will probably also move efi out as well, since you need them for both feature extraction and reranking with a model

          might it be useful to have optional version and/or comment string elements in the feature
          I think the comment section would be a good idea. The version touches on the what I mentioned earlier about updates vs adds. We'll have to think about the best way to handle this since you don't want to lose/replace versions 1 and 2 when you deploy version 3 of a feature.

          Could you clarify/outline when/how the "store" element would be used?
          A FeatureStore is a list of features that you want to extract (and use for training, logging, or in a model for reranking). In the majority of the cases, you will probably just have 1 feature store, and all iterations of your models will use the same feature store, with any new features added to the store. A model cannot use features from other stores. It may be the case that a single collection services many different applications. If each of those applications wants to rerank its results differently and only cares about a subset of features, then they could each make their own FeatureStores with their say 100 features for extraction instead of pulling out the thousands of other features that all the other teams made for that same collection.

          Are feature and model stores local to each solr config or can they be shared across configs?
          The feature and model stores are currently tied locally to each collection/config, like managed stopwords/synonyms. If you wanted to have comparable scores for searches across multiple collections for a unified search list, you have to deploy that model to each of the collections.

          Show
          mnilsson Michael Nilsson added a comment - Hey Christine, I've posted a response to most of your comments thus far below. doDeleteChild method makes no storeManagedData method call We have a ticket for this that we'll fix along with other improvements for our next commit. ManagedFeatureStore.doGet throws an exception when the childId concerned is not present We could return a response with no features if desired, we were currently using the error response to differentiate between a feature store not existing and one existing without any features added to it yet. ManagedResource.doPut addFeature could throw an exception when a name being updated/added already exists. Should repeats of the same name simply replace the existing entry for that name? Typically when you have models deployed using some features, you don't want to "update" an existing feature. You should instead add a new feature with your updates and deploy a newly trained model using it, because you don't want the meaning/value of the original feature used by historical models to change. This is to ensure reproducible results when testing an old model that used the old version of the feature. We use this error to prevent this from happening. LTRComponent state + use of state separation. Would feature store and model store changes still propagate through to ltr_ms If you deploy new features to your feature store, you would want to start extracting those features, which means we should propagate them down. We could make feature stores write-once, and any new features would require a new feature store with all the old ones copied over to avoid this, but that might be cumbersome to the user and leave lots of old feature stores around until the user cleans them up. Question: The only reason we currently have the LTRComponent is so that it can register the Model and Feature stores as managed resources because it can be SolrCore aware. Is there a way we can do this without the use of a component? Branch/commit process Everything you said sounds do-able. The only question I have is regarding "'git merge' and 'git rebase' and 'git --force push' will be avoided". Agreed about git force, but if at the end we're going to make a new master-ltr-plugin-rfc-march branch, and everything is going to be squashed and rebased, why not allow merges into the master-ltr-plugin-rfc to keep up to date with master changes instead of cherry-picking everything one by one into it? Feature engineering dummy model replacement Currently you have to use a dummy model to reference what features you want extracted like you said. fv= true &fl=*,score,[features]&rq={!ltr model=dummyModel reRankDocs=25} The only reason you need the model is because it has a FeatureStore, which has all the features you are looking to extract. Instead, we are planning on allowing you to specify which FeatureStore you want to use for feature extraction directly in the features Document Transformer. We will also remove the superfluous fv=true parameter, since the document transformer already identifies the fact that you want to extract features. The new expected sample request for feature extraction would probably look something like this instead fl=*,score,[features featureStore=MyFeatures] would the efi. parameters move out of the rq We will probably also move efi out as well, since you need them for both feature extraction and reranking with a model might it be useful to have optional version and/or comment string elements in the feature I think the comment section would be a good idea. The version touches on the what I mentioned earlier about updates vs adds. We'll have to think about the best way to handle this since you don't want to lose/replace versions 1 and 2 when you deploy version 3 of a feature. Could you clarify/outline when/how the "store" element would be used? A FeatureStore is a list of features that you want to extract (and use for training, logging, or in a model for reranking). In the majority of the cases, you will probably just have 1 feature store, and all iterations of your models will use the same feature store, with any new features added to the store. A model cannot use features from other stores. It may be the case that a single collection services many different applications. If each of those applications wants to rerank its results differently and only cares about a subset of features, then they could each make their own FeatureStores with their say 100 features for extraction instead of pulling out the thousands of other features that all the other teams made for that same collection. Are feature and model stores local to each solr config or can they be shared across configs? The feature and model stores are currently tied locally to each collection/config, like managed stopwords/synonyms. If you wanted to have comparable scores for searches across multiple collections for a unified search list, you have to deploy that model to each of the collections.
          Hide
          cpoerschke Christine Poerschke added a comment -

          Hi Michael, thanks for the response above. Based on it, some follow-on questions/observations below.

          Typically ... you don't want to "update" an existing feature. You should instead add a new feature with your updates and deploy a newly trained model using it ... all iterations of your models will use the same feature store, with any new features added to the store. A model cannot use features from other stores. ...

          If features present in a feature store aren't normally updated because existing models use them and if models cannot use features from other stores - I wonder if combining features.json and model.json content might be a viable option? models.json illustration shown below, please see also solrconfig.xml related illustration and observations that follow it.

          ###### models.json
          [
          {
              "type":"org.apache.solr.ltr.ranking.RankSVMModel",
              "name":"myFirstModelName",
              "features":[
                  { "name": "originalScore",
                    "type":"org.apache.solr.ltr.feature.impl.OriginalScoreFeature",
                    "params":{}
                  },
                  { "name": "isBook",
                    "type": "org.apache.solr.ltr.feature.impl.SolrFeature",
                    "params":{ "fq": ["{!terms f=category}book"] }
                  }
              ],
              "params":{
                  "weights": {
                      "originalScore": 0.5,
                      "isBook": 0.1
                  }
          
              }
          },
          {
              "type":"org.apache.solr.ltr.ranking.RankSVMModel",
              "name":"mySecondModelName",
              ...
          }
          ]
          
          Show
          cpoerschke Christine Poerschke added a comment - Hi Michael, thanks for the response above. Based on it, some follow-on questions/observations below. Typically ... you don't want to "update" an existing feature. You should instead add a new feature with your updates and deploy a newly trained model using it ... all iterations of your models will use the same feature store, with any new features added to the store. A model cannot use features from other stores. ... If features present in a feature store aren't normally updated because existing models use them and if models cannot use features from other stores - I wonder if combining features.json and model.json content might be a viable option? models.json illustration shown below, please see also solrconfig.xml related illustration and observations that follow it. ###### models.json [ { "type" : "org.apache.solr.ltr.ranking.RankSVMModel" , "name" : "myFirstModelName" , "features" :[ { "name" : "originalScore" , "type" : "org.apache.solr.ltr.feature.impl.OriginalScoreFeature" , "params" :{} }, { "name" : "isBook" , "type" : "org.apache.solr.ltr.feature.impl.SolrFeature" , "params" :{ "fq" : [ "{!terms f=category}book" ] } } ], "params" :{ "weights" : { "originalScore" : 0.5, "isBook" : 0.1 } } }, { "type" : "org.apache.solr.ltr.ranking.RankSVMModel" , "name" : "mySecondModelName" , ... } ]
          Hide
          cpoerschke Christine Poerschke added a comment -

          ... Question: The only reason we currently have the LTRComponent is so that it can register the Model and Feature stores as managed resources because it can be SolrCore aware. Is there a way we can do this without the use of a component?

          Not answering directly the managed resources part of the question but having noticed that the features.json/model.json needs to be accompanied by various solrconfig.xml changes in practice - I wonder if configuring models as plugin part of solrconfig.xml might be something to explore?


          current (features|model).json and solrconfig.xml configuration:

          ###### features.json
          ...
          ###### firstModel.json
          ...
          ###### secondModel.json
          ...
          ###### solrconfig.xml
          ...
          <queryParser name="ltr" class="org.apache.solr.ltr.ranking.LTRQParserPlugin" />
          ...
          <transformer name="features" class="org.apache.solr.ltr.ranking.LTRFeatureLoggerTransformerFactory"/>
          ...
          <searchComponent name="ltrComponent" class="org.apache.solr.ltr.ranking.LTRComponent"/>
          ...
          <requestHandler name="/query" class="solr.SearchHandler">
            ...
            <arr name="last-components">
              <str>ltrComponent</str>
            </arr>
          </requestHandler>
          ...
          

          potential alternative solrconfig.xml configuration:

          ###### solrconfig.xml
          ...
          <!-- no queryParser name="ltr" element since LTRQParserPlugin is in QParserPlugin.standardPlugins -->
          <!-- no transformer name="features" since LTRFeatureLoggerTransformerFactory is in TransformerFactory.defaultFactories -->
          
          <reRankModelFactory name="myFirstModelName" class="solr.SVMRerankModelFactory">
            <!-- model features -->
            <str name="features">originalScore,isBook</str>
            <str name="originalScore.class">org.apache.solr.ltr.feature.impl.OriginalScoreFeature</str>
            <str name="isBook.class">org.apache.solr.ltr.feature.impl.SolrFeature</str>
            <str name="isBook.fq">{!terms f=category}book</str>
            <!-- model parameters -->
            <float name="weights.originalScore">0.5</float>
            <float name="weights.isBook">0.1</float>
          </reRankModelFactory>
          
          <reRankModelFactory class="solr.SVMRerankModelFactory">
            <str name="">mySecondModelName</str>
            ...
          </reRankModelFactory>
          ...
          

          The most obvious implication of having a new solrconfig.xml element instead of (features|model).json managed resources would be that solr/core rather than solr/contrib/ltr contains the code.

          • From an end-user perspective this means 'Learning to Rank' support out-of-the-box i.e. no need to build and deploy extra jar files plus no need to configure LTRQParserPlugin and LTRFeatureLoggerTransformerFactory queryParser and transformer elements. Though note that <reRankModelFactory class="mycompany.MyCustomReRankModelFactory"> customisation is supported if something other than the out-of-the-box models is required.
          • One of the out-of-the-box factories could be a features-only factory similar to the 'dummyModel' mentioned above, e.g.
            <reRankModelFactory name="featuresOnly" class="solr.NoRerankingFactory">
              <str name="features">originalScore,isBook</str>
              <str name="originalScore.class">org.apache.solr.ltr.feature.impl.OriginalScoreFeature</str>
              <str name="isBook.class">org.apache.solr.ltr.feature.impl.SolrFeature</str>
              <str name="isBook.fq">{!terms f=category}book</str>
            </reRankModelFactory>
            

          A concern might be that the reRankModelFactory element(s) would bloat solrconfig.xml and that the element(s) being embedded in solrconfig.xml would be more difficult to edit than one or two json files.

          • The bloat concern can be addressed via xi:include e.g.
            ###### solrconfig.xml
            ...
            <xi:include href="solrconfig-reRankModelFactory-myFirstModelName.xml" xmlns:xi="http://www.w3.org/2001/XInclude"/>
            ...
            ###### solrconfig-reRankModelFactory-myFirstModelName.xml
            <reRankModelFactory name="myFirstModelName" class="solr.SVMRerankModelFactory">
              <!-- model features -->
              <str name="features">originalScore,isBook</str>
              <str name="originalScore.class">org.apache.solr.ltr.feature.impl.OriginalScoreFeature</str>
              <str name="isBook.class">org.apache.solr.ltr.feature.impl.SolrFeature</str>
              <str name="isBook.fq">{!terms f=category}book</str>
              <!-- model parameters -->
              <float name="weights.originalScore">0.5</float>
              <float name="weights.isBook">0.1</float>
            </reRankModelFactory>
            
          • xml vs. json representation is a fair point, if the feature engineering process usually outputs json files then perhaps a simple utility script could help convert that json into solrconfig.xml a reRankModelFactory xml element.

          A factory approach could naturally support arbitrary models including chaining or nesting of models. (A factory approach is of course also possible with json format.)

          <reRankModelFactory name="myTwoPassModelName" class="solr.MultiPassRerankModelFactory">
            <str name="passPrefixes">simple,complex</str>
          
            <!-- simple model factory -->
            <str name="simple.class">solr.SVMRerankModelFactory</str>
            <!-- simple model features -->
            <str name="simple.features">originalScore,isBook</str>
            <str name="simple.originalScore.class">org.apache.solr.ltr.feature.impl.OriginalScoreFeature</str>
            <str name="simple.isBook.class">org.apache.solr.ltr.feature.impl.SolrFeature</str>
            <str name="simple.isBook.fq">{!terms f=category}book</str>
            <!-- simple model parameters -->
            <float name="simple.weights.originalScore">0.5</float>
            <float name="simple.weights.isBook">0.1</float>
          
            <!-- complex model factory -->
            <str name="complex.class">mycompany.MyComplexRerankModelFactory</str>
            <!-- complex model features -->
            <str name="complex.features">x,y</str>
            <str name="complex.x.class">...</str>
            <str name="complex.x.aaa">...</str>
            <int name="complex.x.bbb">...</int>
            <str name="complex.y.class">...</str>
            <int name="complex.y.zzz">...</int>
            <!-- complex model parameters -->
            <float name="complex.something.configurable">0.42</float>
            ...
          </reRankModelFactory>
          
          Show
          cpoerschke Christine Poerschke added a comment - ... Question: The only reason we currently have the LTRComponent is so that it can register the Model and Feature stores as managed resources because it can be SolrCore aware. Is there a way we can do this without the use of a component? Not answering directly the managed resources part of the question but having noticed that the features.json/model.json needs to be accompanied by various solrconfig.xml changes in practice - I wonder if configuring models as plugin part of solrconfig.xml might be something to explore? current (features|model).json and solrconfig.xml configuration: ###### features.json ... ###### firstModel.json ... ###### secondModel.json ... ###### solrconfig.xml ... <queryParser name= "ltr" class= "org.apache.solr.ltr.ranking.LTRQParserPlugin" /> ... <transformer name= "features" class= "org.apache.solr.ltr.ranking.LTRFeatureLoggerTransformerFactory" /> ... <searchComponent name= "ltrComponent" class= "org.apache.solr.ltr.ranking.LTRComponent" /> ... <requestHandler name= "/query" class= "solr.SearchHandler" > ... <arr name= "last-components" > <str>ltrComponent</str> </arr> </requestHandler> ... potential alternative solrconfig.xml configuration: ###### solrconfig.xml ... <!-- no queryParser name= "ltr" element since LTRQParserPlugin is in QParserPlugin.standardPlugins --> <!-- no transformer name= "features" since LTRFeatureLoggerTransformerFactory is in TransformerFactory.defaultFactories --> <reRankModelFactory name= "myFirstModelName" class= "solr.SVMRerankModelFactory" > <!-- model features --> <str name= "features" >originalScore,isBook</str> <str name= "originalScore.class" >org.apache.solr.ltr.feature.impl.OriginalScoreFeature</str> <str name= "isBook.class" >org.apache.solr.ltr.feature.impl.SolrFeature</str> <str name= "isBook.fq" >{!terms f=category}book</str> <!-- model parameters --> < float name= "weights.originalScore" >0.5</ float > < float name= "weights.isBook" >0.1</ float > </reRankModelFactory> <reRankModelFactory class= "solr.SVMRerankModelFactory" > <str name="">mySecondModelName</str> ... </reRankModelFactory> ... The most obvious implication of having a new solrconfig.xml element instead of (features|model).json managed resources would be that solr/core rather than solr/contrib/ltr contains the code. From an end-user perspective this means 'Learning to Rank' support out-of-the-box i.e. no need to build and deploy extra jar files plus no need to configure LTRQParserPlugin and LTRFeatureLoggerTransformerFactory queryParser and transformer elements. Though note that <reRankModelFactory class="mycompany.MyCustomReRankModelFactory"> customisation is supported if something other than the out-of-the-box models is required. One of the out-of-the-box factories could be a features-only factory similar to the 'dummyModel' mentioned above, e.g. <reRankModelFactory name= "featuresOnly" class= "solr.NoRerankingFactory" > <str name= "features" >originalScore,isBook</str> <str name= "originalScore.class" >org.apache.solr.ltr.feature.impl.OriginalScoreFeature</str> <str name= "isBook.class" >org.apache.solr.ltr.feature.impl.SolrFeature</str> <str name= "isBook.fq" >{!terms f=category}book</str> </reRankModelFactory> A concern might be that the reRankModelFactory element(s) would bloat solrconfig.xml and that the element(s) being embedded in solrconfig.xml would be more difficult to edit than one or two json files. The bloat concern can be addressed via xi:include e.g. ###### solrconfig.xml ... <xi:include href= "solrconfig-reRankModelFactory-myFirstModelName.xml" xmlns:xi= "http: //www.w3.org/2001/XInclude" /> ... ###### solrconfig-reRankModelFactory-myFirstModelName.xml <reRankModelFactory name= "myFirstModelName" class= "solr.SVMRerankModelFactory" > <!-- model features --> <str name= "features" >originalScore,isBook</str> <str name= "originalScore.class" >org.apache.solr.ltr.feature.impl.OriginalScoreFeature</str> <str name= "isBook.class" >org.apache.solr.ltr.feature.impl.SolrFeature</str> <str name= "isBook.fq" >{!terms f=category}book</str> <!-- model parameters --> < float name= "weights.originalScore" >0.5</ float > < float name= "weights.isBook" >0.1</ float > </reRankModelFactory> xml vs. json representation is a fair point, if the feature engineering process usually outputs json files then perhaps a simple utility script could help convert that json into solrconfig.xml a reRankModelFactory xml element. A factory approach could naturally support arbitrary models including chaining or nesting of models. (A factory approach is of course also possible with json format.) <reRankModelFactory name= "myTwoPassModelName" class= "solr.MultiPassRerankModelFactory" > <str name= "passPrefixes" >simple,complex</str> <!-- simple model factory --> <str name= "simple.class" >solr.SVMRerankModelFactory</str> <!-- simple model features --> <str name= "simple.features" >originalScore,isBook</str> <str name= "simple.originalScore.class" >org.apache.solr.ltr.feature.impl.OriginalScoreFeature</str> <str name= "simple.isBook.class" >org.apache.solr.ltr.feature.impl.SolrFeature</str> <str name= "simple.isBook.fq" >{!terms f=category}book</str> <!-- simple model parameters --> < float name= "simple.weights.originalScore" >0.5</ float > < float name= "simple.weights.isBook" >0.1</ float > <!-- complex model factory --> <str name= "complex.class" >mycompany.MyComplexRerankModelFactory</str> <!-- complex model features --> <str name= "complex.features" >x,y</str> <str name= "complex.x.class" >...</str> <str name= "complex.x.aaa" >...</str> < int name= "complex.x.bbb" >...</ int > <str name= "complex.y.class" >...</str> < int name= "complex.y.zzz" >...</ int > <!-- complex model parameters --> < float name= "complex.something.configurable" >0.42</ float > ... </reRankModelFactory>
          Hide
          cpoerschke Christine Poerschke added a comment -

          Hoss Man and Steve Rowe - would you have any thoughts on 'managed resource(s)' vs. 'solrconfig.xml plugin(s)' alternatives w.r.t. feature and model representation/configuration? Thanks.

          Show
          cpoerschke Christine Poerschke added a comment - Hoss Man and Steve Rowe - would you have any thoughts on 'managed resource(s)' vs. 'solrconfig.xml plugin(s)' alternatives w.r.t. feature and model representation/configuration? Thanks.
          Hide
          cpoerschke Christine Poerschke added a comment -

          ... The only question I have is regarding "'git merge' and 'git rebase' and 'git --force push' will be avoided". Agreed about git force, but if at the end we're going to make a new master-ltr-plugin-rfc-march branch, and everything is going to be squashed and rebased, why not allow merges into the master-ltr-plugin-rfc to keep up to date with master changes instead of cherry-picking everything one by one into it?

          My impression was that 'git rebase' (against master) could be run for master-ltr-plugin-rfc but then it would have to be followed by a 'git --force push' (and that is the thing to avoid). 'git merge' to pull in changes from master onto the master-ltr-plugin-rfc is perhaps possible without a force push, haven't tried that.

          In terms of transition from master-ltr-plugin-rfc to master-ltr-plugin-rfc-march branch, for that anything can be used in my opinion, rebase/merge/squash/etc. since it's starting a fresh branch.

          Not sure if that answered your question?

          Show
          cpoerschke Christine Poerschke added a comment - ... The only question I have is regarding "'git merge' and 'git rebase' and 'git --force push' will be avoided". Agreed about git force, but if at the end we're going to make a new master-ltr-plugin-rfc-march branch, and everything is going to be squashed and rebased, why not allow merges into the master-ltr-plugin-rfc to keep up to date with master changes instead of cherry-picking everything one by one into it? My impression was that 'git rebase' (against master) could be run for master-ltr-plugin-rfc but then it would have to be followed by a 'git --force push' (and that is the thing to avoid). 'git merge' to pull in changes from master onto the master-ltr-plugin-rfc is perhaps possible without a force push, haven't tried that. In terms of transition from master-ltr-plugin-rfc to master-ltr-plugin-rfc-march branch, for that anything can be used in my opinion, rebase/merge/squash/etc. since it's starting a fresh branch. Not sure if that answered your question?
          Hide
          steve_rowe Steve Rowe added a comment -

          Just read through the comments on the issue, but haven't looked at any code yet.

          I think you're asking about using managed resources or solrconfig.xml plugins as configuration locations. I think that relatively short config stuff fits naturally in solrconfig.xml, and managed resource infrastructure is set up to enable modifications to structured data in resources via API (is that enabled here? probably not, just whole-resource CRUD, I'm guessing). So I'd guess solrconfig.xml is a better fit here.

          Note that new usages of solrconfig.xml config should consider how they can be addressed via the Config API.

          One other consideration you didn't mention: shouldn't the Solr blob store be considered for storage/versioning/sharing of models? (Skimming here makes me think that they are stored in Zk as files with per-collection config.)

          Show
          steve_rowe Steve Rowe added a comment - Just read through the comments on the issue, but haven't looked at any code yet. I think you're asking about using managed resources or solrconfig.xml plugins as configuration locations. I think that relatively short config stuff fits naturally in solrconfig.xml, and managed resource infrastructure is set up to enable modifications to structured data in resources via API (is that enabled here? probably not, just whole-resource CRUD, I'm guessing). So I'd guess solrconfig.xml is a better fit here. Note that new usages of solrconfig.xml config should consider how they can be addressed via the Config API. One other consideration you didn't mention: shouldn't the Solr blob store be considered for storage/versioning/sharing of models? (Skimming here makes me think that they are stored in Zk as files with per-collection config.)
          Hide
          alessandro.benedetti Alessandro Benedetti added a comment -

          Really interesting stuff
          I think could be useful to have more details about the training phase.
          I briefly reviewed both the pull request and documentation, and it seems that to try the demo, we already provide a trained model.
          The only lines related the training seems to be :
          " A good library for training LambdaMART ( http://sourceforge.net/p/lemur/wiki/RankLib/ ).
          +You will need to convert the RankLib model format to the format specified above. " ( similar documentation for the linear SVM approach) .

          It would be cool to have more documentation about the training as well, explaining how to train the model starting from :

          • an example point-wise training set
          • a set of defined feature

          A step by step tutorial would be awesome !
          Anyway I will proceed in studying the plugin and try to do that on my own following the third party training tutorials.
          Well done again,

          Cheers

          Show
          alessandro.benedetti Alessandro Benedetti added a comment - Really interesting stuff I think could be useful to have more details about the training phase. I briefly reviewed both the pull request and documentation, and it seems that to try the demo, we already provide a trained model. The only lines related the training seems to be : " A good library for training LambdaMART ( http://sourceforge.net/p/lemur/wiki/RankLib/ ). +You will need to convert the RankLib model format to the format specified above. " ( similar documentation for the linear SVM approach) . It would be cool to have more documentation about the training as well, explaining how to train the model starting from : an example point-wise training set a set of defined feature A step by step tutorial would be awesome ! Anyway I will proceed in studying the plugin and try to do that on my own following the third party training tutorials. Well done again, Cheers
          Hide
          alessandro.benedetti Alessandro Benedetti added a comment - - edited

          A couple of questions, for some specific use case :

          1) Grouping
          How does the plugin behave for grouping ? Let's assume we have 1000 docs in 5 groups, even if we return only 1 doc per group, i assume the plugin will :
          1) first re-score the top K >>x ( so re-scoring 1000 docs )
          2) group the results
          3) return the 5 groups ( each one for example with top document)
          Or will be possible to re-rank only the top document per group ? ( so only the 5 top documents )

          According to the re-rank official solr documentation :
          "Combining Ranking Queries With Other Solr Features
          The "rq" parameter and the re-ranking feature in general works well with other Solr features. For example, it can be used in conjunction with the collapse parser to re-rank the group heads after they've been collapsed. It also preserves the order of documents elevated by the elevation component. And it even has it's own custom explain so you can see how the re-ranking scores were derived when looking at debug information."
          Can we assume this would happen with the LTR plugin as well ?

          2) Join - Parent Search
          Let's assume we return parents based on a query on the children .
          Just wondering how to combine the block join query parser to the LTR re-rank, to re-rank only the parents ( without re-scoring the children) .
          Also in this case, according to the re-rank documentation, it seems to be compatible, is the plugin going to work with that as well ?

          I will take a look on my own on these topics, but any thought would be much appreciated

          Show
          alessandro.benedetti Alessandro Benedetti added a comment - - edited A couple of questions, for some specific use case : 1) Grouping How does the plugin behave for grouping ? Let's assume we have 1000 docs in 5 groups, even if we return only 1 doc per group, i assume the plugin will : 1) first re-score the top K >>x ( so re-scoring 1000 docs ) 2) group the results 3) return the 5 groups ( each one for example with top document) Or will be possible to re-rank only the top document per group ? ( so only the 5 top documents ) According to the re-rank official solr documentation : "Combining Ranking Queries With Other Solr Features The "rq" parameter and the re-ranking feature in general works well with other Solr features. For example, it can be used in conjunction with the collapse parser to re-rank the group heads after they've been collapsed. It also preserves the order of documents elevated by the elevation component. And it even has it's own custom explain so you can see how the re-ranking scores were derived when looking at debug information." Can we assume this would happen with the LTR plugin as well ? 2) Join - Parent Search Let's assume we return parents based on a query on the children . Just wondering how to combine the block join query parser to the LTR re-rank, to re-rank only the parents ( without re-scoring the children) . Also in this case, according to the re-rank documentation, it seems to be compatible, is the plugin going to work with that as well ? I will take a look on my own on these topics, but any thought would be much appreciated
          Hide
          diegoceccarelli Diego Ceccarelli added a comment - - edited

          We decided to decouple models and features because:

          • the general use case is that you use a particular model (+ relying on a set of features) to rank your documents, but you also want to compute (and log) new features for training a new model to use in the future. All the features in a feature store will be computed but the model will receive only the requested features (allowing also to update the feature store adding new features without affecting the model)
          • two models could use the same feature, but normalize the feature values in a different way (see the Normalizer class)
          Show
          diegoceccarelli Diego Ceccarelli added a comment - - edited We decided to decouple models and features because: the general use case is that you use a particular model (+ relying on a set of features) to rank your documents, but you also want to compute (and log) new features for training a new model to use in the future. All the features in a feature store will be computed but the model will receive only the requested features (allowing also to update the feature store adding new features without affecting the model) two models could use the same feature, but normalize the feature values in a different way (see the Normalizer class)
          Hide
          diegoceccarelli Diego Ceccarelli added a comment - - edited

          Alessandro, thanks for the questions:

          1. At the moment RankQuery (on which LTR relies) is not supported in grouping (but we are working on that - see SOLR-8776), I think the correct solution would be to perform the steps 1,2,3. Maybe we can move the discussion on SOLR-8776 since it affects, in general, RankQueries and grouping. The easy solution is to use collapsing instead of grouping, collapsing is supported by RankQuery and we tested that LTR works as well.
          2. Join - Parent Search. I would if RankQuery supports block join, it should work, but we didn't check.
          Show
          diegoceccarelli Diego Ceccarelli added a comment - - edited Alessandro, thanks for the questions: At the moment RankQuery (on which LTR relies) is not supported in grouping (but we are working on that - see SOLR-8776 ), I think the correct solution would be to perform the steps 1,2,3. Maybe we can move the discussion on SOLR-8776 since it affects, in general, RankQueries and grouping. The easy solution is to use collapsing instead of grouping, collapsing is supported by RankQuery and we tested that LTR works as well. Join - Parent Search. I would if RankQuery supports block join, it should work, but we didn't check.
          Hide
          diegoceccarelli Diego Ceccarelli added a comment -

          I had the same idea. My only concern is: would then be possible to update the solrconfig.xml without bouncing Solr? with the managed resources we would be able to add a feature/model at runtime and start to use it. Would be possible to get the same behavior with the solr config? (...and first, do we want it? )

          Show
          diegoceccarelli Diego Ceccarelli added a comment - I had the same idea. My only concern is: would then be possible to update the solrconfig.xml without bouncing Solr? with the managed resources we would be able to add a feature/model at runtime and start to use it. Would be possible to get the same behavior with the solr config? (...and first, do we want it? )
          Hide
          cpoerschke Christine Poerschke added a comment -

          Thanks Steve for bringing Solr blob store into consideration. Related links:

          Show
          cpoerschke Christine Poerschke added a comment - Thanks Steve for bringing Solr blob store into consideration. Related links: SOLR-8773 'Make blob store usage intuitive and robust' Blob Store API in the Apache Solr Reference Guide 's Configuration APIs section.
          Hide
          alessandro.benedetti Alessandro Benedetti added a comment -

          Diego,
          thanks for the reply!
          just verified :
          1) as documentation specifies, the re-rank component works on the collapsed results, so we can assume LTR re-rank will work as well.
          2) just tried the Block Join Parent Query Parser with the re-rank query parser, and it is working ( the parents returned are re-ranked according to the re-rank parameters ) . I can assume the LTR query parser to work in that scenario as well.
          Thanks for your help !

          Cheers

          Show
          alessandro.benedetti Alessandro Benedetti added a comment - Diego, thanks for the reply! just verified : 1) as documentation specifies, the re-rank component works on the collapsed results, so we can assume LTR re-rank will work as well. 2) just tried the Block Join Parent Query Parser with the re-rank query parser, and it is working ( the parents returned are re-ranked according to the re-rank parameters ) . I can assume the LTR query parser to work in that scenario as well. Thanks for your help ! Cheers
          Hide
          alessandro.benedetti Alessandro Benedetti added a comment - - edited

          Maybe I still have not a clear picture, but isn't the model generated externally, with a training set and a training library ( that uses the feature vectors as well) and then fed to Solr ? ( in the Json format described ? with the different weights and components automatically calculated)

          In that case, I don't see it as a part of the solrconfig.xml .
          Furthermore, as Diego pointed out, are we sure we want to need a core reload each time we add a feature/model ?
          I see a better fit in there to have a managed resource ( like the synonyms for example), and the possibility of adding features and model at runtime, without any core reload or restart necessary,

          Show
          alessandro.benedetti Alessandro Benedetti added a comment - - edited Maybe I still have not a clear picture, but isn't the model generated externally, with a training set and a training library ( that uses the feature vectors as well) and then fed to Solr ? ( in the Json format described ? with the different weights and components automatically calculated) In that case, I don't see it as a part of the solrconfig.xml . Furthermore, as Diego pointed out, are we sure we want to need a core reload each time we add a feature/model ? I see a better fit in there to have a managed resource ( like the synonyms for example), and the possibility of adding features and model at runtime, without any core reload or restart necessary,
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user alessandrobenedetti commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/4#discussion_r55499494

          — Diff: solr/contrib/ltr/README.txt —
          @@ -0,0 +1,330 @@
          +Apache Solr Learning to Rank
          +========
          +
          +This is the main [learning to rank integrated into solr](http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp)
          +repository.
          +[Read up on learning to rank](https://en.wikipedia.org/wiki/Learning_to_rank)
          +
          +Apache Solr Learning to Rank (LTR) provides a way for you to extract features
          +directly inside Solr for use in training a machine learned model. You can then
          +deploy that model to Solr and use it to rerank your top X search results.
          +
          +
          +# Changes to solrconfig.xml
          +```xml
          +<config>
          + ...
          +
          + <!-- Query parser used to rerank top docs with a provided model -->
          + <queryParser name="ltr" class="org.apache.solr.ltr.ranking.LTRQParserPlugin" />
          +
          + <!-- Transformer that will encode the document features in the response.
          + For each document the transformer will add the features as an extra field
          + in the response. The name of the field we will be the the name of the
          + transformer enclosed between brackets (in this case [features]).
          + In order to get the feature vector you will have to
          + specify that you want the field (e.g., fl="*,[features]) -->
          + <transformer name="features" class="org.apache.solr.ltr.ranking.LTRFeatureLoggerTransformerFactory" />
          +
          +
          + <!-- Component that hooks up managed resources for features and models -->
          + <searchComponent name="ltrComponent" class="org.apache.solr.ltr.ranking.LTRComponent"/>
          + <requestHandler name="/query" class="solr.SearchHandler">
          + <lst name="defaults">
          + <str name="echoParams">explicit</str>
          + <str name="wt">json</str>
          + <str name="indent">true</str>
          + <str name="df">id</str>
          + </lst>
          + <arr name="last-components">
          + <!-- Use the component in your requestHandler -->
          + <str>ltrComponent</str>
          + </arr>
          + </requestHandler>
          +
          + <query>
          + ...
          +
          + <!-- Cache for storing and fetching feature vectors -->
          + <cache name="QUERY_DOC_FV"
          + class="solr.search.LRUCache"
          + size="4096"
          + initialSize="2048"
          + autowarmCount="4096"
          + regenerator="solr.search.NoOpRegenerator" />
          + </query>
          +
          +</config>
          +
          +```
          +
          +
          +# Build the plugin
          +In the solr/contrib/ltr directory run
          +`ant dist`
          +
          +# Install the plugin
          +In your solr installation, navigate to your collection's lib directory.
          +In the solr install example, it would be solr/collection1/lib.
          +If lib doesn't exist you will have to make it, and then copy the plugin's jar there.
          +
          +`cp lucene-solr/solr/dist/solr-ltr-X.Y.Z-SNAPSHOT.jar mySolrInstallPath/solr/myCollection/lib`
          +
          +Restart your collection using the admin page and you are good to go.
          +You can find more detailed instructions [here](https://wiki.apache.org/solr/SolrPlugins).
          +
          +
          +# Defining Features
          +In the learning to rank plugin, you can define features in a feature space
          +using standard Solr queries. As an example:
          +
          +###### features.json
          +```json
          +[
          +{ "name": "isBook",
          + "type": "org.apache.solr.ltr.feature.impl.SolrFeature",
          + "params":{ "fq": ["

          {!terms f=category}

          book"] }
          +},
          +{
          + "name": "documentRecency",
          + "type": "org.apache.solr.ltr.feature.impl.SolrFeature",
          + "params": {
          + "q": "

          {!func}

          recip( ms(NOW,publish_date), 3.16e-11, 1, 1)"
          + }
          +},
          +{
          + "name":"originalScore",
          + "type":"org.apache.solr.ltr.feature.impl.OriginalScoreFeature",
          + "params":{}
          +},
          +{
          + "name" : "userTextTitleMatch",
          + "type" : "org.apache.solr.ltr.feature.impl.SolrFeature",
          + "params" : { "q" : "

          {!field f=title}

          $

          {user_text}

          " }
          +}
          +]
          +```
          +
          +Defines four features. Anything that is a valid Solr query can be used to define
          +a feature.
          +
          +### Filter Query Features
          +The first feature isBook fires if the term 'book' matches the category field
          +for the given examined document. Since in this feature q was not specified,
          +either the score 1 (in case of a match) or the score 0 (in case of no match)
          +will be returned.
          +
          +### Query Features
          +In the second feature (documentRecency) q was specified using a function query.
          +In this case the score for the feature on a given document is whatever the query
          +returns (1 for docs dated now, 1/2 for docs dated 1 year ago, 1/3 for docs dated
          +2 years ago, etc..) . If both an fq and q is used, documents that don't match
          +the fq will receive a score of 0 for the documentRecency feature, all other
          +documents will receive the score specified by the query for this feature.
          +
          +### Original Score Feature
          +The third feature (originalScore) has no parameters, and uses the
          +OriginalScoreFeature class instead of the SolrFeature class. Its purpose is
          +to simply return the score for the original search request against the current
          +matching document.
          +
          +### External Features
          +Users can specify external information that can to be passed in as
          +part of the query to the ltr ranking framework. In this case, the
          +fourth feature (userTextPhraseMatch) will be looking for an external field
          +called 'user_text' passed in through the request, and will fire if there is
          +a term match for the document field 'title' from the value of the external
          +field 'user_text'. See the "Run a Rerank Query" section for how
          +to pass in external information.
          +
          +### Custom Features
          +Custom features can be created by extending from
          +org.apache.solr.ltr.ranking.Feature, however this is generally not recommended.
          +The majority of features should be possible to create using the methods described
          +above.
          +
          +# Defining Models
          +Currently the Learning to Rank plugin supports 2 main types of
          +ranking models: [Ranking SVM](http://www.cs.cornell.edu/people/tj/publications/joachims_02c.pdf)
          +and [LambdaMART](http://research.microsoft.com/pubs/132652/MSR-TR-2010-82.pdf)
          +
          +### Ranking SVM
          +Currently only a linear ranking svm is supported. Use LambdaMART for
          +a non-linear model. If you'd like to introduce a bias set a constant feature
          +to the bias value you'd like and make a weight of 1.0 for that feature.
          +
          +###### model.json
          +```json
          +{
          + "type":"org.apache.solr.ltr.ranking.RankSVMModel",
          + "name":"myModelName",
          + "features":[
          +

          { "name": "userTextTitleMatch"}

          ,
          +

          { "name": "originalScore"}

          ,
          +

          { "name": "isBook"}

          + ],
          + "params":{
          + "weights":

          { + "userTextTitleMatch": 1.0, + "originalScore": 0.5, + "isBook": 0.1 + }

          +
          + }
          +}
          +```
          +
          +This is an example of a toy Ranking SVM model. Type specifies the class to be
          +using to interpret the model (RankSVMModel in the case of Ranking SVM).
          +Name is the model identifier you will use when making request to the ltr
          +framework. Features specifies the feature space that you want extracted
          +when using this model. All features that appear in the model params will
          +be used for scoring and must appear in the features list. You can add
          +extra features to the features list that will be computed but not used in the
          +model for scoring, which can be useful for logging.
          +Params are the Ranking SVM parameters.
          +
          +Good library for training SVM's (https://www.csie.ntu.edu.tw/~cjlin/liblinear/ ,
          +https://www.csie.ntu.edu.tw/~cjlin/libsvm/) . You will need to convert the
          +libSVM model format to the format specified above.
          +
          +### LambdaMART
          +
          +###### model2.json
          +```json
          +{
          + "type":"org.apache.solr.ltr.ranking.LambdaMARTModel",
          + "name":"lambdamartmodel",
          + "features":[
          +

          { "name": "userTextTitleMatch"}

          ,
          +

          { "name": "originalScore"}

          + ],
          + "params":{
          + "trees": [
          + {
          + "weight" : 1,
          + "tree": {
          + "feature": "userTextTitleMatch",
          + "threshold": 0.5,
          + "left" :

          { + "value" : -100 + }

          ,
          + "right": {
          + "feature" : "originalScore",
          + "threshold": 10.0,
          + "left" :

          { + "value" : 50 + }

          ,
          + "right" :

          { + "value" : 75 + }

          + }
          + }
          + },
          + {
          + "weight" : 2,
          + "tree":

          { + "value" : -10 + }

          + }
          + ]
          + }
          +}
          +```
          +This is an example of a toy LambdaMART. Type specifies the class to be using to
          +interpret the model (LambdaMARTModel in the case of LambdaMART). Name is the
          +model identifier you will use when making request to the ltr framework.
          +Features specifies the feature space that you want extracted when using this
          +model. All features that appear in the model params will be used for scoring and
          +must appear in the features list. You can add extra features to the features
          +list that will be computed but not used in the model for scoring, which can
          +be useful for logging. Params are the LambdaMART specific parameters. In this
          +case we have 2 trees, one with 3 leaf nodes and one with 1 leaf node.
          +
          +A good library for training LambdaMART ( http://sourceforge.net/p/lemur/wiki/RankLib/ ).
          +You will need to convert the RankLib model format to the format specified above.
          +
          +# Deploy Models and Features
          +To send features run
          +
          +`curl -XPUT 'http://localhost:8983/solr/collection1/schema/fstore' --data-binary @/path/features.json -H 'Content-type:application/json'`
          +
          +To send models run
          +
          +`curl -XPUT 'http://localhost:8983/solr/collection1/schema/mstore' --data-binary @/path/model.json -H 'Content-type:application/json'`
          +
          +
          +# View Models and Features
          +`curl -XGET 'http://localhost:8983/solr/collection1/schema/fstore'`
          +`curl -XGET 'http://localhost:8983/solr/collection1/schema/mstore'`
          +
          +
          +# Run a Rerank Query
          +Add to your original solr query
          +`rq=

          {!ltr model=myModelName reRankDocs=25}

          `
          +
          +The model name is the name of the model you sent to solr earlier.
          +The number of documents you want reranked, which can be larger than the
          +number you display, is reRankDocs.
          +
          +### Pass in external information for external features
          +Add to your original solr query
          +`rq=

          {!ltr reRankDocs=3 model=externalmodel efi.field1='text1' efi.field2='text2'}

          `
          +
          +Where "field1" specifies the name of the customized field to be used by one
          +or more of your features, and text1 is the information to be pass in. As an
          +example that matches the earlier shown userTextTitleMatch feature one could do:
          +
          +`rq=

          {!ltr reRankDocs=3 model=externalmodel efi.user_text='Casablanca' efi.user_intent='movie'}

          `
          +
          +# Extract features
          +To extract features you need to use the feature vector transformer + set the
          +fv parameter to true (this required parameter will be removed in the future).
          +For now you need to also use a dummy model with all the features you want to
          +extract inside the features parameter list of the model (this limitation will
          +also be changed in the future so you can extract features without a dummy model).
          +
          +`fv=true&fl=*,score,[features]&rq=

          {!ltr model=dummyModel reRankDocs=25}

          `
          +
          +## Test the plugin with solr/example/techproducts in 6 steps
          +
          +Solr provides some simple example of indices. In order to test the plugin with
          +the techproducts example please follow these steps
          +
          +1. compile solr and the examples
          +
          + cd solr
          + ant dist
          + ant example
          — End diff –

          I think ant example is deprecated in the current master branch,
          we should point that with recent releases,
          ant server
          is necessary!

          Show
          githubbot ASF GitHub Bot added a comment - Github user alessandrobenedetti commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/4#discussion_r55499494 — Diff: solr/contrib/ltr/README.txt — @@ -0,0 +1,330 @@ +Apache Solr Learning to Rank +======== + +This is the main [learning to rank integrated into solr] ( http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp ) +repository. + [Read up on learning to rank] ( https://en.wikipedia.org/wiki/Learning_to_rank ) + +Apache Solr Learning to Rank (LTR) provides a way for you to extract features +directly inside Solr for use in training a machine learned model. You can then +deploy that model to Solr and use it to rerank your top X search results. + + +# Changes to solrconfig.xml +```xml +<config> + ... + + <!-- Query parser used to rerank top docs with a provided model --> + <queryParser name="ltr" class="org.apache.solr.ltr.ranking.LTRQParserPlugin" /> + + <!-- Transformer that will encode the document features in the response. + For each document the transformer will add the features as an extra field + in the response. The name of the field we will be the the name of the + transformer enclosed between brackets (in this case [features] ). + In order to get the feature vector you will have to + specify that you want the field (e.g., fl="*, [features] ) --> + <transformer name="features" class="org.apache.solr.ltr.ranking.LTRFeatureLoggerTransformerFactory" /> + + + <!-- Component that hooks up managed resources for features and models --> + <searchComponent name="ltrComponent" class="org.apache.solr.ltr.ranking.LTRComponent"/> + <requestHandler name="/query" class="solr.SearchHandler"> + <lst name="defaults"> + <str name="echoParams">explicit</str> + <str name="wt">json</str> + <str name="indent">true</str> + <str name="df">id</str> + </lst> + <arr name="last-components"> + <!-- Use the component in your requestHandler --> + <str>ltrComponent</str> + </arr> + </requestHandler> + + <query> + ... + + <!-- Cache for storing and fetching feature vectors --> + <cache name="QUERY_DOC_FV" + class="solr.search.LRUCache" + size="4096" + initialSize="2048" + autowarmCount="4096" + regenerator="solr.search.NoOpRegenerator" /> + </query> + +</config> + +``` + + +# Build the plugin +In the solr/contrib/ltr directory run +`ant dist` + +# Install the plugin +In your solr installation, navigate to your collection's lib directory. +In the solr install example, it would be solr/collection1/lib. +If lib doesn't exist you will have to make it, and then copy the plugin's jar there. + +`cp lucene-solr/solr/dist/solr-ltr-X.Y.Z-SNAPSHOT.jar mySolrInstallPath/solr/myCollection/lib` + +Restart your collection using the admin page and you are good to go. +You can find more detailed instructions [here] ( https://wiki.apache.org/solr/SolrPlugins ). + + +# Defining Features +In the learning to rank plugin, you can define features in a feature space +using standard Solr queries. As an example: + +###### features.json +```json +[ +{ "name": "isBook", + "type": "org.apache.solr.ltr.feature.impl.SolrFeature", + "params":{ "fq": [" {!terms f=category} book"] } +}, +{ + "name": "documentRecency", + "type": "org.apache.solr.ltr.feature.impl.SolrFeature", + "params": { + "q": " {!func} recip( ms(NOW,publish_date), 3.16e-11, 1, 1)" + } +}, +{ + "name":"originalScore", + "type":"org.apache.solr.ltr.feature.impl.OriginalScoreFeature", + "params":{} +}, +{ + "name" : "userTextTitleMatch", + "type" : "org.apache.solr.ltr.feature.impl.SolrFeature", + "params" : { "q" : " {!field f=title} $ {user_text} " } +} +] +``` + +Defines four features. Anything that is a valid Solr query can be used to define +a feature. + +### Filter Query Features +The first feature isBook fires if the term 'book' matches the category field +for the given examined document. Since in this feature q was not specified, +either the score 1 (in case of a match) or the score 0 (in case of no match) +will be returned. + +### Query Features +In the second feature (documentRecency) q was specified using a function query. +In this case the score for the feature on a given document is whatever the query +returns (1 for docs dated now, 1/2 for docs dated 1 year ago, 1/3 for docs dated +2 years ago, etc..) . If both an fq and q is used, documents that don't match +the fq will receive a score of 0 for the documentRecency feature, all other +documents will receive the score specified by the query for this feature. + +### Original Score Feature +The third feature (originalScore) has no parameters, and uses the +OriginalScoreFeature class instead of the SolrFeature class. Its purpose is +to simply return the score for the original search request against the current +matching document. + +### External Features +Users can specify external information that can to be passed in as +part of the query to the ltr ranking framework. In this case, the +fourth feature (userTextPhraseMatch) will be looking for an external field +called 'user_text' passed in through the request, and will fire if there is +a term match for the document field 'title' from the value of the external +field 'user_text'. See the "Run a Rerank Query" section for how +to pass in external information. + +### Custom Features +Custom features can be created by extending from +org.apache.solr.ltr.ranking.Feature, however this is generally not recommended. +The majority of features should be possible to create using the methods described +above. + +# Defining Models +Currently the Learning to Rank plugin supports 2 main types of +ranking models: [Ranking SVM] ( http://www.cs.cornell.edu/people/tj/publications/joachims_02c.pdf ) +and [LambdaMART] ( http://research.microsoft.com/pubs/132652/MSR-TR-2010-82.pdf ) + +### Ranking SVM +Currently only a linear ranking svm is supported. Use LambdaMART for +a non-linear model. If you'd like to introduce a bias set a constant feature +to the bias value you'd like and make a weight of 1.0 for that feature. + +###### model.json +```json +{ + "type":"org.apache.solr.ltr.ranking.RankSVMModel", + "name":"myModelName", + "features":[ + { "name": "userTextTitleMatch"} , + { "name": "originalScore"} , + { "name": "isBook"} + ], + "params":{ + "weights": { + "userTextTitleMatch": 1.0, + "originalScore": 0.5, + "isBook": 0.1 + } + + } +} +``` + +This is an example of a toy Ranking SVM model. Type specifies the class to be +using to interpret the model (RankSVMModel in the case of Ranking SVM). +Name is the model identifier you will use when making request to the ltr +framework. Features specifies the feature space that you want extracted +when using this model. All features that appear in the model params will +be used for scoring and must appear in the features list. You can add +extra features to the features list that will be computed but not used in the +model for scoring, which can be useful for logging. +Params are the Ranking SVM parameters. + +Good library for training SVM's ( https://www.csie.ntu.edu.tw/~cjlin/liblinear/ , + https://www.csie.ntu.edu.tw/~cjlin/libsvm/ ) . You will need to convert the +libSVM model format to the format specified above. + +### LambdaMART + +###### model2.json +```json +{ + "type":"org.apache.solr.ltr.ranking.LambdaMARTModel", + "name":"lambdamartmodel", + "features":[ + { "name": "userTextTitleMatch"} , + { "name": "originalScore"} + ], + "params":{ + "trees": [ + { + "weight" : 1, + "tree": { + "feature": "userTextTitleMatch", + "threshold": 0.5, + "left" : { + "value" : -100 + } , + "right": { + "feature" : "originalScore", + "threshold": 10.0, + "left" : { + "value" : 50 + } , + "right" : { + "value" : 75 + } + } + } + }, + { + "weight" : 2, + "tree": { + "value" : -10 + } + } + ] + } +} +``` +This is an example of a toy LambdaMART. Type specifies the class to be using to +interpret the model (LambdaMARTModel in the case of LambdaMART). Name is the +model identifier you will use when making request to the ltr framework. +Features specifies the feature space that you want extracted when using this +model. All features that appear in the model params will be used for scoring and +must appear in the features list. You can add extra features to the features +list that will be computed but not used in the model for scoring, which can +be useful for logging. Params are the LambdaMART specific parameters. In this +case we have 2 trees, one with 3 leaf nodes and one with 1 leaf node. + +A good library for training LambdaMART ( http://sourceforge.net/p/lemur/wiki/RankLib/ ). +You will need to convert the RankLib model format to the format specified above. + +# Deploy Models and Features +To send features run + +`curl -XPUT 'http://localhost:8983/solr/collection1/schema/fstore' --data-binary @/path/features.json -H 'Content-type:application/json'` + +To send models run + +`curl -XPUT 'http://localhost:8983/solr/collection1/schema/mstore' --data-binary @/path/model.json -H 'Content-type:application/json'` + + +# View Models and Features +`curl -XGET 'http://localhost:8983/solr/collection1/schema/fstore'` +`curl -XGET 'http://localhost:8983/solr/collection1/schema/mstore'` + + +# Run a Rerank Query +Add to your original solr query +`rq= {!ltr model=myModelName reRankDocs=25} ` + +The model name is the name of the model you sent to solr earlier. +The number of documents you want reranked, which can be larger than the +number you display, is reRankDocs. + +### Pass in external information for external features +Add to your original solr query +`rq= {!ltr reRankDocs=3 model=externalmodel efi.field1='text1' efi.field2='text2'} ` + +Where "field1" specifies the name of the customized field to be used by one +or more of your features, and text1 is the information to be pass in. As an +example that matches the earlier shown userTextTitleMatch feature one could do: + +`rq= {!ltr reRankDocs=3 model=externalmodel efi.user_text='Casablanca' efi.user_intent='movie'} ` + +# Extract features +To extract features you need to use the feature vector transformer + set the +fv parameter to true (this required parameter will be removed in the future). +For now you need to also use a dummy model with all the features you want to +extract inside the features parameter list of the model (this limitation will +also be changed in the future so you can extract features without a dummy model). + +`fv=true&fl=*,score, [features] &rq= {!ltr model=dummyModel reRankDocs=25} ` + +## Test the plugin with solr/example/techproducts in 6 steps + +Solr provides some simple example of indices. In order to test the plugin with +the techproducts example please follow these steps + +1. compile solr and the examples + + cd solr + ant dist + ant example — End diff – I think ant example is deprecated in the current master branch, we should point that with recent releases, ant server is necessary!
          Hide
          alessandro.benedetti Alessandro Benedetti added a comment -

          Just started playing with training a lambdaMart model with RankLib.
          Which tool did you use to parse the RankLib model to the Json format compatible with LTR Plugin ( by default RankLib returns an XML describing the trained model) ?
          Any suggestion would be useful!

          Show
          alessandro.benedetti Alessandro Benedetti added a comment - Just started playing with training a lambdaMart model with RankLib. Which tool did you use to parse the RankLib model to the Json format compatible with LTR Plugin ( by default RankLib returns an XML describing the trained model) ? Any suggestion would be useful!
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user alessandrobenedetti commented on a diff in the pull request:

          https://github.com/apache/lucene-solr/pull/4#discussion_r55557481

          — Diff: solr/contrib/ltr/src/java/org/apache/solr/ltr/ranking/ModelQuery.java —
          @@ -0,0 +1,540 @@
          +package org.apache.solr.ltr.ranking;
          +
          +/*
          + * Licensed to the Apache Software Foundation (ASF) under one or more
          + * contributor license agreements. See the NOTICE file distributed with
          + * this work for additional information regarding copyright ownership.
          + * The ASF licenses this file to You under the Apache License, Version 2.0
          + * (the "License"); you may not use this file except in compliance with
          + * the License. You may obtain a copy of the License at
          + *
          + * http://www.apache.org/licenses/LICENSE-2.0
          + *
          + * Unless required by applicable law or agreed to in writing, software
          + * distributed under the License is distributed on an "AS IS" BASIS,
          + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
          + * See the License for the specific language governing permissions and
          + * limitations under the License.
          + */
          +
          +import java.io.IOException;
          +import java.util.ArrayList;
          +import java.util.Collection;
          +import java.util.HashMap;
          +import java.util.List;
          +import java.util.Map;
          +import java.util.Set;
          +
          +import org.apache.lucene.index.LeafReaderContext;
          +import org.apache.lucene.index.Term;
          +import org.apache.lucene.search.DisiPriorityQueue;
          +import org.apache.lucene.search.DisiWrapper;
          +import org.apache.lucene.search.DisjunctionDISIApproximation;
          +import org.apache.lucene.search.DocIdSetIterator;
          +import org.apache.lucene.search.Explanation;
          +import org.apache.lucene.search.IndexSearcher;
          +import org.apache.lucene.search.Query;
          +import org.apache.lucene.search.Scorer;
          +import org.apache.lucene.search.Weight;
          +import org.apache.lucene.search.Scorer.ChildScorer;
          +import org.apache.solr.ltr.feature.ModelMetadata;
          +import org.apache.solr.ltr.feature.norm.Normalizer;
          +import org.apache.solr.ltr.feature.norm.impl.IdentityNormalizer;
          +import org.apache.solr.ltr.log.FeatureLogger;
          +import org.apache.solr.request.SolrQueryRequest;
          +
          +/**
          + * The ranking query that is run, reranking results using the ModelMetadata
          + * algorithm
          + */
          +public class ModelQuery extends Query {
          +
          + // contains a description of the model
          + protected ModelMetadata meta;
          + // feature logger to output the features.
          + private FeatureLogger fl = null;
          + // Map of external parameters, such as query intent, that can be used by
          + // features
          + protected Map<String,String> efi;
          + // Original solr query used to fetch matching documents
          + protected Query originalQuery;
          + // Original solr request
          + protected SolrQueryRequest request;
          +
          + public ModelQuery(ModelMetadata meta)

          { + this.meta = meta; + }

          +
          + public ModelMetadata getMetadata()

          { + return meta; + }

          +
          + public void setFeatureLogger(FeatureLogger fl)

          { + this.fl = fl; + }

          +
          + public FeatureLogger getFeatureLogger()

          { + return this.fl; + }

          +
          + public Collection<Feature> getAllFeatures()

          { + return meta.getAllFeatures(); + }

          +
          + public void setOriginalQuery(Query mainQuery)

          { + this.originalQuery = mainQuery; + }

          +
          + public void setExternalFeatureInfo(Map<String,String> externalFeatureInfo)

          { + this.efi = externalFeatureInfo; + }

          +
          + public void setRequest(SolrQueryRequest request)

          { + this.request = request; + }

          +
          + @Override
          + public int hashCode() {
          + final int prime = 31;
          + int result = super.hashCode();
          + result = prime * result + ((meta == null) ? 0 : meta.hashCode());
          + result = prime * result
          + + ((originalQuery == null) ? 0 : originalQuery.hashCode());
          + result = prime * result + ((efi == null) ? 0 : originalQuery.hashCode());
          — End diff –

          I think this is a typo.
          It should be :
          result = prime * result + ((efi == null) ? 0 : efi.hashCode());

          This is a small thing but actually currently make the system not usable when you experiment different refi variable values. Basically the cache is always hit, even if your refi variables change dynamically.
          Anyway is really a minimal fix

          Show
          githubbot ASF GitHub Bot added a comment - Github user alessandrobenedetti commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/4#discussion_r55557481 — Diff: solr/contrib/ltr/src/java/org/apache/solr/ltr/ranking/ModelQuery.java — @@ -0,0 +1,540 @@ +package org.apache.solr.ltr.ranking; + +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collection; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; + +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.index.Term; +import org.apache.lucene.search.DisiPriorityQueue; +import org.apache.lucene.search.DisiWrapper; +import org.apache.lucene.search.DisjunctionDISIApproximation; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.search.Explanation; +import org.apache.lucene.search.IndexSearcher; +import org.apache.lucene.search.Query; +import org.apache.lucene.search.Scorer; +import org.apache.lucene.search.Weight; +import org.apache.lucene.search.Scorer.ChildScorer; +import org.apache.solr.ltr.feature.ModelMetadata; +import org.apache.solr.ltr.feature.norm.Normalizer; +import org.apache.solr.ltr.feature.norm.impl.IdentityNormalizer; +import org.apache.solr.ltr.log.FeatureLogger; +import org.apache.solr.request.SolrQueryRequest; + +/** + * The ranking query that is run, reranking results using the ModelMetadata + * algorithm + */ +public class ModelQuery extends Query { + + // contains a description of the model + protected ModelMetadata meta; + // feature logger to output the features. + private FeatureLogger fl = null; + // Map of external parameters, such as query intent, that can be used by + // features + protected Map<String,String> efi; + // Original solr query used to fetch matching documents + protected Query originalQuery; + // Original solr request + protected SolrQueryRequest request; + + public ModelQuery(ModelMetadata meta) { + this.meta = meta; + } + + public ModelMetadata getMetadata() { + return meta; + } + + public void setFeatureLogger(FeatureLogger fl) { + this.fl = fl; + } + + public FeatureLogger getFeatureLogger() { + return this.fl; + } + + public Collection<Feature> getAllFeatures() { + return meta.getAllFeatures(); + } + + public void setOriginalQuery(Query mainQuery) { + this.originalQuery = mainQuery; + } + + public void setExternalFeatureInfo(Map<String,String> externalFeatureInfo) { + this.efi = externalFeatureInfo; + } + + public void setRequest(SolrQueryRequest request) { + this.request = request; + } + + @Override + public int hashCode() { + final int prime = 31; + int result = super.hashCode(); + result = prime * result + ((meta == null) ? 0 : meta.hashCode()); + result = prime * result + + ((originalQuery == null) ? 0 : originalQuery.hashCode()); + result = prime * result + ((efi == null) ? 0 : originalQuery.hashCode()); — End diff – I think this is a typo. It should be : result = prime * result + ((efi == null) ? 0 : efi.hashCode()); This is a small thing but actually currently make the system not usable when you experiment different refi variable values. Basically the cache is always hit, even if your refi variables change dynamically. Anyway is really a minimal fix
          Hide
          alessandro.benedetti Alessandro Benedetti added a comment -

          I think it is necessary to contribute the module configuration for Idea as well :

          dev-tools/idea/solr/contrib/ltr is necessary for a nice integration with IntelliJ Idea.

          Not sure if for Eclipse is necessary anything !

          Show
          alessandro.benedetti Alessandro Benedetti added a comment - I think it is necessary to contribute the module configuration for Idea as well : dev-tools/idea/solr/contrib/ltr is necessary for a nice integration with IntelliJ Idea. Not sure if for Eclipse is necessary anything !
          Hide
          jpantony Joshua Pantony added a comment -

          Hey Alessandro, thanks for all the interest! We actually wrote our own script to parse RankLib to the LTR Plugin format. Do you think it would be prudent to add that to this push? It seemed somewhat outside the scope of this ticket because we wanted the plugin to be as agnostic to the model training as possible, but I could see the logic in having some library specific utilities.

          I'll add some more documentation for the training phase.

          Show
          jpantony Joshua Pantony added a comment - Hey Alessandro, thanks for all the interest! We actually wrote our own script to parse RankLib to the LTR Plugin format. Do you think it would be prudent to add that to this push? It seemed somewhat outside the scope of this ticket because we wanted the plugin to be as agnostic to the model training as possible, but I could see the logic in having some library specific utilities. I'll add some more documentation for the training phase.
          Hide
          alexflower Alex added a comment -

          Hi guys, great plug-in. Using solr search queries as features is really cool.

          As far as I understand, at the moment the training is happening outside Solr. Would be really awesome if the training is happening inside Solr. I don't have any idea how this can be done, but I hope you guys have something in mind.

          Show
          alexflower Alex added a comment - Hi guys, great plug-in. Using solr search queries as features is really cool. As far as I understand, at the moment the training is happening outside Solr. Would be really awesome if the training is happening inside Solr. I don't have any idea how this can be done, but I hope you guys have something in mind.
          Hide
          alessandro.benedetti Alessandro Benedetti added a comment -

          Hi Joshua,
          I was assuming you had some sort of script/app tranformer to parse the xml and build your Json
          I think could be definitely useful to have it as well.

          I understand and I agree you didn't want to force the user to any specific training library ( and related model in output) .
          But in the end, the plugin supports ( at the moment) 2 possible learned model ( linear SVM and LambdaMart), so I think can be really helpful to provide users with a step by step guide to run an example end to end.

          I think the next step could be to add the training component in Solr as well.
          I will describe in another post in this issue, a possible basic approach

          Show
          alessandro.benedetti Alessandro Benedetti added a comment - Hi Joshua, I was assuming you had some sort of script/app tranformer to parse the xml and build your Json I think could be definitely useful to have it as well. I understand and I agree you didn't want to force the user to any specific training library ( and related model in output) . But in the end, the plugin supports ( at the moment) 2 possible learned model ( linear SVM and LambdaMart), so I think can be really helpful to provide users with a step by step guide to run an example end to end. I think the next step could be to add the training component in Solr as well. I will describe in another post in this issue, a possible basic approach
          Hide
          alessandro.benedetti Alessandro Benedetti added a comment -

          As I briefly discussed with Diego, about how to include the training in Solr as well :
          A simple integration could be :

          1) select a supported training library for linear SVM and one for the LambdaMart ( basically the libraries that you already suggest in the README could be a starting point)

          2) create an Update Request handler that accepts the training set ( and the format of the training set will be clearly described in the documentation like : LETOR )
          This update handler will basically take the training set file and related parameters supported by the related library to proceed with the training.
          Trying to use the default configuration parameter where possible, in the way to make it as easy as possible the user interaction.
          The update handler will then extract the document features ( a revisit of the cache could be interesting in here, to improve the rycicling of feature extraction)

          3) update request handler will train the model calling internally the selected library , using all the parameters provided. The model generated will be converted in the supported Json format and stored in the model store.

          This sample approach could be complicated as much as we want ( we can add flexibility in the library to be used and make it easy to extend) .
          A further next step could be to add a layer of signal processing directly in Solr , to build the training set as well .
          ( a sort of REST Api that takes in input the document, queryId, rating score) and automatically create an entry of the training set stored in some smart way.
          Than we can trigger the model generation or set up schedule to refresh the model automatically.
          We could even take into account only certain periods, store training data in different places, clean the training set automatically from time to time ect ext
          Now I am going off topic, but there are a lot of things to do with the training , to ease the integration
          Happy to discuss them and get new ideas to improve the plugin which I think is going to be really , really valuable for the Solr community

          Show
          alessandro.benedetti Alessandro Benedetti added a comment - As I briefly discussed with Diego, about how to include the training in Solr as well : A simple integration could be : 1) select a supported training library for linear SVM and one for the LambdaMart ( basically the libraries that you already suggest in the README could be a starting point) 2) create an Update Request handler that accepts the training set ( and the format of the training set will be clearly described in the documentation like : LETOR ) This update handler will basically take the training set file and related parameters supported by the related library to proceed with the training. Trying to use the default configuration parameter where possible, in the way to make it as easy as possible the user interaction. The update handler will then extract the document features ( a revisit of the cache could be interesting in here, to improve the rycicling of feature extraction) 3) update request handler will train the model calling internally the selected library , using all the parameters provided. The model generated will be converted in the supported Json format and stored in the model store. This sample approach could be complicated as much as we want ( we can add flexibility in the library to be used and make it easy to extend) . A further next step could be to add a layer of signal processing directly in Solr , to build the training set as well . ( a sort of REST Api that takes in input the document, queryId, rating score) and automatically create an entry of the training set stored in some smart way. Than we can trigger the model generation or set up schedule to refresh the model automatically. We could even take into account only certain periods, store training data in different places, clean the training set automatically from time to time ect ext Now I am going off topic, but there are a lot of things to do with the training , to ease the integration Happy to discuss them and get new ideas to improve the plugin which I think is going to be really , really valuable for the Solr community
          Hide
          alessandro.benedetti Alessandro Benedetti added a comment - - edited

          I will continue adding observations in here, feel free to re-organize the observations later :

          EFI
          Let's assume we have a problem where we decided to decompose categorical features.
          This means that potentially we can decompose a categorical features into N binary features.

          The original categorical feature can be single valued which means that when callilng the rank query component we don't want to send N efis .
          e.g.
          &rq=

          {!ltr model=lambdaModel4 reRankDocs=25 efi.isFromLondon=1 efi.isFromLiverpool=0 efi.isFromManchester=0 ...}

          but only one :
          e.g.
          &rq=

          {!ltr model=lambdaModel4 reRankDocs=25 efi.isFromLondon=1 }

          The others will be default to 0 .

          At the moment the plugin will complain with java.lang.NumberFormatException: For input string: \"$

          {efi.isFromManchester}

          \"" .
          We should add the default to 0 when the efi is not passed.
          Maybe I simply missed the syntax to do that, I tried some standard way like $

          {efi.isFromManchester:0}

          in the feature json definition but it doesn't work .

          just let me know if we have a better channel than Jira to notify these observations .

          Show
          alessandro.benedetti Alessandro Benedetti added a comment - - edited I will continue adding observations in here, feel free to re-organize the observations later : EFI Let's assume we have a problem where we decided to decompose categorical features. This means that potentially we can decompose a categorical features into N binary features. The original categorical feature can be single valued which means that when callilng the rank query component we don't want to send N efis . e.g. &rq= {!ltr model=lambdaModel4 reRankDocs=25 efi.isFromLondon=1 efi.isFromLiverpool=0 efi.isFromManchester=0 ...} but only one : e.g. &rq= {!ltr model=lambdaModel4 reRankDocs=25 efi.isFromLondon=1 } The others will be default to 0 . At the moment the plugin will complain with java.lang.NumberFormatException: For input string: \"$ {efi.isFromManchester} \"" . We should add the default to 0 when the efi is not passed. Maybe I simply missed the syntax to do that, I tried some standard way like $ {efi.isFromManchester:0} in the feature json definition but it doesn't work . just let me know if we have a better channel than Jira to notify these observations .
          Hide
          mnilsson Michael Nilsson added a comment - - edited

          Thanks for all of the feedback Alessandro, we're actively working on some of your comments so far! Nice catch on the hash function, and we're looking into adding default values for the external feature information (efi). As a part of this pull request we do not plan on adding training built into Solr, but that would be a very good next enhancement. However, to help people in the Solr community get started with training and testing with machine learned ranking models, we are putting together some scripts and updating our readme to incorporate actual steps to train a model with libsvm instead of using the sample model.json file we provided. This should make it a lot easier for people to pick this up and start using a real ranking model based off their own data. We're keeping track of both JIRA comments and Github pull request comments on our end so they don't get lost. This is working ok so far, but if others have better suggestions we're open to them too.

          Show
          mnilsson Michael Nilsson added a comment - - edited Thanks for all of the feedback Alessandro, we're actively working on some of your comments so far! Nice catch on the hash function, and we're looking into adding default values for the external feature information (efi). As a part of this pull request we do not plan on adding training built into Solr, but that would be a very good next enhancement. However, to help people in the Solr community get started with training and testing with machine learned ranking models, we are putting together some scripts and updating our readme to incorporate actual steps to train a model with libsvm instead of using the sample model.json file we provided. This should make it a lot easier for people to pick this up and start using a real ranking model based off their own data. We're keeping track of both JIRA comments and Github pull request comments on our end so they don't get lost. This is working ok so far, but if others have better suggestions we're open to them too.
          Hide
          aanilpala Ahmet Anil Pala added a comment -

          hi guys,

          great initiative! I love it. However, I will have some comments regarding some issues I am experiencing with LTR.

          • can we have a feature that is actually an external file field? I've tried it with FieldValueFeature and got NPE

          at org.apache.solr.ltr.feature.impl.FieldValueFeature$FieldValueFeatureWeight$FieldValueFeatureScorer.score(FieldValueFeature.java:93)

          If this has not been implemented yet, it would be nice to have it.

          • I need to reload the core whenever I want to curl new features after wiping the old version. If I don't do it, I get the following:

          "Bad Request (400) - Expected Map to create a new ManagedResource but received a java.util.ArrayList\n\tat org.apache.solr.rest.RestManager$RestManagerManagedResource.doPut(RestManager.java:523)

          • I know this is a long shot but worth asking. Apparently, LTR has been developed to have relatively 'simple' (pointwise or pairwise constraint training without kernels like SVM with linear kernel) machine learned models. Are there plans to implement a version which can rank the search results based on a classifier which works on document pairs and tells which one should be ranked higher than the other as opposed to a model that calculates a score given a single document and then reorders the results by that score?
          Show
          aanilpala Ahmet Anil Pala added a comment - hi guys, great initiative! I love it. However, I will have some comments regarding some issues I am experiencing with LTR. can we have a feature that is actually an external file field? I've tried it with FieldValueFeature and got NPE at org.apache.solr.ltr.feature.impl.FieldValueFeature$FieldValueFeatureWeight$FieldValueFeatureScorer.score(FieldValueFeature.java:93) If this has not been implemented yet, it would be nice to have it. I need to reload the core whenever I want to curl new features after wiping the old version. If I don't do it, I get the following: "Bad Request (400) - Expected Map to create a new ManagedResource but received a java.util.ArrayList\n\tat org.apache.solr.rest.RestManager$RestManagerManagedResource.doPut(RestManager.java:523) I know this is a long shot but worth asking. Apparently, LTR has been developed to have relatively 'simple' (pointwise or pairwise constraint training without kernels like SVM with linear kernel) machine learned models. Are there plans to implement a version which can rank the search results based on a classifier which works on document pairs and tells which one should be ranked higher than the other as opposed to a model that calculates a score given a single document and then reorders the results by that score?
          Hide
          aanilpala Ahmet Anil Pala added a comment -

          Update: I got external file fields through 'q' parameter in org.apache.solr.ltr.feature.impl.SolrFeature. Works fine although I still think FieldValueFeature should provide access to them in addition to the non-eff fields.

          Show
          aanilpala Ahmet Anil Pala added a comment - Update: I got external file fields through 'q' parameter in org.apache.solr.ltr.feature.impl.SolrFeature. Works fine although I still think FieldValueFeature should provide access to them in addition to the non-eff fields.
          Hide
          alessandro.benedetti Alessandro Benedetti added a comment -

          Hi gents,
          I am going to start using the plugin in a closer way.
          I think it is likely that I will find small bugs ( like the cache for EFI features ect) or improvements.
          What is the last version of the code available ?
          How can I contribute back improvements/bug-fix ?

          Is this the last version : https://github.com/bloomberg/lucene-solr/commits/master-ltr-plugin-rfc-cpoerschke-comments ?

          Could make sense to create a separate repo, containing only the plugin, self contained without the entire Solr.
          In that way I could branch from there, and then time by time ask pull-requests to include bug-fix if approved.

          What do you think? Diego Ceccarelli Michael Nilsson ?

          Cheers

          Show
          alessandro.benedetti Alessandro Benedetti added a comment - Hi gents, I am going to start using the plugin in a closer way. I think it is likely that I will find small bugs ( like the cache for EFI features ect) or improvements. What is the last version of the code available ? How can I contribute back improvements/bug-fix ? Is this the last version : https://github.com/bloomberg/lucene-solr/commits/master-ltr-plugin-rfc-cpoerschke-comments ? Could make sense to create a separate repo, containing only the plugin, self contained without the entire Solr. In that way I could branch from there, and then time by time ask pull-requests to include bug-fix if approved. What do you think? Diego Ceccarelli Michael Nilsson ? Cheers
          Hide
          diegoceccarelli Diego Ceccarelli added a comment -

          Thanks Alessandro,
          Please refer to the plugin master branch https://github.com/bloomberg/lucene-solr/tree/master-ltr-plugin-rfc, we are going to merge there Christine's changes.

          > How can I contribute back improvements/bug-fix ?
          Github PR are welcome.

          >Could make sense to create a separate repo, containing only the plugin, self contained without the entire Solr.

          I'm not against having a separate repo only with the plugin. what do you think Christine Poerschke?

          Show
          diegoceccarelli Diego Ceccarelli added a comment - Thanks Alessandro, Please refer to the plugin master branch https://github.com/bloomberg/lucene-solr/tree/master-ltr-plugin-rfc , we are going to merge there Christine's changes. > How can I contribute back improvements/bug-fix ? Github PR are welcome. >Could make sense to create a separate repo, containing only the plugin, self contained without the entire Solr. I'm not against having a separate repo only with the plugin. what do you think Christine Poerschke ?
          Hide
          diegoceccarelli Diego Ceccarelli added a comment -

          Great!

          Show
          diegoceccarelli Diego Ceccarelli added a comment - Great!
          Hide
          jpantony Joshua Pantony added a comment -

          Hi, thanks for the interest! Was there a specific algorithm you had in mind that is currently not supported? Often it is possible to formulate comparisons in the training phase in such a way that you can still compare just one score in the live phase. Lets use rankSVM (a pairwise approach) as an example. Given documents D1 and D2, the feature vector represented by the function V(D), if we know that D1 > D2, we can formulate this in the training stage as the objective function (V(D1) - V(D2)) * W > 0 . Here we have created an objective function by directly comparing pairs of documents D1 and D2, hence it is pairwise. In the live phase given documents D1, D2, D3 and D4 we "could" do a direct pairwise approach aka:

          (V(D1) - V(D2)) * W > 0 ?,
          (V(D1) - V(D3)) * W > 0 ?,
          (V(D1) - V(D4)) * W > 0 ?,
          (V(D2) - V(D3)) * W > 0 ?,
          (V(D2) - V(D4)) * W > 0 ?,
          (V(D3) - V(D4)) * W > 0 ?

          However this is computationally inefficient. In this case if we do a direct comparison using our original objective function that we trained on, we'd need to do 6 dot products. Using some basic math, in the live phase we can change (D1 - D2) * W > 0 to V(D1) * W > V(D2) * W . Now all I need to do in a live setting is calculate V(D1) * W, V(D2) * W, V(D3) * W, V(D4) * W . Once we do that we can just sort the numbers and volla we've done pairwise comparisons in the same time complexity as a pointwise approach. Of course don't trust me, read this paper: http://www.cs.cornell.edu/people/tj/publications/joachims_02c.pdf (note I vastly simplified rank SVM here for ease of dialogue).

          So all that being said, I'll circle back to my original question, was there a specific algorithm you had in mind that we don't easily support? If so happy to add it in some future patch (no promise on when though ). [should be noted there is some debate / grey area around if lambdaMART is listwise or pairwise but it is generally considered among the strongest performing methods]

          Show
          jpantony Joshua Pantony added a comment - Hi, thanks for the interest! Was there a specific algorithm you had in mind that is currently not supported? Often it is possible to formulate comparisons in the training phase in such a way that you can still compare just one score in the live phase. Lets use rankSVM (a pairwise approach) as an example. Given documents D1 and D2, the feature vector represented by the function V(D), if we know that D1 > D2, we can formulate this in the training stage as the objective function (V(D1) - V(D2)) * W > 0 . Here we have created an objective function by directly comparing pairs of documents D1 and D2, hence it is pairwise. In the live phase given documents D1, D2, D3 and D4 we "could" do a direct pairwise approach aka: (V(D1) - V(D2)) * W > 0 ?, (V(D1) - V(D3)) * W > 0 ?, (V(D1) - V(D4)) * W > 0 ?, (V(D2) - V(D3)) * W > 0 ?, (V(D2) - V(D4)) * W > 0 ?, (V(D3) - V(D4)) * W > 0 ? However this is computationally inefficient. In this case if we do a direct comparison using our original objective function that we trained on, we'd need to do 6 dot products. Using some basic math, in the live phase we can change (D1 - D2) * W > 0 to V(D1) * W > V(D2) * W . Now all I need to do in a live setting is calculate V(D1) * W, V(D2) * W, V(D3) * W, V(D4) * W . Once we do that we can just sort the numbers and volla we've done pairwise comparisons in the same time complexity as a pointwise approach. Of course don't trust me, read this paper: http://www.cs.cornell.edu/people/tj/publications/joachims_02c.pdf (note I vastly simplified rank SVM here for ease of dialogue). So all that being said, I'll circle back to my original question, was there a specific algorithm you had in mind that we don't easily support? If so happy to add it in some future patch (no promise on when though ). [should be noted there is some debate / grey area around if lambdaMART is listwise or pairwise but it is generally considered among the strongest performing methods]
          Hide
          aanilpala Ahmet Anil Pala added a comment - - edited

          Hi, thanks for the answer.

          Well, nothing in particular. I have experimented with NNs and SVM with RBF kernels and they are promising especially in the cases where the target attribute is result of a complex interaction of inputs which is likely to be the case if you are after modelling some user behavior. What is different in the SVM with polynomial kernels is that although training can be done in a pairwise fashion (constraint training), in the 'live phase' the distance of an example form the separating hyperplane can be used to score the documents. This is possible because we can 'distribute' the W over the polynomial kernel as you did above:

          W(K(V(D_1), V(D_2)) > 0
          W(V(D_1) - V(D_2)) > 0 where K(A,B) = A - B
          W*V(D_1) - W*V(D_2) > 0

          However, some kernels do not allow this. For example, RBF kernel. RBF(D_1, D_2) = e^(-0.5*||D_1-D_2||ˆ2). This is also an example of 'kernel trick' where the non-linear feature mapping kernel does is implicit. In this case, we cannot use SVM as a scorer as our learned W is supposed to be multiplied by the kernel value of the document pair in the 'live phase' for the predictions. Therefore, In his paper Joachims didn't use SVM with kernels. He explains it as follows:

          "If Kernels are not used, this property makes the application of the learned retrieval function very efficient. Fast algorithms exists for computing rankings based on linear functions by means of inverted indices"

          As you said lambdaMart is a promising model. I like it especially because it is a hierarchical model. so the LTR can treat different search cases differently (e.g different hours of day, different ranking formula). However, I'd love to be able to at least use my pairwise NN model (used fann library) in Solr using LTR. But then, 'reordering' of the products will be based on a classifier and some near-optimal algorithm for using a classifier for reordering must be used. There do exist solutions for them although I don't know the performance implications of this. The following paper covers some of them : http://arxiv.org/pdf/1105.5464.pdf

          Show
          aanilpala Ahmet Anil Pala added a comment - - edited Hi, thanks for the answer. Well, nothing in particular. I have experimented with NNs and SVM with RBF kernels and they are promising especially in the cases where the target attribute is result of a complex interaction of inputs which is likely to be the case if you are after modelling some user behavior. What is different in the SVM with polynomial kernels is that although training can be done in a pairwise fashion (constraint training), in the 'live phase' the distance of an example form the separating hyperplane can be used to score the documents. This is possible because we can 'distribute' the W over the polynomial kernel as you did above: W(K(V(D_1), V(D_2)) > 0 W(V(D_1) - V(D_2)) > 0 where K(A,B) = A - B W*V(D_1) - W*V(D_2) > 0 However, some kernels do not allow this. For example, RBF kernel. RBF(D_1, D_2) = e^(-0.5*||D_1-D_2||ˆ2). This is also an example of 'kernel trick' where the non-linear feature mapping kernel does is implicit. In this case, we cannot use SVM as a scorer as our learned W is supposed to be multiplied by the kernel value of the document pair in the 'live phase' for the predictions. Therefore, In his paper Joachims didn't use SVM with kernels. He explains it as follows: "If Kernels are not used, this property makes the application of the learned retrieval function very efficient. Fast algorithms exists for computing rankings based on linear functions by means of inverted indices" As you said lambdaMart is a promising model. I like it especially because it is a hierarchical model. so the LTR can treat different search cases differently (e.g different hours of day, different ranking formula). However, I'd love to be able to at least use my pairwise NN model (used fann library) in Solr using LTR. But then, 'reordering' of the products will be based on a classifier and some near-optimal algorithm for using a classifier for reordering must be used. There do exist solutions for them although I don't know the performance implications of this. The following paper covers some of them : http://arxiv.org/pdf/1105.5464.pdf
          Hide
          jpantony Joshua Pantony added a comment -

          Okay makes sense. You are correct that we are limited in some cases. Other examples of algorithms that wouldn't currently adapt well are things like ListNet and BoltzRank (similar to your problem). Technically support for this could be added in the re scorer level. We made a conscious effort to focus our initial code on something that allowed for some of the more popular algorithms and also had fast performance. I'd love to add support for more. That being said if some friendly developer wanted to add that support we'd love a pull request . Our public branch can be found at: https://github.com/bloomberg/lucene-solr/tree/master-ltr-plugin-rfc .

          Show
          jpantony Joshua Pantony added a comment - Okay makes sense. You are correct that we are limited in some cases. Other examples of algorithms that wouldn't currently adapt well are things like ListNet and BoltzRank (similar to your problem). Technically support for this could be added in the re scorer level. We made a conscious effort to focus our initial code on something that allowed for some of the more popular algorithms and also had fast performance. I'd love to add support for more. That being said if some friendly developer wanted to add that support we'd love a pull request . Our public branch can be found at: https://github.com/bloomberg/lucene-solr/tree/master-ltr-plugin-rfc .
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user mnilsson23 opened a pull request:

          https://github.com/apache/lucene-solr/pull/40

          SOLR-8542: Integrate Learning to Rank into Solr

          Solr Learning to Rank (LTR) provides a way for you to extract features
          directly inside Solr for use in training a machine learned model. You
          can then deploy that model to Solr and use it to rerank your top X
          search results. This concept was previously presented by the authors at
          Lucene/Solr Revolution 2015.

          See the [README](https://github.com/bloomberg/lucene-solr/tree/master-ltr-plugin-release/solr/contrib/ltr) for more information on how to get started.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/bloomberg/lucene-solr master-ltr-plugin-release

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/lucene-solr/pull/40.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #40


          commit 073de9b2719abe91e106119b23b977e521e8b32f
          Author: Diego Ceccarelli <dceccarelli4@bloomberg.net>
          Date: 2016-01-13T22:29:17Z

          SOLR-8542: Integrate Learning to Rank into Solr

          Solr Learning to Rank (LTR) provides a way for you to extract features
          directly inside Solr for use in training a machine learned model. You
          can then deploy that model to Solr and use it to rerank your top X
          search results. This concept was previously presented by the authors at
          Lucene/Solr Revolution 2015

          commit b2bbe8c13122280ee5a76149bfb55fd1b7324279
          Author: Michael Nilsson <mnilsson23@bloomberg.net>
          Date: 2016-05-25T22:13:05Z

          Learning to Rank plugin updates

          • Updated our documentation about the training phase and how to train a real model for those that are not familiar with this process. We provided a step by step example building a rankSVM model externally, and supplied a sample script which does this using liblinear.
          • Formatted the code based on the lucene eclipse style
          • Updated the hashCode and equals functions of the ModelQuery as Alessandro Benedetti pointed out
          • Renamed ModelMetadata, the class you would subclass to add a new model for scoring docs, to LTRScoringAlgorithm
          • Cleaned up the LTRScoringAlgorithm to no longer have a type parameter
          • Added IntelliJ support. Thank you Alessandro Benedetti for adding it
          • Renamed mstore and fstore endpoints to feature-store and model-store as per Upayavira's suggestion
          • Added support for default efi parameters using the same Solr standard in solrconfig. When defining a feature in the config, put ${isFromManchester:0} to get 0 as a default, and you won't have to specify it in the request's efi params. Thanks for the enhancement suggestion Alessandro Benedetti
          • Removed the fv=true param requirement for extracting features.
          • You do not have to provide a "dummy model" first for extracting features, so you can request the transformer without the need of an rq ranking query. Inside the transformer you can provide a store=myFeatureStore param, and it will extract all features from that feature store directly. You can also provide local efi params if needed when extracting without an rq.

          Show
          githubbot ASF GitHub Bot added a comment - GitHub user mnilsson23 opened a pull request: https://github.com/apache/lucene-solr/pull/40 SOLR-8542 : Integrate Learning to Rank into Solr Solr Learning to Rank (LTR) provides a way for you to extract features directly inside Solr for use in training a machine learned model. You can then deploy that model to Solr and use it to rerank your top X search results. This concept was previously presented by the authors at Lucene/Solr Revolution 2015. See the [README] ( https://github.com/bloomberg/lucene-solr/tree/master-ltr-plugin-release/solr/contrib/ltr ) for more information on how to get started. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bloomberg/lucene-solr master-ltr-plugin-release Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/40.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #40 commit 073de9b2719abe91e106119b23b977e521e8b32f Author: Diego Ceccarelli <dceccarelli4@bloomberg.net> Date: 2016-01-13T22:29:17Z SOLR-8542 : Integrate Learning to Rank into Solr Solr Learning to Rank (LTR) provides a way for you to extract features directly inside Solr for use in training a machine learned model. You can then deploy that model to Solr and use it to rerank your top X search results. This concept was previously presented by the authors at Lucene/Solr Revolution 2015 commit b2bbe8c13122280ee5a76149bfb55fd1b7324279 Author: Michael Nilsson <mnilsson23@bloomberg.net> Date: 2016-05-25T22:13:05Z Learning to Rank plugin updates Updated our documentation about the training phase and how to train a real model for those that are not familiar with this process. We provided a step by step example building a rankSVM model externally, and supplied a sample script which does this using liblinear. Formatted the code based on the lucene eclipse style Updated the hashCode and equals functions of the ModelQuery as Alessandro Benedetti pointed out Renamed ModelMetadata, the class you would subclass to add a new model for scoring docs, to LTRScoringAlgorithm Cleaned up the LTRScoringAlgorithm to no longer have a type parameter Added IntelliJ support. Thank you Alessandro Benedetti for adding it Renamed mstore and fstore endpoints to feature-store and model-store as per Upayavira 's suggestion Added support for default efi parameters using the same Solr standard in solrconfig. When defining a feature in the config, put ${isFromManchester:0} to get 0 as a default, and you won't have to specify it in the request's efi params. Thanks for the enhancement suggestion Alessandro Benedetti Removed the fv=true param requirement for extracting features. You do not have to provide a "dummy model" first for extracting features, so you can request the transformer without the need of an rq ranking query. Inside the transformer you can provide a store=myFeatureStore param, and it will extract all features from that feature store directly. You can also provide local efi params if needed when extracting without an rq.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user diegoceccarelli commented on the pull request:

          https://github.com/apache/lucene-solr/pull/4#issuecomment-222163577

          thanks Alessandro, we integrated part of your PR in the new patch.

          Show
          githubbot ASF GitHub Bot added a comment - Github user diegoceccarelli commented on the pull request: https://github.com/apache/lucene-solr/pull/4#issuecomment-222163577 thanks Alessandro, we integrated part of your PR in the new patch.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user diegoceccarelli closed the pull request at:

          https://github.com/apache/lucene-solr/pull/4

          Show
          githubbot ASF GitHub Bot added a comment - Github user diegoceccarelli closed the pull request at: https://github.com/apache/lucene-solr/pull/4
          Hide
          mnilsson Michael Nilsson added a comment -

          Hi everyone! We just made a push with many changes that were requested by you guys, plus a few other things.
          We have also updated to the latest Solr master branch as of few days ago. Just as a heads up, we replaced the old pull request with a new one due to some history changes when merging with the latest master. Below you'll find a list some of the items we changed.

          • Updated our documentation about the training phase and how to train a real model for those that are not familiar with this process. We provided a step by step example building a rankSVM model externally, and supplied a sample script which does this using liblinear.
          • Formatted the code based on the lucene eclipse style
          • Updated the hashCode and equals functions of the ModelQuery as Alessandro Benedetti pointed out
          • Renamed ModelMetadata, the class you would subclass to add a new model for scoring docs, to LTRScoringAlgorithm
          • Cleaned up the LTRScoringAlgorithm to no longer have a type parameter
          • Added IntelliJ support. Thank you Alessandro Benedetti for adding it
          • Renamed mstore and fstore endpoints to feature-store and model-store as per Upayavira's suggestion
          • Added support for default efi parameters using the same Solr standard in solrconfig. When defining a feature in the config, put ${isFromManchester:0} to get 0 as a default, and you won't have to specify it in the request's efi params. Thanks for the enhancement suggestion Alessandro Benedetti
          • Removed the fv=true param requirement for extracting features.
          • You do not have to provide a "dummy model" first for extracting features, so you can request the transformer without the need of an rq ranking query. Inside the transformer you can provide a store=myFeatureStore param, and it will extract all features from that feature store directly. You can also provide local efi params if needed when extracting without an rq.

          Please read through the README for more information on the plugin, and how to train your own external model.
          Also, we have opened up the ability to create issues in our Github repository where the plugin currently lives.
          Please feel free to make or suggest issues, and we will keep track of them there instead of in this long list of comments.
          Thanks for the support everyone, and expect more frequent updates in the future.

          Show
          mnilsson Michael Nilsson added a comment - Hi everyone! We just made a push with many changes that were requested by you guys, plus a few other things. We have also updated to the latest Solr master branch as of few days ago. Just as a heads up, we replaced the old pull request with a new one due to some history changes when merging with the latest master. Below you'll find a list some of the items we changed. Updated our documentation about the training phase and how to train a real model for those that are not familiar with this process. We provided a step by step example building a rankSVM model externally, and supplied a sample script which does this using liblinear. Formatted the code based on the lucene eclipse style Updated the hashCode and equals functions of the ModelQuery as Alessandro Benedetti pointed out Renamed ModelMetadata, the class you would subclass to add a new model for scoring docs, to LTRScoringAlgorithm Cleaned up the LTRScoringAlgorithm to no longer have a type parameter Added IntelliJ support. Thank you Alessandro Benedetti for adding it Renamed mstore and fstore endpoints to feature-store and model-store as per Upayavira 's suggestion Added support for default efi parameters using the same Solr standard in solrconfig. When defining a feature in the config, put ${isFromManchester:0} to get 0 as a default, and you won't have to specify it in the request's efi params. Thanks for the enhancement suggestion Alessandro Benedetti Removed the fv=true param requirement for extracting features. You do not have to provide a "dummy model" first for extracting features, so you can request the transformer without the need of an rq ranking query. Inside the transformer you can provide a store=myFeatureStore param, and it will extract all features from that feature store directly. You can also provide local efi params if needed when extracting without an rq. Please read through the README for more information on the plugin, and how to train your own external model. Also, we have opened up the ability to create issues in our Github repository where the plugin currently lives. Please feel free to make or suggest issues, and we will keep track of them there instead of in this long list of comments. Thanks for the support everyone, and expect more frequent updates in the future.
          Hide
          mnilsson Michael Nilsson added a comment - - edited

          Hello everyone! We have just made a push to the Solr LTR contrib module pull request in preparation for upstreaming into Solr's master branch. We've made a lot of changes since May. We're up to date with the latest master, and ant validate passes. We fixed ant documentation-lint issues encountered in the contrib module, but linting stopped in changes.html so there might be some lingering lint issues.
          We welcome any comments on the contrib module, and please feel free to take a look at the README to get started. We will also be at this year's Lucene Solr Revolution if you want to stop by and ask us anything in person as well!

          Show
          mnilsson Michael Nilsson added a comment - - edited Hello everyone! We have just made a push to the Solr LTR contrib module pull request in preparation for upstreaming into Solr's master branch. We've made a lot of changes since May. We're up to date with the latest master, and ant validate passes. We fixed ant documentation-lint issues encountered in the contrib module, but linting stopped in changes.html so there might be some lingering lint issues. We welcome any comments on the contrib module, and please feel free to take a look at the README to get started. We will also be at this year's Lucene Solr Revolution if you want to stop by and ask us anything in person as well!
          Hide
          alessandro.benedetti Alessandro Benedetti added a comment -

          Well done guys ! Impressive!

          Just a couple of observations and ideas that can help :

          Feature Caching Improvements : https://github.com/bloomberg/lucene-solr/issues/172
          LambdaMART explain summarization : https://github.com/bloomberg/lucene-solr/issues/173

          I can not wait to see the plugin in the official release !

          Show
          alessandro.benedetti Alessandro Benedetti added a comment - Well done guys ! Impressive! Just a couple of observations and ideas that can help : Feature Caching Improvements : https://github.com/bloomberg/lucene-solr/issues/172 LambdaMART explain summarization : https://github.com/bloomberg/lucene-solr/issues/173 I can not wait to see the plugin in the official release !
          Hide
          cpoerschke Christine Poerschke added a comment -

          Just a quick note for the log here to say that i have snapshot the pull request to https://github.com/apache/lucene-solr/tree/jira/solr-8542 branch and created LEGAL-276 re: potential patent concerns question.

          Show
          cpoerschke Christine Poerschke added a comment - Just a quick note for the log here to say that i have snapshot the pull request to https://github.com/apache/lucene-solr/tree/jira/solr-8542 branch and created LEGAL-276 re: potential patent concerns question.
          Hide
          cpoerschke Christine Poerschke added a comment -

          And another quick note for the log here to say that i have snapshot the updated pull request to https://github.com/apache/lucene-solr/tree/jira/solr-8542-v2 branch and updated the LEGAL-276 ticket re: the thus changed understanding as far as any potential patent concerns go.

          Show
          cpoerschke Christine Poerschke added a comment - And another quick note for the log here to say that i have snapshot the updated pull request to https://github.com/apache/lucene-solr/tree/jira/solr-8542-v2 branch and updated the LEGAL-276 ticket re: the thus changed understanding as far as any potential patent concerns go.
          Hide
          adeppa adeppa added a comment -

          Hi Team,

          Could you help any one how to integrate LTR in to solr 5.1.0 ,if need to apply the any patch please help me ,

          Thanks
          Adeppa

          Show
          adeppa adeppa added a comment - Hi Team, Could you help any one how to integrate LTR in to solr 5.1.0 ,if need to apply the any patch please help me , Thanks Adeppa
          Hide
          mnilsson Michael Nilsson added a comment -

          Hey adeppa,

          So our plan is to get this merged into master, roughly solr 7x, very soon. We will then be working on backporting the commit/patch to 6x so it can be rolled out in a solr release. We would strongly recommend you upgrade to 6x to get access to a sturdier and more performant solr version with access to new features like the plugin.

          If upgrading to 6x is not possible, you could cherry-pick the commit into your own branch_5x solr repo and resolve any conflicts. However, there have been many changes compared to what's in master which affect the code the plugin was built on, so the backporting would take some effort.

          -Mike

          Show
          mnilsson Michael Nilsson added a comment - Hey adeppa , So our plan is to get this merged into master, roughly solr 7x, very soon. We will then be working on backporting the commit/patch to 6x so it can be rolled out in a solr release. We would strongly recommend you upgrade to 6x to get access to a sturdier and more performant solr version with access to new features like the plugin. If upgrading to 6x is not possible, you could cherry-pick the commit into your own branch_5x solr repo and resolve any conflicts. However, there have been many changes compared to what's in master which affect the code the plugin was built on, so the backporting would take some effort. -Mike
          Hide
          adeppa adeppa added a comment -

          Hi Mike,

          Thanks for the information, Now i can't able to upgrade to solr 6x ,i was tried above patch but not working still showing many errors, my solr current version 5.1.0 ,please help me how to apply that patch my current solr source

          Thanks
          Adeppa

          Show
          adeppa adeppa added a comment - Hi Mike, Thanks for the information, Now i can't able to upgrade to solr 6x ,i was tried above patch but not working still showing many errors, my solr current version 5.1.0 ,please help me how to apply that patch my current solr source Thanks Adeppa
          Hide
          cpoerschke Christine Poerschke added a comment -

          Attaching patch generated as diff between 'master' and https://github.com/apache/lucene-solr/tree/jira/solr-8542-v2 - master commit to follow shortly.

          Show
          cpoerschke Christine Poerschke added a comment - Attaching patch generated as diff between 'master' and https://github.com/apache/lucene-solr/tree/jira/solr-8542-v2 - master commit to follow shortly.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 5a66b3bc089e4b3e73b1c41c4cdcd89b183b85e7 in lucene-solr's branch refs/heads/master from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5a66b3b ]

          SOLR-8542: Adds Solr Learning to Rank (LTR) plugin for reranking results with machine learning models. (Michael Nilsson, Diego Ceccarelli, Joshua Pantony, Jon Dorando, Naveen Santhapuri, Alessandro Benedetti, David Grohmann, Christine Poerschke)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 5a66b3bc089e4b3e73b1c41c4cdcd89b183b85e7 in lucene-solr's branch refs/heads/master from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5a66b3b ] SOLR-8542 : Adds Solr Learning to Rank (LTR) plugin for reranking results with machine learning models. (Michael Nilsson, Diego Ceccarelli, Joshua Pantony, Jon Dorando, Naveen Santhapuri, Alessandro Benedetti, David Grohmann, Christine Poerschke)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 9eb806a23339a4c6ade88ac86da889b8b889a936 in lucene-solr's branch refs/heads/master from Steve Rowe
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9eb806a ]

          SOLR-8542: Add maven config and improve IntelliJ config.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 9eb806a23339a4c6ade88ac86da889b8b889a936 in lucene-solr's branch refs/heads/master from Steve Rowe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9eb806a ] SOLR-8542 : Add maven config and improve IntelliJ config.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 5a66b3bc089e4b3e73b1c41c4cdcd89b183b85e7 in lucene-solr's branch refs/heads/apiv2 from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5a66b3b ]

          SOLR-8542: Adds Solr Learning to Rank (LTR) plugin for reranking results with machine learning models. (Michael Nilsson, Diego Ceccarelli, Joshua Pantony, Jon Dorando, Naveen Santhapuri, Alessandro Benedetti, David Grohmann, Christine Poerschke)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 5a66b3bc089e4b3e73b1c41c4cdcd89b183b85e7 in lucene-solr's branch refs/heads/apiv2 from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5a66b3b ] SOLR-8542 : Adds Solr Learning to Rank (LTR) plugin for reranking results with machine learning models. (Michael Nilsson, Diego Ceccarelli, Joshua Pantony, Jon Dorando, Naveen Santhapuri, Alessandro Benedetti, David Grohmann, Christine Poerschke)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 9eb806a23339a4c6ade88ac86da889b8b889a936 in lucene-solr's branch refs/heads/apiv2 from Steve Rowe
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9eb806a ]

          SOLR-8542: Add maven config and improve IntelliJ config.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 9eb806a23339a4c6ade88ac86da889b8b889a936 in lucene-solr's branch refs/heads/apiv2 from Steve Rowe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9eb806a ] SOLR-8542 : Add maven config and improve IntelliJ config.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 2c752b04cb63c0b6638f14959839b15fa1fa3e5a in lucene-solr's branch refs/heads/master from Michael Nilsson
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2c752b0 ]

          SOLR-8542: disallow reRankDocs<1 i.e. must rerank at least 1 document
          (Michael Nilsson via Christine Poerschke)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 2c752b04cb63c0b6638f14959839b15fa1fa3e5a in lucene-solr's branch refs/heads/master from Michael Nilsson [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2c752b0 ] SOLR-8542 : disallow reRankDocs<1 i.e. must rerank at least 1 document (Michael Nilsson via Christine Poerschke)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 86a515789f6e4626d71480c7fdf38c33b71ded93 in lucene-solr's branch refs/heads/master from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=86a5157 ]

          SOLR-8542, SOLR-9746: prefix solr/contrib/ltr's search and response.transform packages with ltr

          Show
          jira-bot ASF subversion and git services added a comment - Commit 86a515789f6e4626d71480c7fdf38c33b71ded93 in lucene-solr's branch refs/heads/master from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=86a5157 ] SOLR-8542 , SOLR-9746 : prefix solr/contrib/ltr's search and response.transform packages with ltr
          Hide
          adeppa adeppa added a comment - - edited

          Michael Nilsson
          Hi Team,

          I am working on LTR master branch with solr 6.3, when i try to integrate code in eclipse showing to me compile time errors in couple of class i.e FieldLengthFeatureWeight,LTRScoringQuery ,After adding the unimplemented methods to couple of class i.e FieldValueFeatureWeight,SolrFeatureWeight,ValueFeatureWeight ,

          In the LTRScoringQuery class showing error on
          @Override
          public ModelWeight createWeight(IndexSearcher searcher, boolean needsScores, float boost)
          throws IOException

          Note :if i remove @Override error is went off ,is it any impact

          and FieldLengthFeatureWeight class showing error on
          public FieldLengthFeatureScorer(FeatureWeight weight,
          NumericDocValues norms) throws IOException {
          super(weight, norms);

          Note : Here super (weight,norms ) method showing error
          and
          @Override
          public float score() throws IOException {

          final long l = norms.longValue();
          Note : norms.longValue(); statement is showing error
          please help me for the above error resolution

          Thanks
          Adeppa

          Show
          adeppa adeppa added a comment - - edited Michael Nilsson Hi Team, I am working on LTR master branch with solr 6.3, when i try to integrate code in eclipse showing to me compile time errors in couple of class i.e FieldLengthFeatureWeight,LTRScoringQuery ,After adding the unimplemented methods to couple of class i.e FieldValueFeatureWeight,SolrFeatureWeight,ValueFeatureWeight , In the LTRScoringQuery class showing error on @Override public ModelWeight createWeight(IndexSearcher searcher, boolean needsScores, float boost) throws IOException Note :if i remove @Override error is went off ,is it any impact and FieldLengthFeatureWeight class showing error on public FieldLengthFeatureScorer(FeatureWeight weight, NumericDocValues norms) throws IOException { super(weight, norms); Note : Here super (weight,norms ) method showing error and @Override public float score() throws IOException { final long l = norms.longValue(); Note : norms.longValue(); statement is showing error please help me for the above error resolution Thanks Adeppa
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit bfc3690d5203cee20550450bac3771e5c2b85cbf in lucene-solr's branch refs/heads/master from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bfc3690 ]

          SOLR-8542: couple of tweaks (Michael Nilsson, Diego Ceccarelli, Christine Poerschke)

          • removed code triplication in ManagedModelStore
          • LTRScoringQuery.java tweaks
          • FeatureLogger.makeFeatureVector(...) can now safely be called repeatedly (though that doesn't happen at present)
          • make Feature.FeatureWeight.extractTerms a no-op; (OriginalScore|SolrFeature)Weight now implement extractTerms
          • LTRThreadModule javadocs and README.md tweaks
          • add TestFieldValueFeature.testBooleanValue test; replace "T"/"F" magic string use in FieldValueFeature
          • add TestOriginalScoreScorer test; add OriginalScoreScorer.freq() method
          • in TestMultipleAdditiveTreesModel revive dead explain test
          Show
          jira-bot ASF subversion and git services added a comment - Commit bfc3690d5203cee20550450bac3771e5c2b85cbf in lucene-solr's branch refs/heads/master from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bfc3690 ] SOLR-8542 : couple of tweaks (Michael Nilsson, Diego Ceccarelli, Christine Poerschke) removed code triplication in ManagedModelStore LTRScoringQuery.java tweaks FeatureLogger.makeFeatureVector(...) can now safely be called repeatedly (though that doesn't happen at present) make Feature.FeatureWeight.extractTerms a no-op; (OriginalScore|SolrFeature)Weight now implement extractTerms LTRThreadModule javadocs and README.md tweaks add TestFieldValueFeature.testBooleanValue test; replace "T"/"F" magic string use in FieldValueFeature add TestOriginalScoreScorer test; add OriginalScoreScorer.freq() method in TestMultipleAdditiveTreesModel revive dead explain test
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit a511b30a50672365d46c3d052e19a9fedd228e2e in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a511b30 ]

          SOLR-8542: Adds Solr Learning to Rank (LTR) plugin for reranking results with machine learning models. (Michael Nilsson, Diego Ceccarelli, Joshua Pantony, Jon Dorando, Naveen Santhapuri, Alessandro Benedetti, David Grohmann, Christine Poerschke)

          Show
          jira-bot ASF subversion and git services added a comment - Commit a511b30a50672365d46c3d052e19a9fedd228e2e in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a511b30 ] SOLR-8542 : Adds Solr Learning to Rank (LTR) plugin for reranking results with machine learning models. (Michael Nilsson, Diego Ceccarelli, Joshua Pantony, Jon Dorando, Naveen Santhapuri, Alessandro Benedetti, David Grohmann, Christine Poerschke)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 084809b77cc6b62be5f6f888d78574487cb3ec5b in lucene-solr's branch refs/heads/branch_6x from Steve Rowe
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=084809b ]

          SOLR-8542: Add maven config and improve IntelliJ config.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 084809b77cc6b62be5f6f888d78574487cb3ec5b in lucene-solr's branch refs/heads/branch_6x from Steve Rowe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=084809b ] SOLR-8542 : Add maven config and improve IntelliJ config.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit f87d672be749fde603f592021bba875fd01e0f01 in lucene-solr's branch refs/heads/branch_6x from Michael Nilsson
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f87d672 ]

          SOLR-8542: disallow reRankDocs<1 i.e. must rerank at least 1 document
          (Michael Nilsson via Christine Poerschke)

          Show
          jira-bot ASF subversion and git services added a comment - Commit f87d672be749fde603f592021bba875fd01e0f01 in lucene-solr's branch refs/heads/branch_6x from Michael Nilsson [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f87d672 ] SOLR-8542 : disallow reRankDocs<1 i.e. must rerank at least 1 document (Michael Nilsson via Christine Poerschke)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 252c6e9385ba516887543eb1968c8654b35b2b81 in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=252c6e9 ]

          SOLR-8542, SOLR-9746: prefix solr/contrib/ltr's search and response.transform packages with ltr

          Show
          jira-bot ASF subversion and git services added a comment - Commit 252c6e9385ba516887543eb1968c8654b35b2b81 in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=252c6e9 ] SOLR-8542 , SOLR-9746 : prefix solr/contrib/ltr's search and response.transform packages with ltr
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 3e2657214e103290142d0facfc860cb01f6e033e in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3e26572 ]

          SOLR-8542: couple of tweaks (Michael Nilsson, Diego Ceccarelli, Christine Poerschke)

          • removed code triplication in ManagedModelStore
          • LTRScoringQuery.java tweaks
          • FeatureLogger.makeFeatureVector(...) can now safely be called repeatedly (though that doesn't happen at present)
          • make Feature.FeatureWeight.extractTerms a no-op; (OriginalScore|SolrFeature)Weight now implement extractTerms
          • LTRThreadModule javadocs and README.md tweaks
          • add TestFieldValueFeature.testBooleanValue test; replace "T"/"F" magic string use in FieldValueFeature
          • add TestOriginalScoreScorer test; add OriginalScoreScorer.freq() method
          • in TestMultipleAdditiveTreesModel revive dead explain test
          Show
          jira-bot ASF subversion and git services added a comment - Commit 3e2657214e103290142d0facfc860cb01f6e033e in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3e26572 ] SOLR-8542 : couple of tweaks (Michael Nilsson, Diego Ceccarelli, Christine Poerschke) removed code triplication in ManagedModelStore LTRScoringQuery.java tweaks FeatureLogger.makeFeatureVector(...) can now safely be called repeatedly (though that doesn't happen at present) make Feature.FeatureWeight.extractTerms a no-op; (OriginalScore|SolrFeature)Weight now implement extractTerms LTRThreadModule javadocs and README.md tweaks add TestFieldValueFeature.testBooleanValue test; replace "T"/"F" magic string use in FieldValueFeature add TestOriginalScoreScorer test; add OriginalScoreScorer.freq() method in TestMultipleAdditiveTreesModel revive dead explain test
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 9e8dd854cda6d56cc8d498cc23d138eeb74732fd in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9e8dd85 ]

          SOLR-8542: master-to-branch_6x backport changes (Michael Nilsson, Naveen Santhapuri, Christine Poerschke)

          • removed 'boost' arg from LTRScoringQuery.createWeight signature
          • classes extending Weight now implement normalize and getValueForNormalization
          • FieldLengthFeatureScorer tweaks
          Show
          jira-bot ASF subversion and git services added a comment - Commit 9e8dd854cda6d56cc8d498cc23d138eeb74732fd in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9e8dd85 ] SOLR-8542 : master-to-branch_6x backport changes (Michael Nilsson, Naveen Santhapuri, Christine Poerschke) removed 'boost' arg from LTRScoringQuery.createWeight signature classes extending Weight now implement normalize and getValueForNormalization FieldLengthFeatureScorer tweaks
          Hide
          cpoerschke Christine Poerschke added a comment -

          Hi Adeppa,

          Just a quick note to say that the branch_6x commits below (and specifically the "master-to-branch_6x backport changes" commit) might potentially help with the compile time errors you describe.

          In terms of official Solr 6.3 backporting of the LTR plugin, we do not plan to backport to branch_6_3 but branch_6x will turn into "Solr 6.4" when the next release happens.

          Regards,

          Christine

          Show
          cpoerschke Christine Poerschke added a comment - Hi Adeppa, Just a quick note to say that the branch_6x commits below (and specifically the "master-to-branch_6x backport changes" commit) might potentially help with the compile time errors you describe. In terms of official Solr 6.3 backporting of the LTR plugin, we do not plan to backport to branch_6_3 but branch_6x will turn into "Solr 6.4" when the next release happens. Regards, Christine
          Hide
          cpoerschke Christine Poerschke added a comment -

          done:

          • master commit(s)
          • branch_6x commit(s)

          next steps:

          Show
          cpoerschke Christine Poerschke added a comment - done: master commit(s) branch_6x commit(s) next steps: Solr Reference Guide documentation ( https://cwiki.apache.org/confluence/display/solr/Internal+-+TODO+List as starting point) (to avoid duplication) reduce solr/contrib/ltr/README.md content to point to the appropriate Solr Reference Guide section(s)
          Hide
          adeppa adeppa added a comment -

          Hi Christine

          I done some change accordingly 6.3 solr ,i will create different PR and share with you ,if you have time please review and validate my changes

          Thanks
          Adeppa

          Show
          adeppa adeppa added a comment - Hi Christine I done some change accordingly 6.3 solr ,i will create different PR and share with you ,if you have time please review and validate my changes Thanks Adeppa
          Hide
          adeppa adeppa added a comment -

          Hi Christe,
          Could you help me how to create model.json and feature.json ,i didn't get any idea about that please give me int

          Show
          adeppa adeppa added a comment - Hi Christe, Could you help me how to create model.json and feature.json ,i didn't get any idea about that please give me int
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit c8542b2bd0470af9f8d64bb8133f31828b342604 in lucene-solr's branch refs/heads/master from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c8542b2 ]

          SOLR-8542: techproducts example now includes (disabled) learning-to-rank support (enable via -Dsolr.ltr.enabled=true)

          additional changes as follows:

          • LTRFeatureLoggerTransformerFactory:
            • feature values cache name configurable (instead of hard-coded value that needs to match solrconfig.xml configuration)
            • javadocs (example and parameters)
          • CSV FeatureLogger:
            • removed delimiter and separator assumptions in tests
            • changed delimiter and separator (from "key:val;key:val" to "key=val,key=val")
            • configurable (key value) delimiter and (features) separator
          • JSON FeatureLogger:
            • defer support for this (removing MapFeatureLogger class)
          • adds 'training libraries' to (Linear|MultipleAdditiveTrees)Model javadocs

          (Diego Ceccarelli, Michael Nilsson, Christine Poerschke)

          Show
          jira-bot ASF subversion and git services added a comment - Commit c8542b2bd0470af9f8d64bb8133f31828b342604 in lucene-solr's branch refs/heads/master from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c8542b2 ] SOLR-8542 : techproducts example now includes (disabled) learning-to-rank support (enable via -Dsolr.ltr.enabled=true) additional changes as follows: LTRFeatureLoggerTransformerFactory: feature values cache name configurable (instead of hard-coded value that needs to match solrconfig.xml configuration) javadocs (example and parameters) CSV FeatureLogger: removed delimiter and separator assumptions in tests changed delimiter and separator (from "key:val;key:val" to "key=val,key=val") configurable (key value) delimiter and (features) separator JSON FeatureLogger: defer support for this (removing MapFeatureLogger class) adds 'training libraries' to (Linear|MultipleAdditiveTrees)Model javadocs (Diego Ceccarelli, Michael Nilsson, Christine Poerschke)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4852851d85fc874e3d6fb48faac98d0552873b80 in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4852851 ]

          SOLR-8542: techproducts example now includes (disabled) learning-to-rank support (enable via -Dsolr.ltr.enabled=true)

          additional changes as follows:

          • LTRFeatureLoggerTransformerFactory:
            • feature values cache name configurable (instead of hard-coded value that needs to match solrconfig.xml configuration)
            • javadocs (example and parameters)
          • CSV FeatureLogger:
            • removed delimiter and separator assumptions in tests
            • changed delimiter and separator (from "key:val;key:val" to "key=val,key=val")
            • configurable (key value) delimiter and (features) separator
          • JSON FeatureLogger:
            • defer support for this (removing MapFeatureLogger class)
          • adds 'training libraries' to (Linear|MultipleAdditiveTrees)Model javadocs

          (Diego Ceccarelli, Michael Nilsson, Christine Poerschke)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4852851d85fc874e3d6fb48faac98d0552873b80 in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4852851 ] SOLR-8542 : techproducts example now includes (disabled) learning-to-rank support (enable via -Dsolr.ltr.enabled=true) additional changes as follows: LTRFeatureLoggerTransformerFactory: feature values cache name configurable (instead of hard-coded value that needs to match solrconfig.xml configuration) javadocs (example and parameters) CSV FeatureLogger: removed delimiter and separator assumptions in tests changed delimiter and separator (from "key:val;key:val" to "key=val,key=val") configurable (key value) delimiter and (features) separator JSON FeatureLogger: defer support for this (removing MapFeatureLogger class) adds 'training libraries' to (Linear|MultipleAdditiveTrees)Model javadocs (Diego Ceccarelli, Michael Nilsson, Christine Poerschke)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ac3f1bb339df530d6d4484f26c9ab2da17bd28df in lucene-solr's branch refs/heads/master from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ac3f1bb ]

          SOLR-8542: reduce direct solrconfig-ltr.xml references in solr/contrib/ltr tests

          Show
          jira-bot ASF subversion and git services added a comment - Commit ac3f1bb339df530d6d4484f26c9ab2da17bd28df in lucene-solr's branch refs/heads/master from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ac3f1bb ] SOLR-8542 : reduce direct solrconfig-ltr.xml references in solr/contrib/ltr tests
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 01846cbb4ccfdc9237cbd0af631b8d000448b0f8 in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=01846cb ]

          SOLR-8542: reduce direct solrconfig-ltr.xml references in solr/contrib/ltr tests

          Show
          jira-bot ASF subversion and git services added a comment - Commit 01846cbb4ccfdc9237cbd0af631b8d000448b0f8 in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=01846cb ] SOLR-8542 : reduce direct solrconfig-ltr.xml references in solr/contrib/ltr tests
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit f62874e47a0c790b9e396f58ef6f14ea04e2280b in lucene-solr's branch refs/heads/master from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f62874e ]

          SOLR-8542: change default feature vector format (to 'dense' from 'sparse')

          also: increase test coverage w.r.t. 'sparse' vs. 'dense' vs. 'default' feature vector format

          Show
          jira-bot ASF subversion and git services added a comment - Commit f62874e47a0c790b9e396f58ef6f14ea04e2280b in lucene-solr's branch refs/heads/master from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f62874e ] SOLR-8542 : change default feature vector format (to 'dense' from 'sparse') also: increase test coverage w.r.t. 'sparse' vs. 'dense' vs. 'default' feature vector format
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b7c75a3a1c7524994cb2413afa82562e30eaadcb in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b7c75a3 ]

          SOLR-8542: change default feature vector format (to 'dense' from 'sparse')

          also: increase test coverage w.r.t. 'sparse' vs. 'dense' vs. 'default' feature vector format

          Show
          jira-bot ASF subversion and git services added a comment - Commit b7c75a3a1c7524994cb2413afa82562e30eaadcb in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b7c75a3 ] SOLR-8542 : change default feature vector format (to 'dense' from 'sparse') also: increase test coverage w.r.t. 'sparse' vs. 'dense' vs. 'default' feature vector format
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit eb2a8ba2eec0841f03bbcf7807e602f7164a606e in lucene-solr's branch refs/heads/master from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=eb2a8ba ]

          SOLR-8542: README and solr/contrib/ltr/example changes

          details:

          • reduced README in favour of equivalent Solr Ref Guide content and (new) example/README
          • solr/contrib/ltr/example improvements and fixes

          also:

          • stop supporting '*' in Managed(Feature|Model)Store.doDeleteChild
          Show
          jira-bot ASF subversion and git services added a comment - Commit eb2a8ba2eec0841f03bbcf7807e602f7164a606e in lucene-solr's branch refs/heads/master from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=eb2a8ba ] SOLR-8542 : README and solr/contrib/ltr/example changes details: reduced README in favour of equivalent Solr Ref Guide content and (new) example/README solr/contrib/ltr/example improvements and fixes also: stop supporting '*' in Managed(Feature|Model)Store.doDeleteChild
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 94dad5b68e5a46ea820514a43e3a759ef3c57716 in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=94dad5b ]

          SOLR-8542: README and solr/contrib/ltr/example changes

          details:

          • reduced README in favour of equivalent Solr Ref Guide content and (new) example/README
          • solr/contrib/ltr/example improvements and fixes

          also:

          • stop supporting '*' in Managed(Feature|Model)Store.doDeleteChild
          Show
          jira-bot ASF subversion and git services added a comment - Commit 94dad5b68e5a46ea820514a43e3a759ef3c57716 in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=94dad5b ] SOLR-8542 : README and solr/contrib/ltr/example changes details: reduced README in favour of equivalent Solr Ref Guide content and (new) example/README solr/contrib/ltr/example improvements and fixes also: stop supporting '*' in Managed(Feature|Model)Store.doDeleteChild
          Hide
          cpoerschke Christine Poerschke added a comment - - edited

          The Solr Reference Guide content for SOLR-8542 (Integrate Learning to Rank into Solr) is currently on the tentatively named https://cwiki.apache.org/confluence/display/solr/Result+Reranking page. Suggestions for alternative page names would be very welcome.

          The "Result Reranking" page would be placed after the "Result Grouping" and "Result Clustering" pages, suggestions for alternative placements would be welcome also.

          The following files currently mention the "Result Reranking" page:

          Show
          cpoerschke Christine Poerschke added a comment - - edited The Solr Reference Guide content for SOLR-8542 (Integrate Learning to Rank into Solr) is currently on the tentatively named https://cwiki.apache.org/confluence/display/solr/Result+Reranking page. Suggestions for alternative page names would be very welcome. The "Result Reranking" page would be placed after the "Result Grouping" and "Result Clustering" pages, suggestions for alternative placements would be welcome also. The following files currently mention the "Result Reranking" page: https://github.com/apache/lucene-solr/blob/master/solr/contrib/ltr/README.md https://github.com/apache/lucene-solr/blob/master/solr/contrib/ltr/example/README.md https://github.com/apache/lucene-solr/blob/master/solr/server/solr/configsets/sample_techproducts_configs/conf/solrconfig.xml
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 88450c70bb4daa3ca6c4750581bddeaad9bea6f9 in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=88450c7 ]

          SOLR-8542: expand 'Assemble training data' content in solr/contrib/ltr/README

          (Diego Ceccarelli via Christine Poerschke in response to SOLR-9929 enquiry from Jeffery Yuan.)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 88450c70bb4daa3ca6c4750581bddeaad9bea6f9 in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=88450c7 ] SOLR-8542 : expand 'Assemble training data' content in solr/contrib/ltr/README (Diego Ceccarelli via Christine Poerschke in response to SOLR-9929 enquiry from Jeffery Yuan.)
          Hide
          cpoerschke Christine Poerschke added a comment -

          The bot did not (yet) update for it here but there is equivalent 'master' branch commit as per the bot's update on SOLR-9929 itself.

          Show
          cpoerschke Christine Poerschke added a comment - The bot did not (yet) update for it here but there is equivalent 'master' branch commit as per the bot's update on SOLR-9929 itself.
          Hide
          cpoerschke Christine Poerschke added a comment -

          Thanks everyone!

          Show
          cpoerschke Christine Poerschke added a comment - Thanks everyone!
          Hide
          ctargett Cassandra Targett added a comment -

          Christine Poerschke: About the docs in the Ref Guide - thanks, by the way! - I've started to take a look and will have more feedback but for now I'm wondering if there is a reason why you didn't name the page in the Ref Guide something like "Learning to Rank", or "Machine Learned Ranking"? The current name feels like it is hiding the true topic of the page, but I haven't studied the topic enough to know if there is a reason for doing that in this case.

          Show
          ctargett Cassandra Targett added a comment - Christine Poerschke : About the docs in the Ref Guide - thanks, by the way! - I've started to take a look and will have more feedback but for now I'm wondering if there is a reason why you didn't name the page in the Ref Guide something like "Learning to Rank", or "Machine Learned Ranking"? The current name feels like it is hiding the true topic of the page, but I haven't studied the topic enough to know if there is a reason for doing that in this case.
          Hide
          markus17 Markus Jelsma added a comment -

          I agree with Cassandra because it also allows for confusion with reranking post filter. Machine Learned Ranking covers the topic nicely i believe.

          Show
          markus17 Markus Jelsma added a comment - I agree with Cassandra because it also allows for confusion with reranking post filter. Machine Learned Ranking covers the topic nicely i believe.
          Hide
          cpoerschke Christine Poerschke added a comment -

          Hi Cassandra and Markus - thanks for your input. "Result Reranking" remains very much a tentative name, happy to change it.

          How about "Learning To Rank" as a sub-page of the "Query Re-Ranking" i.e.

          * Searching
            * ...
            * Query Re-Ranking
              * Learning To Rank
            * Transforming Result Documents
            * ...
            * Result Grouping
            * Result Clustering
            * ...
          

          instead of the current

          * Searching
            * ...
            * Query Re-Ranking
            * Transforming Result Documents
            * ...
            * Result Grouping
            * Result Clustering
            * ...
            * Result Reranking
          

          where the tentatively named "Result Reranking" is tentatively a sibling of "Result Grouping" and "Result Clustering"?
          .
          Regarding the alternative of "Machine Learned Ranking", how about reserving that for future use (similar to the "Parameter Substitution" reservation) e.g. for it to become a "routing page" directing users to the "Learning To Rank" page, the "Logistic Regression Text Classification" content mentioned in "Streaming Expressions" and whatever else will come along in future in terms of machine learned ranking?

          Show
          cpoerschke Christine Poerschke added a comment - Hi Cassandra and Markus - thanks for your input. "Result Reranking" remains very much a tentative name, happy to change it. How about "Learning To Rank" as a sub-page of the "Query Re-Ranking" i.e. * Searching * ... * Query Re-Ranking * Learning To Rank * Transforming Result Documents * ... * Result Grouping * Result Clustering * ... instead of the current * Searching * ... * Query Re-Ranking * Transforming Result Documents * ... * Result Grouping * Result Clustering * ... * Result Reranking where the tentatively named "Result Reranking" is tentatively a sibling of "Result Grouping" and "Result Clustering"? . Regarding the alternative of "Machine Learned Ranking", how about reserving that for future use (similar to the "Parameter Substitution" reservation) e.g. for it to become a "routing page" directing users to the "Learning To Rank" page, the "Logistic Regression Text Classification" content mentioned in "Streaming Expressions" and whatever else will come along in future in terms of machine learned ranking?
          Hide
          ctargett Cassandra Targett added a comment -

          How about "Learning To Rank" as a sub-page of the "Query Re-Ranking"...

          +1 Christine Poerschke, I like that idea.

          Regarding the alternative of "Machine Learned Ranking", how about reserving that for future use

          Ah, I get what you're saying. There will be features in the future (hopefully) that would make an umbrella page named "Machine Learned Ranking" worth having so we shouldn't use it now. How about renaming it to "Learning to Rank", then?

          Show
          ctargett Cassandra Targett added a comment - How about "Learning To Rank" as a sub-page of the "Query Re-Ranking"... +1 Christine Poerschke , I like that idea. Regarding the alternative of "Machine Learned Ranking", how about reserving that for future use Ah, I get what you're saying. There will be features in the future (hopefully) that would make an umbrella page named "Machine Learned Ranking" worth having so we shouldn't use it now. How about renaming it to "Learning to Rank", then?
          Hide
          cpoerschke Christine Poerschke added a comment -

          How about "Learning To Rank" as a sub-page of the "Query Re-Ranking" ... +1 ... I like that idea.

          Learning To Rank is now the documentation page. I renamed and relocated the page and updated 'code' and 'ref guide' references to it. http://git-wip-us.apache.org/repos/asf/lucene-solr/commit/987e2650 and http://git-wip-us.apache.org/repos/asf/lucene-solr/commit/9b03e384 are the 'code' commits.

          Regarding the alternative of "Machine Learned Ranking", how about reserving that for future use ...

          Tentative page name and draft content now at https://cwiki.apache.org/confluence/display/solr/Machine+Learning+and+Solr because I think there's actually already enough functionality (Learning To Rank from SOLR-8542 here and Joel Bernstein's Logistic Regression Text Classification) to bring the "future use" into the present - what do you think?

          Show
          cpoerschke Christine Poerschke added a comment - How about "Learning To Rank" as a sub-page of the "Query Re-Ranking" ... +1 ... I like that idea. Learning To Rank is now the documentation page. I renamed and relocated the page and updated 'code' and 'ref guide' references to it. http://git-wip-us.apache.org/repos/asf/lucene-solr/commit/987e2650 and http://git-wip-us.apache.org/repos/asf/lucene-solr/commit/9b03e384 are the 'code' commits. Regarding the alternative of "Machine Learned Ranking", how about reserving that for future use ... Tentative page name and draft content now at https://cwiki.apache.org/confluence/display/solr/Machine+Learning+and+Solr because I think there's actually already enough functionality (Learning To Rank from SOLR-8542 here and Joel Bernstein 's Logistic Regression Text Classification) to bring the "future use" into the present - what do you think?
          Hide
          varunthacker Varun Thacker added a comment -

          Hi Christine,

          I was trying to play around with LTR but I don't see anything under /contrib/ltr in a solr binary? I see /contrib/ltr/example/config.json on git . Am I missing something here?

          Show
          varunthacker Varun Thacker added a comment - Hi Christine, I was trying to play around with LTR but I don't see anything under /contrib/ltr in a solr binary? I see /contrib/ltr/example/config.json on git . Am I missing something here?
          Hide
          cpoerschke Christine Poerschke added a comment -

          Hi Varun,

          https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank has a Quick Start Example using the techproducts example which is included in the solr binary distribution. The solr/contrib/ltr/example content is intentionally not included in the binary distribution but it is (as you say) available in the git repo.

          Show
          cpoerschke Christine Poerschke added a comment - Hi Varun, https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank has a Quick Start Example using the techproducts example which is included in the solr binary distribution. The solr/contrib/ltr/example content is intentionally not included in the binary distribution but it is (as you say) available in the git repo.
          Hide
          varunthacker Varun Thacker added a comment -

          Oh I see what happened.

          On https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank I was reading the "Training example" section which took me to https://github.com/apache/lucene-solr/tree/releases/lucene-solr/6.4.0/solr/contrib/ltr/example and then I was like why isn't contrib/ltr/example/config.json there

          So I have a few questions:
          1. Should we state explicitly that this example is not shipped in the binary? Any reason why we don't out of curiosity ?
          2. The "Installation" section of https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank states that we need " all JARs under contrib/ltr/lib." . I don't see anything under the binary. Is it safe to remove it?

          BTW I love the documentation. Very thorough

          Show
          varunthacker Varun Thacker added a comment - Oh I see what happened. On https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank I was reading the "Training example" section which took me to https://github.com/apache/lucene-solr/tree/releases/lucene-solr/6.4.0/solr/contrib/ltr/example and then I was like why isn't contrib/ltr/example/config.json there So I have a few questions: 1. Should we state explicitly that this example is not shipped in the binary? Any reason why we don't out of curiosity ? 2. The "Installation" section of https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank states that we need " all JARs under contrib/ltr/lib." . I don't see anything under the binary. Is it safe to remove it? BTW I love the documentation. Very thorough
          Hide
          cpoerschke Christine Poerschke added a comment -

          ... Should we state explicitly that this example is not shipped in the binary? ...

          Done.

          ... Any reason why we don't out of curiosity ? ...

          As is the example folder content is not intended or suitable for production use. The train_and_upload_demo_model.py script name intends to convey that but inclusion of the folder in the release could be misunderstood to mean that the example content is maintained and ready-to-use to the same extent as the solr/bin scripts.

          ... we need " all JARs under contrib/ltr/lib." . I don't see anything under the binary. Is it safe to remove it? ...

          Good catch! I was inspired by the "Installation" section of https://cwiki.apache.org/confluence/display/solr/Result+Clustering and missed that contrib/ltr/lib is empty. I just updated the documentation and created SOLR-10451 for the techproducts solrconfig.xml update and to prune the empty (except for README.txt) contrib/ltr folder out of the Solr binary release.

          ... BTW I love the documentation. ...

          Thanks!


          PS: Thanks for the interest and feedback here. Let's wrap up here and continue or start any further conversations outside of this (completed) JIRA ticket in the usual places e.g. as per http://lucene.apache.org/solr/community.html#mailing-lists-irc

          Show
          cpoerschke Christine Poerschke added a comment - ... Should we state explicitly that this example is not shipped in the binary? ... Done. ... Any reason why we don't out of curiosity ? ... As is the example folder content is not intended or suitable for production use. The train_and_upload_demo_model.py script name intends to convey that but inclusion of the folder in the release could be misunderstood to mean that the example content is maintained and ready-to-use to the same extent as the solr/bin scripts. ... we need " all JARs under contrib/ltr/lib." . I don't see anything under the binary. Is it safe to remove it? ... Good catch! I was inspired by the "Installation" section of https://cwiki.apache.org/confluence/display/solr/Result+Clustering and missed that contrib/ltr/lib is empty. I just updated the documentation and created SOLR-10451 for the techproducts solrconfig.xml update and to prune the empty (except for README.txt) contrib/ltr folder out of the Solr binary release. ... BTW I love the documentation. ... Thanks! PS: Thanks for the interest and feedback here. Let's wrap up here and continue or start any further conversations outside of this (completed) JIRA ticket in the usual places e.g. as per http://lucene.apache.org/solr/community.html#mailing-lists-irc the Solr User Mailing list for usage and configuration related questions and problems the Developer List for code and development related discussions the Comments section of https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank for documentation related corrections or suggestions.

            People

            • Assignee:
              cpoerschke Christine Poerschke
              Reporter:
              jpantony Joshua Pantony
            • Votes:
              17 Vote for this issue
              Watchers:
              48 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development