Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8271

use SchemaSimilarityFactory as default when no explicit (top level) sim is configured

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.0
    • Component/s: None
    • Labels:
      None

      Description

      Idea spun out of SOLR-8057...

      As far as i can tell, the chief reason SchemaSimilarityFactory wasn't made the implicit default in IndexSchema when it was introduced is because of how it differed/differs from DefaultSimilarity/ClassicSimilarity with respect to multi-clause queries – see SchemaSimilarityFactory's class javadoc notes relating to queryNorm and coord. Users were expected to think about this trade off when making a concious choice to switch from DefaultSimilarity/ClassicSimilarity to SchemaSimilarityFactory. But (again, AFAICT) these discrepencies don't exist between SchemaSimilarityFactory's PerFieldSimilarityWrapper and BM25Similiarity.

      So assuming luceneMatchVersion >= 6.0, and BM25 is implicit default, we should be able to safely switch to using SchemaSimilarityFactory as our default (which internally uses BM25 for fieldTypes that don't override) and make it much easier for people to declare fieldType overrides for the similarity (just edit the fieldType, w/o also needing to explicitly declare SchemaSimilarityFactory)

      1. SOLR-8271.patch
        13 kB
        Hoss Man
      2. SOLR-8271.patch
        4 kB
        Hoss Man

        Issue Links

          Activity

          Hide
          hossman Hoss Man added a comment -

          Initial simple patch, currently causes failures in TestCloudSchemaless & ChangedSchemaMergeTest.

          These are the same failures I noted in the early attempts at SOLR-8057. Earlier today I thought that was because I was being silly in that old patch and needed to use the SolrResourceLoader to create the sin factory so SolrCoreAware.inform would be called appropriately in all situations – which I do in this patch. But the failures persist. Digging into it I realized the same problem could easily be reproduced via configs – so this issue is currently bloced until we can get to the bottom of SOLR-8280.

          Show
          hossman Hoss Man added a comment - Initial simple patch, currently causes failures in TestCloudSchemaless & ChangedSchemaMergeTest. These are the same failures I noted in the early attempts at SOLR-8057 . Earlier today I thought that was because I was being silly in that old patch and needed to use the SolrResourceLoader to create the sin factory so SolrCoreAware.inform would be called appropriately in all situations – which I do in this patch. But the failures persist. Digging into it I realized the same problem could easily be reproduced via configs – so this issue is currently bloced until we can get to the bottom of SOLR-8280 .
          Hide
          hossman Hoss Man added a comment -

          Ok, now that SOLR-8280 is resolved the same basic patch (with one minor conflict resolution) as before passes all tests. In this new patch I've removed most of the explicit declarations of solr.SchemaSimilarityFactory from the various tests schema files since it's now the implicit default. A few special circumstances...

          • schema-sim.xml - used by TestPerFieldSimilarityClassic which overrides the luceneMatchVersion so we have to be explicit that we want it to be the global sim.
          • schema-class-name-shortening-on-serialization.xml - the purpose of this file is for TestClassNameShortening, so we need to leave the "short" name here in this config to test the proper behavior
          • TestSchemaSimilarityResource.testGetSchemaSimilarity - was previously used the short class name for SchemaSimilarityFactory since that's what was explicitly mentioned in schema-rest.xml, so i changed the test to expect the FQN for the class – which is the default behavior of the REST API for implicitly defined instances.

          ...i think we're good to go.

          Show
          hossman Hoss Man added a comment - Ok, now that SOLR-8280 is resolved the same basic patch (with one minor conflict resolution) as before passes all tests. In this new patch I've removed most of the explicit declarations of solr.SchemaSimilarityFactory from the various tests schema files since it's now the implicit default. A few special circumstances... schema-sim.xml - used by TestPerFieldSimilarityClassic which overrides the luceneMatchVersion so we have to be explicit that we want it to be the global sim. schema-class-name-shortening-on-serialization.xml - the purpose of this file is for TestClassNameShortening, so we need to leave the "short" name here in this config to test the proper behavior TestSchemaSimilarityResource.testGetSchemaSimilarity - was previously used the short class name for SchemaSimilarityFactory since that's what was explicitly mentioned in schema-rest.xml, so i changed the test to expect the FQN for the class – which is the default behavior of the REST API for implicitly defined instances. ...i think we're good to go.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1715393 from hossman@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1715393 ]

          SOLR-8271: Change implicit default Similarity to use SchemaSimilarityFactory when luceneMatchVersion >= 6

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1715393 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1715393 ] SOLR-8271 : Change implicit default Similarity to use SchemaSimilarityFactory when luceneMatchVersion >= 6

            People

            • Assignee:
              hossman Hoss Man
              Reporter:
              hossman Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development