Solr
  1. Solr
  2. SOLR-515

SimilarityFactory patch: facilitate parameterizable Similarity implementations

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.3
    • Component/s: search
    • Labels:
      None

      Description

      Solr currently allows a pluggable Lucene Similarity to be specified as:

      <similarity class="org.apache.lucene.search.DefaultSimilarity"/>

      This patch does not change this syntax at all, but detects whether a Similarity or a SimilarityFactory is specified. The new SimilarityFactory class passes a NamedList from the config file into a getSimilarity(NamedList) method.

      Yes, I used an interface, damn it! Let the battles continue. I've spoken with my code on the issue. But sure, I'll acquiesce on the topic and turn it into an abstract class if I must - stupid programming languages! object-oriented programming, not interface or abstract oriented programming All I ask is ya show me a good case where this interface won't suit your needs, and I'll reply that instead of thinking the problem is the interface, consider it is how the interface is used - it's implicitly designed to be simply that, an interface. You want a different way to configure, don't like NamedLists for some reason maybe?, we change IndexSchema Similarity construction smarts, perhaps creating another interface. Same diff, sort of.

      I'm proud of the unit test, no XPath in sight.

      1. similarity_factory.patch
        31 kB
        Erik Hatcher
      2. similarity_factory.patch
        7 kB
        Erik Hatcher
      3. similarity_factory.patch
        6 kB
        Erik Hatcher
      4. similarity_factory.patch
        6 kB
        Erik Hatcher

        Activity

        Hide
        Erik Hatcher added a comment -

        And yeah, if you don't like NamedLists.... SolrParams is more pleasant to work with and just sounds more appropriate.

        This patch obsoletes the earlier one by switching the SimilarityFactory interface to use SolrParams instead.

        Show
        Erik Hatcher added a comment - And yeah, if you don't like NamedLists.... SolrParams is more pleasant to work with and just sounds more appropriate. This patch obsoletes the earlier one by switching the SimilarityFactory interface to use SolrParams instead.
        Hide
        Erik Hatcher added a comment -

        A paste from the unit test. A SimilarityFactory is used in this manner:

        <similarity class="org.apache.solr.schema.CustomSimilarityFactory">
        <str name="echo">is there an echo?</str>
        </similarity>

        This calls CustomSimilarityFactory#getSimilarity(SolrParams) to construct the Similarity implementation used.

        Show
        Erik Hatcher added a comment - A paste from the unit test. A SimilarityFactory is used in this manner: <similarity class="org.apache.solr.schema.CustomSimilarityFactory"> <str name="echo">is there an echo?</str> </similarity> This calls CustomSimilarityFactory#getSimilarity(SolrParams) to construct the Similarity implementation used.
        Hide
        Erik Hatcher added a comment -

        Comment on the latest patch - I had a revelation/awakening about abstract versus interface. abstract it is. for a couple of reasons, it's handy to have the SimilarityFactory be able to tell you how it was configured, and just plain ol' OO-ness of object =data+behavior. A shame Java has to make a distinction, but whatever.

        Show
        Erik Hatcher added a comment - Comment on the latest patch - I had a revelation/awakening about abstract versus interface. abstract it is. for a couple of reasons, it's handy to have the SimilarityFactory be able to tell you how it was configured, and just plain ol' OO-ness of object =data+behavior. A shame Java has to make a distinction, but whatever.
        Hide
        Hoss Man added a comment -

        1) it would be nice to make this use one of the existing PluginLoader APIs (MapPluginLoader/MapInitializedPlugin or NamedListPluginLoader/NamedListInitializedPlugin since we've been talking about trying to standardize on them ... i don't think we have any types of Plugin's at the moment that use SolrParams.

        2) existing IndexSchema.init code will throuw ClassCast immediatly if someone attempts to use a class that isn't really a Similarity (because it attempts cast right away) ... reading your patch, i think this error will be defered until first use – which might be a red herring for people, better to test it ASAP.

        3) i was going to suggest skipping the factory and going with something like...

        abstract class SolrSimilarity implements Similarity {
             abstract void init(NamedList);
        }
        

        ...but then it occurred to me that with a factory pattern we can give custom Similarities the opportunity to precompute stats about the index before their use...

        public abstract class SimilarityFactory {
          // init code here
          /** Get an uninformed Similarity instance */
          public abstract Similarity getSimilarity();
          /** Get a Similiarity instance for use by an IndexWriter updating a known IndexSchema */
          public Similarity getWriterSimilarity(IndexSchema schema) { return getSimilarity(); }
          /** Get a Similiarity instance for use by an SolrIndexSearcher */
          public Similarity getSearchSimilarity(SolrIndexSearcher searcher) { return getSimilarity(); }
        }
        

        ...with IndexSchema.getSimilarity() being deprecated, but still functional by calling factory.getSimilarity()

        An API like that, and a lot more interesting similarity implementations become possible.

        Show
        Hoss Man added a comment - 1) it would be nice to make this use one of the existing PluginLoader APIs (MapPluginLoader/MapInitializedPlugin or NamedListPluginLoader/NamedListInitializedPlugin since we've been talking about trying to standardize on them ... i don't think we have any types of Plugin's at the moment that use SolrParams. 2) existing IndexSchema.init code will throuw ClassCast immediatly if someone attempts to use a class that isn't really a Similarity (because it attempts cast right away) ... reading your patch, i think this error will be defered until first use – which might be a red herring for people, better to test it ASAP. 3) i was going to suggest skipping the factory and going with something like... abstract class SolrSimilarity implements Similarity { abstract void init(NamedList); } ...but then it occurred to me that with a factory pattern we can give custom Similarities the opportunity to precompute stats about the index before their use... public abstract class SimilarityFactory { // init code here /** Get an uninformed Similarity instance */ public abstract Similarity getSimilarity(); /** Get a Similiarity instance for use by an IndexWriter updating a known IndexSchema */ public Similarity getWriterSimilarity(IndexSchema schema) { return getSimilarity(); } /** Get a Similiarity instance for use by an SolrIndexSearcher */ public Similarity getSearchSimilarity(SolrIndexSearcher searcher) { return getSimilarity(); } } ...with IndexSchema.getSimilarity() being deprecated, but still functional by calling factory.getSimilarity() An API like that, and a lot more interesting similarity implementations become possible.
        Hide
        Erik Hatcher added a comment -

        1) it would be nice to make this use one of the existing PluginLoader APIs (MapPluginLoader/MapInitializedPlugin or NamedListPluginLoader/NamedListInitializedPlugin since we've been talking about trying to standardize on them ... i don't think we have any types of Plugin's at the moment that use SolrParams.

        I looked at the plugin stuff, but it seemed that it wasn't appropriate for a single implementation like this, but rather something like response writers that have multiple implementations that can be picked up dynamically. Wouldn't going with the plugin architecture make your proposed SimilarityFactory unnecessary? You could have a similarity registered as default, one as "writer", and one as "search". Right? Or am I missing something?

        Show
        Erik Hatcher added a comment - 1) it would be nice to make this use one of the existing PluginLoader APIs (MapPluginLoader/MapInitializedPlugin or NamedListPluginLoader/NamedListInitializedPlugin since we've been talking about trying to standardize on them ... i don't think we have any types of Plugin's at the moment that use SolrParams. I looked at the plugin stuff, but it seemed that it wasn't appropriate for a single implementation like this, but rather something like response writers that have multiple implementations that can be picked up dynamically. Wouldn't going with the plugin architecture make your proposed SimilarityFactory unnecessary? You could have a similarity registered as default, one as "writer", and one as "search". Right? Or am I missing something?
        Hide
        Hoss Man added a comment -

        I looked at the plugin stuff, but it seemed that it wasn't appropriate for a single implementation like this

        Hmmm... I may be misremembering what the Loaders currently provide. Either way: I'd prefer that new types of plugins implement one of the *InitializedPlugin interfaces for uniformity moving forward.

        Wouldn't going with the plugin architecture make your proposed SimilarityFactory unnecessary? You could have a similarity registered as default, one as "writer", and one as "search". Right?

        I'm not suggesting just having 2 Similarities per SolrCore, i'm suggesting that a SimilarityFactory be asked to provide a Similarity to use each time a new IndexWriter or a new SolrIndexSearcher is constructed. if the factory wants to use maintain and use 1 or 2 Similarity instances for everything it can, or each call to getSearchSimilarity could compute some stats on every field in the index and return a new custom instance based on the data.

        Show
        Hoss Man added a comment - I looked at the plugin stuff, but it seemed that it wasn't appropriate for a single implementation like this Hmmm... I may be misremembering what the Loaders currently provide. Either way: I'd prefer that new types of plugins implement one of the *InitializedPlugin interfaces for uniformity moving forward. Wouldn't going with the plugin architecture make your proposed SimilarityFactory unnecessary? You could have a similarity registered as default, one as "writer", and one as "search". Right? I'm not suggesting just having 2 Similarities per SolrCore, i'm suggesting that a SimilarityFactory be asked to provide a Similarity to use each time a new IndexWriter or a new SolrIndexSearcher is constructed. if the factory wants to use maintain and use 1 or 2 Similarity instances for everything it can, or each call to getSearchSimilarity could compute some stats on every field in the index and return a new custom instance based on the data.
        Hide
        Grant Ingersoll added a comment -

        I'd suggest we seriously start thinking about Spring and stop recreating yet another Web Framework! All these workarounds to make Solr some pseudo IOC container make my head hurt. All this initialization stuff has been solved so nicely in Spring that we could gut out all of this complicated stuff and just focus on creating a less complicated Solr with more search capabilities and less in the way of initialization capabilities. I honestly can't keep my head straight anymore between Loaders, CoreAware, InitializedPlugin, inform(), init() etc.

        Show
        Grant Ingersoll added a comment - I'd suggest we seriously start thinking about Spring and stop recreating yet another Web Framework! All these workarounds to make Solr some pseudo IOC container make my head hurt. All this initialization stuff has been solved so nicely in Spring that we could gut out all of this complicated stuff and just focus on creating a less complicated Solr with more search capabilities and less in the way of initialization capabilities. I honestly can't keep my head straight anymore between Loaders, CoreAware, InitializedPlugin, inform(), init() etc.
        Hide
        Grant Ingersoll added a comment -

        change to minor

        Show
        Grant Ingersoll added a comment - change to minor
        Hide
        Erik Hatcher added a comment -

        Ok, here's a patch that uses NamedListInitializedPlugin. I don't see the advantage to this over the previous patch though, other than NamedListInitializedPlugin requires a name="..." attribute which is non-sensical in this context since there is only ever one SimilarityFactory per IndexSchema.

        I'd like to commit either this variant or the previous one soon, definitely before 1.3.

        Comments?

        Show
        Erik Hatcher added a comment - Ok, here's a patch that uses NamedListInitializedPlugin. I don't see the advantage to this over the previous patch though, other than NamedListInitializedPlugin requires a name="..." attribute which is non-sensical in this context since there is only ever one SimilarityFactory per IndexSchema. I'd like to commit either this variant or the previous one soon, definitely before 1.3. Comments?
        Hide
        Erik Hatcher added a comment -

        I'm -1 on the latest patch, and will commit the previous patch unless there are objections. I see no advantage to using NamedListInitializedPlugin. We'll iterate from there after it's committed if need be

        Show
        Erik Hatcher added a comment - I'm -1 on the latest patch, and will commit the previous patch unless there are objections. I see no advantage to using NamedListInitializedPlugin. We'll iterate from there after it's committed if need be
        Hide
        Erik Hatcher added a comment -

        Ok, putting a stake in the ground on this one and committed the previous patch (using SolrParams instead of the NamedListInsanity). From here we can add Hoss' tricks of index and search-time Similarity's etc.

        Show
        Erik Hatcher added a comment - Ok, putting a stake in the ground on this one and committed the previous patch (using SolrParams instead of the NamedListInsanity). From here we can add Hoss' tricks of index and search-time Similarity's etc.

          People

          • Assignee:
            Erik Hatcher
            Reporter:
            Erik Hatcher
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development