Lucene - Core
  1. Lucene - Core
  2. LUCENE-1763

MergePolicy should require an IndexWriter upon construction

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      MergePolicy does not require an IW upon construction, but requires one to be passed as method arg to various methods. This gives the impression as if a single MP instance can be shared across various IW instances, which is not true for all MPs (if at all). In addition, LogMergePolicy uses the IW instance passed to these methods incosistently, and is currently exposed to potential NPEs.

      This issue will change MP to require an IW instance, however for back-compat reasons the following changes will be made:

      1. A new MP ctor w/ IW as arg will be introduced. Additionally, for back-compat a default ctor will also be declared which will assign null to the member IW.
      2. Methods that require IW will be deprecated, and new ones will be declared.
        • For back-compat, the new ones will not be made abstract, but will throw UOE, with a comment that they will become abstract in 3.0.
      3. All current MP impls will move to use the member instance.
      4. The code which calls MP methods will continue to use the deprecated methods, passing an IW even that it won't be necessary --> this is strictly for back-compat.

      In 3.0, we'll remove the deprecated default ctor and methods, and change the code to not call the IW method variants anymore.

      I hope that I didn't leave anything out. I'm sure I'll find out when I work on the patch .

        Activity

        Hide
        Shai Erera added a comment -

        Hmm ... I see that MP's methods are package private, so nobody can really extend MP, but only LMP, right? I wonder why is that? I still need to come up w/ new method names because the code still needs to call the methods w/ the IndexWriter arg (for back-compat), but I wonder if we shouldn't make the methods public?

        Show
        Shai Erera added a comment - Hmm ... I see that MP's methods are package private, so nobody can really extend MP, but only LMP, right? I wonder why is that? I still need to come up w/ new method names because the code still needs to call the methods w/ the IndexWriter arg (for back-compat), but I wonder if we shouldn't make the methods public?
        Hide
        Michael McCandless added a comment -

        How about we:

        • Simply change the methods. Yes it's technically a break in back-compat, but since they are package private, and so advanced (I think very few people have customized their merge policy/scheduler), a compile time error on upgrade seems fine.
        • Make the APIs public (perhaps add a unit test, outside of oal.index package, asserting that all that's required is in fact public)
        • Mark the APIs as subject to change.
        Show
        Michael McCandless added a comment - How about we: Simply change the methods. Yes it's technically a break in back-compat, but since they are package private, and so advanced (I think very few people have customized their merge policy/scheduler), a compile time error on upgrade seems fine. Make the APIs public (perhaps add a unit test, outside of oal.index package, asserting that all that's required is in fact public) Mark the APIs as subject to change.
        Hide
        Shai Erera added a comment -

        I don't mind doing that ... but note that LMP's methods are public (it overrides and declare them public) and so I was thinking that someone could potentially have written his own LMP (no one can write their own MP today). But if you're fine w/ me doing that, it's fine by me as well.

        BTW - I don't need to come up w/ new names after all, since by just adding the same method, w/o the IW arg changes its signature. But I agree that having just the right form makes more sense.

        Show
        Shai Erera added a comment - I don't mind doing that ... but note that LMP's methods are public (it overrides and declare them public) and so I was thinking that someone could potentially have written his own LMP (no one can write their own MP today). But if you're fine w/ me doing that, it's fine by me as well. BTW - I don't need to come up w/ new names after all, since by just adding the same method, w/o the IW arg changes its signature. But I agree that having just the right form makes more sense.
        Hide
        Michael McCandless added a comment -

        I think subclassing LMP is also extremely advanced, ie, it's OK to make an exception to our back-compat policy.

        Show
        Michael McCandless added a comment - I think subclassing LMP is also extremely advanced, ie, it's OK to make an exception to our back-compat policy.
        Hide
        Shai Erera added a comment -

        Adds a ctor w/ IndexWriter to MergePolicy, LogMergePolicy, and its extensions.
        Fixed tests and IndexWriter code
        Fixed tags

        All tests pass

        Show
        Shai Erera added a comment - Adds a ctor w/ IndexWriter to MergePolicy, LogMergePolicy, and its extensions. Fixed tests and IndexWriter code Fixed tags All tests pass
        Hide
        Michael McCandless added a comment -

        Patch looks good Shai! Only change I made was to have IndexWriter writer be final in MergePolicy. I'll commit in a day or two...

        Show
        Michael McCandless added a comment - Patch looks good Shai! Only change I made was to have IndexWriter writer be final in MergePolicy. I'll commit in a day or two...
        Hide
        Michael McCandless added a comment -

        Thanks Shai!

        Show
        Michael McCandless added a comment - Thanks Shai!
        Hide
        Mark Miller added a comment -

        I think subclassing LMP is also extremely advanced, ie, it's OK to make an exception to our back-compat policy.

        I don't disagree, especially since MergePolicy was already marked as experimental and subject to change - but just as a history note: its not just subclasses it breaks - I'm sure its rare, but Solr instantiates a MergePolicy with newInstance, which now fails because you have to pass an IndexWriter.

        Show
        Mark Miller added a comment - I think subclassing LMP is also extremely advanced, ie, it's OK to make an exception to our back-compat policy. I don't disagree, especially since MergePolicy was already marked as experimental and subject to change - but just as a history note: its not just subclasses it breaks - I'm sure its rare, but Solr instantiates a MergePolicy with newInstance, which now fails because you have to pass an IndexWriter.
        Hide
        Michael McCandless added a comment -

        Solr instantiates a MergePolicy with newInstance, which now fails because you have to pass an IndexWriter.

        Sigh. Sorry How to fix? Can Solr seek a Constructor that takes an IndexWriter instance as its single arg, instead?

        Show
        Michael McCandless added a comment - Solr instantiates a MergePolicy with newInstance, which now fails because you have to pass an IndexWriter. Sigh. Sorry How to fix? Can Solr seek a Constructor that takes an IndexWriter instance as its single arg, instead?
        Hide
        Shai Erera added a comment -

        Yes I had to fix CreateIndexTask for that. We can add an empty ctor, but that would be a problem, as it will set the writer member to null. That's what I originally meant when I suggested to add the new behavior but not use it yet and deprecate the other methods and ctors.

        Is it a problem for Solr to follow the logic as in CreateIndexTask? I'm not asking from the technical perspective - that's obviously not a problem, but release-schedule wise? Maybe you can have a "fixpack release" which will be compliant w/ the 2.9 back-compat breaks?

        Show
        Shai Erera added a comment - Yes I had to fix CreateIndexTask for that. We can add an empty ctor, but that would be a problem, as it will set the writer member to null. That's what I originally meant when I suggested to add the new behavior but not use it yet and deprecate the other methods and ctors. Is it a problem for Solr to follow the logic as in CreateIndexTask? I'm not asking from the technical perspective - that's obviously not a problem, but release-schedule wise? Maybe you can have a "fixpack release" which will be compliant w/ the 2.9 back-compat breaks?
        Hide
        Mark Miller added a comment -

        Thanks guys - we can fix this in Solr no problem - the SolrIndexWriter is creating the policy, so its easy enough to pass it to the constructor. I hadn't evaluated a fix yet, was just pointing out the back compat break beyond package private stuff.

        Certainly an easy adjustment on Solr's end though, and it seems to me that back compat break was a fair one.

        Show
        Mark Miller added a comment - Thanks guys - we can fix this in Solr no problem - the SolrIndexWriter is creating the policy, so its easy enough to pass it to the constructor. I hadn't evaluated a fix yet, was just pointing out the back compat break beyond package private stuff. Certainly an easy adjustment on Solr's end though, and it seems to me that back compat break was a fair one.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development