Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1763

MergePolicy should require an IndexWriter upon construction

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 2.9
    • core/index
    • None
    • New

    Description

      MergePolicy does not require an IW upon construction, but requires one to be passed as method arg to various methods. This gives the impression as if a single MP instance can be shared across various IW instances, which is not true for all MPs (if at all). In addition, LogMergePolicy uses the IW instance passed to these methods incosistently, and is currently exposed to potential NPEs.

      This issue will change MP to require an IW instance, however for back-compat reasons the following changes will be made:

      1. A new MP ctor w/ IW as arg will be introduced. Additionally, for back-compat a default ctor will also be declared which will assign null to the member IW.
      2. Methods that require IW will be deprecated, and new ones will be declared.
        • For back-compat, the new ones will not be made abstract, but will throw UOE, with a comment that they will become abstract in 3.0.
      3. All current MP impls will move to use the member instance.
      4. The code which calls MP methods will continue to use the deprecated methods, passing an IW even that it won't be necessary --> this is strictly for back-compat.

      In 3.0, we'll remove the deprecated default ctor and methods, and change the code to not call the IW method variants anymore.

      I hope that I didn't leave anything out. I'm sure I'll find out when I work on the patch .

      Attachments

        1. LUCENE-1763.patch
          46 kB
          Shai Erera

        Activity

          shaie Shai Erera added a comment -

          Hmm ... I see that MP's methods are package private, so nobody can really extend MP, but only LMP, right? I wonder why is that? I still need to come up w/ new method names because the code still needs to call the methods w/ the IndexWriter arg (for back-compat), but I wonder if we shouldn't make the methods public?

          shaie Shai Erera added a comment - Hmm ... I see that MP's methods are package private, so nobody can really extend MP, but only LMP, right? I wonder why is that? I still need to come up w/ new method names because the code still needs to call the methods w/ the IndexWriter arg (for back-compat), but I wonder if we shouldn't make the methods public?

          How about we:

          • Simply change the methods. Yes it's technically a break in back-compat, but since they are package private, and so advanced (I think very few people have customized their merge policy/scheduler), a compile time error on upgrade seems fine.
          • Make the APIs public (perhaps add a unit test, outside of oal.index package, asserting that all that's required is in fact public)
          • Mark the APIs as subject to change.
          mikemccand Michael McCandless added a comment - How about we: Simply change the methods. Yes it's technically a break in back-compat, but since they are package private, and so advanced (I think very few people have customized their merge policy/scheduler), a compile time error on upgrade seems fine. Make the APIs public (perhaps add a unit test, outside of oal.index package, asserting that all that's required is in fact public) Mark the APIs as subject to change.
          shaie Shai Erera added a comment -

          I don't mind doing that ... but note that LMP's methods are public (it overrides and declare them public) and so I was thinking that someone could potentially have written his own LMP (no one can write their own MP today). But if you're fine w/ me doing that, it's fine by me as well.

          BTW - I don't need to come up w/ new names after all, since by just adding the same method, w/o the IW arg changes its signature. But I agree that having just the right form makes more sense.

          shaie Shai Erera added a comment - I don't mind doing that ... but note that LMP's methods are public (it overrides and declare them public) and so I was thinking that someone could potentially have written his own LMP (no one can write their own MP today). But if you're fine w/ me doing that, it's fine by me as well. BTW - I don't need to come up w/ new names after all, since by just adding the same method, w/o the IW arg changes its signature. But I agree that having just the right form makes more sense.

          I think subclassing LMP is also extremely advanced, ie, it's OK to make an exception to our back-compat policy.

          mikemccand Michael McCandless added a comment - I think subclassing LMP is also extremely advanced, ie, it's OK to make an exception to our back-compat policy.
          shaie Shai Erera added a comment -

          Adds a ctor w/ IndexWriter to MergePolicy, LogMergePolicy, and its extensions.
          Fixed tests and IndexWriter code
          Fixed tags

          All tests pass

          shaie Shai Erera added a comment - Adds a ctor w/ IndexWriter to MergePolicy, LogMergePolicy, and its extensions. Fixed tests and IndexWriter code Fixed tags All tests pass

          Patch looks good Shai! Only change I made was to have IndexWriter writer be final in MergePolicy. I'll commit in a day or two...

          mikemccand Michael McCandless added a comment - Patch looks good Shai! Only change I made was to have IndexWriter writer be final in MergePolicy. I'll commit in a day or two...

          Thanks Shai!

          mikemccand Michael McCandless added a comment - Thanks Shai!
          markrmiller@gmail.com Mark Miller added a comment -

          I think subclassing LMP is also extremely advanced, ie, it's OK to make an exception to our back-compat policy.

          I don't disagree, especially since MergePolicy was already marked as experimental and subject to change - but just as a history note: its not just subclasses it breaks - I'm sure its rare, but Solr instantiates a MergePolicy with newInstance, which now fails because you have to pass an IndexWriter.

          markrmiller@gmail.com Mark Miller added a comment - I think subclassing LMP is also extremely advanced, ie, it's OK to make an exception to our back-compat policy. I don't disagree, especially since MergePolicy was already marked as experimental and subject to change - but just as a history note: its not just subclasses it breaks - I'm sure its rare, but Solr instantiates a MergePolicy with newInstance, which now fails because you have to pass an IndexWriter.

          Solr instantiates a MergePolicy with newInstance, which now fails because you have to pass an IndexWriter.

          Sigh. Sorry How to fix? Can Solr seek a Constructor that takes an IndexWriter instance as its single arg, instead?

          mikemccand Michael McCandless added a comment - Solr instantiates a MergePolicy with newInstance, which now fails because you have to pass an IndexWriter. Sigh. Sorry How to fix? Can Solr seek a Constructor that takes an IndexWriter instance as its single arg, instead?
          shaie Shai Erera added a comment -

          Yes I had to fix CreateIndexTask for that. We can add an empty ctor, but that would be a problem, as it will set the writer member to null. That's what I originally meant when I suggested to add the new behavior but not use it yet and deprecate the other methods and ctors.

          Is it a problem for Solr to follow the logic as in CreateIndexTask? I'm not asking from the technical perspective - that's obviously not a problem, but release-schedule wise? Maybe you can have a "fixpack release" which will be compliant w/ the 2.9 back-compat breaks?

          shaie Shai Erera added a comment - Yes I had to fix CreateIndexTask for that. We can add an empty ctor, but that would be a problem, as it will set the writer member to null. That's what I originally meant when I suggested to add the new behavior but not use it yet and deprecate the other methods and ctors. Is it a problem for Solr to follow the logic as in CreateIndexTask? I'm not asking from the technical perspective - that's obviously not a problem, but release-schedule wise? Maybe you can have a "fixpack release" which will be compliant w/ the 2.9 back-compat breaks?
          markrmiller@gmail.com Mark Miller added a comment -

          Thanks guys - we can fix this in Solr no problem - the SolrIndexWriter is creating the policy, so its easy enough to pass it to the constructor. I hadn't evaluated a fix yet, was just pointing out the back compat break beyond package private stuff.

          Certainly an easy adjustment on Solr's end though, and it seems to me that back compat break was a fair one.

          markrmiller@gmail.com Mark Miller added a comment - Thanks guys - we can fix this in Solr no problem - the SolrIndexWriter is creating the policy, so its easy enough to pass it to the constructor. I hadn't evaluated a fix yet, was just pointing out the back compat break beyond package private stuff. Certainly an easy adjustment on Solr's end though, and it seems to me that back compat break was a fair one.
          tomoko Tomoko Uchida added a comment -

          This issue was moved to GitHub issue: #2837.

          tomoko Tomoko Uchida added a comment - This issue was moved to GitHub issue: #2837 .

          People

            mikemccand Michael McCandless
            shaie Shai Erera
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: