Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Now that IndexWriterConfig is in place, I'd like to move MergePolicy to it as well. The change is not straightforward and so I've kept it for a separate issue. MergePolicy requires in its ctor an IndexWriter, however none can be passed to it before an IndexWriter actually exists. And today IW may create an MP just for it to be overridden by the application one line afterwards. I don't want to make iw member of MP non-final, or settable by extending classes, however it needs to remain protected so they can access it directly. So the proposed changes are:

      • Add a SetOnce object (to o.a.l.util), or Immutable, which can only be set once (hence its name). It'll have the signature SetOnce<T> w/ synchronized set<T> and T get(). T will be declared volatile, so that get() won't be synchronized.
      • MP will define a protected final SetOnce<IndexWriter> writer instead of the current writer. NOTE: this is a bw break. any suggestions are welcomed.
      • MP will offer a public default ctor, together with a set(IndexWriter).
      • IndexWriter will set itself on MP using set(this). Note that if set will be called more than once, it will throw an exception (AlreadySetException - or does someone have a better suggestion, preferably an already existing Java exception?).

      That's the core idea. I'd like to post a patch soon, so I'd appreciate your review and proposals.

      1. LUCENE-2320.patch
        130 kB
        Shai Erera
      2. LUCENE-2320.patch
        126 kB
        Shai Erera
      3. LUCENE-2320.patch
        127 kB
        Shai Erera
      4. LUCENE-2320.patch
        125 kB
        Shai Erera
      5. LUCENE-2320.patch
        109 kB
        Shai Erera

        Activity

        Hide
        Michael McCandless added a comment -

        Thanks Shai!

        Show
        Michael McCandless added a comment - Thanks Shai!
        Hide
        Shai Erera added a comment -

        Thanks Mike !

        Show
        Shai Erera added a comment - Thanks Mike !
        Hide
        Michael McCandless added a comment -

        The patch looks great Shai – I plan to commit in a day or two.

        I added @lucene.experimental to SetOnce's jdocs, and also removed stale javadoc in MP and MS saying that you need access to package-private APIs (unrelated to this issue but spotted it .

        Show
        Michael McCandless added a comment - The patch looks great Shai – I plan to commit in a day or two. I added @lucene.experimental to SetOnce's jdocs, and also removed stale javadoc in MP and MS saying that you need access to package-private APIs (unrelated to this issue but spotted it .
        Hide
        Shai Erera added a comment -

        Mike - are you reviewing it? I think I fixed all mentioned comments.

        Show
        Shai Erera added a comment - Mike - are you reviewing it? I think I fixed all mentioned comments.
        Hide
        Shai Erera added a comment -

        Fixed a copy-paste comment error in IndexWriter (introduced in LUCENE-2294).

        Show
        Shai Erera added a comment - Fixed a copy-paste comment error in IndexWriter (introduced in LUCENE-2294 ).
        Hide
        Michael McCandless added a comment -

        Shai this patch looks good – thanks! Somehow you keep getting yourself sucked into the issues that need big patches to fix....

        Show
        Michael McCandless added a comment - Shai this patch looks good – thanks! Somehow you keep getting yourself sucked into the issues that need big patches to fix....
        Hide
        Mark Miller added a comment -

        +1 - I've had to do this in the past too. Just dropping tests doesn't seem like the way to go in many cases.

        Show
        Mark Miller added a comment - +1 - I've had to do this in the past too. Just dropping tests doesn't seem like the way to go in many cases.
        Hide
        Michael McCandless added a comment -

        I think it's OK to add stubs to src/* under backwards branch, in cases like this? Ie when an experimental API is changed.

        Just removing the tests that use the affected API isn't really an option here – eg some tests explicitly set up a LogDocMergePolicy (not sure exactly why) and we in general can't just remove that.

        Show
        Michael McCandless added a comment - I think it's OK to add stubs to src/* under backwards branch, in cases like this? Ie when an experimental API is changed. Just removing the tests that use the affected API isn't really an option here – eg some tests explicitly set up a LogDocMergePolicy (not sure exactly why) and we in general can't just remove that.
        Hide
        Uwe Schindler added a comment -

        In that case just remove the test in backwards. If you just replicate it in the same way like in trunk, it does not make sense.

        Show
        Uwe Schindler added a comment - In that case just remove the test in backwards. If you just replicate it in the same way like in trunk, it does not make sense.
        Hide
        Shai Erera added a comment -

        Uwe, I'm pretty familiar w/ how backwards goes .. I've had a lot of bw breaks in my contributions history . This patch + issue removes the MP ctor which accepts IW and exposes the default ctor only. That's a bw break, which is documented in CHANGES as well as was agreed on here because MP is experimental and gives us the freedom to do that (not that it's such a drastic change). Therefore I had to update the src/java section of bw, so that its tests would compile against MPs that expose the default ctor, and not the one accepting IW.

        Show
        Shai Erera added a comment - Uwe, I'm pretty familiar w/ how backwards goes .. I've had a lot of bw breaks in my contributions history . This patch + issue removes the MP ctor which accepts IW and exposes the default ctor only. That's a bw break, which is documented in CHANGES as well as was agreed on here because MP is experimental and gives us the freedom to do that (not that it's such a drastic change). Therefore I had to update the src/java section of bw, so that its tests would compile against MPs that expose the default ctor, and not the one accepting IW.
        Hide
        Uwe Schindler added a comment -

        Its normally not the idea of backwards tests to change the src/java part of the backwards part, as this would hide a backwards break. You should only change only src/tests in backwards! src/java is only for compiling a JAR file of the old lucene version, if you change it, you test against the wrong classes!

        Uwe

        Show
        Uwe Schindler added a comment - Its normally not the idea of backwards tests to change the src/java part of the backwards part, as this would hide a backwards break. You should only change only src/tests in backwards! src/java is only for compiling a JAR file of the old lucene version, if you change it, you test against the wrong classes! Uwe
        Hide
        Shai Erera added a comment -

        Sorry ... I generated the patch on the wrong backwards folder (the one before Uwe's changes) . I hope this time it's ok ...

        Show
        Shai Erera added a comment - Sorry ... I generated the patch on the wrong backwards folder (the one before Uwe's changes) . I hope this time it's ok ...
        Hide
        Shai Erera added a comment -

        Updating to the latest revision. This should be ok now.

        Show
        Shai Erera added a comment - Updating to the latest revision. This should be ok now.
        Hide
        Shai Erera added a comment -

        Attached patch w/ removing the IW-related ctors from MPs, as well as fixing backwards. All tests pass, including javadocs

        Show
        Shai Erera added a comment - Attached patch w/ removing the IW-related ctors from MPs, as well as fixing backwards. All tests pass, including javadocs
        Hide
        Michael McCandless added a comment -

        Patch looks good Shai! I'd rather go with the SetOnce approach than introduce a single-use factory for IW to create the MP.

        But, I don't think we should keep the MP ctors that take IW? Ie, you make the MP then call .setIW on it? We can just remove them (and advertise this in the CHANGES bw break entry) since it's an experimental API...

        Show
        Michael McCandless added a comment - Patch looks good Shai! I'd rather go with the SetOnce approach than introduce a single-use factory for IW to create the MP. But, I don't think we should keep the MP ctors that take IW? Ie, you make the MP then call .setIW on it? We can just remove them (and advertise this in the CHANGES bw break entry) since it's an experimental API...
        Hide
        Shai Erera added a comment -

        But it's MP which requires IW. So how will your policeman (like the name ) proposal prevent it? I think that setting IW on MP is not such a bad thing. If MP needs it then it needs. The question now is to what length do we want to go w/ it: make it sort of final (in which case SetOnce makes sense) or settle w/ a setIW which is simpler.

        This issue is more about moving MP into IWC than refactor MP. I'd like to keep it focused on that as much as possible. I don't mean that we should stop discussing the refactoring, just to say it can be done separately. After MP moves to IWC and all code is converted to use the new API, refactoring MP internally should not affect the API level, right?

        If u agree w/ that, then how do u propose to continue? W/ SetOnce or a simple setter?

        Show
        Shai Erera added a comment - But it's MP which requires IW. So how will your policeman (like the name ) proposal prevent it? I think that setting IW on MP is not such a bad thing. If MP needs it then it needs. The question now is to what length do we want to go w/ it: make it sort of final (in which case SetOnce makes sense) or settle w/ a setIW which is simpler. This issue is more about moving MP into IWC than refactor MP. I'd like to keep it focused on that as much as possible. I don't mean that we should stop discussing the refactoring, just to say it can be done separately. After MP moves to IWC and all code is converted to use the new API, refactoring MP internally should not affect the API level, right? If u agree w/ that, then how do u propose to continue? W/ SetOnce or a simple setter?
        Hide
        Earwin Burrfoot added a comment -

        We could split MergePolicy in two - class that represents the policy (config/factory) and class that acts on that policy (instance).

        So IW gets a MergePolicy that has no outside references, and creates a MergePoliceman from it, supplying 'this' on construction.
        Thus, circular reference still exists, but is contained for good.

        Not sure I totally love the idea myself though.

        Show
        Earwin Burrfoot added a comment - We could split MergePolicy in two - class that represents the policy (config/factory) and class that acts on that policy (instance). So IW gets a MergePolicy that has no outside references, and creates a MergePoliceman from it, supplying 'this' on construction. Thus, circular reference still exists, but is contained for good. Not sure I totally love the idea myself though.
        Hide
        Shai Erera added a comment -

        The thing is that we were at that position already, before I changed it so that MP requires writer up front. The reason was, like Mike mentioned, that writer had to be passed on all method calls, for really no good reason. A MP is usually coupled w/ an IW instance and I don't think we should opt for decoupling them.

        Most of this patch removes MP setting from IW to IWC (and hence changes test code to use the new API). The SetOnce juggling is done only to ensure an IW is set exactly once on MP, and allows us to resolve that circular dependency. We can do two things:

        1. Continue w/ SetOnce as introduced in this patch.
        2. Introduce a setIndexWriter on MP which anyone can call, even more than once.

        With (1) I don't think we complicate anything, and SetOnce can be useful in other places as well. (2) is really like passing writer on all method calls, so let's at least not have it as part of all methods signature. I prefer (1) slightly over (2) but am fine w/ (2) as well. I wouldn't want to change MP back to require IW on all its methods.

        Show
        Shai Erera added a comment - The thing is that we were at that position already, before I changed it so that MP requires writer up front. The reason was, like Mike mentioned, that writer had to be passed on all method calls, for really no good reason. A MP is usually coupled w/ an IW instance and I don't think we should opt for decoupling them. Most of this patch removes MP setting from IW to IWC (and hence changes test code to use the new API). The SetOnce juggling is done only to ensure an IW is set exactly once on MP, and allows us to resolve that circular dependency. We can do two things: Continue w/ SetOnce as introduced in this patch. Introduce a setIndexWriter on MP which anyone can call, even more than once. With (1) I don't think we complicate anything, and SetOnce can be useful in other places as well. (2) is really like passing writer on all method calls, so let's at least not have it as part of all methods signature. I prefer (1) slightly over (2) but am fine w/ (2) as well. I wouldn't want to change MP back to require IW on all its methods.
        Hide
        Michael McCandless added a comment -

        Or, maybe, we should think of MergePolicy API that doesn't require one to keep a reference to IW?

        Looks like IW is used pretty widely: for messaging (when infoStream is set), for retrieving the merges, for getting the Directory, and for getting number of deleted docs for a given segment. I guess an option would be to simply pass it around everywhere. Then we wouldn't have to break the circular dependendy.

        This is what MergeScheduler appears to do – it's passed to .merge, and then each bg thread in CMS holds a reference to the writer (since it needs to ask for followon merges).

        Show
        Michael McCandless added a comment - Or, maybe, we should think of MergePolicy API that doesn't require one to keep a reference to IW? Looks like IW is used pretty widely: for messaging (when infoStream is set), for retrieving the merges, for getting the Directory, and for getting number of deleted docs for a given segment. I guess an option would be to simply pass it around everywhere. Then we wouldn't have to break the circular dependendy. This is what MergeScheduler appears to do – it's passed to .merge, and then each bg thread in CMS holds a reference to the writer (since it needs to ask for followon merges).
        Hide
        Earwin Burrfoot added a comment -

        Or, maybe, we should think of MergePolicy API that doesn't require one to keep a reference to IW?
        The more you struggle to control such circular multistage initializations and check their validity, the more their ugliness stands out and screams - refactor meee!

        Show
        Earwin Burrfoot added a comment - Or, maybe, we should think of MergePolicy API that doesn't require one to keep a reference to IW? The more you struggle to control such circular multistage initializations and check their validity, the more their ugliness stands out and screams - refactor meee!
        Hide
        Shai Erera added a comment -

        Patch with the proposed changes including:

        • SetOnce (+ test)
        • Deprecations in IW
        • Adds MP to IWC
        • Changes needed in MP and extensions
        • Converted tests code to not use the deprecated API.

        All tests pass, including core, contrib and backwards

        Show
        Shai Erera added a comment - Patch with the proposed changes including: SetOnce (+ test) Deprecations in IW Adds MP to IWC Changes needed in MP and extensions Converted tests code to not use the deprecated API. All tests pass, including core, contrib and backwards
        Hide
        Shai Erera added a comment -

        Ok, I'll update that part of CHANGES. The conversion is very straightforward - instead of calling writer.doXYZ(), the code becomes writer.get().doXYZ().

        Show
        Shai Erera added a comment - Ok, I'll update that part of CHANGES. The conversion is very straightforward - instead of calling writer.doXYZ(), the code becomes writer.get().doXYZ().
        Hide
        Michael McCandless added a comment -

        This sounds good Shai!

        This is technically a back-compat break, but, these are experimental APIs, and this is an easy fix for users who have impl'd their own MPs, so I think it's OK to make an exception. Make sure you note it in the bw break section in CHANGES...

        Show
        Michael McCandless added a comment - This sounds good Shai! This is technically a back-compat break, but, these are experimental APIs, and this is an easy fix for users who have impl'd their own MPs, so I think it's OK to make an exception. Make sure you note it in the bw break section in CHANGES...

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development