Lucene - Core
  1. Lucene - Core
  2. LUCENE-4876

IndexWriterConfig.clone should clone the MergeScheduler

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.3, 4.5, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      ConcurrentMergeScheduler has a List<MergeThread> member to track the running merging threads, so IndexWriterConfig.clone should clone the merge scheduler so that both IndexWriterConfig instances are independant.

      1. LUCENE-4876.patch
        21 kB
        Shai Erera
      2. LUCENE-4876.patch
        21 kB
        Shai Erera
      3. LUCENE-4876.patch
        11 kB
        Shai Erera
      4. LUCENE-4876.patch
        37 kB
        Adrien Grand
      5. LUCENE-4876.patch
        34 kB
        Adrien Grand

        Activity

        Hide
        Michael McCandless added a comment -

        +1

        Show
        Michael McCandless added a comment - +1
        Hide
        Adrien Grand added a comment -

        Patch:

        • MergeScheduler implements Cloneable
        • IndexDeletionPolicy is now an abstract class (so that it can provide a default clone impl) and implements Cloneable
        • InfoStream implements Cloneable (there is no need for this today but I assumed that some people might be interested to display line numbers or other things that would require adding a state to the InfoStream, I've no strong feeling about it and can remove it if you think it shouldn't implement Cloneable)
        • MergeSchedulers and IndexDeletionPolicies have been fixed so that clones don't share state with the instance they've been cloned from
        • IndexWriterConfig clones mergeScheduler and delPolicy (in addition to mergePolicy, flushPolicy and indexerThreadPool which were already cloned)
        • Most of the patch changes are due to the fact that many tests assumed that the IndexDeletionPolicy instance passed to IndexWriterConfig was the same one as the one used by IndexWriter (which is not true now since IndexWriter clones the provided config in its constructor and we now clone del policies in IndexWriterConfig.clone).
        Show
        Adrien Grand added a comment - Patch: MergeScheduler implements Cloneable IndexDeletionPolicy is now an abstract class (so that it can provide a default clone impl) and implements Cloneable InfoStream implements Cloneable (there is no need for this today but I assumed that some people might be interested to display line numbers or other things that would require adding a state to the InfoStream, I've no strong feeling about it and can remove it if you think it shouldn't implement Cloneable) MergeSchedulers and IndexDeletionPolicies have been fixed so that clones don't share state with the instance they've been cloned from IndexWriterConfig clones mergeScheduler and delPolicy (in addition to mergePolicy, flushPolicy and indexerThreadPool which were already cloned) Most of the patch changes are due to the fact that many tests assumed that the IndexDeletionPolicy instance passed to IndexWriterConfig was the same one as the one used by IndexWriter (which is not true now since IndexWriter clones the provided config in its constructor and we now clone del policies in IndexWriterConfig.clone).
        Hide
        Michael McCandless added a comment -

        +1, looks great!

        IndexDeletionPolicy changed from an interface to abstract class ... I think this is fine (even though it's not marked @lucene.experimental): it's obviously very expert. Just make sure to advertise this break in the back compat break CHANGES section.

        Does PersistentSnapshotDeletionPolicy need clone() too? Hmm that doesn't seem straightforward ... (can't clone the open IndexWriter). Maybe just add to its javadocs that you should never try to share one instance of it across writers?

        Show
        Michael McCandless added a comment - +1, looks great! IndexDeletionPolicy changed from an interface to abstract class ... I think this is fine (even though it's not marked @lucene.experimental): it's obviously very expert. Just make sure to advertise this break in the back compat break CHANGES section. Does PersistentSnapshotDeletionPolicy need clone() too? Hmm that doesn't seem straightforward ... (can't clone the open IndexWriter). Maybe just add to its javadocs that you should never try to share one instance of it across writers?
        Hide
        Adrien Grand added a comment -

        Does PersistentSnapshotDeletionPolicy need clone() too?

        At first, I though about making its clone() method throw an exception but we can't because IndexWriter constructor always clones the provided IndexWriterConfig. I'll add warnings about sharing in the javadocs.

        Show
        Adrien Grand added a comment - Does PersistentSnapshotDeletionPolicy need clone() too? At first, I though about making its clone() method throw an exception but we can't because IndexWriter constructor always clones the provided IndexWriterConfig. I'll add warnings about sharing in the javadocs.
        Hide
        Adrien Grand added a comment -

        New patch:

        • Added CHANGES entries
        • Added documentation to PersistentSnapshotDeletionPolicy to make clear that instances of this classes must not be shared across IndexWriters
        • Some Solr tests were failing because Solr expects SolrCore.solrDelPolicy to be the same instance as IndexWriter.getConfig().getIndexDeletionPolicy(). There is sensible code relying on it (SnapShooter/ReplicationHandler in particular) so I preferred emulating the old behavior by making IndexDeletetionPolicyWrapper.clone() return 'this' for the moment. This is not a problem because each core has its own private deletion policy and never opens more than one IndexWriter with it.
        Show
        Adrien Grand added a comment - New patch: Added CHANGES entries Added documentation to PersistentSnapshotDeletionPolicy to make clear that instances of this classes must not be shared across IndexWriters Some Solr tests were failing because Solr expects SolrCore.solrDelPolicy to be the same instance as IndexWriter.getConfig().getIndexDeletionPolicy(). There is sensible code relying on it (SnapShooter/ReplicationHandler in particular) so I preferred emulating the old behavior by making IndexDeletetionPolicyWrapper.clone() return 'this' for the moment. This is not a problem because each core has its own private deletion policy and never opens more than one IndexWriter with it.
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.
        Hide
        selckin added a comment -

        The new requirement to implement clone() is not very well documented in the release notes or in the javadoc, and can silently break things on upgrading

        Show
        selckin added a comment - The new requirement to implement clone() is not very well documented in the release notes or in the javadoc, and can silently break things on upgrading
        Hide
        Michael McCandless added a comment -

        Alas I'm not sure we did the right thing here; it's very counter intuitive that the IndexDeletionPolicy that you set on your IWC is not the one used by IndexWriter, and it confuses users: http://lucene.markmail.org/thread/ihxkc44ba4mpnkpo

        Why, again, must we clone everything coming off of IWC? Who is sharing these instances across different IndexWriter instances?

        Show
        Michael McCandless added a comment - Alas I'm not sure we did the right thing here; it's very counter intuitive that the IndexDeletionPolicy that you set on your IWC is not the one used by IndexWriter, and it confuses users: http://lucene.markmail.org/thread/ihxkc44ba4mpnkpo Why, again, must we clone everything coming off of IWC? Who is sharing these instances across different IndexWriter instances?
        Hide
        Shai Erera added a comment -

        Perhaps we can do a minor change – stop calling IWC.clone() by IW on init. We keep clone() on IWC, and the rest of the objects, and tell users that it's their responsibility to call IWC.clone() before passing to IW? That's line a 1-liner change (well + clarifying the jdocs), that will make 99% of the users happy. The rest should just do new IW(dir, conf.clone()) ... that's simple enough?

        Show
        Shai Erera added a comment - Perhaps we can do a minor change – stop calling IWC.clone() by IW on init. We keep clone() on IWC, and the rest of the objects, and tell users that it's their responsibility to call IWC.clone() before passing to IW? That's line a 1-liner change (well + clarifying the jdocs), that will make 99% of the users happy. The rest should just do new IW(dir, conf.clone()) ... that's simple enough?
        Hide
        Michael McCandless added a comment -

        +1

        Show
        Michael McCandless added a comment - +1
        Hide
        Yonik Seeley added a comment -

        We keep clone() on IWC, and the rest of the objects, and tell users that it's their responsibility to call IWC.clone()

        +1

        Show
        Yonik Seeley added a comment - We keep clone() on IWC, and the rest of the objects, and tell users that it's their responsibility to call IWC.clone() +1
        Hide
        Shai Erera added a comment -

        I thought it will be an easy change, but some tests failed, as they relied on the private cloning. I was able to fix all of them except TestSnapshotDeletionPolicy.testRollbackToOldSnapshot.

        I don't know if the test was always buggy, or SDP is buggy, but here's what happens if I don't clone IWC (which I put a nocommit on):

        • First IW instance is created with the non-cloned SDP, the snapshots (IndexCommit instances) match the ones in the SDP instance. That is, assertSame passes.
        • Second time we create an IW instance, not cloning IWC, SDP.onInit receives new IndexCommit instances, and swaps them in its map, thereby changing the SDP.commits instances vs the test's snapshots instances, which causes assertSame to fail.

        This is the result of SDP.onInit which despite already having IC instances, swaps them with the new ones given by IW. I think this is correct, since that's a new IW and we should use the new IndexCommit references. In that case, the test should be relaxed to use assertEquals and not assertSame? I've put a nocommit next to calling assertSnapshotExists – changed to not check "same" IC, and the test passes without cloning ... what do you think?

        Show
        Shai Erera added a comment - I thought it will be an easy change, but some tests failed, as they relied on the private cloning. I was able to fix all of them except TestSnapshotDeletionPolicy.testRollbackToOldSnapshot. I don't know if the test was always buggy, or SDP is buggy, but here's what happens if I don't clone IWC (which I put a nocommit on): First IW instance is created with the non-cloned SDP, the snapshots (IndexCommit instances) match the ones in the SDP instance. That is, assertSame passes. Second time we create an IW instance, not cloning IWC, SDP.onInit receives new IndexCommit instances, and swaps them in its map, thereby changing the SDP.commits instances vs the test's snapshots instances, which causes assertSame to fail. This is the result of SDP.onInit which despite already having IC instances, swaps them with the new ones given by IW. I think this is correct, since that's a new IW and we should use the new IndexCommit references. In that case, the test should be relaxed to use assertEquals and not assertSame? I've put a nocommit next to calling assertSnapshotExists – changed to not check "same" IC, and the test passes without cloning ... what do you think?
        Hide
        Adrien Grand added a comment -

        We keep clone() on IWC, and the rest of the objects, and tell users that it's their responsibility to call IWC.clone() before passing to IW? That's line a 1-liner change (well + clarifying the jdocs), that will make 99% of the users happy. The rest should just do new IW(dir, conf.clone()) ... that's simple enough?

        Even though most users probably don't reuse their IndexWriterConfig objects, doing so should be safe and I'm a little scared of what could happen if a ConcurrentMergeScheduler was mistakenly shared by two different IndexWriters for example.

        Maybe another option for this issue would be to replace all these objects (MergePolicy, MergeScheduler, etc.) in IndexWriterConfig by factories for these objects that accept an IndexWriter as an argument (and maybe other objects depending on the factory). This would make it clear that IndexWriter has its own instance of these objects and reusing IndexWriterConfig instances would still be safe. An interesting side-effect is that we wouldn't need these SetOnce<?> in DWPT, FlushPolicy, and MergePolicy anymore, and ConcurrentMergeScheduler.indexWriter could be made final.

        Show
        Adrien Grand added a comment - We keep clone() on IWC, and the rest of the objects, and tell users that it's their responsibility to call IWC.clone() before passing to IW? That's line a 1-liner change (well + clarifying the jdocs), that will make 99% of the users happy. The rest should just do new IW(dir, conf.clone()) ... that's simple enough? Even though most users probably don't reuse their IndexWriterConfig objects, doing so should be safe and I'm a little scared of what could happen if a ConcurrentMergeScheduler was mistakenly shared by two different IndexWriters for example. Maybe another option for this issue would be to replace all these objects (MergePolicy, MergeScheduler, etc.) in IndexWriterConfig by factories for these objects that accept an IndexWriter as an argument (and maybe other objects depending on the factory). This would make it clear that IndexWriter has its own instance of these objects and reusing IndexWriterConfig instances would still be safe. An interesting side-effect is that we wouldn't need these SetOnce<?> in DWPT, FlushPolicy, and MergePolicy anymore, and ConcurrentMergeScheduler.indexWriter could be made final.
        Hide
        Shai Erera added a comment -

        This is currently impossible because of SetOnce. If you try to share the same IWC between two writers, MP will throw an exception that its IW cannot be set again. I think we're pretty safe. I'm more bothered by having confusing APIs – take a look at TestSDP (also SearcherAndTaxoRevision!): you create an IWC, want to get a handle to its IndexDelPolicy, yet you cannot do that until after IW has been created and you need to obtain it from its config. Given that most apps probably manage only one IW, and that creating IWC is not heavy, such that you gain anything by reusing it, I think it's ok that IW won't privately clone it?

        Show
        Shai Erera added a comment - This is currently impossible because of SetOnce. If you try to share the same IWC between two writers, MP will throw an exception that its IW cannot be set again. I think we're pretty safe. I'm more bothered by having confusing APIs – take a look at TestSDP (also SearcherAndTaxoRevision!): you create an IWC, want to get a handle to its IndexDelPolicy, yet you cannot do that until after IW has been created and you need to obtain it from its config. Given that most apps probably manage only one IW, and that creating IWC is not heavy, such that you gain anything by reusing it, I think it's ok that IW won't privately clone it?
        Hide
        selckin added a comment -

        This is also encouraging to just 'return this' from clone() in DeletionPolicy, it is how i fixed my app, how solr fixed it, and elasticsearch does it as well, so for future changes you can't even assume the clone is being done, not to mention how cruel it is to expose clone semantics on end-users

        Show
        selckin added a comment - This is also encouraging to just 'return this' from clone() in DeletionPolicy, it is how i fixed my app, how solr fixed it, and elasticsearch does it as well, so for future changes you can't even assume the clone is being done, not to mention how cruel it is to expose clone semantics on end-users
        Hide
        Shai Erera added a comment -

        That's right. I view IWC.clone as a service to apps that cannot do new IndexWriterConfig, not a necessity for IW to function properly.

        Show
        Shai Erera added a comment - That's right. I view IWC.clone as a service to apps that cannot do new IndexWriterConfig, not a necessity for IW to function properly.
        Hide
        Adrien Grand added a comment -

        This is currently impossible because of SetOnce.

        The merge schedulers don't have a SetOnce<IndexWriter> so if a user replaces the MergePolicy and all objects that have a SetOnce in its IndexWriterConfig and forgets the merge scheduler, the problem remains.

        I don't really like this SetOnce<?> trick. If a variable should only be set once, it should be final and set in the constructor?

        how cruel it is to expose clone semantics on end-users

        I fully agree. In this issue I tried to make clone consistently used across stateful objects held by an IndexWriterConfig object but ideally IndexWriterConfig should only carry stateless objects (in particular none of them should have an IndexWriter as a member) so that we never need to clone it or any of its members when reusing it.

        Show
        Adrien Grand added a comment - This is currently impossible because of SetOnce. The merge schedulers don't have a SetOnce<IndexWriter> so if a user replaces the MergePolicy and all objects that have a SetOnce in its IndexWriterConfig and forgets the merge scheduler, the problem remains. I don't really like this SetOnce<?> trick. If a variable should only be set once, it should be final and set in the constructor? how cruel it is to expose clone semantics on end-users I fully agree. In this issue I tried to make clone consistently used across stateful objects held by an IndexWriterConfig object but ideally IndexWriterConfig should only carry stateless objects (in particular none of them should have an IndexWriter as a member) so that we never need to clone it or any of its members when reusing it.
        Hide
        Shai Erera added a comment -

        The purpose behind IWC was to centralize all IW setters and ctors into one config object. This is not only about IW being a member of MP or not. There are other settings which you many not want to share with other IWs. I think we should have a SetOnce(IndexWriter) (or AtomicBoolean isUsed) members on IWC which IW will set in the ctor. That will prevent sharing IWC with anything else. It's a fast enough detection. I think I'll do that. Let's also remember that most likely apps are NOT sharing IWC, or reusing one. And those that need to create multiple instances, should have no problem creating new instances of IWC (even without calling clone).

        As for making the rest of the classes stateless ... it may work for MP, but IDP such as Snapshot maintains state. What will you do then?

        We should go for the simplest solution. I'll try to make the changes to IWC so that it's not share-able.

        Show
        Shai Erera added a comment - The purpose behind IWC was to centralize all IW setters and ctors into one config object. This is not only about IW being a member of MP or not. There are other settings which you many not want to share with other IWs. I think we should have a SetOnce(IndexWriter) (or AtomicBoolean isUsed) members on IWC which IW will set in the ctor. That will prevent sharing IWC with anything else. It's a fast enough detection. I think I'll do that. Let's also remember that most likely apps are NOT sharing IWC, or reusing one. And those that need to create multiple instances, should have no problem creating new instances of IWC (even without calling clone). As for making the rest of the classes stateless ... it may work for MP, but IDP such as Snapshot maintains state. What will you do then? We should go for the simplest solution. I'll try to make the changes to IWC so that it's not share-able.
        Hide
        Shai Erera added a comment -

        Patch adds a SetOnce<IndexWriter> to IWC. IW now sets itself on the given conf in the ctor. Some tests had to be adjusted to clone their IWC. I think it's quite simple.

        Core tests pass, I'll run the rest. I also need to resolve the nocommit around SnapshotDeletionPolicy.

        Show
        Shai Erera added a comment - Patch adds a SetOnce<IndexWriter> to IWC. IW now sets itself on the given conf in the ctor. Some tests had to be adjusted to clone their IWC. I think it's quite simple. Core tests pass, I'll run the rest. I also need to resolve the nocommit around SnapshotDeletionPolicy.
        Hide
        Michael McCandless added a comment -

        In that case, the test should be relaxed to use assertEquals and not assertSame?

        +1

        Show
        Michael McCandless added a comment - In that case, the test should be relaxed to use assertEquals and not assertSame? +1
        Hide
        Shai Erera added a comment -

        All tests pass (including Lucene and Solr). I also resolved the nocommit by modifying the test. It tests rolling back the index, it shouldn't care about whether the IndexCommit being snapshotted is the exact same reference. Also, SDP seems to be working correctly by updating the IC reference. It makes no sense to use an old reference.

        So now you cannot share IWC because of the SetOnce<IndexWriter). Users cannot make a mistake, yet simple apps don't need to do weird things to get a handle to the objects they set on IWC.

        If there are no objections, I will commit it.

        Show
        Shai Erera added a comment - All tests pass (including Lucene and Solr). I also resolved the nocommit by modifying the test. It tests rolling back the index, it shouldn't care about whether the IndexCommit being snapshotted is the exact same reference. Also, SDP seems to be working correctly by updating the IC reference. It makes no sense to use an old reference. So now you cannot share IWC because of the SetOnce<IndexWriter). Users cannot make a mistake, yet simple apps don't need to do weird things to get a handle to the objects they set on IWC. If there are no objections, I will commit it.
        Hide
        Adrien Grand added a comment -

        The SetOnce<IndexWriter> on IWC addresses my main concern. Thanks Shai!

        Show
        Adrien Grand added a comment - The SetOnce<IndexWriter> on IWC addresses my main concern. Thanks Shai!
        Hide
        Michael McCandless added a comment -

        +1, patch looks great. Thanks Shai!

        Show
        Michael McCandless added a comment - +1, patch looks great. Thanks Shai!
        Hide
        ASF subversion and git services added a comment -

        Commit 1507704 from Shai Erera in branch 'dev/trunk'
        [ https://svn.apache.org/r1507704 ]

        LUCENE-4876: don't clone IndexWriterConfig by IndexWriter; prevent config from being shared with multiple writers

        Show
        ASF subversion and git services added a comment - Commit 1507704 from Shai Erera in branch 'dev/trunk' [ https://svn.apache.org/r1507704 ] LUCENE-4876 : don't clone IndexWriterConfig by IndexWriter; prevent config from being shared with multiple writers
        Hide
        ASF subversion and git services added a comment -

        Commit 1507705 from Shai Erera in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1507705 ]

        LUCENE-4876: don't clone IndexWriterConfig by IndexWriter; prevent config from being shared with multiple writers

        Show
        ASF subversion and git services added a comment - Commit 1507705 from Shai Erera in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1507705 ] LUCENE-4876 : don't clone IndexWriterConfig by IndexWriter; prevent config from being shared with multiple writers
        Hide
        Shai Erera added a comment -

        Committed to trunk and 4x. Thanks Mike and Adrien!

        Show
        Shai Erera added a comment - Committed to trunk and 4x. Thanks Mike and Adrien!
        Hide
        Adrien Grand added a comment -

        4.5 release -> bulk close

        Show
        Adrien Grand added a comment - 4.5 release -> bulk close

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development