Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.9, 6.0
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We originally added this clone to allow a single IWC to be re-used against more than one IndexWriter, but I think this is a mis-feature: it adds complexity to hairy classes (merge policy/scheduler, DW thread pool, etc.), I think it's buggy today.

      I think we should just disallow sharing: you must make a new IWC for a new IndexWriter.

      1. LUCENE-5708.patch
        36 kB
        Michael McCandless
      2. LUCENE-5708.patch
        33 kB
        Michael McCandless
      3. LUCENE-5708.patch
        0.7 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Initial patch, tests seem to pass. IWC already detects if it's illegally re-used across more than one IW.

        Show
        Michael McCandless added a comment - Initial patch, tests seem to pass. IWC already detects if it's illegally re-used across more than one IW.
        Hide
        Michael McCandless added a comment -

        Woops, wrong patch ... this one should work.

        Show
        Michael McCandless added a comment - Woops, wrong patch ... this one should work.
        Hide
        Shai Erera added a comment -

        I think the way you fixed some tests that used clone is incorrect. You should at least call newIndexWriterConfig(random) w/ the same random and seed, so the exact IWC is created each time. At least, that's what these tests now rely on, even if they don't break. Otherwise, they just create a random IWC each time they open a writer, which is not the intention I believe.

        Show
        Shai Erera added a comment - I think the way you fixed some tests that used clone is incorrect. You should at least call newIndexWriterConfig(random) w/ the same random and seed, so the exact IWC is created each time. At least, that's what these tests now rely on, even if they don't break. Otherwise, they just create a random IWC each time they open a writer, which is not the intention I believe.
        Hide
        Michael McCandless added a comment -

        Hmm which tests rely on using the same IWC? I thought I was improving the tests by switching up the config...

        Show
        Michael McCandless added a comment - Hmm which tests rely on using the same IWC? I thought I was improving the tests by switching up the config...
        Hide
        Adrien Grand added a comment -

        I don't know about these tests that expect the same config but I'm +1 in general to remove all that cloning. It looks to me that we should be able to make some fields final now that we don't have a clone method anymore (eg. MergePolicy.writer)?

        Show
        Adrien Grand added a comment - I don't know about these tests that expect the same config but I'm +1 in general to remove all that cloning. It looks to me that we should be able to make some fields final now that we don't have a clone method anymore (eg. MergePolicy.writer)?
        Hide
        Shai Erera added a comment - - edited

        Hmm which tests rely on using the same IWC?

        Hmm ... I don't remember. All I remember is that while I worked on preventing sharing IWC between writers (LUCENE-4876), there were a bunch of tests that reused the IWC. I fixed them by simply cloning it, but I admit I didn't check if initializing a new IWC each time serves their purpose. I just assume that if so many tests did that, there ought to be a reason beyond just convenience, but I could be wrong.

        What I'm worried is that by not cloning Jenkins will trip (which is good!), or worse - that those tests will stop asserting what they asserted before. So I just wanted to point that out. If we're ready to take the risk, I'm fine with it, because eventually we're discussing tests here .. there's nothing functionally impossible from an app's perspective.

        Show
        Shai Erera added a comment - - edited Hmm which tests rely on using the same IWC? Hmm ... I don't remember. All I remember is that while I worked on preventing sharing IWC between writers ( LUCENE-4876 ), there were a bunch of tests that reused the IWC. I fixed them by simply cloning it, but I admit I didn't check if initializing a new IWC each time serves their purpose. I just assume that if so many tests did that, there ought to be a reason beyond just convenience, but I could be wrong. What I'm worried is that by not cloning Jenkins will trip (which is good!), or worse - that those tests will stop asserting what they asserted before. So I just wanted to point that out. If we're ready to take the risk, I'm fine with it, because eventually we're discussing tests here .. there's nothing functionally impossible from an app's perspective.
        Hide
        Simon Willnauer added a comment -

        part of the problem is that we holding IW state on MergePolicy and MergeScheduler. Both classes should get the IW passed to the relevant methods so we can share them across as many instances we want...

        Show
        Simon Willnauer added a comment - part of the problem is that we holding IW state on MergePolicy and MergeScheduler. Both classes should get the IW passed to the relevant methods so we can share them across as many instances we want...
        Hide
        Michael McCandless added a comment -

        New patch, I think it's ready; I'll commit soon...

        It looks to me that we should be able to make some fields final now that we don't have a clone method anymore

        +1, I fixed a few of these, and found a couple more "implements Cloneable" to remove.

        (eg. MergePolicy.writer)

        Looks like we'll remove IW as a field in MP/MS with LUCENE-5711.

        Show
        Michael McCandless added a comment - New patch, I think it's ready; I'll commit soon... It looks to me that we should be able to make some fields final now that we don't have a clone method anymore +1, I fixed a few of these, and found a couple more "implements Cloneable" to remove. (eg. MergePolicy.writer) Looks like we'll remove IW as a field in MP/MS with LUCENE-5711 .
        Hide
        ASF subversion and git services added a comment -

        Commit 1598489 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1598489 ]

        LUCENE-5708: remove IWC.clone

        Show
        ASF subversion and git services added a comment - Commit 1598489 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1598489 ] LUCENE-5708 : remove IWC.clone
        Hide
        ASF subversion and git services added a comment -

        Commit 1598492 from Michael McCandless in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1598492 ]

        LUCENE-5708: remove IWC.clone

        Show
        ASF subversion and git services added a comment - Commit 1598492 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1598492 ] LUCENE-5708 : remove IWC.clone
        Hide
        ASF subversion and git services added a comment -

        Commit 1598496 from Michael McCandless in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1598496 ]

        LUCENE-5708: fix test bug

        Show
        ASF subversion and git services added a comment - Commit 1598496 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1598496 ] LUCENE-5708 : fix test bug
        Hide
        ASF subversion and git services added a comment -

        Commit 1598497 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1598497 ]

        LUCENE-5708: fix test bug

        Show
        ASF subversion and git services added a comment - Commit 1598497 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1598497 ] LUCENE-5708 : fix test bug
        Hide
        ASF subversion and git services added a comment -

        Commit 1598502 from Michael McCandless in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1598502 ]

        LUCENE-5708: fix test bug

        Show
        ASF subversion and git services added a comment - Commit 1598502 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1598502 ] LUCENE-5708 : fix test bug
        Hide
        ASF subversion and git services added a comment -

        Commit 1598503 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1598503 ]

        LUCENE-5708: fix test bug

        Show
        ASF subversion and git services added a comment - Commit 1598503 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1598503 ] LUCENE-5708 : fix test bug
        Hide
        Robert Muir added a comment -

        Can we remove the "fix test bugs" commits and fix the true underlying issue?

        Tests arent testing what they should, we need to fix the disease or back this change out.

        Show
        Robert Muir added a comment - Can we remove the "fix test bugs" commits and fix the true underlying issue? Tests arent testing what they should, we need to fix the disease or back this change out.
        Hide
        Michael McCandless added a comment -

        Tests arent testing what they should, we need to fix the disease or back this change out.

        I'm happy to back this change out if you want.

        But, I'm confused: are you talking about how these tests hardwire the PF to Lucene41? I agree we should fix that, but this is a pre-existing issue.

        Show
        Michael McCandless added a comment - Tests arent testing what they should, we need to fix the disease or back this change out. I'm happy to back this change out if you want. But, I'm confused: are you talking about how these tests hardwire the PF to Lucene41? I agree we should fix that, but this is a pre-existing issue.
        Hide
        Robert Muir added a comment -

        All these "PostingsFormat" tests wire their codec to themselves. This allows us to implement generic base classes with tests that all codecs should pass, as well as codec-specific tests (e.g. in lucene41) that test particular corner cases of importance.

        this worked well in knocking out bugs for postings, so the whole scheme was duplicated to docvalues, storedfields, vectors, everything.

        Now here comes this commit, and these tests (which are important to ensure the index format is working) are no longer testing what they are supposed to. For example the tests in Lucene41 package explicitly test special cases of that codec that would otherwise be extraordinarily rare in the existing random tests. If they are executing against random codecs or even random configurations then they just became useless.

        So thats why I'm concerned: I see this commit causing these failures, and I know we just experienced a significant loss of test coverage to the index format. We are relying upon tests to fail to detect this, but unfortunately 'loss of test coverage' doesn't always trigger a jenkins build.

        So maybe instead of playing whack-a-mole with jenkins tests failures, we should pay more attention reviewing all changes to unit tests where clone() was previously used. The patch is buggy here, and I just want to ensure its taken seriously so that we do not lose coverage

        Show
        Robert Muir added a comment - All these "PostingsFormat" tests wire their codec to themselves. This allows us to implement generic base classes with tests that all codecs should pass, as well as codec-specific tests (e.g. in lucene41) that test particular corner cases of importance. this worked well in knocking out bugs for postings, so the whole scheme was duplicated to docvalues, storedfields, vectors, everything. Now here comes this commit, and these tests (which are important to ensure the index format is working) are no longer testing what they are supposed to. For example the tests in Lucene41 package explicitly test special cases of that codec that would otherwise be extraordinarily rare in the existing random tests. If they are executing against random codecs or even random configurations then they just became useless. So thats why I'm concerned: I see this commit causing these failures, and I know we just experienced a significant loss of test coverage to the index format. We are relying upon tests to fail to detect this, but unfortunately 'loss of test coverage' doesn't always trigger a jenkins build. So maybe instead of playing whack-a-mole with jenkins tests failures, we should pay more attention reviewing all changes to unit tests where clone() was previously used. The patch is buggy here, and I just want to ensure its taken seriously so that we do not lose coverage
        Hide
        Michael McCandless added a comment -

        I agree that the distinction of "randomly test this PF via the generic
        base test class" and the "specifically test tricky corner cases for
        this particular PF" (e.g. TestBlockPostingsFormat/2/3.java) is
        important, and the specific IWC settings for those tests are
        necessary.

        I reviewed all the test changes more closely, and found a couple other
        places that needed to carry over explicit IWC changes after pulling a
        random IWC (I'll commit shortly).

        I think this is net/net good vs the clone we had before: it means we
        are still randomly changing the things the test didn't care about, and
        fixing the settings that it does.

        Show
        Michael McCandless added a comment - I agree that the distinction of "randomly test this PF via the generic base test class" and the "specifically test tricky corner cases for this particular PF" (e.g. TestBlockPostingsFormat/2/3.java) is important, and the specific IWC settings for those tests are necessary. I reviewed all the test changes more closely, and found a couple other places that needed to carry over explicit IWC changes after pulling a random IWC (I'll commit shortly). I think this is net/net good vs the clone we had before: it means we are still randomly changing the things the test didn't care about, and fixing the settings that it does.
        Hide
        ASF subversion and git services added a comment -

        Commit 1598543 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1598543 ]

        LUCENE-5708: fix these tests to also 'mimic' previous IWC.clone

        Show
        ASF subversion and git services added a comment - Commit 1598543 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1598543 ] LUCENE-5708 : fix these tests to also 'mimic' previous IWC.clone
        Hide
        ASF subversion and git services added a comment -

        Commit 1598545 from Michael McCandless in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1598545 ]

        LUCENE-5708: fix these tests to also 'mimic' previous IWC.clone

        Show
        ASF subversion and git services added a comment - Commit 1598545 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1598545 ] LUCENE-5708 : fix these tests to also 'mimic' previous IWC.clone

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development