Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.9, 6.0
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We originally added this clone to allow a single IWC to be re-used against more than one IndexWriter, but I think this is a mis-feature: it adds complexity to hairy classes (merge policy/scheduler, DW thread pool, etc.), I think it's buggy today.

      I think we should just disallow sharing: you must make a new IWC for a new IndexWriter.

      1. LUCENE-5708.patch
        0.7 kB
        Michael McCandless
      2. LUCENE-5708.patch
        33 kB
        Michael McCandless
      3. LUCENE-5708.patch
        36 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1598545 from Michael McCandless in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1598545 ]

          LUCENE-5708: fix these tests to also 'mimic' previous IWC.clone

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1598545 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1598545 ] LUCENE-5708 : fix these tests to also 'mimic' previous IWC.clone
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1598543 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1598543 ]

          LUCENE-5708: fix these tests to also 'mimic' previous IWC.clone

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1598543 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1598543 ] LUCENE-5708 : fix these tests to also 'mimic' previous IWC.clone
          Hide
          mikemccand Michael McCandless added a comment -

          I agree that the distinction of "randomly test this PF via the generic
          base test class" and the "specifically test tricky corner cases for
          this particular PF" (e.g. TestBlockPostingsFormat/2/3.java) is
          important, and the specific IWC settings for those tests are
          necessary.

          I reviewed all the test changes more closely, and found a couple other
          places that needed to carry over explicit IWC changes after pulling a
          random IWC (I'll commit shortly).

          I think this is net/net good vs the clone we had before: it means we
          are still randomly changing the things the test didn't care about, and
          fixing the settings that it does.

          Show
          mikemccand Michael McCandless added a comment - I agree that the distinction of "randomly test this PF via the generic base test class" and the "specifically test tricky corner cases for this particular PF" (e.g. TestBlockPostingsFormat/2/3.java) is important, and the specific IWC settings for those tests are necessary. I reviewed all the test changes more closely, and found a couple other places that needed to carry over explicit IWC changes after pulling a random IWC (I'll commit shortly). I think this is net/net good vs the clone we had before: it means we are still randomly changing the things the test didn't care about, and fixing the settings that it does.
          Hide
          rcmuir Robert Muir added a comment -

          All these "PostingsFormat" tests wire their codec to themselves. This allows us to implement generic base classes with tests that all codecs should pass, as well as codec-specific tests (e.g. in lucene41) that test particular corner cases of importance.

          this worked well in knocking out bugs for postings, so the whole scheme was duplicated to docvalues, storedfields, vectors, everything.

          Now here comes this commit, and these tests (which are important to ensure the index format is working) are no longer testing what they are supposed to. For example the tests in Lucene41 package explicitly test special cases of that codec that would otherwise be extraordinarily rare in the existing random tests. If they are executing against random codecs or even random configurations then they just became useless.

          So thats why I'm concerned: I see this commit causing these failures, and I know we just experienced a significant loss of test coverage to the index format. We are relying upon tests to fail to detect this, but unfortunately 'loss of test coverage' doesn't always trigger a jenkins build.

          So maybe instead of playing whack-a-mole with jenkins tests failures, we should pay more attention reviewing all changes to unit tests where clone() was previously used. The patch is buggy here, and I just want to ensure its taken seriously so that we do not lose coverage

          Show
          rcmuir Robert Muir added a comment - All these "PostingsFormat" tests wire their codec to themselves. This allows us to implement generic base classes with tests that all codecs should pass, as well as codec-specific tests (e.g. in lucene41) that test particular corner cases of importance. this worked well in knocking out bugs for postings, so the whole scheme was duplicated to docvalues, storedfields, vectors, everything. Now here comes this commit, and these tests (which are important to ensure the index format is working) are no longer testing what they are supposed to. For example the tests in Lucene41 package explicitly test special cases of that codec that would otherwise be extraordinarily rare in the existing random tests. If they are executing against random codecs or even random configurations then they just became useless. So thats why I'm concerned: I see this commit causing these failures, and I know we just experienced a significant loss of test coverage to the index format. We are relying upon tests to fail to detect this, but unfortunately 'loss of test coverage' doesn't always trigger a jenkins build. So maybe instead of playing whack-a-mole with jenkins tests failures, we should pay more attention reviewing all changes to unit tests where clone() was previously used. The patch is buggy here, and I just want to ensure its taken seriously so that we do not lose coverage
          Hide
          mikemccand Michael McCandless added a comment -

          Tests arent testing what they should, we need to fix the disease or back this change out.

          I'm happy to back this change out if you want.

          But, I'm confused: are you talking about how these tests hardwire the PF to Lucene41? I agree we should fix that, but this is a pre-existing issue.

          Show
          mikemccand Michael McCandless added a comment - Tests arent testing what they should, we need to fix the disease or back this change out. I'm happy to back this change out if you want. But, I'm confused: are you talking about how these tests hardwire the PF to Lucene41? I agree we should fix that, but this is a pre-existing issue.
          Hide
          rcmuir Robert Muir added a comment -

          Can we remove the "fix test bugs" commits and fix the true underlying issue?

          Tests arent testing what they should, we need to fix the disease or back this change out.

          Show
          rcmuir Robert Muir added a comment - Can we remove the "fix test bugs" commits and fix the true underlying issue? Tests arent testing what they should, we need to fix the disease or back this change out.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1598503 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1598503 ]

          LUCENE-5708: fix test bug

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1598503 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1598503 ] LUCENE-5708 : fix test bug
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1598502 from Michael McCandless in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1598502 ]

          LUCENE-5708: fix test bug

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1598502 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1598502 ] LUCENE-5708 : fix test bug
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1598497 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1598497 ]

          LUCENE-5708: fix test bug

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1598497 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1598497 ] LUCENE-5708 : fix test bug
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1598496 from Michael McCandless in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1598496 ]

          LUCENE-5708: fix test bug

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1598496 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1598496 ] LUCENE-5708 : fix test bug
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1598492 from Michael McCandless in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1598492 ]

          LUCENE-5708: remove IWC.clone

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1598492 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1598492 ] LUCENE-5708 : remove IWC.clone
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1598489 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1598489 ]

          LUCENE-5708: remove IWC.clone

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1598489 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1598489 ] LUCENE-5708 : remove IWC.clone
          Hide
          mikemccand Michael McCandless added a comment -

          New patch, I think it's ready; I'll commit soon...

          It looks to me that we should be able to make some fields final now that we don't have a clone method anymore

          +1, I fixed a few of these, and found a couple more "implements Cloneable" to remove.

          (eg. MergePolicy.writer)

          Looks like we'll remove IW as a field in MP/MS with LUCENE-5711.

          Show
          mikemccand Michael McCandless added a comment - New patch, I think it's ready; I'll commit soon... It looks to me that we should be able to make some fields final now that we don't have a clone method anymore +1, I fixed a few of these, and found a couple more "implements Cloneable" to remove. (eg. MergePolicy.writer) Looks like we'll remove IW as a field in MP/MS with LUCENE-5711 .
          Hide
          simonw Simon Willnauer added a comment -

          part of the problem is that we holding IW state on MergePolicy and MergeScheduler. Both classes should get the IW passed to the relevant methods so we can share them across as many instances we want...

          Show
          simonw Simon Willnauer added a comment - part of the problem is that we holding IW state on MergePolicy and MergeScheduler. Both classes should get the IW passed to the relevant methods so we can share them across as many instances we want...
          Hide
          shaie Shai Erera added a comment - - edited

          Hmm which tests rely on using the same IWC?

          Hmm ... I don't remember. All I remember is that while I worked on preventing sharing IWC between writers (LUCENE-4876), there were a bunch of tests that reused the IWC. I fixed them by simply cloning it, but I admit I didn't check if initializing a new IWC each time serves their purpose. I just assume that if so many tests did that, there ought to be a reason beyond just convenience, but I could be wrong.

          What I'm worried is that by not cloning Jenkins will trip (which is good!), or worse - that those tests will stop asserting what they asserted before. So I just wanted to point that out. If we're ready to take the risk, I'm fine with it, because eventually we're discussing tests here .. there's nothing functionally impossible from an app's perspective.

          Show
          shaie Shai Erera added a comment - - edited Hmm which tests rely on using the same IWC? Hmm ... I don't remember. All I remember is that while I worked on preventing sharing IWC between writers ( LUCENE-4876 ), there were a bunch of tests that reused the IWC. I fixed them by simply cloning it, but I admit I didn't check if initializing a new IWC each time serves their purpose. I just assume that if so many tests did that, there ought to be a reason beyond just convenience, but I could be wrong. What I'm worried is that by not cloning Jenkins will trip (which is good!), or worse - that those tests will stop asserting what they asserted before. So I just wanted to point that out. If we're ready to take the risk, I'm fine with it, because eventually we're discussing tests here .. there's nothing functionally impossible from an app's perspective.
          Hide
          jpountz Adrien Grand added a comment -

          I don't know about these tests that expect the same config but I'm +1 in general to remove all that cloning. It looks to me that we should be able to make some fields final now that we don't have a clone method anymore (eg. MergePolicy.writer)?

          Show
          jpountz Adrien Grand added a comment - I don't know about these tests that expect the same config but I'm +1 in general to remove all that cloning. It looks to me that we should be able to make some fields final now that we don't have a clone method anymore (eg. MergePolicy.writer)?
          Hide
          mikemccand Michael McCandless added a comment -

          Hmm which tests rely on using the same IWC? I thought I was improving the tests by switching up the config...

          Show
          mikemccand Michael McCandless added a comment - Hmm which tests rely on using the same IWC? I thought I was improving the tests by switching up the config...
          Hide
          shaie Shai Erera added a comment -

          I think the way you fixed some tests that used clone is incorrect. You should at least call newIndexWriterConfig(random) w/ the same random and seed, so the exact IWC is created each time. At least, that's what these tests now rely on, even if they don't break. Otherwise, they just create a random IWC each time they open a writer, which is not the intention I believe.

          Show
          shaie Shai Erera added a comment - I think the way you fixed some tests that used clone is incorrect. You should at least call newIndexWriterConfig(random) w/ the same random and seed, so the exact IWC is created each time. At least, that's what these tests now rely on, even if they don't break. Otherwise, they just create a random IWC each time they open a writer, which is not the intention I believe.
          Hide
          mikemccand Michael McCandless added a comment -

          Woops, wrong patch ... this one should work.

          Show
          mikemccand Michael McCandless added a comment - Woops, wrong patch ... this one should work.
          Hide
          mikemccand Michael McCandless added a comment -

          Initial patch, tests seem to pass. IWC already detects if it's illegally re-used across more than one IW.

          Show
          mikemccand Michael McCandless added a comment - Initial patch, tests seem to pass. IWC already detects if it's illegally re-used across more than one IW.

            People

            • Assignee:
              mikemccand Michael McCandless
              Reporter:
              mikemccand Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development