Solr
  1. Solr
  2. SOLR-1275

Add expungeDeletes to DirectUpdateHandler2

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.4
    • Component/s: update
    • Labels:
      None

      Description

      expungeDeletes is a useful method somewhat like optimize is offered by IndexWriter that can be implemented in DirectUpdateHandler2.

      1. SOLR-1275.patch
        20 kB
        Jason Rutherglen
      2. SOLR-1275.patch
        5 kB
        Yonik Seeley
      3. SOLR-1275.patch
        5 kB
        Yonik Seeley
      4. SOLR-1275.patch
        19 kB
        Jason Rutherglen
      5. SOLR-1275.patch
        10 kB
        Jason Rutherglen
      6. SOLR-1275.patch
        5 kB
        Noble Paul

        Activity

        Hide
        Noble Paul added a comment -

        So , this should be an extra option in the CommitUpdateComand ?

        Show
        Noble Paul added a comment - So , this should be an extra option in the CommitUpdateComand ?
        Hide
        Noble Paul added a comment -

        Untested patch. for review

        Show
        Noble Paul added a comment - Untested patch. for review
        Hide
        Otis Gospodnetic added a comment -

        Patch looks good to me (also not tested it)

        Show
        Otis Gospodnetic added a comment - Patch looks good to me (also not tested it)
        Hide
        Jason Rutherglen added a comment -

        I can add some tests? (would like to get familiar with Solr's test framework)

        Show
        Jason Rutherglen added a comment - I can add some tests? (would like to get familiar with Solr's test framework)
        Hide
        Noble Paul added a comment - - edited

        jason, is there a way to test if expungeDeletes is indeed called?

        Show
        Noble Paul added a comment - - edited jason, is there a way to test if expungeDeletes is indeed called?
        Hide
        Jason Rutherglen added a comment -

        Sure, I'll have time at some point to post it, we can use the one from Lucene. Basically create deletes in multiple segments, then call expungeDeletes, check the number of segments left.

        Show
        Jason Rutherglen added a comment - Sure, I'll have time at some point to post it, we can use the one from Lucene. Basically create deletes in multiple segments, then call expungeDeletes, check the number of segments left.
        Hide
        Yonik Seeley added a comment -

        Pushing this off to 1.5 given that it's very expert use, and it's not even clear what the use cases are.

        Show
        Yonik Seeley added a comment - Pushing this off to 1.5 given that it's very expert use, and it's not even clear what the use cases are.
        Hide
        Jason Rutherglen added a comment -

        Given the simplicity of the functionality I don't see a reason to put this off. expungeDeletes has been in INdexWriter since 2.4?

        Show
        Jason Rutherglen added a comment - Given the simplicity of the functionality I don't see a reason to put this off. expungeDeletes has been in INdexWriter since 2.4?
        Hide
        Yonik Seeley added a comment -

        Apologies - my brain went the wrong way when I saw "expungeDeletes" and I temporarily thought it meant throwing away the deletedDocs BitVector (i.e. undeleting all previously deleted documents). So there are definitely use cases for this. Seems simple enough, I'll move it back to 1.4

        Show
        Yonik Seeley added a comment - Apologies - my brain went the wrong way when I saw "expungeDeletes" and I temporarily thought it meant throwing away the deletedDocs BitVector (i.e. undeleting all previously deleted documents). So there are definitely use cases for this. Seems simple enough, I'll move it back to 1.4
        Hide
        Jason Rutherglen added a comment -
        • Added DirectUpdateHandlerTest.testExpungeDeletes which creates
          a bunch of docs (i.e. several segments), terms known to be in
          specific segments are found and deleted. Expunge deletes is
          called, the segments are fewer and the segments with the deleted
          terms no longer exist.
        • Added expungeDeletes functionality to DirectUpdateHandler2
        • The patch randomly didn't create enough of the expected
          segments, this could be due to ConcurrentMergeScheduler. I
          couldn't reproduce consistently.
        Show
        Jason Rutherglen added a comment - Added DirectUpdateHandlerTest.testExpungeDeletes which creates a bunch of docs (i.e. several segments), terms known to be in specific segments are found and deleted. Expunge deletes is called, the segments are fewer and the segments with the deleted terms no longer exist. Added expungeDeletes functionality to DirectUpdateHandler2 The patch randomly didn't create enough of the expected segments, this could be due to ConcurrentMergeScheduler. I couldn't reproduce consistently.
        Hide
        Noble Paul added a comment -

        The patch looks fine to me . I plan to commit this shortly

        Show
        Noble Paul added a comment - The patch looks fine to me . I plan to commit this shortly
        Hide
        Noble Paul added a comment -

        committed: r805774
        thanks Jason

        Show
        Noble Paul added a comment - committed: r805774 thanks Jason
        Hide
        Noble Paul added a comment -

        the testcase fails

        Show
        Noble Paul added a comment - the testcase fails
        Hide
        Noble Paul added a comment -

        committed r805888

        The test passes now.

        Show
        Noble Paul added a comment - committed r805888 The test passes now.
        Hide
        Noble Paul added a comment -

        The tests are passing ,but for the time being I wish to keep this open . Let us review the change in DUH2#commit() once again and ensure that the fix is right

        Show
        Noble Paul added a comment - The tests are passing ,but for the time being I wish to keep this open . Let us review the change in DUH2#commit() once again and ensure that the fix is right
        Hide
        Jason Rutherglen added a comment -
        • Added solrconfig-serialms.xml which uses a
          SerialMergeScheduler (instead of ConcurrentMergeScheduler).
          Otherwise we get inconsistent numbers of segments at variable
          times (because the merging is happening asynchronously)
        • Commit is called every 500 docs to insure we get enough
          segments
        • Test passes
        Show
        Jason Rutherglen added a comment - Added solrconfig-serialms.xml which uses a SerialMergeScheduler (instead of ConcurrentMergeScheduler). Otherwise we get inconsistent numbers of segments at variable times (because the merging is happening asynchronously) Commit is called every 500 docs to insure we get enough segments Test passes
        Hide
        Yonik Seeley added a comment -

        The standard solrconfig.xml in the test directory has a LogDocMergePolicy and maxBufferedDocs=10 to generate more segments.
        In some other Solr tests, I add a few docs, then do a commit, add a few docs, then do another commit to get 2 segments for example. Shouldn't that work to get a fixed number of segments?

        Show
        Yonik Seeley added a comment - The standard solrconfig.xml in the test directory has a LogDocMergePolicy and maxBufferedDocs=10 to generate more segments. In some other Solr tests, I add a few docs, then do a commit, add a few docs, then do another commit to get 2 segments for example. Shouldn't that work to get a fixed number of segments?
        Hide
        Jason Rutherglen added a comment -

        Because maxBufferedDocs=10, and the previous patch was creating
        3000 docs, a lot of merging was happening in the background. SMS
        gives predictable merging for any number of documents which is
        more suitable for testing.

        Show
        Jason Rutherglen added a comment - Because maxBufferedDocs=10, and the previous patch was creating 3000 docs, a lot of merging was happening in the background. SMS gives predictable merging for any number of documents which is more suitable for testing.
        Hide
        Yonik Seeley added a comment -

        How about this much simpler test that's faster to execute, and should be very easy to fix if/when behavior changes in the future (which it certainly could with NRT stuff). It also tests more of the complete loop too by going through the XML update command.

        Show
        Yonik Seeley added a comment - How about this much simpler test that's faster to execute, and should be very easy to fix if/when behavior changes in the future (which it certainly could with NRT stuff). It also tests more of the complete loop too by going through the XML update command.
        Hide
        Yonik Seeley added a comment -

        Hmmm, it's harder to see in diff format... the test just boils down to this:

          public void testExpungeDeletes() throws Exception {
            assertU(adoc("id","1"));
            assertU(adoc("id","2"));
            assertU(commit());
        
            assertU(adoc("id","3"));
            assertU(adoc("id","2"));
            assertU(adoc("id","4"));
            assertU(commit());
        
            SolrQueryRequest sr = req("q","foo");
            SolrIndexReader r = sr.getSearcher().getReader();
            assertTrue(r.maxDoc() > r.numDocs());   // should have deletions
            assertTrue(r.getLeafReaders().length > 1);  // more than 1 segment
            sr.close();
        
            assertU(commit("expungeDeletes","true"));
        
            sr = req("q","foo");
            r = sr.getSearcher().getReader();
            assertEquals(r.maxDoc(), r.numDocs());  // no deletions
            assertTrue(r.getLeafReaders().length > 1);  // still more than 1 segment
            sr.close();
          }
        
        Show
        Yonik Seeley added a comment - Hmmm, it's harder to see in diff format... the test just boils down to this: public void testExpungeDeletes() throws Exception { assertU(adoc( "id" , "1" )); assertU(adoc( "id" , "2" )); assertU(commit()); assertU(adoc( "id" , "3" )); assertU(adoc( "id" , "2" )); assertU(adoc( "id" , "4" )); assertU(commit()); SolrQueryRequest sr = req( "q" , "foo" ); SolrIndexReader r = sr.getSearcher().getReader(); assertTrue(r.maxDoc() > r.numDocs()); // should have deletions assertTrue(r.getLeafReaders().length > 1); // more than 1 segment sr.close(); assertU(commit( "expungeDeletes" , " true " )); sr = req( "q" , "foo" ); r = sr.getSearcher().getReader(); assertEquals(r.maxDoc(), r.numDocs()); // no deletions assertTrue(r.getLeafReaders().length > 1); // still more than 1 segment sr.close(); }
        Hide
        Jason Rutherglen added a comment -

        I like using XML, I was wondering why in
        TermVectorComponentTest.testDistributed we're manually creating
        the distributed part? I'm currently picking art
        TestDistributedSearch to test out a distributed component (a bit
        of work).

        expungeDeletes is supposed to remove segments that contains
        deletes. The patch posted does this by checking the segment
        names. This will work fine with NRT.

        Show
        Jason Rutherglen added a comment - I like using XML, I was wondering why in TermVectorComponentTest.testDistributed we're manually creating the distributed part? I'm currently picking art TestDistributedSearch to test out a distributed component (a bit of work). expungeDeletes is supposed to remove segments that contains deletes. The patch posted does this by checking the segment names. This will work fine with NRT.
        Hide
        Yonik Seeley added a comment -

        Isn't it much simpler to just check that the segments have no deletes after expungeDeletes is called?
        Is there something my proposed patch doesn't test that you think it should?

        Show
        Yonik Seeley added a comment - Isn't it much simpler to just check that the segments have no deletes after expungeDeletes is called? Is there something my proposed patch doesn't test that you think it should?
        Hide
        Jason Rutherglen added a comment -

        > Isn't it much simpler

        Calling SR.undelete would remove the deletes and the test would
        pass?

        Show
        Jason Rutherglen added a comment - > Isn't it much simpler Calling SR.undelete would remove the deletes and the test would pass?
        Hide
        Jason Rutherglen added a comment -

        I'll check in a new patch that's faster (i.e. indexes fewer docs).

        Show
        Jason Rutherglen added a comment - I'll check in a new patch that's faster (i.e. indexes fewer docs).
        Hide
        Yonik Seeley added a comment -

        Calling SR.undelete would remove the deletes and the test would pass?

        Simple to fix... check against the exact number of documents instead of checking that there are no deletes.

        Show
        Yonik Seeley added a comment - Calling SR.undelete would remove the deletes and the test would pass? Simple to fix... check against the exact number of documents instead of checking that there are no deletes.
        Hide
        Jason Rutherglen added a comment -
        • Reduced the num docs indexed to make the test faster
        • expungeDeletes merges away segments with deletes, if we don't
          test to insure the segments with deletes are in fact gone, we're
          unfortunately not testing the functionality of the method
        • Incorporated Yonik's additions
        • commit("expungeDeletes","true"); // doesn't work
        Show
        Jason Rutherglen added a comment - Reduced the num docs indexed to make the test faster expungeDeletes merges away segments with deletes, if we don't test to insure the segments with deletes are in fact gone, we're unfortunately not testing the functionality of the method Incorporated Yonik's additions commit("expungeDeletes","true"); // doesn't work
        Hide
        Yonik Seeley added a comment -

        if we don't test to insure the segments with deletes are in fact gone, we're unfortunately not testing the functionality of the method.

        Do we need to? We should be testing that the method is called and look for evidence that it has done it's work. All of the corner cases of testing if it did it's work correctly would seem to belong in Lucene unit tests?

        Show
        Yonik Seeley added a comment - if we don't test to insure the segments with deletes are in fact gone, we're unfortunately not testing the functionality of the method. Do we need to? We should be testing that the method is called and look for evidence that it has done it's work. All of the corner cases of testing if it did it's work correctly would seem to belong in Lucene unit tests?
        Hide
        Noble Paul added a comment -

        Do we need to? We should be testing that the method is called and look for evidence that it has done it's work.

        I guess , we should only test if the message indeed got called . Whether the call really does the work need not be tested

        Show
        Noble Paul added a comment - Do we need to? We should be testing that the method is called and look for evidence that it has done it's work. I guess , we should only test if the message indeed got called . Whether the call really does the work need not be tested
        Hide
        Grant Ingersoll added a comment -

        What's the status on this one?

        Show
        Grant Ingersoll added a comment - What's the status on this one?
        Hide
        Noble Paul added a comment -

        here is some confusion on whether the testcase is right/good enough. Yonik may have a final word on this

        Show
        Noble Paul added a comment - here is some confusion on whether the testcase is right/good enough. Yonik may have a final word on this
        Hide
        Yonik Seeley added a comment -

        Yonik may have a final word on this

        Hey, there aren't any benevolent dictators around here Arguments should stand on their own.

        I was just confused why the test was so complicated, and didn't look forward to maintaining it in the face of future changes. But I guess we can always cross that bridge when we come to it.

        Are the current tests failing for anyone?

        Show
        Yonik Seeley added a comment - Yonik may have a final word on this Hey, there aren't any benevolent dictators around here Arguments should stand on their own. I was just confused why the test was so complicated, and didn't look forward to maintaining it in the face of future changes. But I guess we can always cross that bridge when we come to it. Are the current tests failing for anyone?
        Hide
        Mark Miller added a comment -

        Hey, there aren't any benevolent dictators around here

        I always suspected we were being guided by selfish dictators

        Show
        Mark Miller added a comment - Hey, there aren't any benevolent dictators around here I always suspected we were being guided by selfish dictators
        Hide
        Noble Paul added a comment -

        the tests never failed in my box or in the nightly builds.
        Jason , did you observe it failing?

        Show
        Noble Paul added a comment - the tests never failed in my box or in the nightly builds. Jason , did you observe it failing?
        Hide
        Grant Ingersoll added a comment -

        Tests pass for me, Noble, can this be resolved now?

        Show
        Grant Ingersoll added a comment - Tests pass for me, Noble, can this be resolved now?
        Hide
        Noble Paul added a comment -

        marking this as resolved. let us open another issue if the testcase needs to be changed

        Show
        Noble Paul added a comment - marking this as resolved. let us open another issue if the testcase needs to be changed
        Hide
        Jason Rutherglen added a comment -

        The DirectUpdateHandlerTest fails as solrconfig-serialms.xml didn't make it into trunk. Which patch was committed?

        Show
        Jason Rutherglen added a comment - The DirectUpdateHandlerTest fails as solrconfig-serialms.xml didn't make it into trunk. Which patch was committed?
        Hide
        Noble Paul added a comment -

        solrconfig-serialms.xml is not in the trunk. which testcase and patch has this?

        Show
        Noble Paul added a comment - solrconfig-serialms.xml is not in the trunk. which testcase and patch has this?
        Hide
        Jason Rutherglen added a comment -

        The latest patch dated 2009-08-24 03:44 PM.

        Show
        Jason Rutherglen added a comment - The latest patch dated 2009-08-24 03:44 PM.
        Hide
        Grant Ingersoll added a comment -

        Bulk close for Solr 1.4

        Show
        Grant Ingersoll added a comment - Bulk close for Solr 1.4

          People

          • Assignee:
            Unassigned
            Reporter:
            Jason Rutherglen
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 48h
              48h
              Remaining:
              Remaining Estimate - 48h
              48h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development