Lucene - Core
  1. Lucene - Core
  2. LUCENE-1708

Improve the use of isDeleted in the indexing code

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      A spin off from here: http://www.nabble.com/Some-thoughts-around-the-use-of-reader.isDeleted-and-hasDeletions-td23931216.html.
      Two changes:

      1. Optimize SegmentMerger work when a reader has no deletions.
      2. IndexReader.document() will no longer check if the document is deleted.

      Will post a patch shortly

      1. LUCENE-1708.patch
        29 kB
        Michael McCandless
      2. LUCENE-1708.patch
        29 kB
        Shai Erera

        Activity

        Hide
        Shai Erera added a comment -

        Patch includes changes to SegmentMerger, IndexReader, SegmentReader, CHANGES, and TestSegmentReader (on tags as well).

        I think this is ready to commit (all tests pass). Before commit, we should complete the "Shai Erera via" in CHANGES.

        Show
        Shai Erera added a comment - Patch includes changes to SegmentMerger, IndexReader, SegmentReader, CHANGES, and TestSegmentReader (on tags as well). I think this is ready to commit (all tests pass). Before commit, we should complete the "Shai Erera via" in CHANGES.
        Hide
        Michael McCandless added a comment -

        I see lots of cleanups snuck in here

        Show
        Michael McCandless added a comment - I see lots of cleanups snuck in here
        Hide
        Michael McCandless added a comment -

        Attached another iteration on the patch, basically cosmetic changes (removing un-needed args, naming things more verbosely, etc.). I think it's ready to be committed. I'll wait a day or two.

        Show
        Michael McCandless added a comment - Attached another iteration on the patch, basically cosmetic changes (removing un-needed args, naming things more verbosely, etc.). I think it's ready to be committed. I'll wait a day or two.
        Hide
        Shai Erera added a comment -

        I see lots of cleanups snuck in here

        Not all cleanups are bad, i.e. changing to iteration-based loops, rather than calling list.size() in each loop and list.get, is better, and cleaner . I admit though that some of the cleanups just helped me understand the code better.

        Show
        Shai Erera added a comment - I see lots of cleanups snuck in here Not all cleanups are bad, i.e. changing to iteration-based loops, rather than calling list.size() in each loop and list.get , is better, and cleaner . I admit though that some of the cleanups just helped me understand the code better.
        Hide
        Michael McCandless added a comment -

        Not all cleanups are bad,

        Oh, they are all good cleanups! I just wish they were in a dedicated "cleanup" issue, somehow, instead... because then it's harder for me to focus on the "real" changes (though I realize "clean as you go" is mighty convenient).

        Show
        Michael McCandless added a comment - Not all cleanups are bad, Oh, they are all good cleanups! I just wish they were in a dedicated "cleanup" issue, somehow, instead... because then it's harder for me to focus on the "real" changes (though I realize "clean as you go" is mighty convenient).
        Hide
        Michael McCandless added a comment -

        Thanks Shai!

        Show
        Michael McCandless added a comment - Thanks Shai!
        Hide
        Mark Miller added a comment -

        I cant find the back compat discussion on this. Did I miss?

        I do see where mike mentions the isDeleted change should happen with deprecation, but this patch appears to just make the change. Our current back compat policy doesn't allow that. It doesn't even really allow the really special exceptions that we have had to make (and I think it should spell that out, as well as our 'experiemental trick'). We have not officially allowed a relaxed back compat policy right?

        I'm not trying to police back compat, but I think we need to do our best to live up to it until its officially changed. Having had to make a couple exceptions doesn't mean we can just toss it for this release.

        Sorry if I missed the relevant discussion on this - didn't see anything in the attached email thread.

        Show
        Mark Miller added a comment - I cant find the back compat discussion on this. Did I miss? I do see where mike mentions the isDeleted change should happen with deprecation, but this patch appears to just make the change. Our current back compat policy doesn't allow that. It doesn't even really allow the really special exceptions that we have had to make (and I think it should spell that out, as well as our 'experiemental trick'). We have not officially allowed a relaxed back compat policy right? I'm not trying to police back compat, but I think we need to do our best to live up to it until its officially changed. Having had to make a couple exceptions doesn't mean we can just toss it for this release. Sorry if I missed the relevant discussion on this - didn't see anything in the attached email thread.
        Hide
        Mark Miller added a comment -

        I suppose, to be fair, we do mention that we might change runtime behaviour and document it - its just that we don't usually say, code around it.

        I guess its simple enough here thats its not really a big deal. I was just surprised I saw no mention of back compat in the discussion other than Mike mentioning that the change should be made through deprecation early on in the attached email thread.

        Show
        Mark Miller added a comment - I suppose, to be fair, we do mention that we might change runtime behaviour and document it - its just that we don't usually say, code around it. I guess its simple enough here thats its not really a big deal. I was just surprised I saw no mention of back compat in the discussion other than Mike mentioning that the change should be made through deprecation early on in the attached email thread.
        Hide
        Shai Erera added a comment -

        There is a paragraph in CHANGES under "Changes to Runtime Behavior" that explains this. I think it was on the email thread and not on this issue, that people preferred the runtime change vs. the deprecation and a new method name for document(), under the assumption that it's very unlikely that someone relies on IndexReader.document() checking for isDeleted (i.e., it passes a document which may or may not be deleted).

        Show
        Shai Erera added a comment - There is a paragraph in CHANGES under "Changes to Runtime Behavior" that explains this. I think it was on the email thread and not on this issue, that people preferred the runtime change vs. the deprecation and a new method name for document(), under the assumption that it's very unlikely that someone relies on IndexReader.document() checking for isDeleted (i.e., it passes a document which may or may not be deleted).
        Hide
        Mark Miller added a comment - - edited

        There is a paragraph in CHANGES under "Changes to Runtime Behavior" that explains this.

        Right, I saw that - I just wondered about the discussion to do it.

        I think it was on the email thread and not on this issue, that people preferred the runtime change vs. the deprecation and a new method name for document(), under the assumption that it's very unlikely that someone relies on IndexReader.document() checking for isDeleted (i.e., it passes a document which may or may not be deleted).

        Thanks - thats the discussion I wasn't able to spot.

        I see - it was a quick back and forth at the end. Got lazy towards the end Thanks for pointing out.

        Show
        Mark Miller added a comment - - edited There is a paragraph in CHANGES under "Changes to Runtime Behavior" that explains this. Right, I saw that - I just wondered about the discussion to do it. I think it was on the email thread and not on this issue, that people preferred the runtime change vs. the deprecation and a new method name for document(), under the assumption that it's very unlikely that someone relies on IndexReader.document() checking for isDeleted (i.e., it passes a document which may or may not be deleted). Thanks - thats the discussion I wasn't able to spot. I see - it was a quick back and forth at the end. Got lazy towards the end Thanks for pointing out.
        Hide
        Shai Erera added a comment -

        it was a quick back and forth at the end. Got lazy towards the end

        Sometimes it's a valid way to pass a decision .

        Show
        Shai Erera added a comment - it was a quick back and forth at the end. Got lazy towards the end Sometimes it's a valid way to pass a decision .

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development