Lucene - Core
  1. Lucene - Core
  2. LUCENE-2802

DirectoryReader ignores NRT SegmentInfos in #isOptimized()

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.1, 4.0-ALPHA
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      DirectoryReader only takes shared (with IW) SegmentInfos into account in DirectoryReader#isOptimized(). This can return true even if the actual realtime reader sees more than one segments.

      public boolean isOptimized() {
          ensureOpen();
         // if segmentsInfos changes in IW this can return false positive
          return segmentInfos.size() == 1 && !hasDeletions();
        }
      

      DirectoryReader should check if this reader has a non-nul segmentInfosStart and use that instead

      1. LUCENE-2802.patch
        4 kB
        Simon Willnauer
      2. LUCENE-2802.patch
        5 kB
        Simon Willnauer
      3. LUCENE-2802.patch
        0.6 kB
        Simon Willnauer

        Issue Links

          Activity

          Hide
          Simon Willnauer added a comment -

          here is a patch

          Show
          Simon Willnauer added a comment - here is a patch
          Hide
          Michael McCandless added a comment -

          Nice catch Simon! This is also a thread safety issue since IR should not touch the writer's segmentInfos outside of sync(IW).

          Show
          Michael McCandless added a comment - Nice catch Simon! This is also a thread safety issue since IR should not touch the writer's segmentInfos outside of sync(IW).
          Hide
          Simon Willnauer added a comment -

          Nice catch Simon! This is also a thread safety issue since IR should not touch the writer's segmentInfos outside of sync(IW).

          it seem like there is more about all that in DR - we should really only use the uncloned SegmentInfos if we are not in NRT mode #getVersion uses it too which is wrong.
          I actually rely on the isOptimized in several tests and run into a NPE due to that though so we should really fix DR to use a private SegmentInfos or restrict the uncloned one for the isCurrent comparison

          Show
          Simon Willnauer added a comment - Nice catch Simon! This is also a thread safety issue since IR should not touch the writer's segmentInfos outside of sync(IW). it seem like there is more about all that in DR - we should really only use the uncloned SegmentInfos if we are not in NRT mode #getVersion uses it too which is wrong. I actually rely on the isOptimized in several tests and run into a NPE due to that though so we should really fix DR to use a private SegmentInfos or restrict the uncloned one for the isCurrent comparison
          Hide
          Earwin Burrfoot added a comment -

          Heh, I've mentioned this earlier in LUCENE-2355.

          Show
          Earwin Burrfoot added a comment - Heh, I've mentioned this earlier in LUCENE-2355 .
          Hide
          Simon Willnauer added a comment -

          I changed DirectoryReader to use the cloned version of SegmentInfos instead of two of them inconsistently. The only failure I get is on TestIndexWriterReader line 105

              r1.close();
              writer.close();
              assertTrue(r2.isCurrent());
          

          where the writer is closed and afterwards it checks if the r2 reader still is the current one which failes since the writer.close() method changes the version of the SegmentInfos. In my opinion this is actually the semantics I would expect from #isCurrent(), the question is if we would want to change the semantics to return false from #isCurrent if the writer we used to obtain the reader from is closed.

          I think we should consider it for consistency and simplicity though.

          Show
          Simon Willnauer added a comment - I changed DirectoryReader to use the cloned version of SegmentInfos instead of two of them inconsistently. The only failure I get is on TestIndexWriterReader line 105 r1.close(); writer.close(); assertTrue(r2.isCurrent()); where the writer is closed and afterwards it checks if the r2 reader still is the current one which failes since the writer.close() method changes the version of the SegmentInfos. In my opinion this is actually the semantics I would expect from #isCurrent(), the question is if we would want to change the semantics to return false from #isCurrent if the writer we used to obtain the reader from is closed. I think we should consider it for consistency and simplicity though.
          Hide
          Simon Willnauer added a comment -

          here is a patch that removes the mutable state from DirectoryReader in the NRT case. The actual reason IMO why this has been introduced was that the RT reader returns true from #isCurrent() if the wirter was closed which is actually wrong since closing a writer changes the index and the reader should see that change.

          I also added a testcase for is current to check the semantics

          Show
          Simon Willnauer added a comment - here is a patch that removes the mutable state from DirectoryReader in the NRT case. The actual reason IMO why this has been introduced was that the RT reader returns true from #isCurrent() if the wirter was closed which is actually wrong since closing a writer changes the index and the reader should see that change. I also added a testcase for is current to check the semantics
          Hide
          Simon Willnauer added a comment -

          we need to backport to 3.x too

          Show
          Simon Willnauer added a comment - we need to backport to 3.x too
          Hide
          Earwin Burrfoot added a comment -

          Patch looks cool.

          Show
          Earwin Burrfoot added a comment - Patch looks cool.
          Hide
          Simon Willnauer added a comment -

          here is a final patch - i will commit later!

          Show
          Simon Willnauer added a comment - here is a final patch - i will commit later!
          Hide
          Simon Willnauer added a comment -

          Committed revision 1043277.

          I keep it open for backporting

          Show
          Simon Willnauer added a comment - Committed revision 1043277. I keep it open for backporting
          Hide
          Simon Willnauer added a comment -

          Ported back to 3x in revision 1043418.

          Show
          Simon Willnauer added a comment - Ported back to 3x in revision 1043418.
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1

            People

            • Assignee:
              Simon Willnauer
              Reporter:
              Simon Willnauer
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development