Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.3
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      Patch Available

      Description

      This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

      1. varient-no-isCloneSupported.BROKEN.patch
        53 kB
        Hoss Man
      2. MySegmentReader.java
        0.7 kB
        Otis Gospodnetic
      3. MyMultiReader.java
        0.6 kB
        Otis Gospodnetic
      4. lucene-743-take9.patch
        79 kB
        Michael Busch
      5. lucene-743-take8.patch
        80 kB
        Michael Busch
      6. lucene-743-take7.patch
        76 kB
        Michael Busch
      7. lucene-743-take6.patch
        76 kB
        Michael Busch
      8. lucene-743-take5.patch
        62 kB
        Michael Busch
      9. lucene-743-take4.patch
        59 kB
        Michael Busch
      10. lucene-743-take3.patch
        54 kB
        Michael Busch
      11. lucene-743-take2.patch
        52 kB
        Michael Busch
      12. lucene-743-take10.patch
        80 kB
        Michael Busch
      13. lucene-743.patch
        21 kB
        Michael Busch
      14. lucene-743.patch
        24 kB
        Hoss Man
      15. lucene-743.patch
        28 kB
        Michael Busch
      16. IndexReaderUtils.java
        5 kB
        Otis Gospodnetic

        Issue Links

          Activity

          Hide
          Michael Busch added a comment -

          Committed! Phew!!!

          Show
          Michael Busch added a comment - Committed! Phew!!!
          Hide
          Michael Busch added a comment -

          Changes:

          • Updated to current trunk.
          • Removed println in SegmentReader.

          I'm going to commit this soon!

          Show
          Michael Busch added a comment - Changes: Updated to current trunk. Removed println in SegmentReader. I'm going to commit this soon!
          Hide
          Michael Busch added a comment -

          Thanks for the review, Mike! I'll remove the println.

          Ok, I think this patch has been reviewed a bunch of times and
          should be ready to commit now. I'll wait another day and commit
          it then if nobody objects.

          Show
          Michael Busch added a comment - Thanks for the review, Mike! I'll remove the println. Ok, I think this patch has been reviewed a bunch of times and should be ready to commit now. I'll wait another day and commit it then if nobody objects.
          Hide
          Michael McCandless added a comment -

          Patch looks good. Only thing I found was this leftover
          System.out.println, in SegmentReader.java:

            System.out.println("refCount " + getRefCount());
          
          Show
          Michael McCandless added a comment - Patch looks good. Only thing I found was this leftover System.out.println, in SegmentReader.java: System .out.println( "refCount " + getRefCount());
          Hide
          Michael Busch added a comment -

          Changes:

          • Close FieldsReader in SegmentReader#doClose() even if
            referencedReader!=null
          • Call clone.decRef() in the finally clause of
            SegmentReader#reopenSegment()
          • decRef referencedReader before closing other resources
            in SegmentReader#doClose()
          • Removed IndexReader#doCloseUnsharedResources().
          Show
          Michael Busch added a comment - Changes: Close FieldsReader in SegmentReader#doClose() even if referencedReader!=null Call clone.decRef() in the finally clause of SegmentReader#reopenSegment() decRef referencedReader before closing other resources in SegmentReader#doClose() Removed IndexReader#doCloseUnsharedResources().
          Hide
          Michael Busch added a comment -

          I think also if we do decide to do this we should open a new issue for it?

          +1

          I'll open a new issue.

          Show
          Michael Busch added a comment - I think also if we do decide to do this we should open a new issue for it? +1 I'll open a new issue.
          Hide
          Yonik Seeley added a comment -

          > Since our goal is it to make IndexReader read-only in the future
          > (LUCENE-1030), do you really think we need to add this?

          flush() would make reopen() useful in more cases, and LUCENE-1030 is further off (not Lucene 2.3, right?)
          Anyway, flush() would be considered a write operation like setNorm() & deleteDocument() and could be deprecated along with them in the future if that's how we decide to go.

          > I think also if we do decide to do this we should open a new issue for it?

          Yes, that's fine.

          Show
          Yonik Seeley added a comment - > Since our goal is it to make IndexReader read-only in the future > ( LUCENE-1030 ), do you really think we need to add this? flush() would make reopen() useful in more cases, and LUCENE-1030 is further off (not Lucene 2.3, right?) Anyway, flush() would be considered a write operation like setNorm() & deleteDocument() and could be deprecated along with them in the future if that's how we decide to go. > I think also if we do decide to do this we should open a new issue for it? Yes, that's fine.
          Hide
          Michael McCandless added a comment -

          So how about a public IndexReader.flush() method

          I think also if we do decide to do this we should open a new issue for it?

          Show
          Michael McCandless added a comment - So how about a public IndexReader.flush() method I think also if we do decide to do this we should open a new issue for it?
          Hide
          Michael McCandless added a comment -

          Hmm but actually we could change the order in close() so that
          referencedSegmentReader.decRefReaderNotNorms() is done first even
          if the following close() operations don't succeed?

          +1

          Show
          Michael McCandless added a comment - Hmm but actually we could change the order in close() so that referencedSegmentReader.decRefReaderNotNorms() is done first even if the following close() operations don't succeed? +1
          Hide
          Michael Busch added a comment -

          So how about a public IndexReader.flush() method

          Since our goal is it to make IndexReader read-only in the future
          (LUCENE-1030), do you really think we need to add this?

          Show
          Michael Busch added a comment - So how about a public IndexReader.flush() method Since our goal is it to make IndexReader read-only in the future ( LUCENE-1030 ), do you really think we need to add this?
          Hide
          Michael Busch added a comment -
          • You should also close fieldsReader when referencedSegmentReader !=
            null, right? (in SegmentReader.doClose)

          Yes, will do!

          • In the new try/finally in reopenSegment: if you first setup
            referencedSegmentReader, then can't that finally clause just be
            clone.decRef() instead of duplicating code for decRef'ing norms,
            closeNorms(), etc.?

          Hmm, what if then in clone.close() an exception is thrown from
          FieldsReader.close() or singleNormStream.close(). In that case it
          would not decRef the referenced reader.

          Hmm but actually we could change the order in close() so that
          referencedSegmentReader.decRefReaderNotNorms() is done first even
          if the following close() operations don't succeed?

          Show
          Michael Busch added a comment - You should also close fieldsReader when referencedSegmentReader != null, right? (in SegmentReader.doClose) Yes, will do! In the new try/finally in reopenSegment: if you first setup referencedSegmentReader, then can't that finally clause just be clone.decRef() instead of duplicating code for decRef'ing norms, closeNorms(), etc.? Hmm, what if then in clone.close() an exception is thrown from FieldsReader.close() or singleNormStream.close(). In that case it would not decRef the referenced reader. Hmm but actually we could change the order in close() so that referencedSegmentReader.decRefReaderNotNorms() is done first even if the following close() operations don't succeed?
          Hide
          Yonik Seeley added a comment -

          So how about a public IndexReader.flush() method so that one could also reopen readers that were used for changes?

          Usecase:

          reader.deleteDocument()
          reader.flush()
          writer = new IndexWriter()
          writer.addDocument()
          writer.close()
          reader.reopen()
          reader.deleteDocument()

          Show
          Yonik Seeley added a comment - So how about a public IndexReader.flush() method so that one could also reopen readers that were used for changes? Usecase: reader.deleteDocument() reader.flush() writer = new IndexWriter() writer.addDocument() writer.close() reader.reopen() reader.deleteDocument()
          Hide
          Michael McCandless added a comment -

          Awesome! Thanks so much for pointing me there, Mike! I was getting a
          little suicidal here already ...

          No problem, I lost some hairs tracking that one down too!!

          OK, latest patch looks good! I love the new threaded unit test.

          Only two smallish comments:

          • You should also close fieldsReader when referencedSegmentReader !=
            null, right? (in SegmentReader.doClose)
          • In the new try/finally in reopenSegment: if you first setup
            referencedSegmentReader, then can't that finally clause just be
            clone.decRef() instead of duplicating code for decRef'ing norms,
            closeNorms(), etc.?
          Show
          Michael McCandless added a comment - Awesome! Thanks so much for pointing me there, Mike! I was getting a little suicidal here already ... No problem, I lost some hairs tracking that one down too!! OK, latest patch looks good! I love the new threaded unit test. Only two smallish comments: You should also close fieldsReader when referencedSegmentReader != null, right? (in SegmentReader.doClose) In the new try/finally in reopenSegment: if you first setup referencedSegmentReader, then can't that finally clause just be clone.decRef() instead of duplicating code for decRef'ing norms, closeNorms(), etc.?
          Hide
          Michael Busch added a comment -

          OK, all tests pass now, including the thread-safety test.
          I ran it several times on different JVMs.

          Changes:

          • As Mike suggested I added a try ... finally block to
            SegmentReader#reopenSegment() which cleans up after an
            exception was hit.
          • Added some additional comments.
          • Minor improvements to TestIndexReaderReopen
          Show
          Michael Busch added a comment - OK, all tests pass now, including the thread-safety test. I ran it several times on different JVMs. Changes: As Mike suggested I added a try ... finally block to SegmentReader#reopenSegment() which cleans up after an exception was hit. Added some additional comments. Minor improvements to TestIndexReaderReopen
          Hide
          Michael Busch added a comment -

          > I think the cause of the intermittant failure in the test is a missing
          > try/finally in doReopen to properly close/decRef everything on
          > exception.

          Awesome! Thanks so much for pointing me there, Mike! I was getting a
          little suicidal here already ...

          I should have read the comment in SegmentReader#initialize more
          carefully:

              } finally {
          
                // With lock-less commits, it's entirely possible (and
                // fine) to hit a FileNotFound exception above.  In
                // this case, we want to explicitly close any subset
                // of things that were opened so that we don't have to
                // wait for a GC to do so.
                if (!success) {
                  doClose();
                }
              }
          

          While debugging, it's easy to miss such an exception, because
          SegmentInfos.FindSegmentsFile#run() ignores it. But it's good that it
          logs such an exception, I just have to remember to print out the
          infoStream next time.

          So it seems that this was indeed the cause for the failing test case.
          I made the change and so far the tests didn't fail anymore (ran it
          about 10 times so far). I'll run it another few times on a different
          JVM and submit an updated patch in a short while if it doesn't fail
          again.

          Show
          Michael Busch added a comment - > I think the cause of the intermittant failure in the test is a missing > try/finally in doReopen to properly close/decRef everything on > exception. Awesome! Thanks so much for pointing me there, Mike! I was getting a little suicidal here already ... I should have read the comment in SegmentReader#initialize more carefully: } finally { // With lock-less commits, it's entirely possible (and // fine) to hit a FileNotFound exception above. In // this case , we want to explicitly close any subset // of things that were opened so that we don't have to // wait for a GC to do so. if (!success) { doClose(); } } While debugging, it's easy to miss such an exception, because SegmentInfos.FindSegmentsFile#run() ignores it. But it's good that it logs such an exception, I just have to remember to print out the infoStream next time. So it seems that this was indeed the cause for the failing test case. I made the change and so far the tests didn't fail anymore (ran it about 10 times so far). I'll run it another few times on a different JVM and submit an updated patch in a short while if it doesn't fail again.
          Hide
          Michael McCandless added a comment -

          I think the cause of the intermittant failure in the test is a missing
          try/finally in doReopen to properly close/decRef everything on
          exception.

          Because of lockless commits, a commit could be in-process while you
          are re-opening, in which case you could hit an IOexception and you
          must therefore decRef those norms you had incRef'd (and, close eg the
          newly opened FieldsReader).

          Show
          Michael McCandless added a comment - I think the cause of the intermittant failure in the test is a missing try/finally in doReopen to properly close/decRef everything on exception. Because of lockless commits, a commit could be in-process while you are re-opening, in which case you could hit an IOexception and you must therefore decRef those norms you had incRef'd (and, close eg the newly opened FieldsReader).
          Hide
          Michael Busch added a comment -

          Changes:

          • Updated patch to current trunk (I just realized that the
            latest didn't apply cleanly anymore)
          • MultiSegmentReader now decRefs the subReaders correctly
            in case an exception is thrown during reopen()
          • Small changes in TestIndexReaderReopen.java

          The thread-safety test still sometimes fails. The weird
          thing is that the test verifies that the re-opened
          readers always return correct results. The only problem
          is that the refCount value is not always 0 at the end
          of the test. I'm starting to think that the testcase
          itself has a problem? Maybe someone else can take a look

          • it's probably something really obvious but I'm already
            starting to feel dizzy while pondering about
            thread-safety.
          Show
          Michael Busch added a comment - Changes: Updated patch to current trunk (I just realized that the latest didn't apply cleanly anymore) MultiSegmentReader now decRefs the subReaders correctly in case an exception is thrown during reopen() Small changes in TestIndexReaderReopen.java The thread-safety test still sometimes fails. The weird thing is that the test verifies that the re-opened readers always return correct results. The only problem is that the refCount value is not always 0 at the end of the test. I'm starting to think that the testcase itself has a problem? Maybe someone else can take a look it's probably something really obvious but I'm already starting to feel dizzy while pondering about thread-safety.
          Hide
          Michael Busch added a comment -

          Changes in this patch:

          • Fixed ParallelReader and MultiReader so that they don't incRef the subreaders anymore in case reopen() is a NOOP (i. e. reopen() doesn't return a new instance)
          • In the new ctr in MultiSegmentReader it was possible to hit a NullPointerException during filling the norms cache, because I didn't check for null after retrieving the old reader from the HashMap. I fixed this.
          • SegmentReader now also overwrites decRef() and implements decRefReaderNotNorms().
          • As Mike suggested I removed "boolean sharedNorms" from SegmentReader. Now in MultiSegmentReader I compare the norm instances from the old and the new subReaders and copy the bytes to the new cache in case they are ==.
          • In SegmentReader I changed norms to be a HashMap instead of HashTable.
          • Norm.decRef() and Norm.incRef() are synchronized now.
          • SegmentReader#norms(String field, byte[] bytes, int offset) now synchronizes on the norm object that is to be read.
          • SegmentReader#reopen() now opens a new FieldsReader because it is not thread-safe.
          • SegmentReader.Norm has a new boolean variable "useSingleNormStream". SegmentReader#norms(String field, byte[] bytes, int offset) checks if it is true. If yes, then the readers' singleNormStream is used, otherwise norm.in. This is necessary so that a reopened SegmentReader always uses its own singleNormStream and to avoid synchronization on the singleNormStream.
          • I added a bunch of code to TestIndexReaderReopen to test the thread-safety of reopen(). It starts 150 threads: some modify the index (some delete docs, some add docs and some set norms), some reopen readers and check if the re-opened ones deliver the same results as fresh ones.
          • assertReaderClosed now checks if the norms are closed and also checks recursively if all subReaders are closed.

          Still outstanding:

          • On the IBM JVM all tests pass. On Sun, the thread-safety test sometimes fails. When it fails, then in assertReaderClosed, because the refCounts of either the norms or some subReaders aren't 0 at the end of the test. At this point I'm not sure why and I'm still debugging. I just wanted to submit the patch to give others the chance to review the patch or possibly (hopefully) find the problem before me.
          Show
          Michael Busch added a comment - Changes in this patch: Fixed ParallelReader and MultiReader so that they don't incRef the subreaders anymore in case reopen() is a NOOP (i. e. reopen() doesn't return a new instance) In the new ctr in MultiSegmentReader it was possible to hit a NullPointerException during filling the norms cache, because I didn't check for null after retrieving the old reader from the HashMap. I fixed this. SegmentReader now also overwrites decRef() and implements decRefReaderNotNorms(). As Mike suggested I removed "boolean sharedNorms" from SegmentReader. Now in MultiSegmentReader I compare the norm instances from the old and the new subReaders and copy the bytes to the new cache in case they are ==. In SegmentReader I changed norms to be a HashMap instead of HashTable. Norm.decRef() and Norm.incRef() are synchronized now. SegmentReader#norms(String field, byte[] bytes, int offset) now synchronizes on the norm object that is to be read. SegmentReader#reopen() now opens a new FieldsReader because it is not thread-safe. SegmentReader.Norm has a new boolean variable "useSingleNormStream". SegmentReader#norms(String field, byte[] bytes, int offset) checks if it is true. If yes, then the readers' singleNormStream is used, otherwise norm.in. This is necessary so that a reopened SegmentReader always uses its own singleNormStream and to avoid synchronization on the singleNormStream. I added a bunch of code to TestIndexReaderReopen to test the thread-safety of reopen(). It starts 150 threads: some modify the index (some delete docs, some add docs and some set norms), some reopen readers and check if the re-opened ones deliver the same results as fresh ones. assertReaderClosed now checks if the norms are closed and also checks recursively if all subReaders are closed. Still outstanding: On the IBM JVM all tests pass. On Sun, the thread-safety test sometimes fails. When it fails, then in assertReaderClosed, because the refCounts of either the norms or some subReaders aren't 0 at the end of the test. At this point I'm not sure why and I'm still debugging. I just wanted to submit the patch to give others the chance to review the patch or possibly (hopefully) find the problem before me.
          Hide
          Thomas Peuss added a comment -

          To find concurrency issues with an unit test is hard to do, because your potential problems lie in the time domain and not in the code domain.

          From my experience following things can have impact on the results of such a test:

          • Running on SP or SMP machines. SMP machines (the more cores the better) reveal concurrency issues much earlier.
          • The Java implementation you are using. IBM's and Sun's thread implementations behave slightly different for example.
          • The OS you are running. This may seem odd in the first run but remember that modern Java implementations rely heavily on the threading implementations of the OS.
          • The processor platform you are running. NUMA vs. UMA (which is AMD vs. intel). The timing of threads can differ because of this.

          And be prepared that one time your tests runs through without a problem and on the next run it breaks...

          Just my € 0.02

          Show
          Thomas Peuss added a comment - To find concurrency issues with an unit test is hard to do, because your potential problems lie in the time domain and not in the code domain. From my experience following things can have impact on the results of such a test: Running on SP or SMP machines. SMP machines (the more cores the better) reveal concurrency issues much earlier. The Java implementation you are using. IBM's and Sun's thread implementations behave slightly different for example. The OS you are running. This may seem odd in the first run but remember that modern Java implementations rely heavily on the threading implementations of the OS. The processor platform you are running. NUMA vs. UMA (which is AMD vs. intel). The timing of threads can differ because of this. And be prepared that one time your tests runs through without a problem and on the next run it breaks... Just my € 0.02
          Hide
          Yonik Seeley added a comment -

          Sorry, I hadn't kept up with this issue wrt what was going to be legal (and we should definitely only test what will be legal in the MT test). So that removes the deletedDocs issue I guess.

          Show
          Yonik Seeley added a comment - Sorry, I hadn't kept up with this issue wrt what was going to be legal (and we should definitely only test what will be legal in the MT test). So that removes the deletedDocs issue I guess.
          Hide
          Michael Busch added a comment -

          I just did a quick partial review of SegmentReader for thread safety only and ran across some potential issues

          OK, let's scratch my "ready to commit" comment

          A question about thread-safety here. I agree that we must
          fix all possible problems concerning two or more
          IndexReaders in read-mode, like the FieldsReader issue.

          On the other hand: We're saying that performing write
          operations on a re-opened reader results in undefined
          behavior. Some of the issues you mentioned, Yonik, should
          only apply in case one of the shared readers is used to
          perform index modifications, right? Then the question is:
          how much sense does it make to make reopen() thread-safe
          in the write case then?

          So I think the multi-threaded testcase should not
          perform index modifications using readers involved in a
          reopen()?

          Show
          Michael Busch added a comment - I just did a quick partial review of SegmentReader for thread safety only and ran across some potential issues OK, let's scratch my "ready to commit" comment A question about thread-safety here. I agree that we must fix all possible problems concerning two or more IndexReaders in read-mode , like the FieldsReader issue. On the other hand: We're saying that performing write operations on a re-opened reader results in undefined behavior. Some of the issues you mentioned, Yonik, should only apply in case one of the shared readers is used to perform index modifications, right? Then the question is: how much sense does it make to make reopen() thread-safe in the write case then? So I think the multi-threaded testcase should not perform index modifications using readers involved in a reopen()?
          Hide
          Michael McCandless added a comment -

          Or reverse it... write the test first so we have confidence that it can at least uncover some of these issues.
          The test should do as little synchronization as possible of it's own so it doesn't mask a lack of synchronization in the core.
          It should be possible to uncover the unsynchronized concurrent use of IndexInput at least, and hopefully some of the refcounting issues too.

          Excellent, I agree!

          Show
          Michael McCandless added a comment - Or reverse it... write the test first so we have confidence that it can at least uncover some of these issues. The test should do as little synchronization as possible of it's own so it doesn't mask a lack of synchronization in the core. It should be possible to uncover the unsynchronized concurrent use of IndexInput at least, and hopefully some of the refcounting issues too. Excellent, I agree!
          Hide
          Yonik Seeley added a comment -

          We should fix all the synchronization issues you've found, create this
          unit test, and then iterate from there.

          Or reverse it... write the test first so we have confidence that it can at least uncover some of these issues.
          The test should do as little synchronization as possible of it's own so it doesn't mask a lack of synchronization in the core.
          It should be possible to uncover the unsynchronized concurrent use of IndexInput at least, and hopefully some of the refcounting issues too.

          Show
          Yonik Seeley added a comment - We should fix all the synchronization issues you've found, create this unit test, and then iterate from there. Or reverse it... write the test first so we have confidence that it can at least uncover some of these issues. The test should do as little synchronization as possible of it's own so it doesn't mask a lack of synchronization in the core. It should be possible to uncover the unsynchronized concurrent use of IndexInput at least, and hopefully some of the refcounting issues too.
          Hide
          Michael McCandless added a comment -

          This is complex enough that in addition to review, I think we need a
          good multi-threaded test - 100 or 1000 threads over a ram directory,
          all changing, querying, retrieving docs, reopening, closing, etc.

          +1

          We should fix all the synchronization issues you've found, create this
          unit test, and then iterate from there.

          Show
          Michael McCandless added a comment - This is complex enough that in addition to review, I think we need a good multi-threaded test - 100 or 1000 threads over a ram directory, all changing, querying, retrieving docs, reopening, closing, etc. +1 We should fix all the synchronization issues you've found, create this unit test, and then iterate from there.
          Hide
          Michael McCandless added a comment -

          OK, reviewed the latest patch:

          • In this code:
            // singleNormFile means multiple norms share this file
            if (fileName.endsWith("." + IndexFileNames.NORMS_EXTENSION)) {
              clone.singleNormStream = d.openInput(fileName, readBufferSize);            
            }
              

            I think the comment should be removed (it doens't apply) and also
            won't this incorrectly open the singleNormStream more than once if
            more than one field does not have separate norms? I think you should
            init that to null and then only reopen it, once, if it's still null?

          • In MultiSegmentReader, the logic that copies over unchanged norms
            from the old norms cache can be simplified. I think you can just
            look up the old Norm instance & the new Norm instance and if they
            are == then you can copy bytes over? This would also let you remove
            "sharedNorms" entirely which is good because it's not a just a
            boolean thing anymore since some Norm instances are shared and some
            aren't.
          • I think you also need to override decRef (and add
            decRefReaderNotNorms) to SegmentReader? Because now there is a
            mismatch: incRef incr's the Norm RC's, but, decRef does not. So I
            think norms are not getting closed? I think we should modify the
            "assertReaderClosed()" in the unit test to verify (when appropriate)
            that also the RC of all Norm instances is also 0 (ie
            assertTrue(SR.normsClosed())). Then, make sure SR calls
            referencedSegmentReader.decRefReaderNotNorms instead of decRef. I
            think you then don't need to track "closedNorms" boolean, at all.
            You simply always decRef the norms whenever SR.decRef is called.
            The doCloseUnsharedResources is still needed to close the
            singleNormStream.
          Show
          Michael McCandless added a comment - OK, reviewed the latest patch: In this code: // singleNormFile means multiple norms share this file if (fileName.endsWith( "." + IndexFileNames.NORMS_EXTENSION)) { clone.singleNormStream = d.openInput(fileName, readBufferSize); } I think the comment should be removed (it doens't apply) and also won't this incorrectly open the singleNormStream more than once if more than one field does not have separate norms? I think you should init that to null and then only reopen it, once, if it's still null? In MultiSegmentReader, the logic that copies over unchanged norms from the old norms cache can be simplified. I think you can just look up the old Norm instance & the new Norm instance and if they are == then you can copy bytes over? This would also let you remove "sharedNorms" entirely which is good because it's not a just a boolean thing anymore since some Norm instances are shared and some aren't. I think you also need to override decRef (and add decRefReaderNotNorms) to SegmentReader? Because now there is a mismatch: incRef incr's the Norm RC's, but, decRef does not. So I think norms are not getting closed? I think we should modify the "assertReaderClosed()" in the unit test to verify (when appropriate) that also the RC of all Norm instances is also 0 (ie assertTrue(SR.normsClosed())). Then, make sure SR calls referencedSegmentReader.decRefReaderNotNorms instead of decRef. I think you then don't need to track "closedNorms" boolean, at all. You simply always decRef the norms whenever SR.decRef is called. The doCloseUnsharedResources is still needed to close the singleNormStream.
          Hide
          Yonik Seeley added a comment -

          It also looks like Norm.incRef is used in an unsafe manner (unsynchronized, or synchronized on the reader), and also Norm.decRef() is called inside a synchronized(norms) block, but an individual Norm may be shared across multiple Hashtables, right?

          I don't think that norms even needs to be a synchronized Hashtable... it could be changed to a HashMap since it's contents never change, right?

          Show
          Yonik Seeley added a comment - It also looks like Norm.incRef is used in an unsafe manner (unsynchronized, or synchronized on the reader), and also Norm.decRef() is called inside a synchronized(norms) block, but an individual Norm may be shared across multiple Hashtables, right? I don't think that norms even needs to be a synchronized Hashtable... it could be changed to a HashMap since it's contents never change, right?
          Hide
          Yonik Seeley added a comment -

          I just did a quick partial review of SegmentReader for thread safety only and ran across some potential issues

          • It looks like fieldsReader is shared between clones(), and that isn't thread-safe (synchronization is done at the SegmentReader level, and now there is more than one)
          • maybe the same issue with deletedDocs? mutual exclusion is no longer enforced.
          • it looks like the norms Hashtable could be shared... looping over the individual norms and calling incRef doesn't seem safe for a number of reasons (for example, you might miss some just being added)
          • reading new norms isn't safe...
            synchronized norms(String field, byte[] bytes, int offset) uses the "norm' IndexInput that is shared. synchronization on a single reader no longer guarantees mutual exclusion.

          There's probably other stuff, but I stopped looking. Since we are sharing things now, every method that was synchronized is now potentially unsafe. Synchronizing on the object being shared is probably a much better strategy now.

          This is complex enough that in addition to review, I think we need a good multi-threaded test - 100 or 1000 threads over a ram directory, all changing, querying, retrieving docs, reopening, closing, etc.

          Show
          Yonik Seeley added a comment - I just did a quick partial review of SegmentReader for thread safety only and ran across some potential issues It looks like fieldsReader is shared between clones(), and that isn't thread-safe (synchronization is done at the SegmentReader level, and now there is more than one) maybe the same issue with deletedDocs? mutual exclusion is no longer enforced. it looks like the norms Hashtable could be shared... looping over the individual norms and calling incRef doesn't seem safe for a number of reasons (for example, you might miss some just being added) reading new norms isn't safe... synchronized norms(String field, byte[] bytes, int offset) uses the "norm' IndexInput that is shared. synchronization on a single reader no longer guarantees mutual exclusion. There's probably other stuff, but I stopped looking. Since we are sharing things now, every method that was synchronized is now potentially unsafe. Synchronizing on the object being shared is probably a much better strategy now. This is complex enough that in addition to review, I think we need a good multi-threaded test - 100 or 1000 threads over a ram directory, all changing, querying, retrieving docs, reopening, closing, etc.
          Hide
          Michael Busch added a comment -

          OK, I think it's finally working now!

          SegmentReader now overwrites incRef() and increments the readers RC,
          as well as the RCs of all norms. I further added the private method
          incRefReaderNotNorms() to SegmentReader, which is called in
          reopenSegment(), because it takes care of incrementing the RCs of
          all shared norms.

          I also added the method doCloseUnsharedResources() to IndexReader,
          which is a NOOP by default. It is called when a reader is closed,
          even if its RC > 0. SegmentReader overwrites this method and
          closes (=decRef) the norms in it. The SegmentReader then remembers
          that it closed the norms already and won't close them again in
          doClose(), which is called when its RC finally drops to 0.

          I also made the change you suggested, Mike, to only reload the
          field norms that actually changed. SegmentReader.openNorms() now
          checks if there is already a norm for a field in the HashTable,
          and only loads it if it's not there. reopenSegment() puts all
          norms in the new SegmentReader that haven't changed.

          I added some new tests to verify the norms ref counting. All unit
          tests pass now. So I think this is ready to commit, but I'd feel
          more comfortable if you could review it again before I commit it.

          Show
          Michael Busch added a comment - OK, I think it's finally working now! SegmentReader now overwrites incRef() and increments the readers RC, as well as the RCs of all norms. I further added the private method incRefReaderNotNorms() to SegmentReader, which is called in reopenSegment(), because it takes care of incrementing the RCs of all shared norms. I also added the method doCloseUnsharedResources() to IndexReader, which is a NOOP by default. It is called when a reader is closed, even if its RC > 0. SegmentReader overwrites this method and closes (=decRef) the norms in it. The SegmentReader then remembers that it closed the norms already and won't close them again in doClose(), which is called when its RC finally drops to 0. I also made the change you suggested, Mike, to only reload the field norms that actually changed. SegmentReader.openNorms() now checks if there is already a norm for a field in the HashTable, and only loads it if it's not there. reopenSegment() puts all norms in the new SegmentReader that haven't changed. I added some new tests to verify the norms ref counting. All unit tests pass now. So I think this is ready to commit, but I'd feel more comfortable if you could review it again before I commit it.
          Hide
          Michael McCandless added a comment -

          Hmm I was thinking about this before (that's actually why I put that
          method in there). But I don't think this is gonna work. For example,
          let's say we use a MultiReader that has two SegmentReader SR1 and SR2.
          Now only SR2 changed, we reopen the MR which increases the refCount on
          SR1, because it shares that SR. Now we close the old MultiReader, which
          calls close() on SR1. If now SegmentReader.close() calls Norm.decRef(),
          then it will close the norms even though they are still used by the new
          MultiReader.

          Ugh, you're right. The challenge is sometimes a reference to SR means
          "I will use norms" (this is when MultiReader incRefs) but other times
          it means "I will not use norms" (this is when SR incRefs due to
          reopen).

          OK, I like your original proposal: SR overrides incRef() and incrs its
          RC as well as the RC for each Norm it's using. Then, in SR's
          reopenSegment, you carefully incRef the "original" SR without
          incRef'ing its Norms (except for those Norms you will keep).

          Likewise, SR overrides decRef() to decr its RC and RC for each Norm.
          But, when a reopened SR1.doClose() is called, you must carefully
          decRef the RD of the original SR but not decRef each of its Norms
          (except for those you had actually shared).

          This way when MR calls SR.incRef/decRef then all Norms and the SR's RC
          are incr'd/decr'd. But when SR1 shares resources with an original SR
          it only incr's/decr's the refCount of the SR.

          Show
          Michael McCandless added a comment - Hmm I was thinking about this before (that's actually why I put that method in there). But I don't think this is gonna work. For example, let's say we use a MultiReader that has two SegmentReader SR1 and SR2. Now only SR2 changed, we reopen the MR which increases the refCount on SR1, because it shares that SR. Now we close the old MultiReader, which calls close() on SR1. If now SegmentReader.close() calls Norm.decRef(), then it will close the norms even though they are still used by the new MultiReader. Ugh, you're right. The challenge is sometimes a reference to SR means "I will use norms" (this is when MultiReader incRefs) but other times it means "I will not use norms" (this is when SR incRefs due to reopen). OK, I like your original proposal: SR overrides incRef() and incrs its RC as well as the RC for each Norm it's using. Then, in SR's reopenSegment, you carefully incRef the "original" SR without incRef'ing its Norms (except for those Norms you will keep). Likewise, SR overrides decRef() to decr its RC and RC for each Norm. But, when a reopened SR1.doClose() is called, you must carefully decRef the RD of the original SR but not decRef each of its Norms (except for those you had actually shared). This way when MR calls SR.incRef/decRef then all Norms and the SR's RC are incr'd/decr'd. But when SR1 shares resources with an original SR it only incr's/decr's the refCount of the SR.
          Hide
          Michael Busch added a comment -

          One more comment: I think you can partially share Norm instances? Eg

          Good idea! Will make the change.

          Show
          Michael Busch added a comment - One more comment: I think you can partially share Norm instances? Eg Good idea! Will make the change.
          Hide
          Michael Busch added a comment -

          How about if SegmentReader.close() always calls Norm.decRef(),
          immediately, for each Norm is has open? EG you could implement
          doCloseUnsharedResources in SegmentReader and do it there). This way,

          Hmm I was thinking about this before (that's actually why I put that
          method in there). But I don't think this is gonna work. For example,
          let's say we use a MultiReader that has two SegmentReader SR1 and SR2.
          Now only SR2 changed, we reopen the MR which increases the refCount on
          SR1, because it shares that SR. Now we close the old MultiReader, which
          calls close() on SR1. If now SegmentReader.close() calls Norm.decRef(),
          then it will close the norms even though they are still used by the new
          MultiReader.

          Show
          Michael Busch added a comment - How about if SegmentReader.close() always calls Norm.decRef(), immediately, for each Norm is has open? EG you could implement doCloseUnsharedResources in SegmentReader and do it there). This way, Hmm I was thinking about this before (that's actually why I put that method in there). But I don't think this is gonna work. For example, let's say we use a MultiReader that has two SegmentReader SR1 and SR2. Now only SR2 changed, we reopen the MR which increases the refCount on SR1, because it shares that SR. Now we close the old MultiReader, which calls close() on SR1. If now SegmentReader.close() calls Norm.decRef(), then it will close the norms even though they are still used by the new MultiReader.
          Hide
          Michael McCandless added a comment -

          Looks great! All tests pass for me.

          OK I see. I made this change as well. I also made the change that
          there is no chain, but one starting SegmentReader which all re-opened
          ones reference (see below). Now this starting SegmentReader won't close
          its norms until all other readers that reference it are closed (RC=0),
          because only then doClose() is called, which calls closeNorms().
          Do you see an easy way how to improve this?

          How about if SegmentReader.close() always calls Norm.decRef(),
          immediately, for each Norm is has open? EG you could implement
          doCloseUnsharedResources in SegmentReader and do it there). This way,
          if the SegmentReader has been closed but it shares resources (and not
          the Norms) with reopened SegmentReaders then its Norms would all
          decRef to 0 & be closed.

          Also make sure that if a SegmentReader is decRef'd to 0 and close was
          never called, that also in this case you remember to call Norm.decRef
          for all open norms.

          One more comment: I think you can partially share Norm instances? Eg
          if I have 2 fields that have norms, but only one of them changed since
          I opened this SegmentReader, then the reopened SegmentReader could
          share the Norm instance of the field that didn't change with the old
          SegmentReader? But right now you're re-loading all the Norms.

          Otherwise no more comments!

          Show
          Michael McCandless added a comment - Looks great! All tests pass for me. OK I see. I made this change as well. I also made the change that there is no chain, but one starting SegmentReader which all re-opened ones reference (see below). Now this starting SegmentReader won't close its norms until all other readers that reference it are closed (RC=0), because only then doClose() is called, which calls closeNorms(). Do you see an easy way how to improve this? How about if SegmentReader.close() always calls Norm.decRef(), immediately, for each Norm is has open? EG you could implement doCloseUnsharedResources in SegmentReader and do it there). This way, if the SegmentReader has been closed but it shares resources (and not the Norms) with reopened SegmentReaders then its Norms would all decRef to 0 & be closed. Also make sure that if a SegmentReader is decRef'd to 0 and close was never called, that also in this case you remember to call Norm.decRef for all open norms. One more comment: I think you can partially share Norm instances? Eg if I have 2 fields that have norms, but only one of them changed since I opened this SegmentReader, then the reopened SegmentReader could share the Norm instance of the field that didn't change with the old SegmentReader? But right now you're re-loading all the Norms. Otherwise no more comments!
          Hide
          Michael Busch added a comment -

          Patch looks great! I'm still working through it but found a few small
          issues...

          Thanks Mike! Very good review and feedback!

          It might be good to put a "assert refCount > 0" at various places like...

          Agreed. I added those asserts to incRef() and decRef(). I didn't add it
          to ensureOpen(), because it throws an AlreadyClosedException anyway, and
          some testcases check if this exception is thrown.

          I'm seeing a failure in contrib/memory testcase:

          Oups, I fixed this already. I changed the (deprecated) ctr
          IndexReader.IndexReader(Directory) to call this() which sets the refCount
          to 1. The test passes then. I made this fix yesterday, I think I just
          forgot to update the patch file before I submitted it, sorry about this.

          • In multiple places you catch an IOException and undo the attempted
            re-open, but shouldn't this be a try/finally instead so you also
            clean up on hitting any unchecked exceptions?

          Yes of course! Changed it.

          • I think you need an explicit refCount for the Norm class in
            SegmentReader.

          OK I see. I made this change as well. I also made the change that
          there is no chain, but one starting SegmentReader which all re-opened
          ones reference (see below). Now this starting SegmentReader won't close
          its norms until all other readers that reference it are closed (RC=0),
          because only then doClose() is called, which calls closeNorms().
          Do you see an easy way how to improve this?
          Hmm, probably I have to definalize IndexReader.incRef() and decRef()
          and overwrite them in SegmentReader. Then SegmentReader.incRef() would
          also incRef the norms, SegmentReader.decref() would decref the norms,
          and somehow a clone that shares references the reader but not the norms
          (because they changed) would only incref the reader itself, but not
          the norms... Or do you see an easier way?

          • If you have a long series of reopens, then, all SegmentReaders in
            the chain will remain alive. So this is a [small] memory leak
            with time. I think if you changed referencedSegmentReader to
            always be the starting SegmentReader then this chain is broken

          Good point. Ok I changed this and also the test cases that check the refCount
          values.

          Show
          Michael Busch added a comment - Patch looks great! I'm still working through it but found a few small issues... Thanks Mike! Very good review and feedback! It might be good to put a "assert refCount > 0" at various places like... Agreed. I added those asserts to incRef() and decRef(). I didn't add it to ensureOpen(), because it throws an AlreadyClosedException anyway, and some testcases check if this exception is thrown. I'm seeing a failure in contrib/memory testcase: Oups, I fixed this already. I changed the (deprecated) ctr IndexReader.IndexReader(Directory) to call this() which sets the refCount to 1. The test passes then. I made this fix yesterday, I think I just forgot to update the patch file before I submitted it, sorry about this. In multiple places you catch an IOException and undo the attempted re-open, but shouldn't this be a try/finally instead so you also clean up on hitting any unchecked exceptions? Yes of course! Changed it. I think you need an explicit refCount for the Norm class in SegmentReader. OK I see. I made this change as well. I also made the change that there is no chain, but one starting SegmentReader which all re-opened ones reference (see below). Now this starting SegmentReader won't close its norms until all other readers that reference it are closed (RC=0), because only then doClose() is called, which calls closeNorms(). Do you see an easy way how to improve this? Hmm, probably I have to definalize IndexReader.incRef() and decRef() and overwrite them in SegmentReader. Then SegmentReader.incRef() would also incRef the norms, SegmentReader.decref() would decref the norms, and somehow a clone that shares references the reader but not the norms (because they changed) would only incref the reader itself, but not the norms... Or do you see an easier way? If you have a long series of reopens, then, all SegmentReaders in the chain will remain alive. So this is a [small] memory leak with time. I think if you changed referencedSegmentReader to always be the starting SegmentReader then this chain is broken Good point. Ok I changed this and also the test cases that check the refCount values.
          Hide
          Michael McCandless added a comment -

          OK I think this patch is very close! I finished reviewing it –
          here's some more feedback:

          • In multiple places you catch an IOException and undo the attempted
            re-open, but shouldn't this be a try/finally instead so you also
            clean up on hitting any unchecked exceptions?
          • I think you need an explicit refCount for the Norm class in
            SegmentReader.
            .
            Say I've done a chain of 10 re-opens for SegmentReader and each
            time only the segment's norms has changed. I've closed all but
            the last SegmentReader. At this point all 10 SegmentReaders are
            still alive (RC > 0) and holding open all file handles for their
            copies of the norms. So this will leak file handles/RAM with each
            reopen?
            .
            To fix this, I think you just need to add refCount into Norm class
            & set refCount to 1 in the constructor. Then, each each
            SegmentReader calls Norm.decRef(), not Norm.close(), when it's
            done. When refCount hits 0 then the Norm closes itself. Finally,
            during re-open you should share a Norm instance (rather than open
            a new one) if it had not changed from the previous SegmentReader.
            .
            For singleNormStream, I think each reopened SegmentReader should
            always re-open this descriptor and then we can forcefully close
            this stream when the SegmentReader is closed (what you are doing
            now). Ie the SegmentReader fully owns singleNormStream.
          • If you have a long series of reopens, then, all SegmentReaders in
            the chain will remain alive. So this is a [small] memory leak
            with time. I think if you changed referencedSegmentReader to
            always be the starting SegmentReader then this chain is broken
            and after 10 reopens only the original SegmentReader and the most
            recent one will remain alive (assuming I closed all SegmentReaders
            but the most recent one).
          Show
          Michael McCandless added a comment - OK I think this patch is very close! I finished reviewing it – here's some more feedback: In multiple places you catch an IOException and undo the attempted re-open, but shouldn't this be a try/finally instead so you also clean up on hitting any unchecked exceptions? I think you need an explicit refCount for the Norm class in SegmentReader. . Say I've done a chain of 10 re-opens for SegmentReader and each time only the segment's norms has changed. I've closed all but the last SegmentReader. At this point all 10 SegmentReaders are still alive (RC > 0) and holding open all file handles for their copies of the norms. So this will leak file handles/RAM with each reopen? . To fix this, I think you just need to add refCount into Norm class & set refCount to 1 in the constructor. Then, each each SegmentReader calls Norm.decRef(), not Norm.close(), when it's done. When refCount hits 0 then the Norm closes itself. Finally, during re-open you should share a Norm instance (rather than open a new one) if it had not changed from the previous SegmentReader. . For singleNormStream, I think each reopened SegmentReader should always re-open this descriptor and then we can forcefully close this stream when the SegmentReader is closed (what you are doing now). Ie the SegmentReader fully owns singleNormStream. If you have a long series of reopens, then, all SegmentReaders in the chain will remain alive. So this is a [small] memory leak with time. I think if you changed referencedSegmentReader to always be the starting SegmentReader then this chain is broken and after 10 reopens only the original SegmentReader and the most recent one will remain alive (assuming I closed all SegmentReaders but the most recent one).
          Hide
          Michael McCandless added a comment -

          Patch looks great! I'm still working through it but found a few small
          issues...

          It might be good to put a "assert refCount > 0" at various places like
          decRef(), incRef(), ensureOpen()? That would require changing the
          constructors to init refCount=1 rather than incRef() it to 1.

          I'm seeing a failure in contrib/memory testcase:

              [junit] *********** FILE=./NOTICE.txt
              [junit] Fatal error at query=Apache, file=./NOTICE.txt, anal=org.apache.lucene.analysis.SimpleAnalyzer@341960
              [junit] ------------- ---------------- ---------------
              [junit] Testcase: testMany(org.apache.lucene.index.memory.MemoryIndexTest):	Caused an ERROR
              [junit] this IndexReader is closed
              [junit] org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed
              [junit] 	at org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:158)
              [junit] 	at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:632)
              [junit] 	at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:64)
              [junit] 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:143)
              [junit] 	at org.apache.lucene.search.Searcher.search(Searcher.java:118)
              [junit] 	at org.apache.lucene.search.Searcher.search(Searcher.java:97)
              [junit] 	at org.apache.lucene.index.memory.MemoryIndexTest.query(MemoryIndexTest.java:412)
              [junit] 	at org.apache.lucene.index.memory.MemoryIndexTest.run(MemoryIndexTest.java:313)
              [junit] 	at org.apache.lucene.index.memory.MemoryIndexTest.testMany(MemoryIndexTest.java:234)
          

          I think it's because MemoryIndexReader (private class in
          MemoryIndex.java) calls super(null) =
          IndexReader.IndexReader(Directory) in its constructor, which does not
          initialize the refCount to 1? If I insert incRef() into
          IndexReader.IndexReader(Directory) constructor, the test passes, but
          who else is using that constructor (ie will this double-incref in
          those cases?).

          Show
          Michael McCandless added a comment - Patch looks great! I'm still working through it but found a few small issues... It might be good to put a "assert refCount > 0" at various places like decRef(), incRef(), ensureOpen()? That would require changing the constructors to init refCount=1 rather than incRef() it to 1. I'm seeing a failure in contrib/memory testcase: [junit] *********** FILE=./NOTICE.txt [junit] Fatal error at query=Apache, file=./NOTICE.txt, anal=org.apache.lucene.analysis.SimpleAnalyzer@341960 [junit] ------------- ---------------- --------------- [junit] Testcase: testMany(org.apache.lucene.index.memory.MemoryIndexTest): Caused an ERROR [junit] this IndexReader is closed [junit] org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed [junit] at org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:158) [junit] at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:632) [junit] at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:64) [junit] at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:143) [junit] at org.apache.lucene.search.Searcher.search(Searcher.java:118) [junit] at org.apache.lucene.search.Searcher.search(Searcher.java:97) [junit] at org.apache.lucene.index.memory.MemoryIndexTest.query(MemoryIndexTest.java:412) [junit] at org.apache.lucene.index.memory.MemoryIndexTest.run(MemoryIndexTest.java:313) [junit] at org.apache.lucene.index.memory.MemoryIndexTest.testMany(MemoryIndexTest.java:234) I think it's because MemoryIndexReader (private class in MemoryIndex.java) calls super(null) = IndexReader.IndexReader(Directory) in its constructor, which does not initialize the refCount to 1? If I insert incRef() into IndexReader.IndexReader(Directory) constructor, the test passes, but who else is using that constructor (ie will this double-incref in those cases?).
          Hide
          Michael Busch added a comment -

          Ok here is the next one ...

          This patch implements the refCounting as discussed with Mike and Yonik
          above.

          Other changes/improvements/comments:

          • ensureOpen() is now also called in MultiReader.reopen() and
            ParallelReader.reopen(). (thanks, Mike)
          • in case an exception occurs during reopen() it is taken care of
            closing or decreasing the refCount of already created readers.
            Also old readers should not be affected in case an exception occurs.
          • I improved how norms are re-opened in a MultiSegmentReader (MSR).
            It now checks which parts of the normsCache haven't changed and
            copies those to the new normsCache. Because I'm imagining Yonik with
            his thread-safety hat on now , another comment about this: In case
            a MSR is refreshed, then the synchronized MSR.reopen() method has the
            lock on the old MSR. This method creates the new MSR and the values
            from the old cache are copied to the new cache in the constructor, so
            while the lock on the old MSR is still being held.
          • added new constructors to MultiReader and ParallelReader that
            increase the refCount on the subReaders and thus prevent closing the
            subReaders on close(). (thanks, Yonik)

          I also made the changes suggested by Hoss (thanks!):

          • changed the "successfully reopened" comments int the javadocs
          • added comments to the javadocs saying that write operations on the
            re-opened reader will result in undefined behavior unless the old
            reader is closed
          • FilterIndexReader.reopen() not implemented, i. e. will throw an
            UnsupportedOperationException.

          All unit tests pass.

          Show
          Michael Busch added a comment - Ok here is the next one ... This patch implements the refCounting as discussed with Mike and Yonik above. Other changes/improvements/comments: ensureOpen() is now also called in MultiReader.reopen() and ParallelReader.reopen(). (thanks, Mike) in case an exception occurs during reopen() it is taken care of closing or decreasing the refCount of already created readers. Also old readers should not be affected in case an exception occurs. I improved how norms are re-opened in a MultiSegmentReader (MSR). It now checks which parts of the normsCache haven't changed and copies those to the new normsCache. Because I'm imagining Yonik with his thread-safety hat on now , another comment about this: In case a MSR is refreshed, then the synchronized MSR.reopen() method has the lock on the old MSR. This method creates the new MSR and the values from the old cache are copied to the new cache in the constructor, so while the lock on the old MSR is still being held. added new constructors to MultiReader and ParallelReader that increase the refCount on the subReaders and thus prevent closing the subReaders on close(). (thanks, Yonik) I also made the changes suggested by Hoss (thanks!): changed the "successfully reopened" comments int the javadocs added comments to the javadocs saying that write operations on the re-opened reader will result in undefined behavior unless the old reader is closed FilterIndexReader.reopen() not implemented, i. e. will throw an UnsupportedOperationException. All unit tests pass.
          Hide
          Michael McCandless added a comment -

          Oh and Mike, thanks for bearing with me

          Thank you for bearing with me!

          What about a new constructor for MultiReader/ParallelReader that implements more sensible semantics (increment refcount on readers passed to it, and decrement on close())?

          +1

          Show
          Michael McCandless added a comment - Oh and Mike, thanks for bearing with me Thank you for bearing with me! What about a new constructor for MultiReader/ParallelReader that implements more sensible semantics (increment refcount on readers passed to it, and decrement on close())? +1
          Hide
          Michael Busch added a comment -

          What about a new constructor for MultiReader/ParallelReader that implements more sensible semantics (increment refcount on readers passed to it, and decrement on close())?

          Yeah, when reference counting is implemented then such a constructor should be easy to add.

          Show
          Michael Busch added a comment - What about a new constructor for MultiReader/ParallelReader that implements more sensible semantics (increment refcount on readers passed to it, and decrement on close())? Yeah, when reference counting is implemented then such a constructor should be easy to add.
          Hide
          Yonik Seeley added a comment -

          What about a new constructor for MultiReader/ParallelReader that implements more sensible semantics (increment refcount on readers passed to it, and decrement on close())?

          Show
          Yonik Seeley added a comment - What about a new constructor for MultiReader/ParallelReader that implements more sensible semantics (increment refcount on readers passed to it, and decrement on close())?
          Hide
          Michael Busch added a comment -

          I think this is OK?

          This was essentially the reason why I suggested to use two refcount values:
          one to control when to close a reader, and one to control when to close
          it's (shared) resources in case of SegmentReader. That approach would not
          alter the behaviour of IndexReader.close().
          But I agree that your approach is simpler and I also think it is okay to
          change ensureOpen() and accept the slight API change.

          So I'll go ahead and implement the refcount approach unless anybody objects.

          Oh and Mike, thanks for bearing with me

          Show
          Michael Busch added a comment - I think this is OK? This was essentially the reason why I suggested to use two refcount values: one to control when to close a reader, and one to control when to close it's (shared) resources in case of SegmentReader. That approach would not alter the behaviour of IndexReader.close(). But I agree that your approach is simpler and I also think it is okay to change ensureOpen() and accept the slight API change. So I'll go ahead and implement the refcount approach unless anybody objects. Oh and Mike, thanks for bearing with me
          Hide
          Michael McCandless added a comment -
          • MultiReader/ParallelReader must not incref the subreaders on open()
            as you said. But on reopen() it must incref the subreaders that
            haven't changed and thus are shared with the old MultiReader/
            ParallelReader. This further means, that the re-opened MultiReader/
            ParallelReader must remember which of the subreaders to decref on
            close(), right?

          Hmm, right. MultiReader/ParallelReader must keep track of whether it
          should call decref() or close() on each of its child readers, when it
          itself is closed.

          • If we change ensureOpen() like you suggest, then the user might
            still be able to use reader1 (in my example), even after
            reader1.close() was explicitly called. Probably not a big deal?

          I think this is OK?

          Show
          Michael McCandless added a comment - MultiReader/ParallelReader must not incref the subreaders on open() as you said. But on reopen() it must incref the subreaders that haven't changed and thus are shared with the old MultiReader/ ParallelReader. This further means, that the re-opened MultiReader/ ParallelReader must remember which of the subreaders to decref on close(), right? Hmm, right. MultiReader/ParallelReader must keep track of whether it should call decref() or close() on each of its child readers, when it itself is closed. If we change ensureOpen() like you suggest, then the user might still be able to use reader1 (in my example), even after reader1.close() was explicitly called. Probably not a big deal? I think this is OK?
          Hide
          Michael Busch added a comment -

          With that change, plus the change below, your example works fine.

          Two things:

          • MultiReader/ParallelReader must not incref the subreaders on open()
            as you said. But on reopen() it must incref the subreaders that
            haven't changed and thus are shared with the old MultiReader/
            ParallelReader. This further means, that the re-opened MultiReader/
            ParallelReader must remember which of the subreaders to decref on
            close(), right?
          • If we change ensureOpen() like you suggest, then the user might
            still be able to use reader1 (in my example), even after
            reader1.close() was explicitly called. Probably not a big deal?
          Show
          Michael Busch added a comment - With that change, plus the change below, your example works fine. Two things: MultiReader/ParallelReader must not incref the subreaders on open() as you said. But on reopen() it must incref the subreaders that haven't changed and thus are shared with the old MultiReader/ ParallelReader. This further means, that the re-opened MultiReader/ ParallelReader must remember which of the subreaders to decref on close(), right? If we change ensureOpen() like you suggest, then the user might still be able to use reader1 (in my example), even after reader1.close() was explicitly called. Probably not a big deal?
          Hide
          Michael McCandless added a comment -

          I think we are forced to keep this semantics, for backwards
          compatibility. But I don't really think MultiReader/ParallelReader
          should actually be this aggressive. Maybe in the future we can add
          ctors for MultiReader/ParallelReader that accept a "doClose" boolean
          to turn this off.

          Actually I retract this: it's no longer necessary as long as we change
          ensureOpen to assert that RC > 0 instead of closed==false.

          I think this is actually a nice unexpected side-effect of using
          reference counting: it resolves this overly aggressive behavior of
          MultiReader/ParallelReader.

          Show
          Michael McCandless added a comment - I think we are forced to keep this semantics, for backwards compatibility. But I don't really think MultiReader/ParallelReader should actually be this aggressive. Maybe in the future we can add ctors for MultiReader/ParallelReader that accept a "doClose" boolean to turn this off. Actually I retract this: it's no longer necessary as long as we change ensureOpen to assert that RC > 0 instead of closed==false. I think this is actually a nice unexpected side-effect of using reference counting: it resolves this overly aggressive behavior of MultiReader/ParallelReader.
          Hide
          Michael McCandless added a comment -

          If you do:

          IndexReader reader1 = IndexReader.open(index1);  
          IndexReader multiReader1 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index2)});
          multiReader1.close();
          

          then today multiReader1.close() also closes reader1. That's why I consciously omitted reader1.close().

          Ahh, I missed that MultiReader is allowed to close all readers that
          were passed into it, when it is closed. OK, let's leave
          reader1.close() out of the example.

          It's somewhat "aggressive" of MultiReader/ParallelReader to do that?
          If you go and use those same sub-readers in other MultiReaders then
          they closing of the first MultiReader will then break the other ones?

          I think we are forced to keep this semantics, for backwards
          compatibility. But I don't really think MultiReader/ParallelReader
          should actually be this aggressive. Maybe in the future we can add
          ctors for MultiReader/ParallelReader that accept a "doClose" boolean
          to turn this off.

          Anyway, it's simple to preserve this semantics with reference
          counting. It just means that IndexReader / MultiReader do not incref
          the readers they receive, and, when they are done with those readers,
          they must call their close(), not decref. Ie they "borrow the
          reference" that was passed in, rather than incref'ing their own
          reference, to the child readers.

          With that change, plus the change below, your example works fine.

          Consequently, if you do

          IndexReader reader1 = IndexReader.open(index1);  
          IndexReader multiReader1 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index2)});
          IndexReader multiReader2 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index3)});
          multiReader1.close();
          

          then multiReader2 is not usable anymore, because multiReader1.close() closes reader1. But that can be explicitly avoided by the user because it's known that multiReader1 and multiReader2 share the same reader.

          This is why I don't like the semantics we have today – I don't think
          it's right that the multiReader1.close() breaks multiReader2.

          Now, with the reopen() code:

          IndexReader reader1 = IndexReader.open(index1);  // optimized index, reader1 is a SegmentReader
          IndexReader multiReader1 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index2)});
          ... // modify index2
          IndexReader multiReader2 = multiReader1.reopen();  
          // only index2 changed, so multiReader2 uses reader1 and has to increment the refcount of reader1
          

          The user gets a new reader instance from multiReader.reopen(), but can't tell which of the subreaders has been reopened and which are shared. That's why multiReader1.close() should not close reader1 in this case and we need refcounting in order to make this work.

          Both of these cases are easy to fix with reference counting: we just
          have to change ensureOpen() to assert that RC > 0 instead of
          closed==false. Ie, a reader may still be used as long as its RC is
          still non-zero.

          Show
          Michael McCandless added a comment - If you do: IndexReader reader1 = IndexReader.open(index1); IndexReader multiReader1 = new MultiReader( new IndexReader[] {reader1, IndexReader.open(index2)}); multiReader1.close(); then today multiReader1.close() also closes reader1. That's why I consciously omitted reader1.close(). Ahh, I missed that MultiReader is allowed to close all readers that were passed into it, when it is closed. OK, let's leave reader1.close() out of the example. It's somewhat "aggressive" of MultiReader/ParallelReader to do that? If you go and use those same sub-readers in other MultiReaders then they closing of the first MultiReader will then break the other ones? I think we are forced to keep this semantics, for backwards compatibility. But I don't really think MultiReader/ParallelReader should actually be this aggressive. Maybe in the future we can add ctors for MultiReader/ParallelReader that accept a "doClose" boolean to turn this off. Anyway, it's simple to preserve this semantics with reference counting. It just means that IndexReader / MultiReader do not incref the readers they receive, and, when they are done with those readers, they must call their close(), not decref. Ie they "borrow the reference" that was passed in, rather than incref'ing their own reference, to the child readers. With that change, plus the change below, your example works fine. Consequently, if you do IndexReader reader1 = IndexReader.open(index1); IndexReader multiReader1 = new MultiReader( new IndexReader[] {reader1, IndexReader.open(index2)}); IndexReader multiReader2 = new MultiReader( new IndexReader[] {reader1, IndexReader.open(index3)}); multiReader1.close(); then multiReader2 is not usable anymore, because multiReader1.close() closes reader1. But that can be explicitly avoided by the user because it's known that multiReader1 and multiReader2 share the same reader. This is why I don't like the semantics we have today – I don't think it's right that the multiReader1.close() breaks multiReader2. Now, with the reopen() code: IndexReader reader1 = IndexReader.open(index1); // optimized index, reader1 is a SegmentReader IndexReader multiReader1 = new MultiReader( new IndexReader[] {reader1, IndexReader.open(index2)}); ... // modify index2 IndexReader multiReader2 = multiReader1.reopen(); // only index2 changed, so multiReader2 uses reader1 and has to increment the refcount of reader1 The user gets a new reader instance from multiReader.reopen(), but can't tell which of the subreaders has been reopened and which are shared. That's why multiReader1.close() should not close reader1 in this case and we need refcounting in order to make this work. Both of these cases are easy to fix with reference counting: we just have to change ensureOpen() to assert that RC > 0 instead of closed==false. Ie, a reader may still be used as long as its RC is still non-zero.
          Hide
          Michael Busch added a comment -

          I'm assuming in your example you meant for reader2 and reader3 to also
          be SegmentReaders?

          Yes that's what I meant. Sorry, I didn't make that clear.

          Also in your example let's insert missing "reader1.close()" as the
          very first close? (Else it will never be closed because it's RC never
          hits 0).

          Doesn't what you describe change the semantics of MultiReader.close()?

          If you do:

          IndexReader reader1 = IndexReader.open(index1);  
          IndexReader multiReader1 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index2)});
          multiReader1.close();
          

          then today multiReader1.close() also closes reader1. That's why I consciously omitted reader1.close().

          Consequently, if you do

          IndexReader reader1 = IndexReader.open(index1);  
          IndexReader multiReader1 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index2)});
          IndexReader multiReader2 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index3)});
          multiReader1.close();
          

          then multiReader2 is not usable anymore, because multiReader1.close() closes reader1. But that can be explicitly avoided by the user because it's known that multiReader1 and multiReader2 share the same reader.

          Now, with the reopen() code:

          IndexReader reader1 = IndexReader.open(index1);  // optimized index, reader1 is a SegmentReader
          IndexReader multiReader1 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index2)});
          ... // modify index2
          IndexReader multiReader2 = multiReader1.reopen();  
          // only index2 changed, so multiReader2 uses reader1 and has to increment the refcount of reader1
          

          The user gets a new reader instance from multiReader.reopen(), but can't tell which of the subreaders has been reopened and which are shared. That's why multiReader1.close() should not close reader1 in this case and we need refcounting in order to make this work.

          So do you suggest that a MultiReader should increment the refcounts when it is opened as well (in the constructor)? I believe we can implement it like this, but as I said it changes the semantics of MultiReader.close() (and ParallelReader.close() is, I believe, the same). A user would then have to close subreaders manually.

          Show
          Michael Busch added a comment - I'm assuming in your example you meant for reader2 and reader3 to also be SegmentReaders? Yes that's what I meant. Sorry, I didn't make that clear. Also in your example let's insert missing "reader1.close()" as the very first close? (Else it will never be closed because it's RC never hits 0). Doesn't what you describe change the semantics of MultiReader.close()? If you do: IndexReader reader1 = IndexReader.open(index1); IndexReader multiReader1 = new MultiReader( new IndexReader[] {reader1, IndexReader.open(index2)}); multiReader1.close(); then today multiReader1.close() also closes reader1. That's why I consciously omitted reader1.close(). Consequently, if you do IndexReader reader1 = IndexReader.open(index1); IndexReader multiReader1 = new MultiReader( new IndexReader[] {reader1, IndexReader.open(index2)}); IndexReader multiReader2 = new MultiReader( new IndexReader[] {reader1, IndexReader.open(index3)}); multiReader1.close(); then multiReader2 is not usable anymore, because multiReader1.close() closes reader1. But that can be explicitly avoided by the user because it's known that multiReader1 and multiReader2 share the same reader. Now, with the reopen() code: IndexReader reader1 = IndexReader.open(index1); // optimized index, reader1 is a SegmentReader IndexReader multiReader1 = new MultiReader( new IndexReader[] {reader1, IndexReader.open(index2)}); ... // modify index2 IndexReader multiReader2 = multiReader1.reopen(); // only index2 changed, so multiReader2 uses reader1 and has to increment the refcount of reader1 The user gets a new reader instance from multiReader.reopen(), but can't tell which of the subreaders has been reopened and which are shared. That's why multiReader1.close() should not close reader1 in this case and we need refcounting in order to make this work. So do you suggest that a MultiReader should increment the refcounts when it is opened as well (in the constructor)? I believe we can implement it like this, but as I said it changes the semantics of MultiReader.close() (and ParallelReader.close() is, I believe, the same). A user would then have to close subreaders manually.
          Hide
          Michael McCandless added a comment -

          It's not nearly this complex (we don't need two ref counts). If we
          follow the simple rule that "every time reader X wants to use reader
          Y, it increfs it" and "whenver reader X is done using reader Y, it
          decrefs it", all should work correctly.

          Also we should think of "close()" as the way that the external user
          does the decref of their reader. We just special-case this call, by
          setting isOpen=false, to make sure we don't double decref on a double
          close call.

          Let's walk through your example...

          I'm assuming in your example you meant for reader2 and reader3 to also
          be SegmentReaders? Ie, the changes that are happening to the
          single-segment index1 are just changes to norms and/or deletes. If
          not, the example is less interesting because reader1 will be closed
          sooner

          Also in your example let's insert missing "reader1.close()" as the
          very first close? (Else it will never be closed because it's RC never
          hits 0).

          When reader1 is created it has RC 1.

          When multiReader1 is created, reader1 now has RC 2.

          When multiReader2 is created, reader1 now has RC 3.

          When reader2 is created (by reader1.reopen()), it incref's reader1
          because it's sharing the sub-readers in reader1. So reader1 now has
          RC 4.

          When reader3 was created (by reader2.reopen()), it incref's reader2
          because it's sharing the sub-readers reader2 contains. So reader1 is
          still at RC 4 and reader2 is now at RC 2.

          Now, we close.

          After reader1.close() is called, reader1 sets isOpen=false (to prevent
          double close by the user) and RC drops to 3.

          With multiReader1.close(), multiReader1 is not at RC 0, and so it
          decrefs all readers it was using, and so reader1 RC is now 2.

          With multiReader2.close(), likewise it is now at RC 0 and so it
          decrefs all readers it was using, and so reader1 RC is now 1.

          With reader2.close(), it decrefs its own RC, however that brings its
          RC to 1 (reader3 is still referring to it) and so it does not decref
          the reader1 that it's referring to.

          Finally, with reader3.close(), it is now at RC 0 and so it decrefs the
          reader2 it refers to. This brings reader2's RC to 0, and so reader2
          decrefs the reader1 that it's referring to. Which brings reader1's RC
          to 0, and so reader1 finally closes all its internal sub-readers.

          Show
          Michael McCandless added a comment - It's not nearly this complex (we don't need two ref counts). If we follow the simple rule that "every time reader X wants to use reader Y, it increfs it" and "whenver reader X is done using reader Y, it decrefs it", all should work correctly. Also we should think of "close()" as the way that the external user does the decref of their reader. We just special-case this call, by setting isOpen=false, to make sure we don't double decref on a double close call. Let's walk through your example... I'm assuming in your example you meant for reader2 and reader3 to also be SegmentReaders? Ie, the changes that are happening to the single-segment index1 are just changes to norms and/or deletes. If not, the example is less interesting because reader1 will be closed sooner Also in your example let's insert missing "reader1.close()" as the very first close? (Else it will never be closed because it's RC never hits 0). When reader1 is created it has RC 1. When multiReader1 is created, reader1 now has RC 2. When multiReader2 is created, reader1 now has RC 3. When reader2 is created (by reader1.reopen()), it incref's reader1 because it's sharing the sub-readers in reader1. So reader1 now has RC 4. When reader3 was created (by reader2.reopen()), it incref's reader2 because it's sharing the sub-readers reader2 contains. So reader1 is still at RC 4 and reader2 is now at RC 2. Now, we close. After reader1.close() is called, reader1 sets isOpen=false (to prevent double close by the user) and RC drops to 3. With multiReader1.close(), multiReader1 is not at RC 0, and so it decrefs all readers it was using, and so reader1 RC is now 2. With multiReader2.close(), likewise it is now at RC 0 and so it decrefs all readers it was using, and so reader1 RC is now 1. With reader2.close(), it decrefs its own RC, however that brings its RC to 1 (reader3 is still referring to it) and so it does not decref the reader1 that it's referring to. Finally, with reader3.close(), it is now at RC 0 and so it decrefs the reader2 it refers to. This brings reader2's RC to 0, and so reader2 decrefs the reader1 that it's referring to. Which brings reader1's RC to 0, and so reader1 finally closes all its internal sub-readers.
          Hide
          Michael Busch added a comment - - edited

          Hi Mike,

          I'm not sure if I fully understand your comment. Consider the following (quite constructed) example:

          IndexReader reader1 = IndexReader.open(index1);  // optimized index, reader1 is a SegmentReader
          IndexReader multiReader1 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index2)});
          ... // modify index2
          IndexReader multiReader2 = multiReader1.reopen();  
          // only index2 changed, so multiReader2 uses reader1 and has to increment the refcount of reader1
          ... // modify index1
          IndexReader reader2 = reader1.reopen();
          // reader2 is a new instance of SegmentReader that shares resources with reader1
          ... // modify index1
          IndexReader reader3 = reader2.reopen();
          // reader3 is a new instance of SegmentReader that shares resources with reader1 and reader2
          

          Now the user closes the readers in this order:

          1. multiReader1.close();
          2. multiReader2.close();
          3. reader2.close();
          4. reader3.close();

          reader1 should be marked as closed after 2., right? Because multiReader1.close() and multiReader2.close() have to decrement reader1's refcount. But the underlying files have to remain open until after 4., because reader2 and reader3 use reader1's resources.

          So don't we need 2 refcount values in reader1? One that tells us when the reader itself can be marked as closed, and one that tells when the resources can be closed? Then multiReader1 and multiReader2 would decrement the first refCount, whereas reader2 and reader3 both have to "know" reader1, so that they can decrement the second refcount.

          I hope I'm just completely confused now and someone tells me that the whole thing is much simpler

          Show
          Michael Busch added a comment - - edited Hi Mike, I'm not sure if I fully understand your comment. Consider the following (quite constructed) example: IndexReader reader1 = IndexReader.open(index1); // optimized index, reader1 is a SegmentReader IndexReader multiReader1 = new MultiReader( new IndexReader[] {reader1, IndexReader.open(index2)}); ... // modify index2 IndexReader multiReader2 = multiReader1.reopen(); // only index2 changed, so multiReader2 uses reader1 and has to increment the refcount of reader1 ... // modify index1 IndexReader reader2 = reader1.reopen(); // reader2 is a new instance of SegmentReader that shares resources with reader1 ... // modify index1 IndexReader reader3 = reader2.reopen(); // reader3 is a new instance of SegmentReader that shares resources with reader1 and reader2 Now the user closes the readers in this order: multiReader1.close(); multiReader2.close(); reader2.close(); reader3.close(); reader1 should be marked as closed after 2., right? Because multiReader1.close() and multiReader2.close() have to decrement reader1's refcount. But the underlying files have to remain open until after 4., because reader2 and reader3 use reader1's resources. So don't we need 2 refcount values in reader1? One that tells us when the reader itself can be marked as closed, and one that tells when the resources can be closed? Then multiReader1 and multiReader2 would decrement the first refCount, whereas reader2 and reader3 both have to "know" reader1, so that they can decrement the second refcount. I hope I'm just completely confused now and someone tells me that the whole thing is much simpler
          Hide
          Michael McCandless added a comment -

          > But if a reader is shared, how do you tell two real closes from an erronous double-close?
          > Perhaps have a top level close() that is only invoked once via isClosed, and a projected doClose() that a multi-reader would use that actually does the decref?

          Yes I think that's the right approach.

          > It is quite different when it comes to SegmentReaders. SegmentReader.reopen() checks SegmentInfos if there is a segment with the same name but newer normGen or delGen. If there is one then a new SegmentReader instance is created that reuses resources like e. g. TermInfosReader and loads the new generation of the del or norm file.
          >
          > Now you can end up having a bunch of SegmentReaders that share the same resources but don't know about each other. The reference counting would have to happen somewhere else, e. g. in the TermInfosReader. Of course this is doable, but it's a special case and more complicated compared to the "restrict reopen() to only be called once"-approach.

          For SegmentReader, I think on reopen(), everything would be shared
          except norms and/or deletedDocs right? In which case couldn't all
          cascaded reopens hold onto the original SegmentReader & call doClose()
          on it when they are closed? (Ie it is the "owner" of all the shared
          parts of a SegmentReader). Then, deletedDocs needs no protection (it
          has no close()), and for Norms we could push the reference counting
          down into it as well?

          We wouldn't need to push reference counting into all the readers that
          a SegmentReader holds because those are always shared when a
          SegmentReader is reopened?

          Show
          Michael McCandless added a comment - > But if a reader is shared, how do you tell two real closes from an erronous double-close? > Perhaps have a top level close() that is only invoked once via isClosed, and a projected doClose() that a multi-reader would use that actually does the decref? Yes I think that's the right approach. > It is quite different when it comes to SegmentReaders. SegmentReader.reopen() checks SegmentInfos if there is a segment with the same name but newer normGen or delGen. If there is one then a new SegmentReader instance is created that reuses resources like e. g. TermInfosReader and loads the new generation of the del or norm file. > > Now you can end up having a bunch of SegmentReaders that share the same resources but don't know about each other. The reference counting would have to happen somewhere else, e. g. in the TermInfosReader. Of course this is doable, but it's a special case and more complicated compared to the "restrict reopen() to only be called once"-approach. For SegmentReader, I think on reopen(), everything would be shared except norms and/or deletedDocs right? In which case couldn't all cascaded reopens hold onto the original SegmentReader & call doClose() on it when they are closed? (Ie it is the "owner" of all the shared parts of a SegmentReader). Then, deletedDocs needs no protection (it has no close()), and for Norms we could push the reference counting down into it as well? We wouldn't need to push reference counting into all the readers that a SegmentReader holds because those are always shared when a SegmentReader is reopened?
          Hide
          Michael Busch added a comment -

          The implementation seems simple. When a reader is opened, it starts with RC 1. When it is closed, it decrefs the RC and marks itself closed (to make sure double-close does not re-decref the RC). When a MultiReader needs to use the reader, it calls incref. And when that MultiReader is done with it, it calls decref. Whenever the RC hits 0 it's safe to free all resources.

          Wouldn't that work?

          You're right, for a MultiReader and ParallelReader this would work and wouldn't be hard to implement.

          It is quite different when it comes to SegmentReaders. SegmentReader.reopen() checks SegmentInfos if there is a segment with the same name but newer normGen or delGen. If there is one then a new SegmentReader instance is created that reuses resources like e. g. TermInfosReader and loads the new generation of the del or norm file.

          Now you can end up having a bunch of SegmentReaders that share the same resources but don't know about each other. The reference counting would have to happen somewhere else, e. g. in the TermInfosReader. Of course this is doable, but it's a special case and more complicated compared to the "restrict reopen() to only be called once"-approach.

          Show
          Michael Busch added a comment - The implementation seems simple. When a reader is opened, it starts with RC 1. When it is closed, it decrefs the RC and marks itself closed (to make sure double-close does not re-decref the RC). When a MultiReader needs to use the reader, it calls incref. And when that MultiReader is done with it, it calls decref. Whenever the RC hits 0 it's safe to free all resources. Wouldn't that work? You're right, for a MultiReader and ParallelReader this would work and wouldn't be hard to implement. It is quite different when it comes to SegmentReaders. SegmentReader.reopen() checks SegmentInfos if there is a segment with the same name but newer normGen or delGen. If there is one then a new SegmentReader instance is created that reuses resources like e. g. TermInfosReader and loads the new generation of the del or norm file. Now you can end up having a bunch of SegmentReaders that share the same resources but don't know about each other. The reference counting would have to happen somewhere else, e. g. in the TermInfosReader. Of course this is doable, but it's a special case and more complicated compared to the "restrict reopen() to only be called once"-approach.
          Hide
          Yonik Seeley added a comment -

          > When it is closed, it decrefs the RC and marks itself closed (to make sure double-close does not re-decref the RC)

          But if a reader is shared, how do you tell two real closes from an erronous double-close?
          Perhaps have a top level close() that is only invoked once via isClosed, and a projected doClose() that a multi-reader would use that actually does the decref?

          Show
          Yonik Seeley added a comment - > When it is closed, it decrefs the RC and marks itself closed (to make sure double-close does not re-decref the RC) But if a reader is shared, how do you tell two real closes from an erronous double-close? Perhaps have a top level close() that is only invoked once via isClosed, and a projected doClose() that a multi-reader would use that actually does the decref?
          Hide
          Michael McCandless added a comment -

          I think reference counting would solve this issue quite nicely. How come we want to avoid reference counting? It seems like the right solution here.

          The implementation seems simple. When a reader is opened, it starts with RC 1. When it is closed, it decrefs the RC and marks itself closed (to make sure double-close does not re-decref the RC). When a MultiReader needs to use the reader, it calls incref. And when that MultiReader is done with it, it calls decref. Whenever the RC hits 0 it's safe to free all resources.

          Wouldn't that work?

          Show
          Michael McCandless added a comment - I think reference counting would solve this issue quite nicely. How come we want to avoid reference counting? It seems like the right solution here. The implementation seems simple. When a reader is opened, it starts with RC 1. When it is closed, it decrefs the RC and marks itself closed (to make sure double-close does not re-decref the RC). When a MultiReader needs to use the reader, it calls incref. And when that MultiReader is done with it, it calls decref. Whenever the RC hits 0 it's safe to free all resources. Wouldn't that work?
          Hide
          Michael Busch added a comment -

          Hmm one other thing: how should IndexReader.close() work? If we re-open a reader (a is the old reader, b is the new one), and then someone calls a.close(), then a should not close any resources that it shares with b.

          One way to make this work would be reference counting. Since we want to avoid that, we could simply restrict reopen() from being called twice for the same reader. Then there would never be more than 2 readers sharing the same ressources. The old reader a would remember that reopen() was called already and would not close the shared ressources on close(). However, the new reader b would close all ressources, meaning that reader a would not work anymore after b.close() was called.
          Thoughts?

          Show
          Michael Busch added a comment - Hmm one other thing: how should IndexReader.close() work? If we re-open a reader (a is the old reader, b is the new one), and then someone calls a.close(), then a should not close any resources that it shares with b. One way to make this work would be reference counting. Since we want to avoid that, we could simply restrict reopen() from being called twice for the same reader. Then there would never be more than 2 readers sharing the same ressources. The old reader a would remember that reopen() was called already and would not close the shared ressources on close(). However, the new reader b would close all ressources, meaning that reader a would not work anymore after b.close() was called. Thoughts?
          Hide
          Yonik Seeley added a comment -

          A reader which is being used for deletes or setting norms is always current (it owns the write lock), so there should never be the need to re-open such a reader.

          I was thinking about the "add some docs" then "delete some docs" scenario:
          One currently needs to close() the deleting reader to open an IndexWriter. If IndexReader.commit() was public, then one could simply flush changes, and then call reopen() after the IndexWriter was done adding new documents. But perhaps longer term, all deletions should be done via the IndexWriter anyway.

          if you re-open an existing reader which was not used for deletes before and use the new instance (b) to perform deletes

          Ah, thanks for that clarification. I guess that should remain undefined.

          Show
          Yonik Seeley added a comment - A reader which is being used for deletes or setting norms is always current (it owns the write lock), so there should never be the need to re-open such a reader. I was thinking about the "add some docs" then "delete some docs" scenario: One currently needs to close() the deleting reader to open an IndexWriter. If IndexReader.commit() was public, then one could simply flush changes, and then call reopen() after the IndexWriter was done adding new documents. But perhaps longer term, all deletions should be done via the IndexWriter anyway. if you re-open an existing reader which was not used for deletes before and use the new instance (b) to perform deletes Ah, thanks for that clarification. I guess that should remain undefined.
          Hide
          Michael Busch added a comment -

          How about just defining the behavior such that any pending changes are flushed. That would make it more useful because you could then reopen readers you used for deletes.

          Hmm, I'm not sure I understand. A reader which is being used for deletes or setting norms is always current (it owns the write lock), so there should never be the need to re-open such a reader.

          However, if you re-open an existing reader which was not used for deletes before and use the new instance (b) to perform deletes, it will result in a undefined behavior for the old reader (a):

          IndexReader a = .....
          ....
          IndexReader b = a.reopen();
          b.deleteDocument(...);
          
          Show
          Michael Busch added a comment - How about just defining the behavior such that any pending changes are flushed. That would make it more useful because you could then reopen readers you used for deletes. Hmm, I'm not sure I understand. A reader which is being used for deletes or setting norms is always current (it owns the write lock), so there should never be the need to re-open such a reader. However, if you re-open an existing reader which was not used for deletes before and use the new instance (b) to perform deletes, it will result in a undefined behavior for the old reader (a): IndexReader a = ..... .... IndexReader b = a.reopen(); b.deleteDocument(...);
          Hide
          Yonik Seeley added a comment -

          As for reopen() I'd like to go with Hoss' suggestion for now and add warning comments to reopen() saying that using an re-opened IndexReader with closeOldReader==false for write operations will result in an undefined behavior.

          How about just defining the behavior such that any pending changes are flushed. That would make it more useful because you could then reopen readers you used for deletes. An alternative would be a method to explicitly flush changes on a reader, giving one the ability to then reopen it, but I like the former better since it avoids adding another API call.

          Show
          Yonik Seeley added a comment - As for reopen() I'd like to go with Hoss' suggestion for now and add warning comments to reopen() saying that using an re-opened IndexReader with closeOldReader==false for write operations will result in an undefined behavior. How about just defining the behavior such that any pending changes are flushed. That would make it more useful because you could then reopen readers you used for deletes. An alternative would be a method to explicitly flush changes on a reader, giving one the ability to then reopen it, but I like the former better since it avoids adding another API call.
          Hide
          Michael Busch added a comment -

          I just opened LUCENE-1030 and would like to move discussions related to "read-only" IndexReaders to that issue.

          As for reopen() I'd like to go with Hoss' suggestion for now and add warning comments to reopen() saying that using an re-opened IndexReader with closeOldReader==false for write operations will result in an undefined behavior. Any objections?

          Show
          Michael Busch added a comment - I just opened LUCENE-1030 and would like to move discussions related to "read-only" IndexReaders to that issue. As for reopen() I'd like to go with Hoss' suggestion for now and add warning comments to reopen() saying that using an re-opened IndexReader with closeOldReader==false for write operations will result in an undefined behavior. Any objections?
          Hide
          Michael Busch added a comment -

          So I think deleteDocument(int id) should either be moved to a subclass. same with setNorms?

          Or we could take the approach you suggested in http://www.gossamer-threads.com/lists/lucene/java-dev/52017.

          That would mean to add a callback after flush to get a current IndexReader to get the docids and to use the IndexWriter then to perform deleteDocument(docId) or setNorm(). These methods could also take an IndexReader as argument, e. g. deleteDocument(IndexReader reader, int docId), which would throw an IOException if the passed in reader is stale (i. e. docids have changed since the reader was opened). Just as IndexReader does it today. Does this make sense or am I missing something?

          Show
          Michael Busch added a comment - So I think deleteDocument(int id) should either be moved to a subclass. same with setNorms? Or we could take the approach you suggested in http://www.gossamer-threads.com/lists/lucene/java-dev/52017 . That would mean to add a callback after flush to get a current IndexReader to get the docids and to use the IndexWriter then to perform deleteDocument(docId) or setNorm(). These methods could also take an IndexReader as argument, e. g. deleteDocument(IndexReader reader, int docId), which would throw an IOException if the passed in reader is stale (i. e. docids have changed since the reader was opened). Just as IndexReader does it today. Does this make sense or am I missing something?
          Hide
          Yonik Seeley added a comment -

          Having a read-only IndexReader would (should?) mean being able to remove "synchronized" from some things like isDeleted()... a nice performance win for multi-processor systems for things that didn't have access to the deleted-docs bitvec.

          > If our goal is to make them read-only (we can delete via IndexWriter already)

          But you can only delete-by-term.
          It's more powerful to be able to delete by docid, however I manage to come up with it.
          So I think deleteDocument(int id) should either be moved to a subclass. same with setNorms?

          Show
          Yonik Seeley added a comment - Having a read-only IndexReader would (should?) mean being able to remove "synchronized" from some things like isDeleted()... a nice performance win for multi-processor systems for things that didn't have access to the deleted-docs bitvec. > If our goal is to make them read-only (we can delete via IndexWriter already) But you can only delete-by-term. It's more powerful to be able to delete by docid, however I manage to come up with it. So I think deleteDocument(int id) should either be moved to a subclass. same with setNorms?
          Hide
          Doug Cutting added a comment -

          Got it. IndexWriter only works with SegmentReaders anyway, not with an arbitrary IndexReader implementation: IndexReader is extensible, but IndexWriter is not. I'd (momentarily) forgotten that. Nevermind.

          Show
          Doug Cutting added a comment - Got it. IndexWriter only works with SegmentReaders anyway, not with an arbitrary IndexReader implementation: IndexReader is extensible, but IndexWriter is not. I'd (momentarily) forgotten that. Nevermind.
          Hide
          Michael Busch added a comment -

          I think that's a fine direction. Note however that IndexWriter implements delete by calling IndexReader.delete(). That method could be made package-private, so that users cannot call it, but then this makes it impossible for someone to subclass IndexReader from a different package. So perhaps delete() needs to move to a subclass of IndexReader? That gets messy...

          Actually all of the lock/commit logic moved from IndexReader to DirectoryIndexReader already, and the delete logic is in SegmentReader, which subclasses DirectoryIndexReader. So we could remove the deleteDocument() API from IndexReader but leave it in DirectoryIndexReader. Then it would still be possible to extend IndexReader from a different package just as today, and IndexWriter could use DirectoryIndexReader for performing deletes. These changes should be trivial.

          Show
          Michael Busch added a comment - I think that's a fine direction. Note however that IndexWriter implements delete by calling IndexReader.delete(). That method could be made package-private, so that users cannot call it, but then this makes it impossible for someone to subclass IndexReader from a different package. So perhaps delete() needs to move to a subclass of IndexReader? That gets messy... Actually all of the lock/commit logic moved from IndexReader to DirectoryIndexReader already, and the delete logic is in SegmentReader, which subclasses DirectoryIndexReader. So we could remove the deleteDocument() API from IndexReader but leave it in DirectoryIndexReader. Then it would still be possible to extend IndexReader from a different package just as today, and IndexWriter could use DirectoryIndexReader for performing deletes. These changes should be trivial.
          Hide
          Doug Cutting added a comment -

          Hmm, so what are our long-term plans for indexreaders? If our goal is to
          make them read-only (we can delete via IndexWriter already), then I think
          adding those warning comments to reopen(), as you suggest Hoss, would be
          sufficient.

          I think that's a fine direction. Note however that IndexWriter implements delete by calling IndexReader.delete(). That method could be made package-private, so that users cannot call it, but then this makes it impossible for someone to subclass IndexReader from a different package. So perhaps delete() needs to move to a subclass of IndexReader? That gets messy...

          Show
          Doug Cutting added a comment - Hmm, so what are our long-term plans for indexreaders? If our goal is to make them read-only (we can delete via IndexWriter already), then I think adding those warning comments to reopen(), as you suggest Hoss, would be sufficient. I think that's a fine direction. Note however that IndexWriter implements delete by calling IndexReader.delete(). That method could be made package-private, so that users cannot call it, but then this makes it impossible for someone to subclass IndexReader from a different package. So perhaps delete() needs to move to a subclass of IndexReader? That gets messy...
          Hide
          Michael Busch added a comment -

          > in deciding "to clone or not clone" for the purposes of implementing
          > reopen, it may make sense to step back and ask a two broader questions...

          I agree!

          > 1) what was the motivation for approaching reopen using clones in the
          > first place

          Good summarization! You are right, I started the clone approach because
          IndexReaders are not "read only" objects.

          > "if reopen(closeOld=false) returns a new reader it will share state with
          > the current reader, attempting to do the following operations on this
          > new reader will result in undefined behavior"

          This would mean, that we simply warn the user that performing write
          operations with the re-opened indexreader will result in undefined behavior,
          whereas with Mike's approach we would prevent such an undefined behavior by
          using reference counting.

          Hmm, so what are our long-term plans for indexreaders? If our goal is to
          make them read-only (we can delete via IndexWriter already), then I think
          adding those warning comments to reopen(), as you suggest Hoss, would be
          sufficient.

          If everybody likes this approach I'll go ahead and submit a new patch.

          Show
          Michael Busch added a comment - > in deciding "to clone or not clone" for the purposes of implementing > reopen, it may make sense to step back and ask a two broader questions... I agree! > 1) what was the motivation for approaching reopen using clones in the > first place Good summarization! You are right, I started the clone approach because IndexReaders are not "read only" objects. > "if reopen(closeOld=false) returns a new reader it will share state with > the current reader, attempting to do the following operations on this > new reader will result in undefined behavior" This would mean, that we simply warn the user that performing write operations with the re-opened indexreader will result in undefined behavior, whereas with Mike's approach we would prevent such an undefined behavior by using reference counting. Hmm, so what are our long-term plans for indexreaders? If our goal is to make them read-only (we can delete via IndexWriter already), then I think adding those warning comments to reopen(), as you suggest Hoss, would be sufficient. If everybody likes this approach I'll go ahead and submit a new patch.
          Hide
          Hoss Man added a comment -

          in deciding "to clone or not clone" for the purposes of implementing reopen, it may make sense to step back and ask a two broader questions...

          1) what was the motivation for approaching reopen using clones in the first place
          2) what does it inherently mean to clone an indexreader.

          the answer to the first question, as i understand it, relates to the issue of indexreaders not being "read only" object ... operations can be taken which modify the readers state, and those operations can be flushed to disk when the reader is closed. so readerB = readerA.reopen(closeOld=false) and then readerA.delete(...) is called, there is ambiguity as to what should happen in readerB. so the clone route seems pretty straight forward if and only if we have an unambiguous concept of cloning a reader (because then the clone approach to reopen becomes unambiguous as well). alternately, since reopen is a new method, we can resolve the ambiguity anyway we want, as long as it's documented ... in theory we should pick a solution that seems to serve the most benefit ... perhaps that solution is to document reopen with "if reopen(closeOld=false) returns a new reader it will share state with the current reader, attempting to do the following operations on this new reader will result in undefined behavior"

          the answer the the second is only important if we want to go the cloning route ... but it is pretty important in a larger sense then just reopening ... f we start to say that any of the IndexReader classes are Clonable we have to be very clear about what that means in all cases where someone might clone it in addition to reopen ... in particular, i worry about what it means to clone a reader which has already had "write" operations performed on it (something the clone based version of reopen will never do because a write operation indicates the reader must be current), but some client code might as soon as we add the Clonable interface to a class.

          Show
          Hoss Man added a comment - in deciding "to clone or not clone" for the purposes of implementing reopen, it may make sense to step back and ask a two broader questions... 1) what was the motivation for approaching reopen using clones in the first place 2) what does it inherently mean to clone an indexreader. the answer to the first question, as i understand it, relates to the issue of indexreaders not being "read only" object ... operations can be taken which modify the readers state, and those operations can be flushed to disk when the reader is closed. so readerB = readerA.reopen(closeOld=false) and then readerA.delete(...) is called, there is ambiguity as to what should happen in readerB. so the clone route seems pretty straight forward if and only if we have an unambiguous concept of cloning a reader (because then the clone approach to reopen becomes unambiguous as well). alternately, since reopen is a new method, we can resolve the ambiguity anyway we want, as long as it's documented ... in theory we should pick a solution that seems to serve the most benefit ... perhaps that solution is to document reopen with "if reopen(closeOld=false) returns a new reader it will share state with the current reader, attempting to do the following operations on this new reader will result in undefined behavior" the answer the the second is only important if we want to go the cloning route ... but it is pretty important in a larger sense then just reopening ... f we start to say that any of the IndexReader classes are Clonable we have to be very clear about what that means in all cases where someone might clone it in addition to reopen ... in particular, i worry about what it means to clone a reader which has already had "write" operations performed on it (something the clone based version of reopen will never do because a write operation indicates the reader must be current), but some client code might as soon as we add the Clonable interface to a class.
          Hide
          Michael McCandless added a comment -

          > We have to be very careful about cross-referencing multiple readers.
          > If for some reason any references between two or more readers are
          > not cleared after one was closed, then that reader might not become
          > GC'd. I'm not saying it's not possible or even very hard, we just
          > have to make sure those things can't ever happen.

          One correction: there should be no cross-referencing, right? The only
          thing that's happening is incrementing & decrementing the "referrer
          count" for a reader? (But, you're right: the "copy on write" approach
          to cloning would have cross-referencing).

          I think the downside of this proposed "shared readers" (non-cloning)
          approach is that you can't delete/setNorm on the clone until you close
          the original. But I think that's fairly minor? Also if you really
          need a full deep copy of your reader you could just open a new one
          (ie, not use "reopen")?

          I think the big upside of "shared readers" is reopening becomes quite
          a bit more resource efficient: the cost of re-opening a reader would
          be in proportion to what has actually changed in the index. EG, if
          your index has added a few tiny segments since you last opened then
          the reopen would be extremely fas but with cloning you are forced
          to make a full deep copy of all other [unchanged] segments.

          Show
          Michael McCandless added a comment - > We have to be very careful about cross-referencing multiple readers. > If for some reason any references between two or more readers are > not cleared after one was closed, then that reader might not become > GC'd. I'm not saying it's not possible or even very hard, we just > have to make sure those things can't ever happen. One correction: there should be no cross-referencing, right? The only thing that's happening is incrementing & decrementing the "referrer count" for a reader? (But, you're right: the "copy on write" approach to cloning would have cross-referencing). I think the downside of this proposed "shared readers" (non-cloning) approach is that you can't delete/setNorm on the clone until you close the original. But I think that's fairly minor? Also if you really need a full deep copy of your reader you could just open a new one (ie, not use "reopen")? I think the big upside of "shared readers" is reopening becomes quite a bit more resource efficient: the cost of re-opening a reader would be in proportion to what has actually changed in the index. EG, if your index has added a few tiny segments since you last opened then the reopen would be extremely fas but with cloning you are forced to make a full deep copy of all other [unchanged] segments.
          Hide
          Michael Busch added a comment -

          > This way if you have a reader X and you did reopen to get Y and did
          > reopen again to get Z then the shared sub-readers between X, Y and Z
          > would not allow any write operations until 2 of the three had been
          > closed. I think that would work?

          Yes I think it would work. However, this approach has two downside IMO:

          • reopen() becomes more complicated and restricted for the user. With
            the cloning approach the user doesn't have to care about when index
            modifications are not allowed. IndexReader instances returned by open()
            or reopen() can be used exactly the same without any restrictions.
          • We have to be very careful about cross-referencing multiple readers.
            If for some reason any references between two or more readers are not
            cleared after one was closed, then that reader might not become GC'd.
            I'm not saying it's not possible or even very hard, we just have to
            make sure those things can't ever happen.

          Of course the cloning approach has disadvantages too. For custom
          readers clone() has to be implemented in order to make reopen() work
          correctly. Also reopen() is more expensive in case of
          closeOldReader=false. Well we could certainly consider the lazy clone
          approach that you suggested, Mike, but we have to be careful about the
          cross-referencing issue again.

          So I'm really not sure which way the better one is. I think I'm slightly
          in favor for the cloning approach, so that reopen() returns instances
          that are completely independant from each other, which seems cleaner IMO.
          What do others think?

          Show
          Michael Busch added a comment - > This way if you have a reader X and you did reopen to get Y and did > reopen again to get Z then the shared sub-readers between X, Y and Z > would not allow any write operations until 2 of the three had been > closed. I think that would work? Yes I think it would work. However, this approach has two downside IMO: reopen() becomes more complicated and restricted for the user. With the cloning approach the user doesn't have to care about when index modifications are not allowed. IndexReader instances returned by open() or reopen() can be used exactly the same without any restrictions. We have to be very careful about cross-referencing multiple readers. If for some reason any references between two or more readers are not cleared after one was closed, then that reader might not become GC'd. I'm not saying it's not possible or even very hard, we just have to make sure those things can't ever happen. Of course the cloning approach has disadvantages too. For custom readers clone() has to be implemented in order to make reopen() work correctly. Also reopen() is more expensive in case of closeOldReader=false. Well we could certainly consider the lazy clone approach that you suggested, Mike, but we have to be careful about the cross-referencing issue again. So I'm really not sure which way the better one is. I think I'm slightly in favor for the cloning approach, so that reopen() returns instances that are completely independant from each other, which seems cleaner IMO. What do others think?
          Hide
          Michael McCandless added a comment -

          > > Actually if we went back to the sharing (not cloning) approach,
          > > could we insert a check for any writing operation against the
          > > re-opened reader that throws an exception if the original reader
          > > is not yet closed?
          >
          > Interesting, yes that should work in case we have two readers (the
          > original one and the re-opened one). But what happens if the user
          > calls reopen twice to get two re-opened instances back? Then there
          > would be three instances, and without cloning the two re-opened ones
          > would also share the same resources. Is this a desirable use case or
          > would it be okay to restrict reopen() so that it can only create one
          > new instance?

          Hmmm good point.

          Actually, we could allow more then one re-open call if we take the
          following approach: every time a cloned Reader "borrows" a reference
          to a sub-reader, it increments a counter (call it the "referrer
          count"). When the Reader is closed, it decrements the count (by 1)
          for each of the sub-readers.

          Then, any reader should refuse to do a writing operation if its
          "referrer" count is greater than 1, because it's being shared across
          more than one referrer.

          This way if you have a reader X and you did reopen to get Y and did
          reopen again to get Z then the shared sub-readers between X, Y and Z
          would not allow any write operations until 2 of the three had been
          closed. I think that would work?

          BTW this would also allow for very efficient "snapshots" during
          searching: keeping multiple readers "alive", each searching a
          different point-in-time commit of the index, because they would all
          share the underlying segment readers that they have in common. Vs
          cloning which would have to make many copies of each segment reader.

          Show
          Michael McCandless added a comment - > > Actually if we went back to the sharing (not cloning) approach, > > could we insert a check for any writing operation against the > > re-opened reader that throws an exception if the original reader > > is not yet closed? > > Interesting, yes that should work in case we have two readers (the > original one and the re-opened one). But what happens if the user > calls reopen twice to get two re-opened instances back? Then there > would be three instances, and without cloning the two re-opened ones > would also share the same resources. Is this a desirable use case or > would it be okay to restrict reopen() so that it can only create one > new instance? Hmmm good point. Actually, we could allow more then one re-open call if we take the following approach: every time a cloned Reader "borrows" a reference to a sub-reader, it increments a counter (call it the "referrer count"). When the Reader is closed, it decrements the count (by 1) for each of the sub-readers. Then, any reader should refuse to do a writing operation if its "referrer" count is greater than 1, because it's being shared across more than one referrer. This way if you have a reader X and you did reopen to get Y and did reopen again to get Z then the shared sub-readers between X, Y and Z would not allow any write operations until 2 of the three had been closed. I think that would work? BTW this would also allow for very efficient "snapshots" during searching: keeping multiple readers "alive", each searching a different point-in-time commit of the index, because they would all share the underlying segment readers that they have in common. Vs cloning which would have to make many copies of each segment reader.
          Hide
          Michael Busch added a comment -

          Thanks all for the reviews and comments!

          There seem to be some issues/open questions concerning the cloning.
          Before I comment on them I think it would make sense to decide whether
          we want to stick with the cloning approach or not. Mike suggests this
          approach:

          > Actually if we went back to the sharing (not cloning) approach, could
          > we insert a check for any writing operation against the re-opened
          > reader that throws an exception if the original reader is not yet
          > closed?

          Interesting, yes that should work in case we have two readers (the
          original one and the re-opened one). But what happens if the user calls
          reopen twice to get two re-opened instances back? Then there would be
          three instances, and without cloning the two re-opened ones would also
          share the same resources. Is this a desirable use case or would it be
          okay to restrict reopen() so that it can only create one new instance?

          Show
          Michael Busch added a comment - Thanks all for the reviews and comments! There seem to be some issues/open questions concerning the cloning. Before I comment on them I think it would make sense to decide whether we want to stick with the cloning approach or not. Mike suggests this approach: > Actually if we went back to the sharing (not cloning) approach, could > we insert a check for any writing operation against the re-opened > reader that throws an exception if the original reader is not yet > closed? Interesting, yes that should work in case we have two readers (the original one and the re-opened one). But what happens if the user calls reopen twice to get two re-opened instances back? Then there would be three instances, and without cloning the two re-opened ones would also share the same resources. Is this a desirable use case or would it be okay to restrict reopen() so that it can only create one new instance?
          Hide
          Yonik Seeley added a comment -

          > I'm wondering about the case where once thread calls reopen while another thread is updating norms or deleting docs.

          Hmmm there is cause for concern (and I should have had my mt-safe hat on
          Reopen is synchronized on the reader, and so are norms access and docs, but from a quick look:

          • reopen() may be synchronized, but clone() called on sub-readers isn't in a synchronized context that the sub-reader cares about. For example, MultiReader.reopen has the lock on the multireader, but calles subreader.clone() which iterates over the norms in a non thread-safe way.
          • IndexInput objects that are in use should never be cloned... (like what is done in FieldsReader.clone())
          Show
          Yonik Seeley added a comment - > I'm wondering about the case where once thread calls reopen while another thread is updating norms or deleting docs. Hmmm there is cause for concern (and I should have had my mt-safe hat on Reopen is synchronized on the reader, and so are norms access and docs, but from a quick look: reopen() may be synchronized, but clone() called on sub-readers isn't in a synchronized context that the sub-reader cares about. For example, MultiReader.reopen has the lock on the multireader, but calles subreader.clone() which iterates over the norms in a non thread-safe way. IndexInput objects that are in use should never be cloned... (like what is done in FieldsReader.clone())
          Hide
          Hoss Man added a comment -

          a rough variation on Michael's latest patch that makes the changes along two of the lines i was suggesting before reagrding FilterIndexReader and ising "instanceof Cloneable" instead of "isCloneSupported()" two important things to note:

          1) i didn't change any of the documentation, i was trying to change as little aspossibel so the two patches could be compared side-by-side

          2) this patch is currently broken. TestIndexReaderReopen gets an exception i can't understand ... but i have to stop now, and i wanted to post what i had in case people had comments.

          ...now that i've done this exercise, i'm not convinced that it's a better way to go, but it does seem like isCloneSupported isn't neccessary, that's the whole point of the Cloneable interface.

          Show
          Hoss Man added a comment - a rough variation on Michael's latest patch that makes the changes along two of the lines i was suggesting before reagrding FilterIndexReader and ising "instanceof Cloneable" instead of "isCloneSupported()" two important things to note: 1) i didn't change any of the documentation, i was trying to change as little aspossibel so the two patches could be compared side-by-side 2) this patch is currently broken. TestIndexReaderReopen gets an exception i can't understand ... but i have to stop now, and i wanted to post what i had in case people had comments. ...now that i've done this exercise, i'm not convinced that it's a better way to go, but it does seem like isCloneSupported isn't neccessary, that's the whole point of the Cloneable interface.
          Hide
          Hoss Man added a comment -

          Okay, read the patch. I'm on board with the need for Clonable now ... it's all about isolation. if "r.reopen(false) == r" then the client is responsible for recognizing that it's got the same instance and can make the judgement call about reusing the instance or cloning depending on it's needs ... internally in things like MultiReader we have to assume we need a clone for isolation.

          Questions and comments...

          1. CompoundFileReader, FieldsReader, IndexReader, and BitVector all have clone methods where they silently catch and ignore CloneNotSupportedExceptions from super.clone()... if we don't expect the exception to ever happen, we should just let the exception propogate up the stack (there is no down side to declaring that clone() throws CloneNotSupportedException). If we think the exception might happen, but it's not a big deal if it does and we can ignore it, then we should put a comment in the catch block to that effect. i'm not clear which are the cases here.

          2. i don't remember if the old patches had this issue as well, but having "return new FilterIndexReader(reopened);" in FilterIndexReader doesn't really help anyone using FilterIndexReader – they're going to want an instance of their own subclass. to meet the contract of FilterIndexReader, we should implement all "abstract" methods and delegate to the inner reader - so in theory do don't need a new instance of FIlterIndexReader, we could just return in.reopen(closeold) ... my gut says it would be better to leave the method out entirely so that the default impl which throws UnSupOpEx is used — that way subclasses who want to use reopen must define their own (instead of getting confused when their filtering logic stops working after they reopen for the first time)

          3. instead of having an isClonable() method, why don't we just remove the "implements Clonable" declaration from IndexReader and put it on the concrete subclasses – then use "instanceof Cloneable" to determine if something is clonable? ... for that matter, the only place isCloneSupported is really used is in IndexReader.clone where an exception is thrown if Clone is not supported ... subclasses which know they are Clonable don't need this from the base class, subclasses which don't implement Clonable but are used in combination with core classes whose reopen method requires it could be skiped by the caller (since they don't implement Clonable) ..

          4. javadocs frequently refer to "reopened successfully" and "refreshed successfully" when what it really means is "reopen() returned a newer/fresher reader" ... this may confuse people who are use to thinking of "successfully" a "without error"

          5. random thought: are their any synchronization issues we're missing here? I'm wondering about the case where once thread calls reopen while another thread is updating norms or deleting docs. is there any chance for inconsistent state?

          Show
          Hoss Man added a comment - Okay, read the patch. I'm on board with the need for Clonable now ... it's all about isolation. if "r.reopen(false) == r" then the client is responsible for recognizing that it's got the same instance and can make the judgement call about reusing the instance or cloning depending on it's needs ... internally in things like MultiReader we have to assume we need a clone for isolation. Questions and comments... 1. CompoundFileReader, FieldsReader, IndexReader, and BitVector all have clone methods where they silently catch and ignore CloneNotSupportedExceptions from super.clone()... if we don't expect the exception to ever happen, we should just let the exception propogate up the stack (there is no down side to declaring that clone() throws CloneNotSupportedException). If we think the exception might happen, but it's not a big deal if it does and we can ignore it, then we should put a comment in the catch block to that effect. i'm not clear which are the cases here. 2. i don't remember if the old patches had this issue as well, but having "return new FilterIndexReader(reopened);" in FilterIndexReader doesn't really help anyone using FilterIndexReader – they're going to want an instance of their own subclass. to meet the contract of FilterIndexReader, we should implement all "abstract" methods and delegate to the inner reader - so in theory do don't need a new instance of FIlterIndexReader, we could just return in.reopen(closeold) ... my gut says it would be better to leave the method out entirely so that the default impl which throws UnSupOpEx is used — that way subclasses who want to use reopen must define their own (instead of getting confused when their filtering logic stops working after they reopen for the first time) 3. instead of having an isClonable() method, why don't we just remove the "implements Clonable" declaration from IndexReader and put it on the concrete subclasses – then use "instanceof Cloneable" to determine if something is clonable? ... for that matter, the only place isCloneSupported is really used is in IndexReader.clone where an exception is thrown if Clone is not supported ... subclasses which know they are Clonable don't need this from the base class, subclasses which don't implement Clonable but are used in combination with core classes whose reopen method requires it could be skiped by the caller (since they don't implement Clonable) .. 4. javadocs frequently refer to "reopened successfully" and "refreshed successfully" when what it really means is "reopen() returned a newer/fresher reader" ... this may confuse people who are use to thinking of "successfully" a "without error" 5. random thought: are their any synchronization issues we're missing here? I'm wondering about the case where once thread calls reopen while another thread is updating norms or deleting docs. is there any chance for inconsistent state?
          Hide
          Michael McCandless added a comment -

          A couple other questions/points:

          • Just to verify: if you have a DirectoryIndexReader that is holding
            the write lock (because it has made changes to deletes/norms) then
            calling reopen on this reader should always return 'this', right?
            Because it has the write lock, it must be (better be!) current.

          This might be a good place to add an assert: if you are not
          current, assert that you don't have the write lock, and if you
          have the write lock, assert that you are current.

          • I think you should add "ensureOpen()" at the top of
            MultiReader.reopen(...)?
          • If an Exception is hit during reopen, what is the resulting state
            of your original reader? I think, ideally, it is unaffected by
            the attempt to re-open? EG if you're on the hairy edge of file
            descriptor limits, and the attempt to re-open failed because you
            hit the limit, ideally your original reader is unaffected (even if
            you specified closeOldReader=true).
          Show
          Michael McCandless added a comment - A couple other questions/points: Just to verify: if you have a DirectoryIndexReader that is holding the write lock (because it has made changes to deletes/norms) then calling reopen on this reader should always return 'this', right? Because it has the write lock, it must be (better be!) current. This might be a good place to add an assert: if you are not current, assert that you don't have the write lock, and if you have the write lock, assert that you are current. I think you should add "ensureOpen()" at the top of MultiReader.reopen(...)? If an Exception is hit during reopen, what is the resulting state of your original reader? I think, ideally, it is unaffected by the attempt to re-open? EG if you're on the hairy edge of file descriptor limits, and the attempt to re-open failed because you hit the limit, ideally your original reader is unaffected (even if you specified closeOldReader=true).
          Hide
          Michael McCandless added a comment -

          Actually if we went back to the sharing (not cloning) approach, could
          we insert a check for any writing operation against the re-opened
          reader that throws an exception if the original reader is not yet
          closed?

          In Michael's example above, on calling mr2.deleteDoc, you would hit an
          exception because mr2 would check and see that it's "original" reader
          mr is not yet closed. But once you've closed mr, then the call
          succeeds.

          I think this would let us have our cake and eat it too: re-opening
          would be very low cost for unchanged readers (no clones created), and,
          you can still do deletes, etc, after you have closed your prior
          reader. You control when your prior reader gets closed, to allow for
          warming time and for queries to finish on the old reader.

          Would this work?

          Show
          Michael McCandless added a comment - Actually if we went back to the sharing (not cloning) approach, could we insert a check for any writing operation against the re-opened reader that throws an exception if the original reader is not yet closed? In Michael's example above, on calling mr2.deleteDoc, you would hit an exception because mr2 would check and see that it's "original" reader mr is not yet closed. But once you've closed mr, then the call succeeds. I think this would let us have our cake and eat it too: re-opening would be very low cost for unchanged readers (no clones created), and, you can still do deletes, etc, after you have closed your prior reader. You control when your prior reader gets closed, to allow for warming time and for queries to finish on the old reader. Would this work?
          Hide
          Michael McCandless added a comment -

          > > Too bad so much needs to be cloned in the case that
          > > closeOldReader==false... maybe someday in the future we can have
          > > read-only readers.
          >
          > Yeah, the cloning part was kind of tedious. Read-only readers would
          > indeed make our life much easier here. I'm wondering how many people
          > are using the IndexReader to alter the norms anyway?

          I think the closeOldReader=false case is actually quite important.

          Because in production, I think you'd have to use that, so that your
          old reader stays alive and is used to service incoming queries, up
          until the point where the re-opened reader is "fully warmed".

          Since fully warming could take a long time (minutes?) you need that
          old reader to stay open.

          Can we take a copy-on-write approach? EG, this is how OS's handle the
          virtual memory pages when forking a process. This would mean whenever
          a clone has been made of a SegmentReader, they cross-reference one
          another. Then whenever either needs to alter deleted docs, one of them
          makes a copy then. Likewise for the norms.

          This would mean that "read-only" uses of the cloned reader never
          pay the cost of copying the deleted docs bit vector nor norms.

          Show
          Michael McCandless added a comment - > > Too bad so much needs to be cloned in the case that > > closeOldReader==false... maybe someday in the future we can have > > read-only readers. > > Yeah, the cloning part was kind of tedious. Read-only readers would > indeed make our life much easier here. I'm wondering how many people > are using the IndexReader to alter the norms anyway? I think the closeOldReader=false case is actually quite important. Because in production, I think you'd have to use that, so that your old reader stays alive and is used to service incoming queries, up until the point where the re-opened reader is "fully warmed". Since fully warming could take a long time (minutes?) you need that old reader to stay open. Can we take a copy-on-write approach? EG, this is how OS's handle the virtual memory pages when forking a process. This would mean whenever a clone has been made of a SegmentReader, they cross-reference one another. Then whenever either needs to alter deleted docs, one of them makes a copy then. Likewise for the norms. This would mean that "read-only" uses of the cloned reader never pay the cost of copying the deleted docs bit vector nor norms.
          Hide
          Michael Busch added a comment -

          > why use clones at all? why not just use the original reader (if the "index" that reader represents hasn't changed, why clone it?

          Let's say you have a MultiReader with two subreaders:

          IndexReader ir1 = IndexReader.open(index1);
          IndexReader ir2 = IndexReader.open(index2);
          IndexReader mr = new MultiReader(new IndexReader[] {ir1, ir2});
          

          Now index1 changes and you reopen the MultiReader and keep the old one open:

          IndexReader mr2 = mr.reopen();
          

          ir1 would now be reopened and let's assume we wouldn't clone ir2. If you use mr2 now to e.g. delete a doc and that doc happens to be in index2, then mr1 would also see the changes because both MultiReaders share the same subreader ir2 and are thus not independent from each other.

          > why not make reopen always return this.clone()

          clone() might be an expensive operation. We only need to clone if at least one of the subreaders has changed.

          Show
          Michael Busch added a comment - > why use clones at all? why not just use the original reader (if the "index" that reader represents hasn't changed, why clone it? Let's say you have a MultiReader with two subreaders: IndexReader ir1 = IndexReader.open(index1); IndexReader ir2 = IndexReader.open(index2); IndexReader mr = new MultiReader( new IndexReader[] {ir1, ir2}); Now index1 changes and you reopen the MultiReader and keep the old one open: IndexReader mr2 = mr.reopen(); ir1 would now be reopened and let's assume we wouldn't clone ir2. If you use mr2 now to e.g. delete a doc and that doc happens to be in index2, then mr1 would also see the changes because both MultiReaders share the same subreader ir2 and are thus not independent from each other. > why not make reopen always return this.clone() clone() might be an expensive operation. We only need to clone if at least one of the subreaders has changed.
          Hide
          Hoss Man added a comment -

          I haven't looked at the patch yet (i really really plan to this weekend, baring catastrophe) but i'm confused as to why you went the cloning route (along with the complexity of the api changes to indicate when it is/isn't supported) ... based on the comments, it seems to boil down to...

          > As an example for how the clone() method is used let me describe how MultiReader.reopen()
          > works: it tries to reopen every of its subreaders. If at least one subreader could be reopened
          > successfully, then a new MultiReader instance is created and the reopened subreaders are
          > added to it. Every of the old MultiReader's subreaders, that was not reopened (because of no
          > index changes) is now cloned() and added to the new MultiReader.

          that seems like circular logic: the clone method is used so that the sub readers can be cloned

          why use clones at all? why not just use the original reader (if the "index" that reader represents hasn't changed, why clone it?

          And if (for reasons that aren't clear to me) it is important for MultiReader to use a clone of it's subreaders when their reopen method returns "this" then shouldn't clients do the same thing? ... why not make reopen always return this.clone() if the index hasn't changed (which now that i think about it, would also help by punting on the isCloneSupported issue – each class would already know if it was clonable.

          maybe this will make more sense once i read the patch ... i just wanted to through it out there in case someone had a chance to reply before i get a chance.

          Show
          Hoss Man added a comment - I haven't looked at the patch yet (i really really plan to this weekend, baring catastrophe) but i'm confused as to why you went the cloning route (along with the complexity of the api changes to indicate when it is/isn't supported) ... based on the comments, it seems to boil down to... > As an example for how the clone() method is used let me describe how MultiReader.reopen() > works: it tries to reopen every of its subreaders. If at least one subreader could be reopened > successfully, then a new MultiReader instance is created and the reopened subreaders are > added to it. Every of the old MultiReader's subreaders, that was not reopened (because of no > index changes) is now cloned() and added to the new MultiReader. that seems like circular logic: the clone method is used so that the sub readers can be cloned why use clones at all? why not just use the original reader (if the "index" that reader represents hasn't changed, why clone it? And if (for reasons that aren't clear to me) it is important for MultiReader to use a clone of it's subreaders when their reopen method returns "this" then shouldn't clients do the same thing? ... why not make reopen always return this.clone() if the index hasn't changed (which now that i think about it, would also help by punting on the isCloneSupported issue – each class would already know if it was clonable. maybe this will make more sense once i read the patch ... i just wanted to through it out there in case someone had a chance to reply before i get a chance.
          Hide
          robert engels added a comment -

          Nice to see all the good work on this. We are still on a 1.9 derivative.

          Hopefully we'll be able to move to stock 2.X release in the future.

          Show
          robert engels added a comment - Nice to see all the good work on this. We are still on a 1.9 derivative. Hopefully we'll be able to move to stock 2.X release in the future.
          Hide
          Michael Busch added a comment -

          > Too bad so much needs to be cloned in the case that closeOldReader==false... maybe someday in the future we can have read-only readers.

          Yeah, the cloning part was kind of tedious. Read-only readers would indeed make our life much easier here. I'm wondering how many people are using the IndexReader to alter the norms anyway?

          Well, thanks for reviewing the patch, Yonik!

          Show
          Michael Busch added a comment - > Too bad so much needs to be cloned in the case that closeOldReader==false... maybe someday in the future we can have read-only readers. Yeah, the cloning part was kind of tedious. Read-only readers would indeed make our life much easier here. I'm wondering how many people are using the IndexReader to alter the norms anyway? Well, thanks for reviewing the patch, Yonik!
          Hide
          Yonik Seeley added a comment -

          I think this looks pretty good Michael!
          Too bad so much needs to be cloned in the case that closeOldReader==false... maybe someday in the future we can have read-only readers.

          Show
          Yonik Seeley added a comment - I think this looks pretty good Michael! Too bad so much needs to be cloned in the case that closeOldReader==false... maybe someday in the future we can have read-only readers.
          Hide
          Michael Busch added a comment -

          I ran new performance tests with the latest patch similar to the tests I explained in an earlier comment on this issue.

          I'm using again a 4.5M index. In each round I delete one document and (re)open the IndexReader thereafter. Here are the numbers for 5000 rounds:

            Time Speedup
          open 703s  
          reopen(closeOldReader==false) 62s 11x
          reopen(closeOldReader==true) 16s 44x

          Now in each round I delete on document and also set the norm for one random document. Numbers for 1000 rounds:

            Time Speedup
          open 166s  
          reopen(closeOldReader==false) 33s 5.0x
          reopen(closeOldReader==true) 29s 5.7x

          I think these numbers look pretty good. We get a quite decent speedup even if the old readers are not closed.

          I would like to commit this in a couple of days to get ready for Lucene 2.3. It would be nice if someone could review the latest patch! Hoss? Others?

          Show
          Michael Busch added a comment - I ran new performance tests with the latest patch similar to the tests I explained in an earlier comment on this issue. I'm using again a 4.5M index. In each round I delete one document and (re)open the IndexReader thereafter. Here are the numbers for 5000 rounds:   Time Speedup open 703s   reopen(closeOldReader==false) 62s 11x reopen(closeOldReader==true) 16s 44x Now in each round I delete on document and also set the norm for one random document. Numbers for 1000 rounds:   Time Speedup open 166s   reopen(closeOldReader==false) 33s 5.0x reopen(closeOldReader==true) 29s 5.7x I think these numbers look pretty good. We get a quite decent speedup even if the old readers are not closed. I would like to commit this in a couple of days to get ready for Lucene 2.3. It would be nice if someone could review the latest patch! Hoss? Others?
          Hide
          Michael Busch added a comment -

          I'm attaching a new version of the patch that has a lot of changes compared to the last patch:

          • I factored most of the reopen logic into the subclasses of IndexReader. Now that we're having the DirectoryIndexReader layer this was possible in a more elegant way.
          • IndexReader.reopen() now does not close the old readers by default. This was somewhat tricky, because now the IndexReaders must be cloneable. I changed IndexReader to implement the Cloneable interface and implemented clone() for all Lucene built-in IndexReaders. However, there are probably custom IndexReader implementations out there that do not implement clone() and reopen() should throw an exception when an attempt is made to reopen such a reader. But I don't want to throw an exception in IndexReader.clone() itself, because then it would not be possible anymore for subclasses to recursively call the native Object.clone() via super.clone(). To solve this conflict I added the method
              /**
               * Returns true, if this IndexReader instance supports the clone() operation.
               */
              protected boolean isCloneSupported();
            

          to IndexReader which returns false by default. IndexReader.clone() checks if the actual implementation supports clone() (i. e. the above method returns true). If not, it throws an UnsupportedOperationException, if yes, it returns super.clone().

          I was not sure about whether to throw an (unchecked) UnsupportedOperationException or a CloneNotSupportedException in this case. I decided to throw UnsupportedOperationException even though it's not really following the clone() guidelines, because it avoids the necessity to catch the CloneNotSupportedException every time clone() is called (in the reopen() methods of all IndexReader implementations).

          As an example for how the clone() method is used let me describe how MultiReader.reopen() works: it tries to reopen every of its subreaders. If at least one subreader could be reopened successfully, then a new MultiReader instance is created and the reopened subreaders are added to it. Every of the old MultiReader's subreaders, that was not reopened (because of no index changes) is now cloned() and added to the new MultiReader.

          • I also added the new method
              /**
               * In addition to {@link #reopen()} this methods offers the ability to close
               * the old IndexReader instance. This speeds up the reopening process for
               * certain IndexReader implementations and reduces memory consumption, because
               * resources of the old instance can be reused for the reopened IndexReader
               * as it avoids the need of copying the resources.
               * <p>
               * The reopen performance especially benefits if IndexReader instances returned 
               * by one of the <code>open()</code> methods are reopened with 
               * <code>closeOldReader==true</code>.
               * <p>
               * Certain IndexReader implementations ({@link MultiReader}, {@link ParallelReader})
               * require that the subreaders support the clone() operation (see {@link #isCloneSupported()}
               * in order to perform reopen with <code>closeOldReader==false</code>.  
               */
              public synchronized IndexReader reopen(boolean closeOldReader);
            

          As the javadoc says it has two benefits: 1) it speeds up reopening and reduces ressources, and 2) it allows to reopen readers, that use non-cloneable subreaders.

          Please let me know what you think about these changes, especially about the clone() implementation.

          Show
          Michael Busch added a comment - I'm attaching a new version of the patch that has a lot of changes compared to the last patch: I factored most of the reopen logic into the subclasses of IndexReader. Now that we're having the DirectoryIndexReader layer this was possible in a more elegant way. IndexReader.reopen() now does not close the old readers by default. This was somewhat tricky, because now the IndexReaders must be cloneable. I changed IndexReader to implement the Cloneable interface and implemented clone() for all Lucene built-in IndexReaders. However, there are probably custom IndexReader implementations out there that do not implement clone() and reopen() should throw an exception when an attempt is made to reopen such a reader. But I don't want to throw an exception in IndexReader.clone() itself, because then it would not be possible anymore for subclasses to recursively call the native Object.clone() via super.clone(). To solve this conflict I added the method /** * Returns true , if this IndexReader instance supports the clone() operation. */ protected boolean isCloneSupported(); to IndexReader which returns false by default. IndexReader.clone() checks if the actual implementation supports clone() (i. e. the above method returns true). If not, it throws an UnsupportedOperationException, if yes, it returns super.clone(). I was not sure about whether to throw an (unchecked) UnsupportedOperationException or a CloneNotSupportedException in this case. I decided to throw UnsupportedOperationException even though it's not really following the clone() guidelines, because it avoids the necessity to catch the CloneNotSupportedException every time clone() is called (in the reopen() methods of all IndexReader implementations). As an example for how the clone() method is used let me describe how MultiReader.reopen() works: it tries to reopen every of its subreaders. If at least one subreader could be reopened successfully, then a new MultiReader instance is created and the reopened subreaders are added to it. Every of the old MultiReader's subreaders, that was not reopened (because of no index changes) is now cloned() and added to the new MultiReader. I also added the new method /** * In addition to {@link #reopen()} this methods offers the ability to close * the old IndexReader instance. This speeds up the reopening process for * certain IndexReader implementations and reduces memory consumption, because * resources of the old instance can be reused for the reopened IndexReader * as it avoids the need of copying the resources. * <p> * The reopen performance especially benefits if IndexReader instances returned * by one of the <code>open()</code> methods are reopened with * <code>closeOldReader== true </code>. * <p> * Certain IndexReader implementations ({@link MultiReader}, {@link ParallelReader}) * require that the subreaders support the clone() operation (see {@link #isCloneSupported()} * in order to perform reopen with <code>closeOldReader== false </code>. */ public synchronized IndexReader reopen( boolean closeOldReader); As the javadoc says it has two benefits: 1) it speeds up reopening and reduces ressources, and 2) it allows to reopen readers, that use non-cloneable subreaders. Please let me know what you think about these changes, especially about the clone() implementation.
          Hide
          Nat added a comment -

          Please also consider making an option where the reopen can be automated (i.e. when the index is updated) instead of having to call it explicitly. Thread safety should be taken into account as well.

          Show
          Nat added a comment - Please also consider making an option where the reopen can be automated (i.e. when the index is updated) instead of having to call it explicitly. Thread safety should be taken into account as well.
          Hide
          Michael Busch added a comment -

          We should first refactor segmentInfos into IndexReader's subclasses.

          Show
          Michael Busch added a comment - We should first refactor segmentInfos into IndexReader's subclasses.
          Hide
          Hoss Man added a comment -

          > I'm not sure if the code would become cleaner if we did that. Sometimes a SegmentReader would then have to
          > return a MultiSegmentReader instance and vice versa. So we'd probably have to duplicate some of the code in
          > these two classes.

          i don't hink there would be anything wrong with SegmentReader.reopen returning a MultiSegmentReader in some cases (or vice versa) but it definitely seems wrong to me for a parent class to be explicitly casing "this" to one of two know subclasses ... making reopen abstract in the base class (or throw UnsupportOp if for API back compatibility) seems like the only safe way to ensure any future IndexReader subclasses work properly.

          Show
          Hoss Man added a comment - > I'm not sure if the code would become cleaner if we did that. Sometimes a SegmentReader would then have to > return a MultiSegmentReader instance and vice versa. So we'd probably have to duplicate some of the code in > these two classes. i don't hink there would be anything wrong with SegmentReader.reopen returning a MultiSegmentReader in some cases (or vice versa) but it definitely seems wrong to me for a parent class to be explicitly casing "this" to one of two know subclasses ... making reopen abstract in the base class (or throw UnsupportOp if for API back compatibility) seems like the only safe way to ensure any future IndexReader subclasses work properly.
          Hide
          Michael Busch added a comment -

          > IndexReader a, b, c = ...
          > MultiReader ab = new MultiReader(

          {a, b}

          )
          > MultiReader bc = new MultiReader(

          {b, c}

          )
          > ...b changes on disk...
          > ab.reopen(); // this shouldn't affect bc;
          >
          > one possibility would be for the semantics of reopen to close old readers only
          > if it completely owns them; ie: MultiReader should never close anything in
          > reopen, MultiSegmentReader should close all of the subreaders since it opened
          > them in the first place

          So if 'b' in your example is a MultiSegmentReader, then the reopen() call
          triggered from MultiReader.reopen() would close old readers, because it owns them,
          thus 'bc' wouldn't work anymore. So it depends on the caller of
          MultiSegmentReader.reopen() whether or not to close the subreaders. I think this
          is kind of messy. Well instead of reopen() we could add
          reopen(boolean closeOldReaders), but then...

          > IndexReader r = ...
          > ...
          > IndexReader tmp = r.reopen();
          > if (tmp != r) r.close();
          > r = tmp;
          > ...

          ... is actually easy enough as you pointed out, so that the extra complexity is not
          really worth it IMO.

          > In general i think the safest thing to do is for reopen to never close.

          So yes, I agree.

          > why is the meat of reopen still in "IndexReader" with a "if (reader instanceof
          > SegmentReader)" style logic in it? can't the different reopen mechanisms be
          > refactored down into SegmentReader and MultiSegmentReader respectively?

          I'm not sure if the code would become cleaner if we did that. Sometimes a
          SegmentReader would then have to return a MultiSegmentReader instance and vice
          versa. So we'd probably have to duplicate some of the code in these two classes.

          Show
          Michael Busch added a comment - > IndexReader a, b, c = ... > MultiReader ab = new MultiReader( {a, b} ) > MultiReader bc = new MultiReader( {b, c} ) > ...b changes on disk... > ab.reopen(); // this shouldn't affect bc; > > one possibility would be for the semantics of reopen to close old readers only > if it completely owns them; ie: MultiReader should never close anything in > reopen, MultiSegmentReader should close all of the subreaders since it opened > them in the first place So if 'b' in your example is a MultiSegmentReader, then the reopen() call triggered from MultiReader.reopen() would close old readers, because it owns them, thus 'bc' wouldn't work anymore. So it depends on the caller of MultiSegmentReader.reopen() whether or not to close the subreaders. I think this is kind of messy. Well instead of reopen() we could add reopen(boolean closeOldReaders), but then... > IndexReader r = ... > ... > IndexReader tmp = r.reopen(); > if (tmp != r) r.close(); > r = tmp; > ... ... is actually easy enough as you pointed out, so that the extra complexity is not really worth it IMO. > In general i think the safest thing to do is for reopen to never close. So yes, I agree. > why is the meat of reopen still in "IndexReader" with a "if (reader instanceof > SegmentReader)" style logic in it? can't the different reopen mechanisms be > refactored down into SegmentReader and MultiSegmentReader respectively? I'm not sure if the code would become cleaner if we did that. Sometimes a SegmentReader would then have to return a MultiSegmentReader instance and vice versa. So we'd probably have to duplicate some of the code in these two classes.
          Hide
          Hoss Man added a comment -

          (note: i haven't looked at the latest patch in detail, just commenting on the comments)

          One key problem i see with automatically closing things in reopen is MultiReader: it's perfectly legal to do something like this psuedocode...

          IndexReader a, b, c = ...
          MultiReader ab = new MultiReader(

          {a, b}

          )
          MultiReader bc = new MultiReader(

          {b, c}

          )
          ...b changes on disk...
          ab.reopen(); // this shouldn't affect bc;

          one possibility would be for the semantics of reopen to close old readers only if it completely owns them; ie: MultiReader should never close anything in reopen, MultiSegmentReader should close all of the subreaders since it opened them in the first place ... things get into a grey area with SegementReader though.

          In general i think the safest thing to do is for reopen to never close. Yonik's comment showcases one of the most compelling reasons why it can be important for clients to be able to keep using an old IndexReader instance, and it's easy enough for clients that want the old one closed to do something like...

          IndexReader r = ...
          ...
          IndexReader tmp = r.reopen();
          if (tmp != r) r.close();
          r = tmp;
          ...

          (one question that did jump out at me while greping the patch for the where old readers were being closed: why is the meat of reopen still in "IndexReader" with a "if (reader instanceof SegmentReader)" style logic in it? can't the different reopen mechanisms be refactored down into SegmentReader and MultiSegmentReader respectively? shouldn't the default impl of IndexReader throw an UnsuppportedOperationException?)

          Show
          Hoss Man added a comment - (note: i haven't looked at the latest patch in detail, just commenting on the comments) One key problem i see with automatically closing things in reopen is MultiReader: it's perfectly legal to do something like this psuedocode... IndexReader a, b, c = ... MultiReader ab = new MultiReader( {a, b} ) MultiReader bc = new MultiReader( {b, c} ) ...b changes on disk... ab.reopen(); // this shouldn't affect bc; one possibility would be for the semantics of reopen to close old readers only if it completely owns them; ie: MultiReader should never close anything in reopen, MultiSegmentReader should close all of the subreaders since it opened them in the first place ... things get into a grey area with SegementReader though. In general i think the safest thing to do is for reopen to never close. Yonik's comment showcases one of the most compelling reasons why it can be important for clients to be able to keep using an old IndexReader instance, and it's easy enough for clients that want the old one closed to do something like... IndexReader r = ... ... IndexReader tmp = r.reopen(); if (tmp != r) r.close(); r = tmp; ... (one question that did jump out at me while greping the patch for the where old readers were being closed: why is the meat of reopen still in "IndexReader" with a "if (reader instanceof SegmentReader)" style logic in it? can't the different reopen mechanisms be refactored down into SegmentReader and MultiSegmentReader respectively? shouldn't the default impl of IndexReader throw an UnsuppportedOperationException?)
          Hide
          Yonik Seeley added a comment -

          > I think closing the old reader is fine. What do others think? Is keeping the old
          > reader after a reopen() a useful usecase?

          In a multi-threaded environment, one wants to open a new reader, but needs to wait until all requests finish before closing the old reader. Seems like reference counting is the only way to handle that case.

          Show
          Yonik Seeley added a comment - > I think closing the old reader is fine. What do others think? Is keeping the old > reader after a reopen() a useful usecase? In a multi-threaded environment, one wants to open a new reader, but needs to wait until all requests finish before closing the old reader. Seems like reference counting is the only way to handle that case.
          Hide
          Michael Busch added a comment -

          I ran some quick performance tests with this patch:

          1) The test opens an IndexReader, deletes one document by random docid, closes the Reader.
          So this reader doesn't have to open the dictionary or the norms.
          2) Another reader is opened (or alternatively reopened) and one TermQuery is executed, so
          this reader has to read the norms and the dictionary.

          I run these two steps 5000 times in a loop.

          First run: Index size: 4.5M, optimized

          • 1) + TermQuery: 103 sec
          • 1) + 2) (open): 806 sec, so open() takes 703 sec
          • 1) + 2) (reopen): 118 sec, so reopen() takes 15 sec ==> Speedup: 46.9 X

          Second run: Index size: 3.3M, 24 segments (14x 230.000, 10x 10.000)

          • 1) + TermQuery: 235 sec
          • 1) + 2) (open): 1162 sec, so open() takes 927 sec
          • 1) + 2) (reopen): 321 sec, so reopen() takes 86 sec ==> Speedup: 10.8X
          Show
          Michael Busch added a comment - I ran some quick performance tests with this patch: 1) The test opens an IndexReader, deletes one document by random docid, closes the Reader. So this reader doesn't have to open the dictionary or the norms. 2) Another reader is opened (or alternatively reopened) and one TermQuery is executed, so this reader has to read the norms and the dictionary. I run these two steps 5000 times in a loop. First run: Index size: 4.5M, optimized 1) + TermQuery: 103 sec 1) + 2) (open): 806 sec, so open() takes 703 sec 1) + 2) (reopen): 118 sec, so reopen() takes 15 sec ==> Speedup: 46.9 X Second run: Index size: 3.3M, 24 segments (14x 230.000, 10x 10.000) 1) + TermQuery: 235 sec 1) + 2) (open): 1162 sec, so open() takes 927 sec 1) + 2) (reopen): 321 sec, so reopen() takes 86 sec ==> Speedup: 10.8X
          Hide
          Michael Busch added a comment -

          Now, after LUCENE-781, LUCENE-970 and LUCENE-832 are committed, I updated the latest
          patch here, which was now easier because MultiReader is now separated into two classes.

          Notes:

          • As Hoss suggested I added the reopen() method to IndexReader non-static.
          • MultiReader and ParallelReader now overwrite reopen() to reopen the subreaders
            recursively.
          • FilteredReader also overwrites reopen(). It checks if the underlying reader has
            changed, and in that case returns a new instance of FilteredReader.

          I think the general contract of reopen() should be to always return a new IndexReader
          instance if it was successfully refreshed and return the same instance otherwise,
          because IndexReaders are used as keys in caches.
          A remaining question here is if the old reader(s) should be closed then or not.
          This patch closes the old readers for now, if we want to change that we probably have
          to add some reference counting mechanism, as Robert suggested already. Then I would
          also have to change the SegmentReader.reopen() implementation to clone resources like
          the dictionary, norms and delete bits.
          I think closing the old reader is fine. What do others think? Is keeping the old
          reader after a reopen() a useful usecase?

          Show
          Michael Busch added a comment - Now, after LUCENE-781 , LUCENE-970 and LUCENE-832 are committed, I updated the latest patch here, which was now easier because MultiReader is now separated into two classes. Notes: As Hoss suggested I added the reopen() method to IndexReader non-static. MultiReader and ParallelReader now overwrite reopen() to reopen the subreaders recursively. FilteredReader also overwrites reopen(). It checks if the underlying reader has changed, and in that case returns a new instance of FilteredReader. I think the general contract of reopen() should be to always return a new IndexReader instance if it was successfully refreshed and return the same instance otherwise, because IndexReaders are used as keys in caches. A remaining question here is if the old reader(s) should be closed then or not. This patch closes the old readers for now, if we want to change that we probably have to add some reference counting mechanism, as Robert suggested already. Then I would also have to change the SegmentReader.reopen() implementation to clone resources like the dictionary, norms and delete bits. I think closing the old reader is fine. What do others think? Is keeping the old reader after a reopen() a useful usecase?
          Hide
          Hoss Man added a comment -

          an extremely hackish refactoring of the previous patch that demonstrates the method working recursively and dealing with MultiReaders constructed over multiple subReaders.

          a few notes:

          1) no where near enough tests of the subReader situation
          2) the refactoring is very very ugly and brutish ... most of the code is still in IndexReader just because it needs so many things that are private – things that really seems like they should be pushed down into SegmentReader (or made protected)
          3) test triggered an apparent NPE in MultiReader.isOptimized() when there are subReaders, i hacked arround this in the test, see usages of assertIndexEqualsZZZ vs assertIndexEquals
          4) the FilterIndexReader situation is ... interesting. in theory FilterIndexReader should really be abstract (since if you aren't subclassing it anyway, why do you want it?)

          Show
          Hoss Man added a comment - an extremely hackish refactoring of the previous patch that demonstrates the method working recursively and dealing with MultiReaders constructed over multiple subReaders. a few notes: 1) no where near enough tests of the subReader situation 2) the refactoring is very very ugly and brutish ... most of the code is still in IndexReader just because it needs so many things that are private – things that really seems like they should be pushed down into SegmentReader (or made protected) 3) test triggered an apparent NPE in MultiReader.isOptimized() when there are subReaders, i hacked arround this in the test, see usages of assertIndexEqualsZZZ vs assertIndexEquals 4) the FilterIndexReader situation is ... interesting. in theory FilterIndexReader should really be abstract (since if you aren't subclassing it anyway, why do you want it?)
          Hide
          Hoss Man added a comment -

          > Yeah we could do that. However, this might not be so easy to implement.
          > For example, if a user creates a MultiReader instance and adds whatever
          > subreaders, we would have to recursively refresh the underlying readers.
          > But if the MultiReader was created automatically by IndexReader.open() just
          > calling refresh on the subreaders is not enough. New SegmentReaders have to
          > be opened for new segments.

          ...this being the curse that is MultiReader – it can serve two very differenet purposes.

          You seem to have already solved the multisegments in a single directory approach, the MultiReader over many subreader part actually seems much easier to me (just call your open method on all of the subreaders) the only tricky part is detecting which behavior should be used when. This could be driven by a simple boolean property of MultiReader indicating whether it owns it's directory and we need to look for new segments or not – in which case we just need to refresh the subreaders. (My personal preference would be to change MultiReader so "this.directory" is null if it was open over several other subReaders, right now it's just assigned to the first one arbitrarily, but there may be other consequences of changing that)

          Incidentally: I don't think it's crucial that this be done as a recursive method. the same approach i describe could be added to static utility like what you've got, I just think that if it's possible to do it recursively we should so that if someone does write their own MultiReader or SegmentReader subclass they can still benefit from any core reopening logic as long as theey do their part to "reopen" their extensions.

          Show
          Hoss Man added a comment - > Yeah we could do that. However, this might not be so easy to implement. > For example, if a user creates a MultiReader instance and adds whatever > subreaders, we would have to recursively refresh the underlying readers. > But if the MultiReader was created automatically by IndexReader.open() just > calling refresh on the subreaders is not enough. New SegmentReaders have to > be opened for new segments. ...this being the curse that is MultiReader – it can serve two very differenet purposes. You seem to have already solved the multisegments in a single directory approach, the MultiReader over many subreader part actually seems much easier to me (just call your open method on all of the subreaders) the only tricky part is detecting which behavior should be used when. This could be driven by a simple boolean property of MultiReader indicating whether it owns it's directory and we need to look for new segments or not – in which case we just need to refresh the subreaders. (My personal preference would be to change MultiReader so "this.directory" is null if it was open over several other subReaders, right now it's just assigned to the first one arbitrarily, but there may be other consequences of changing that) Incidentally: I don't think it's crucial that this be done as a recursive method. the same approach i describe could be added to static utility like what you've got, I just think that if it's possible to do it recursively we should so that if someone does write their own MultiReader or SegmentReader subclass they can still benefit from any core reopening logic as long as theey do their part to "reopen" their extensions.
          Hide
          Michael Busch added a comment -

          First version of my patch:

          • Adds the static method IndexReader.open(IndexReader)
            that returns a new instance of IndexReader in case
            the reader could be updated. If it was up to date
            then the passed-in instance is returned. Only
            IndexReader instances that were created by one of
            the IndexReader.open() methods can be refreshed.
          • SegmentsReader.reopen(SegmentInfos) looks in the
            SegmentInfos for a segment with the same name. If
            one could be found then either the deleted docs or
            the norms were updated, otherwise the segment name
            would have changed. reopen() clones the
            SegmentReader and either loads the deleted docs,
            the norms or both. Then the clone is returned and
            the original SegmentReader is marked as closed.
          • If the index has only one segment, then
            IndexReader.open(IndexReader) checks if the passed
            in IndexReader can be refreshed. This is only
            possible if it is no MultiReader and if the segment
            name has not changed. Otherwise a new SegmentReader
            instance is returned and the old reader is closed.
          • If the index has multiple segments, then
            IndexReader.open(IndexReader) creates a new
            MultiReader and tries to reuse the passed-in
            reader (in case it's a SegmentReader) or its
            subreaders (in case it's a MultiReader). For new
            segments new SegmentReaders are created. Old
            readers that couldn't be reused are closed.
          • Adds the new testcase TestIndexReaderReopen. It
            includes the method
            assertIndexEquals(IndexReader, IndexReader) that
            checks whether boths IndexReaders have the same
            content. The testcase creates an index and
            performes different modifications on that index.
            One IndexReader is refreshed after each index
            modification and compared to a fresh IndexReader
            which is opened after each modification.

          This first version is for review and not ready to
          commit. I want to add more extensive tests and
          probably clean up the code.

          All tests pass.

          Show
          Michael Busch added a comment - First version of my patch: Adds the static method IndexReader.open(IndexReader) that returns a new instance of IndexReader in case the reader could be updated. If it was up to date then the passed-in instance is returned. Only IndexReader instances that were created by one of the IndexReader.open() methods can be refreshed. SegmentsReader.reopen(SegmentInfos) looks in the SegmentInfos for a segment with the same name. If one could be found then either the deleted docs or the norms were updated, otherwise the segment name would have changed. reopen() clones the SegmentReader and either loads the deleted docs, the norms or both. Then the clone is returned and the original SegmentReader is marked as closed. If the index has only one segment, then IndexReader.open(IndexReader) checks if the passed in IndexReader can be refreshed. This is only possible if it is no MultiReader and if the segment name has not changed. Otherwise a new SegmentReader instance is returned and the old reader is closed. If the index has multiple segments, then IndexReader.open(IndexReader) creates a new MultiReader and tries to reuse the passed-in reader (in case it's a SegmentReader) or its subreaders (in case it's a MultiReader). For new segments new SegmentReaders are created. Old readers that couldn't be reused are closed. Adds the new testcase TestIndexReaderReopen. It includes the method assertIndexEquals(IndexReader, IndexReader) that checks whether boths IndexReaders have the same content. The testcase creates an index and performes different modifications on that index. One IndexReader is refreshed after each index modification and compared to a fresh IndexReader which is opened after each modification. This first version is for review and not ready to commit. I want to add more extensive tests and probably clean up the code. All tests pass.
          Hide
          Michael Busch added a comment -

          > i somehow missed seeing this issues before ...

          actually, me too... first I came across this thread:

          http://www.gossamer-threads.com/lists/lucene/java-dev/31592?search_string=refresh;#31592

          in which Doug suggested adding a static method IndexReader.open(IndexReader)
          which would either return the passed in IndexReader instance in case is is
          up to date or return a new, refreshed instance.

          I started implementing this, using Dougs and Roberts ideas and then realized
          that there was already this open issue. But the code here is quite outdated.

          > in generally we should probably try to approach reopening a reader as a
          > recursive operation

          Yeah we could do that. However, this might not be so easy to implement.
          For example, if a user creates a MultiReader instance and adds whatever
          subreaders, we would have to recursively refresh the underlying readers.
          But if the MultiReader was created automatically by IndexReader.open() just
          calling refresh on the subreaders is not enough. New SegmentReaders have to
          be opened for new segments.

          Also the recursive walk would have to take place within the FindSegmentsFile
          logic.

          I decided therefore to only allow IndexReaders to be refreshed if they were
          created by one of the IndexReader.open() methods. I'm going to submit a first
          version of my patch soon. Do you think this is too limiting?

          Show
          Michael Busch added a comment - > i somehow missed seeing this issues before ... actually, me too... first I came across this thread: http://www.gossamer-threads.com/lists/lucene/java-dev/31592?search_string=refresh;#31592 in which Doug suggested adding a static method IndexReader.open(IndexReader) which would either return the passed in IndexReader instance in case is is up to date or return a new, refreshed instance. I started implementing this, using Dougs and Roberts ideas and then realized that there was already this open issue. But the code here is quite outdated. > in generally we should probably try to approach reopening a reader as a > recursive operation Yeah we could do that. However, this might not be so easy to implement. For example, if a user creates a MultiReader instance and adds whatever subreaders, we would have to recursively refresh the underlying readers. But if the MultiReader was created automatically by IndexReader.open() just calling refresh on the subreaders is not enough. New SegmentReaders have to be opened for new segments. Also the recursive walk would have to take place within the FindSegmentsFile logic. I decided therefore to only allow IndexReaders to be refreshed if they were created by one of the IndexReader.open() methods. I'm going to submit a first version of my patch soon. Do you think this is too limiting?
          Hide
          Hoss Man added a comment -

          i somehow missed seeing this issues before ... i don't really understand the details, but a few comments that come to mind...

          1) this approach seems to assume that when reopening a MyMultiReader, the sub readers will all be MySegmentReaders .. assuming we generalize this to MultiReader/SegmentTeader, this wouldn't work in the case were people are using a MultiReader containing other MultiReaders ... not to mention the possibility of people who have written their own IndexReader implementations.
          in generally we should probably try to approach reopening a reader as a recursive operation if possible where each type of reader is responsible for checking to see if it's underlying data has changed, if not return itself, if so return a new reader in it's place (much like rewrite works for Queries)

          2) there is no more commit lock correct? ... is this approach something that can still be valid using the current backoff/retry mechanism involved with opening segments?

          Show
          Hoss Man added a comment - i somehow missed seeing this issues before ... i don't really understand the details, but a few comments that come to mind... 1) this approach seems to assume that when reopening a MyMultiReader, the sub readers will all be MySegmentReaders .. assuming we generalize this to MultiReader/SegmentTeader, this wouldn't work in the case were people are using a MultiReader containing other MultiReaders ... not to mention the possibility of people who have written their own IndexReader implementations. in generally we should probably try to approach reopening a reader as a recursive operation if possible where each type of reader is responsible for checking to see if it's underlying data has changed, if not return itself, if so return a new reader in it's place (much like rewrite works for Queries) 2) there is no more commit lock correct? ... is this approach something that can still be valid using the current backoff/retry mechanism involved with opening segments?
          Hide
          robert engels added a comment -

          A generic version probably needs to implement reference counting on the Segments or IndexReader in order to know when they can be safely closed.

          Show
          robert engels added a comment - A generic version probably needs to implement reference counting on the Segments or IndexReader in order to know when they can be safely closed.
          Hide
          Otis Gospodnetic added a comment -

          In a direct email to me, Robert said: "All of the files can be prepended with the ASL."

          Show
          Otis Gospodnetic added a comment - In a direct email to me, Robert said: "All of the files can be prepended with the ASL."

            People

            • Assignee:
              Michael Busch
              Reporter:
              Otis Gospodnetic
            • Votes:
              3 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development