Lucene - Core
  1. Lucene - Core
  2. LUCENE-1726

IndexWriter.readerPool create new segmentReader outside of sync block

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Won't Fix
    • Affects Version/s: 2.4.1
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I think we will want to do something like what field cache does
      with CreationPlaceholder for IndexWriter.readerPool. Otherwise
      we have the (I think somewhat problematic) issue of all other
      readerPool.get* methods waiting for an SR to warm.

      It would be good to implement this for 2.9.

      1. LUCENE-1726.patch
        39 kB
        Michael McCandless
      2. LUCENE-1726.patch
        18 kB
        Michael McCandless
      3. LUCENE-1726.trunk.test.patch
        4 kB
        Jason Rutherglen
      4. LUCENE-1726.patch
        13 kB
        Jason Rutherglen
      5. LUCENE-1726.patch
        13 kB
        Jason Rutherglen
      6. LUCENE-1726.patch
        9 kB
        Jason Rutherglen
      7. LUCENE-1726.patch
        9 kB
        Jason Rutherglen

        Activity

        Hide
        Jason Rutherglen added a comment -

        We don't block accessing readers in the IW.readerPool when a new
        segmentReader is being warmed/instantiated. This is important
        when new segmentReaders on large new segments are being accessed
        for the first time. Otherwise today IW.getReader may wait while
        the new SR is being created.

        • IW.readerPool map values are now of type MapValue
        • We synchronize on the MapValue in methods that access the SR
        • synchronize for the entire readerPool.get method is removed.
        • All tests pass
        Show
        Jason Rutherglen added a comment - We don't block accessing readers in the IW.readerPool when a new segmentReader is being warmed/instantiated. This is important when new segmentReaders on large new segments are being accessed for the first time. Otherwise today IW.getReader may wait while the new SR is being created. IW.readerPool map values are now of type MapValue We synchronize on the MapValue in methods that access the SR synchronize for the entire readerPool.get method is removed. All tests pass
        Hide
        Michael McCandless added a comment -

        Can we make the MapValue strongly typed? Eg name it SegmentReaderValue, and it has a single member "SegmentReader reader".

        getIfExists has duplicate checks for null (mv != null is checked twice and mv.value != null too).

        I think there is a thread hazard here, in particular a risk that one thread decrefs a reader just as another thread is trying to get it, and the reader in fact gets closed while the other thread has an mv.reader != null and illegally increfs that. I think you'll have to do the sr.incRef inside the synchronized(this), but I don't think that entirely resolves it.

        I'm going to move this out out 2.9; I don't think it should block it.

        Show
        Michael McCandless added a comment - Can we make the MapValue strongly typed? Eg name it SegmentReaderValue, and it has a single member "SegmentReader reader". getIfExists has duplicate checks for null (mv != null is checked twice and mv.value != null too). I think there is a thread hazard here, in particular a risk that one thread decrefs a reader just as another thread is trying to get it, and the reader in fact gets closed while the other thread has an mv.reader != null and illegally increfs that. I think you'll have to do the sr.incRef inside the synchronized(this), but I don't think that entirely resolves it. I'm going to move this out out 2.9; I don't think it should block it.
        Hide
        Jason Rutherglen added a comment -
        • New SRMapValue is strongly typed
        • All tests pass

        I think there is a thread hazard here, in particular a
        risk that one thread decrefs a reader just as another thread is
        trying to get it, and the reader in fact gets closed while the
        other thread has an mv.reader != null and illegally increfs
        that. I think you'll have to do the sr.incRef inside the
        synchronized(this), but I don't think that entirely resolves
        it.

        Are you referring to a decref on a reader outside of IW? The
        asserts we have did help in catching synchronization errors.
        It's unclear to me how to recreate the use case described such
        that it breaks things. We need a test case that fails with the
        current patch?

        Show
        Jason Rutherglen added a comment - New SRMapValue is strongly typed All tests pass I think there is a thread hazard here, in particular a risk that one thread decrefs a reader just as another thread is trying to get it, and the reader in fact gets closed while the other thread has an mv.reader != null and illegally increfs that. I think you'll have to do the sr.incRef inside the synchronized(this), but I don't think that entirely resolves it. Are you referring to a decref on a reader outside of IW? The asserts we have did help in catching synchronization errors. It's unclear to me how to recreate the use case described such that it breaks things. We need a test case that fails with the current patch?
        Hide
        Michael McCandless added a comment -

        The hazard is something like this, for a non-NRT IndexWriter (ie, no
        pooling):

        • Thread #1 is a merge; it checks out a reader & starts running
        • Thread #2 is applyDeletes (or opening an NRT reader); it calls
          readerPool.get, enters the first sync block to pull out the
          SRMapValue which has non-null reader, then leaves the sync block
        • Thread #1 calls release, which decRefs the reader & closes it
        • Thread #2 resumes, sees it has a non-null mv.reader and incRefs
          it, which is illegal (reader was already closed).
        Show
        Michael McCandless added a comment - The hazard is something like this, for a non-NRT IndexWriter (ie, no pooling): Thread #1 is a merge; it checks out a reader & starts running Thread #2 is applyDeletes (or opening an NRT reader); it calls readerPool.get, enters the first sync block to pull out the SRMapValue which has non-null reader, then leaves the sync block Thread #1 calls release, which decRefs the reader & closes it Thread #2 resumes, sees it has a non-null mv.reader and incRefs it, which is illegal (reader was already closed).
        Hide
        Jason Rutherglen added a comment -

        Shouldn't we be seeing an exception in TestStressIndexing2 (or
        another test class) when the mv.reader.incRef occurs and the
        reader is already closed?

        Show
        Jason Rutherglen added a comment - Shouldn't we be seeing an exception in TestStressIndexing2 (or another test class) when the mv.reader.incRef occurs and the reader is already closed?
        Hide
        Michael McCandless added a comment -

        Yes, we should eventually see a failure; I think it's just rare. Maybe try making a new test that constantly indexes docs, w/ low merge factor & maxBufferedDocs (so lots of flushing/merging happens) in one thread and constantly opens an NRT reader in another thread, to tease it out?

        Show
        Michael McCandless added a comment - Yes, we should eventually see a failure; I think it's just rare. Maybe try making a new test that constantly indexes docs, w/ low merge factor & maxBufferedDocs (so lots of flushing/merging happens) in one thread and constantly opens an NRT reader in another thread, to tease it out?
        Hide
        Jason Rutherglen added a comment -

        When I moved the sync block around in readerPool.get, tests
        would fail and/or hang. I'm not sure yet where we'd add the
        sync(this) block.

        I'll work on reproducing the above mentioned issue, thanks for
        the advice.

        Show
        Jason Rutherglen added a comment - When I moved the sync block around in readerPool.get, tests would fail and/or hang. I'm not sure yet where we'd add the sync(this) block. I'll work on reproducing the above mentioned issue, thanks for the advice.
        Hide
        Jason Rutherglen added a comment -

        Added a test case (for now a separate test class) that runs for
        5 minutes, mergeFactor 2, maxBufferedDocs 10, 4 threads
        alternately adding and deleting docs. I haven't seen the error
        we're looking for yet. CPU isn't maxing out (probably should,
        indicating possible blocking?) and may need to allow it run
        longer?

        Show
        Jason Rutherglen added a comment - Added a test case (for now a separate test class) that runs for 5 minutes, mergeFactor 2, maxBufferedDocs 10, 4 threads alternately adding and deleting docs. I haven't seen the error we're looking for yet. CPU isn't maxing out (probably should, indicating possible blocking?) and may need to allow it run longer?
        Hide
        Jason Rutherglen added a comment -

        Each thread in the test only performs adds or deletes (rather than both) and now we get a "MockRAMDirectory: cannot close: there are still open files" exception.

        Show
        Jason Rutherglen added a comment - Each thread in the test only performs adds or deletes (rather than both) and now we get a "MockRAMDirectory: cannot close: there are still open files" exception.
        Hide
        Jason Rutherglen added a comment - - edited

        I tried the test on trunk and get the same error. They're all
        docstore related files so maybe extra doc stores are being
        opened?

         
           [junit] MockRAMDirectory: cannot close: there are still open
        files: {_s4.fdt=2, _g2.fdx=2, _s4.fdx=2, _g2.tvf=2, _dw.fdx=2,
        _g2.tvd=2, _g2.tvx=2, _ks.tvf=2, _n9.tvx=2, _ks.tvx=2,
        _n9.fdx=2, _ks.fdx=2, _dw.cfx=1, _n9.tvf=2, _cp.cfx=1,
        _s4.tvf=2, _dw.tvx=2, _87.fdx=2, _fr.tvx=2, _87.tvf=2,
        _fr.tvd=2, _87.fdt=2, _ks.tvd=2, _s4.tvd=2, _dw.tvd=2,
        _n9.fdt=2, _g2.fdt=2, _87.tvd=2, _fr.fdt=2, _dw.fdt=2,
        _dj.cfx=1, _s4.tvx=2, _ks.fdt=2, _n9.tvd=2, _fr.tvf=2,
        _fr.fdx=2, _dw.tvf=2, _87.tvx=2} [junit]
        java.lang.RuntimeException: MockRAMDirectory: cannot close:
        there are still open files: {_s4.fdt=2, _g2.fdx=2, _s4.fdx=2,
        _g2.tvf=2, _dw.fdx=2, _g2.tvd=2, _g2.tvx=2, _ks.tvf=2,
        _n9.tvx=2, _ks.tvx=2, _n9.fdx=2, _ks.fdx=2, _dw.cfx=1,
        _n9.tvf=2, _cp.cfx=1, _s4.tvf=2, _dw.tvx=2, _87.fdx=2,
        _fr.tvx=2, _87.tvf=2, _fr.tvd=2, _87.fdt=2, _ks.tvd=2,
        _s4.tvd=2, _dw.tvd=2, _n9.fdt=2, _g2.fdt=2, _87.tvd=2,
        _fr.fdt=2, _dw.fdt=2, _dj.cfx=1, _s4.tvx=2, _ks.fdt=2,
        _n9.tvd=2, _fr.tvf=2, _fr.fdx=2, _dw.tvf=2, _87.tvx=2} [junit]
        	at
        org.apache.lucene.store.MockRAMDirectory.close(MockRAMDirectory.j
        ava:278) [junit] 	at
        org.apache.lucene.index.Test1726.testIndexing(Test1726.java:48)
        [junit] 	at
        org.apache.lucene.util.LuceneTestCase.runTest(LuceneTestCase.java
        :88)
        
        Show
        Jason Rutherglen added a comment - - edited I tried the test on trunk and get the same error. They're all docstore related files so maybe extra doc stores are being opened? [junit] MockRAMDirectory: cannot close: there are still open files: {_s4.fdt=2, _g2.fdx=2, _s4.fdx=2, _g2.tvf=2, _dw.fdx=2, _g2.tvd=2, _g2.tvx=2, _ks.tvf=2, _n9.tvx=2, _ks.tvx=2, _n9.fdx=2, _ks.fdx=2, _dw.cfx=1, _n9.tvf=2, _cp.cfx=1, _s4.tvf=2, _dw.tvx=2, _87.fdx=2, _fr.tvx=2, _87.tvf=2, _fr.tvd=2, _87.fdt=2, _ks.tvd=2, _s4.tvd=2, _dw.tvd=2, _n9.fdt=2, _g2.fdt=2, _87.tvd=2, _fr.fdt=2, _dw.fdt=2, _dj.cfx=1, _s4.tvx=2, _ks.fdt=2, _n9.tvd=2, _fr.tvf=2, _fr.fdx=2, _dw.tvf=2, _87.tvx=2} [junit] java.lang.RuntimeException: MockRAMDirectory: cannot close: there are still open files: {_s4.fdt=2, _g2.fdx=2, _s4.fdx=2, _g2.tvf=2, _dw.fdx=2, _g2.tvd=2, _g2.tvx=2, _ks.tvf=2, _n9.tvx=2, _ks.tvx=2, _n9.fdx=2, _ks.fdx=2, _dw.cfx=1, _n9.tvf=2, _cp.cfx=1, _s4.tvf=2, _dw.tvx=2, _87.fdx=2, _fr.tvx=2, _87.tvf=2, _fr.tvd=2, _87.fdt=2, _ks.tvd=2, _s4.tvd=2, _dw.tvd=2, _n9.fdt=2, _g2.fdt=2, _87.tvd=2, _fr.fdt=2, _dw.fdt=2, _dj.cfx=1, _s4.tvx=2, _ks.fdt=2, _n9.tvd=2, _fr.tvf=2, _fr.fdx=2, _dw.tvf=2, _87.tvx=2} [junit] at org.apache.lucene.store.MockRAMDirectory.close(MockRAMDirectory.j ava:278) [junit] at org.apache.lucene.index.Test1726.testIndexing(Test1726.java:48) [junit] at org.apache.lucene.util.LuceneTestCase.runTest(LuceneTestCase.java :88)
        Hide
        Michael McCandless added a comment -

        Hmm... I'll dig into this test case.

        Show
        Michael McCandless added a comment - Hmm... I'll dig into this test case.
        Hide
        Jason Rutherglen added a comment -

        Mike,

        I was wondering if you can recommend techniques or tools for
        debugging this type of multithreading issue? (i.e. how do you go
        about figuring this type of issue out?)

        Show
        Jason Rutherglen added a comment - Mike, I was wondering if you can recommend techniques or tools for debugging this type of multithreading issue? (i.e. how do you go about figuring this type of issue out?)
        Hide
        Michael McCandless added a comment -

        I don't have any particular tools...

        First I simplify the test as much as possible while still hitting the
        failure (eg this failure happens w/ only 2 threads), then see if the
        error will happen if I turn on IndexWriter's infoStream (it doesn't
        for this, so far). If so, I scrutinize the series of events to find
        the hazard; else, I turn off infoStream and add back in a small number
        of prints, as long as failure still happens.

        Often I use a simple Python script that runs the test over & over
        until a failure happens, saving the log, and then scrutinize that.

        It's good to start with a rough guess, eg this failure is w/ only doc
        stores so it seems likely the merging logic that opens doc stores just
        before kicking off the merge may be to blame.

        Show
        Michael McCandless added a comment - I don't have any particular tools... First I simplify the test as much as possible while still hitting the failure (eg this failure happens w/ only 2 threads), then see if the error will happen if I turn on IndexWriter's infoStream (it doesn't for this, so far). If so, I scrutinize the series of events to find the hazard; else, I turn off infoStream and add back in a small number of prints, as long as failure still happens. Often I use a simple Python script that runs the test over & over until a failure happens, saving the log, and then scrutinize that. It's good to start with a rough guess, eg this failure is w/ only doc stores so it seems likely the merging logic that opens doc stores just before kicking off the merge may be to blame.
        Hide
        Michael McCandless added a comment -

        OK the problem happens when a segment is first opened by a merge that
        doesn't need to merge the doc stores; later, an NRT reader is opened
        that separately opens the doc stores of the same [pooled]
        SegmentReader, but then it's the merge that closes the read-only clone
        of the reader.

        In this case the separately opened (by the NRT reader) doc stores are
        not closed by the merge thread. It's the mirror image of LUCENE-1639.

        I've fixed it by pulling all shared readers in a SegmentReader into a
        separate static class (CoreReaders). Cloned SegmentReaders share the
        same instance of this class so that if a clone later opens the doc
        stores, any prior ancestor (that the clone was created from) would
        also close those readers if it's the reader to decRef to 0.

        I did something similar for LUCENE-1609 (which I'll now hit conflicts
        on after committing this... sigh).

        I plan to commit in a day or so.

        Show
        Michael McCandless added a comment - OK the problem happens when a segment is first opened by a merge that doesn't need to merge the doc stores; later, an NRT reader is opened that separately opens the doc stores of the same [pooled] SegmentReader, but then it's the merge that closes the read-only clone of the reader. In this case the separately opened (by the NRT reader) doc stores are not closed by the merge thread. It's the mirror image of LUCENE-1639 . I've fixed it by pulling all shared readers in a SegmentReader into a separate static class (CoreReaders). Cloned SegmentReaders share the same instance of this class so that if a clone later opens the doc stores, any prior ancestor (that the clone was created from) would also close those readers if it's the reader to decRef to 0. I did something similar for LUCENE-1609 (which I'll now hit conflicts on after committing this... sigh). I plan to commit in a day or so.
        Hide
        Jason Rutherglen added a comment -

        The test now passes, needs to go in the patch, perhaps in
        TestIndexWriterReader? Great work on this, it's easier to
        understand SegmentReader now that all the shared objects are in
        one object (CoreReaders). It should make debugging go more
        smoothly.

        Is there a reason we're not synchronizing on SR.core in
        openDocStores? Couldn't we synchronize on core for the cloning
        methods?

        Show
        Jason Rutherglen added a comment - The test now passes, needs to go in the patch, perhaps in TestIndexWriterReader? Great work on this, it's easier to understand SegmentReader now that all the shared objects are in one object (CoreReaders). It should make debugging go more smoothly. Is there a reason we're not synchronizing on SR.core in openDocStores? Couldn't we synchronize on core for the cloning methods?
        Hide
        Michael McCandless added a comment -

        Is there a reason we're not synchronizing on SR.core in openDocStores?

        I was going to say "because IW sychronizes" but in fact it doesn't,
        properly, because when merging we go and open doc stores in
        unsynchronized context. So I'll synchronize(core) in
        SR.openDocStores.

        Couldn't we synchronize on core for the cloning methods?

        I don't think that's needed? The core is simply carried over to the
        newly cloned reader.

        Show
        Michael McCandless added a comment - Is there a reason we're not synchronizing on SR.core in openDocStores? I was going to say "because IW sychronizes" but in fact it doesn't, properly, because when merging we go and open doc stores in unsynchronized context. So I'll synchronize(core) in SR.openDocStores. Couldn't we synchronize on core for the cloning methods? I don't think that's needed? The core is simply carried over to the newly cloned reader.
        Hide
        Jason Rutherglen added a comment -

        I don't think that's needed? The core is simply carried
        over to the newly cloned reader.

        Right however wouldn't it be somewhat cleaner to sync on core
        for all clone operations given we don't want those to occur
        (external to IW) at the same time? Ultimately we want core to be
        the controller of it's resources rather than the SR being cloned?

        I ran the test with the SRMapValue sync code, (4 threads) with
        the sync on SR.core in openDocStore for 10 minutes, 2 core
        Windows XML laptop Java 6.14 and no errors. Then same with 2
        threads for 5 minutes and no errors. I'll keep on running it to
        see if we can get an error.

        I'm still a little confused as to why we're going to see the bug
        if readerPool.get is syncing on the SRMapValue. I guess there's
        a slight possibility of the error, and perhaps a more randomized
        test would produce it.

        Show
        Jason Rutherglen added a comment - I don't think that's needed? The core is simply carried over to the newly cloned reader. Right however wouldn't it be somewhat cleaner to sync on core for all clone operations given we don't want those to occur (external to IW) at the same time? Ultimately we want core to be the controller of it's resources rather than the SR being cloned? I ran the test with the SRMapValue sync code, (4 threads) with the sync on SR.core in openDocStore for 10 minutes, 2 core Windows XML laptop Java 6.14 and no errors. Then same with 2 threads for 5 minutes and no errors. I'll keep on running it to see if we can get an error. I'm still a little confused as to why we're going to see the bug if readerPool.get is syncing on the SRMapValue. I guess there's a slight possibility of the error, and perhaps a more randomized test would produce it.
        Hide
        Michael McCandless added a comment -

        Attached new patch (the patch is worse than it looks, because many
        things moved into the CoreReaders class):

        • Moved more stuff into CoreReaders (fieldInfos, dir, segment, etc.)
          and moved methods down as well (ctor, openDocStores, decRef).
        • Made members final when possible, else synchronized access to
          getting them (to avoid running amok of JMM).

        Right however wouldn't it be somewhat cleaner to sync on core
        for all clone operations given we don't want those to occur
        (external to IW) at the same time? Ultimately we want core to be
        the controller of it's resources rather than the SR being cloned?

        In fact, I'm not sure why cloning/reopening a segment needs to be
        synchronized at all.

        Sure it'd be weird for an app to send multiple threads down,
        attempting to reopen/clone the same SR or core at once, but from
        Lucene's standpoint there's nothing "wrong" with doing so, I think?

        (Though, DirectoryReader does need its sync when its transferring the
        write lock due to reopen on a reader w/ pending changes).

        I ran the test with the SRMapValue sync code, (4 threads) with
        the sync on SR.core in openDocStore for 10 minutes, 2 core
        Windows XML laptop Java 6.14 and no errors. Then same with 2
        threads for 5 minutes and no errors. I'll keep on running it to
        see if we can get an error.

        You could try inserting a testPoint (see the other testPoints in
        IndexWriter) after the SRMapValue is pulled from the hash but before
        we check if its reader is null, and then modify the threads in your
        test to randomly yield on that testPoint (by subclassing IW)? Ie
        "exacerbate" the path that exposes the hazard.

        I'm still a little confused as to why we're going to see the bug
        if readerPool.get is syncing on the SRMapValue. I guess there's
        a slight possibility of the error, and perhaps a more randomized
        test would produce it.

        The hazard exists because there's a time when no synchronization is
        held. Ie, you retrieve SRMapValue from the hash while sync'd on
        ReaderPool. You then leave that sync entirely (this is where hazard
        comes in), and next you sync on the SRMapValue. Another thread can
        sneak in and close the SRMapValue.reader during the time that no sync
        is held.

        Show
        Michael McCandless added a comment - Attached new patch (the patch is worse than it looks, because many things moved into the CoreReaders class): Moved more stuff into CoreReaders (fieldInfos, dir, segment, etc.) and moved methods down as well (ctor, openDocStores, decRef). Made members final when possible, else synchronized access to getting them (to avoid running amok of JMM). Right however wouldn't it be somewhat cleaner to sync on core for all clone operations given we don't want those to occur (external to IW) at the same time? Ultimately we want core to be the controller of it's resources rather than the SR being cloned? In fact, I'm not sure why cloning/reopening a segment needs to be synchronized at all. Sure it'd be weird for an app to send multiple threads down, attempting to reopen/clone the same SR or core at once, but from Lucene's standpoint there's nothing "wrong" with doing so, I think? (Though, DirectoryReader does need its sync when its transferring the write lock due to reopen on a reader w/ pending changes). I ran the test with the SRMapValue sync code, (4 threads) with the sync on SR.core in openDocStore for 10 minutes, 2 core Windows XML laptop Java 6.14 and no errors. Then same with 2 threads for 5 minutes and no errors. I'll keep on running it to see if we can get an error. You could try inserting a testPoint (see the other testPoints in IndexWriter) after the SRMapValue is pulled from the hash but before we check if its reader is null, and then modify the threads in your test to randomly yield on that testPoint (by subclassing IW)? Ie "exacerbate" the path that exposes the hazard. I'm still a little confused as to why we're going to see the bug if readerPool.get is syncing on the SRMapValue. I guess there's a slight possibility of the error, and perhaps a more randomized test would produce it. The hazard exists because there's a time when no synchronization is held. Ie, you retrieve SRMapValue from the hash while sync'd on ReaderPool. You then leave that sync entirely (this is where hazard comes in), and next you sync on the SRMapValue. Another thread can sneak in and close the SRMapValue.reader during the time that no sync is held.
        Hide
        Jason Rutherglen added a comment -

        try inserting a testPoint (see the other testPoints in
        IndexWriter) after the SRMapValue is pulled from the hash but before
        we check if its reader is null

        I added the test point, but tested without the yield, the method
        itself was enough of a delay to expose the exception.

        Show
        Jason Rutherglen added a comment - try inserting a testPoint (see the other testPoints in IndexWriter) after the SRMapValue is pulled from the hash but before we check if its reader is null I added the test point, but tested without the yield, the method itself was enough of a delay to expose the exception.
        Hide
        Jason Rutherglen added a comment -

        I haven't really figured out a clean way to move the reader
        creation out of the reader pool synchronization. It turns out to
        be somewhat tricky, unless we redesign our synchronization.

        One thing that came to mind is passing a lock object to SR's
        core (which would be the same lock on SRMapValue), which the
        incref/decref could sync on as well. Otherwise we've got
        synchronization in many places, IW, IW.readerPool, SR, SR.core.
        It would seem to make things brittle? Perhaps listing out the
        various reasons we're synchronizing, to see if we can
        consolidate some of them will help?

        Show
        Jason Rutherglen added a comment - I haven't really figured out a clean way to move the reader creation out of the reader pool synchronization. It turns out to be somewhat tricky, unless we redesign our synchronization. One thing that came to mind is passing a lock object to SR's core (which would be the same lock on SRMapValue), which the incref/decref could sync on as well. Otherwise we've got synchronization in many places, IW, IW.readerPool, SR, SR.core. It would seem to make things brittle? Perhaps listing out the various reasons we're synchronizing, to see if we can consolidate some of them will help?
        Hide
        Jason Rutherglen added a comment -

        Another idea is instantiate the SR.core.ref outside of the
        IW.readerPool, and pass it into the newly created reader. Then
        when we obtain SRMapValue, incref so we're keeping track of it's
        usage, which (I believe) should be inline with the normal usage
        of SR.ref (meaning don't close the reader if SRMV is checked
        out). This way we know when the SRMV is in use and different
        threads don't clobber each other creating and closing SRs using
        readerPool.

        Show
        Jason Rutherglen added a comment - Another idea is instantiate the SR.core.ref outside of the IW.readerPool, and pass it into the newly created reader. Then when we obtain SRMapValue, incref so we're keeping track of it's usage, which (I believe) should be inline with the normal usage of SR.ref (meaning don't close the reader if SRMV is checked out). This way we know when the SRMV is in use and different threads don't clobber each other creating and closing SRs using readerPool.
        Hide
        Michael McCandless added a comment -

        I haven't really figured out a clean way to move the reader
        creation out of the reader pool synchronization. It turns out to
        be somewhat tricky, unless we redesign our synchronization.

        Maybe we should simply hold off for now?

        I don't think this sync is costing much in practice, now.
        Ie, IndexReader.open is not concurrent when opening its segments; nor
        would we expect multiple threads to be calling IndexWriter.getReader
        concurrently.

        There is a wee bit of concurrency we are preventing, ie for a merge or
        applyDeletes to get a reader just as an NRT reader is being opened,
        but realistically 1) that's not very costly, and 2) we can't gain that
        concurrency back anyway because we synchronize on IW when opening the
        reader.

        Show
        Michael McCandless added a comment - I haven't really figured out a clean way to move the reader creation out of the reader pool synchronization. It turns out to be somewhat tricky, unless we redesign our synchronization. Maybe we should simply hold off for now? I don't think this sync is costing much in practice, now. Ie, IndexReader.open is not concurrent when opening its segments; nor would we expect multiple threads to be calling IndexWriter.getReader concurrently. There is a wee bit of concurrency we are preventing, ie for a merge or applyDeletes to get a reader just as an NRT reader is being opened, but realistically 1) that's not very costly, and 2) we can't gain that concurrency back anyway because we synchronize on IW when opening the reader.
        Hide
        Jason Rutherglen added a comment -

        I was thinking the sync on all of readerPool could delay someone
        trying to call IW.getReader who would wait for a potentially
        large new segment to be warmed. However because IW.mergeMiddle
        isn't loading the term index, IW.getReader will pay the cost of
        loading the term index. So yeah, it doesn't seem necessary.

        Show
        Jason Rutherglen added a comment - I was thinking the sync on all of readerPool could delay someone trying to call IW.getReader who would wait for a potentially large new segment to be warmed. However because IW.mergeMiddle isn't loading the term index, IW.getReader will pay the cost of loading the term index. So yeah, it doesn't seem necessary.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Jason Rutherglen
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 48h
              48h
              Remaining:
              Remaining Estimate - 48h
              48h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development