Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.3, 5.0
    • Component/s: modules/facet
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      If an application wants to use an IndexSearcher and TaxonomyReader in a SearcherManager-like fashion, it cannot use a separate SearcherManager, and say a TaxonomyReaderManager, because the IndexSearcher and TaxoReader instances need to be in sync. That is, the IS-TR pair must match, or otherwise the category ordinals that are encoded in the search index might not match the ones in the taxonomy index.

      This can happen if someone reopens the IndexSearcher's IndexReader, but does not refresh the TaxonomyReader, and the category ordinals that exist in the reopened IndexReader are not yet visible to the TaxonomyReader instance.

      I'd like to create a SearcherTaxoManager (which is a ReferenceManager) which manages an IndexSearcher and TaxonomyReader pair. Then an application will call:

      SearcherTaxoPair pair = manager.acquire();
      try {
        IndexSearcher searcher = pair.searcher;
        TaxonomyReader taxoReader = pair.taxoReader;
        // do something with them
      } finally {
        manager.release(pair);
        pair = null;
      }
      
      1. LUCENE-3786-3x-nocommit.patch
        22 kB
        Shai Erera
      2. LUCENE-3786.patch
        16 kB
        Michael McCandless
      3. LUCENE-3786.patch
        18 kB
        Michael McCandless
      4. LUCENE-3786.patch
        18 kB
        Michael McCandless
      5. LUCENE-3786.patch
        19 kB
        Michael McCandless

        Activity

        Hide
        Shai Erera added a comment -

        Removing 3.6 Fix version. If I'll make it by the release, I'll put it back.

        Show
        Shai Erera added a comment - Removing 3.6 Fix version. If I'll make it by the release, I'll put it back.
        Hide
        Michael McCandless added a comment -

        Patch w/ test.

        Show
        Michael McCandless added a comment - Patch w/ test.
        Hide
        Shai Erera added a comment -

        I actually started working on that a long time ago and had a patch which I never posted because it wasn't ready. Back then I was thinking of how can the search and taxonomy indexes be synced. I.e. in your patch, opening a pair on a Directory assumes that the pair is "valid", which may not be the case at all. For instance, when you commit such pair, you need to first commit IW, and only then TW. That way, TW always contains ordinals that are >= than what IW knows about, in which case they are "valid". So if the commit to TW succeeds and the commit to IW fails, you potentially don't end up in an inconsistent state. However, if an app makes a mistake and commits them in a different order, the pair may not be valid. I'm willing to live with a documentation that says "you should commit the pair like so...".

        But, if the app's commit logic is fine, yet it opened IW and TW with OpenMode.CREATE, then the taxonomy will include ordinals that may be completely unrelated to what IW stores, right? For that reason, I created (in the un-posted patch) a taxonomy timestamp class (which we can treat as version or something similar) which is written to both TW's and IW's commitData, and the manager checks for that at initialization to ensure the two actually match.

        The taxonomy writer records an index.epoch property on the internal IW commitData, which keeps track of how many times it has been recreated (opened w/ OpenMode.CREATE, or replaceTaxonomy). TaxoReader uses that on openIfChanged, returning a new instance if it is the case. I think I had issues recording that in the IW commitData as well, because you need to first commit IW, but the epoch on TW commitData is unknown until it is committed... and pulling it from DirTaxoWriter's member is dangerous because in between you might get a replaceTaxo call which increments the epoch.

        It's not that simple to keep these two in sync, so I put it aside until I have time to get back to it. Thanks for re-initiating!

        What do you think about this? This recreate thing is delicate. Since apps can call IW.deleteAll() as well as TW.replaceTaxonomy, I wish that we give a solution that works in all cases, rather than say this manager doesn't work if these methods were called.

        I think that we also need an object that manages IW and TW pair, so that a faceted search app calls commit() on it, and it handles the delicate commit order + whatever metadata we need to commit to make these two in sync. I wonder if it should be this manager? Then it can always take IW and TW pair, and offer both the acquire/release logic as well as commit.

        About the patch:

        • Why does the test uses newFSDirectory? I didn't read through, but can't it work with RAMDirectory too?
        • Manager.decRef()-- I think you should searcher.reader.incRef() if taxoReader.decRef() failed?
        • It's odd that acquire() throws IOE ... I realize it's because the decRef call in tryIncRef. I don't know if it's critical, but if it is, you may want to throw RuntimeEx?
        Show
        Shai Erera added a comment - I actually started working on that a long time ago and had a patch which I never posted because it wasn't ready. Back then I was thinking of how can the search and taxonomy indexes be synced. I.e. in your patch, opening a pair on a Directory assumes that the pair is "valid", which may not be the case at all. For instance, when you commit such pair, you need to first commit IW, and only then TW. That way, TW always contains ordinals that are >= than what IW knows about, in which case they are "valid". So if the commit to TW succeeds and the commit to IW fails, you potentially don't end up in an inconsistent state. However, if an app makes a mistake and commits them in a different order, the pair may not be valid. I'm willing to live with a documentation that says "you should commit the pair like so...". But, if the app's commit logic is fine, yet it opened IW and TW with OpenMode.CREATE, then the taxonomy will include ordinals that may be completely unrelated to what IW stores, right? For that reason, I created (in the un-posted patch) a taxonomy timestamp class (which we can treat as version or something similar) which is written to both TW's and IW's commitData, and the manager checks for that at initialization to ensure the two actually match. The taxonomy writer records an index.epoch property on the internal IW commitData, which keeps track of how many times it has been recreated (opened w/ OpenMode.CREATE, or replaceTaxonomy). TaxoReader uses that on openIfChanged, returning a new instance if it is the case. I think I had issues recording that in the IW commitData as well, because you need to first commit IW, but the epoch on TW commitData is unknown until it is committed... and pulling it from DirTaxoWriter's member is dangerous because in between you might get a replaceTaxo call which increments the epoch. It's not that simple to keep these two in sync, so I put it aside until I have time to get back to it. Thanks for re-initiating! What do you think about this? This recreate thing is delicate. Since apps can call IW.deleteAll() as well as TW.replaceTaxonomy, I wish that we give a solution that works in all cases, rather than say this manager doesn't work if these methods were called. I think that we also need an object that manages IW and TW pair, so that a faceted search app calls commit() on it, and it handles the delicate commit order + whatever metadata we need to commit to make these two in sync. I wonder if it should be this manager? Then it can always take IW and TW pair, and offer both the acquire/release logic as well as commit. About the patch: Why does the test uses newFSDirectory? I didn't read through, but can't it work with RAMDirectory too? Manager.decRef()-- I think you should searcher.reader.incRef() if taxoReader.decRef() failed? It's odd that acquire() throws IOE ... I realize it's because the decRef call in tryIncRef. I don't know if it's critical, but if it is, you may want to throw RuntimeEx?
        Hide
        Shai Erera added a comment -

        Here's the patch I ended up with when working on this on top of 3.x (don't remember if it was 3.5 or 3.6). This is not intended for commit, but you many want to look at the manager and its validation logic. Not sure how much of it is still relevant, except the recreate scenario.

        Show
        Shai Erera added a comment - Here's the patch I ended up with when working on this on top of 3.x (don't remember if it was 3.5 or 3.6). This is not intended for commit, but you many want to look at the manager and its validation logic. Not sure how much of it is still relevant, except the recreate scenario.
        Hide
        Michael McCandless added a comment -

        OK I discussed these tricky issues with Shai ... with the non-NRT case
        (app commits and then calls .maybeRefresh) there are some big
        challenges.

        First off, the app must always commit IW first then TW. But second
        off, even if it does that, there is at least this multi-threaded case
        where .maybeRefresh can screw up:

        • Thread 1 (indexer) commits IW1
        • Thread 1 (indexer) commits TW1
        • Thread 2 (indexer) commits IW2
        • Thread 3 (searcher) maybeRefresh opens IW2
        • Thread 3 (searcher) maybeRefresh opens TW1
        • Thread 1 (indexer) commits TW2

        That will then lead to confusing AIOOBEs during facet counting...

        Net/net I think there's too much hair around supporting the non-NRT
        case, and I think for starters we should just support NRT, ie you must
        pass IW and TW to STM's ctor. Then STM is agnostic to what commits
        are being done ... commit is only for durability purposes.

        We must still document that you cannot do IW.deleteAll /
        TW.replaceTaxonomy (I'll add it).

        Why does the test uses newFSDirectory?

        Just because it's using the LineFileDocs, which have biggish docs in
        them. Add in -Dtests.nightly, -Dtests.multiplier=3, and it could
        maybe be we are pushing the 512 MB RAM limit...

        Manager.decRef()-- I think you should searcher.reader.incRef() if taxoReader.decRef() failed?

        Hmm this isn't so simple: that decRef could have closed the reader. I
        suppose I could do a "best effort" tryIncRef so that if the app
        somehow catches the exception and retries the decRef we don't
        prematurely close the reader ...

        It's odd that acquire() throws IOE ... I realize it's because the decRef call in tryIncRef. I don't know if it's critical, but if it is, you may want to throw RuntimeEx?

        I think it's OK to add IOE to the signature?

        Show
        Michael McCandless added a comment - OK I discussed these tricky issues with Shai ... with the non-NRT case (app commits and then calls .maybeRefresh) there are some big challenges. First off, the app must always commit IW first then TW. But second off, even if it does that, there is at least this multi-threaded case where .maybeRefresh can screw up: Thread 1 (indexer) commits IW1 Thread 1 (indexer) commits TW1 Thread 2 (indexer) commits IW2 Thread 3 (searcher) maybeRefresh opens IW2 Thread 3 (searcher) maybeRefresh opens TW1 Thread 1 (indexer) commits TW2 That will then lead to confusing AIOOBEs during facet counting... Net/net I think there's too much hair around supporting the non-NRT case, and I think for starters we should just support NRT, ie you must pass IW and TW to STM's ctor. Then STM is agnostic to what commits are being done ... commit is only for durability purposes. We must still document that you cannot do IW.deleteAll / TW.replaceTaxonomy (I'll add it). Why does the test uses newFSDirectory? Just because it's using the LineFileDocs, which have biggish docs in them. Add in -Dtests.nightly, -Dtests.multiplier=3, and it could maybe be we are pushing the 512 MB RAM limit... Manager.decRef()-- I think you should searcher.reader.incRef() if taxoReader.decRef() failed? Hmm this isn't so simple: that decRef could have closed the reader. I suppose I could do a "best effort" tryIncRef so that if the app somehow catches the exception and retries the decRef we don't prematurely close the reader ... It's odd that acquire() throws IOE ... I realize it's because the decRef call in tryIncRef. I don't know if it's critical, but if it is, you may want to throw RuntimeEx? I think it's OK to add IOE to the signature?
        Hide
        Shai Erera added a comment -

        I think it's OK to add IOE to the signature?

        Ok.

        that decRef could have closed the reader

        Hmm ... if we assume that this TR/IR pair is managed only by that manager, then an IOE thrown from decRef could only be caused by closing the reader, right?
        So if you successfully IR.decRef() but fail to TR.decRef(), it means that IR is closed already right? Therefore there's no point to even tryIncRef?

        Just because it's using the LineFileDocs

        Ahh ok. As I said, I didn't read the test through. I will review the patch after you post a new version.

        Show
        Shai Erera added a comment - I think it's OK to add IOE to the signature? Ok. that decRef could have closed the reader Hmm ... if we assume that this TR/IR pair is managed only by that manager, then an IOE thrown from decRef could only be caused by closing the reader, right? So if you successfully IR.decRef() but fail to TR.decRef(), it means that IR is closed already right? Therefore there's no point to even tryIncRef? Just because it's using the LineFileDocs Ahh ok. As I said, I didn't read the test through. I will review the patch after you post a new version.
        Hide
        Michael McCandless added a comment -

        New patch, just handling the NRT case.

        Show
        Michael McCandless added a comment - New patch, just handling the NRT case.
        Hide
        Michael McCandless added a comment -

        that decRef could have closed the reader

        Hmm ... if we assume that this TR/IR pair is managed only by that manager, then an IOE thrown from decRef could only be caused by closing the reader, right?
        So if you successfully IR.decRef() but fail to TR.decRef(), it means that IR is closed already right? Therefore there's no point to even tryIncRef?

        You're right ... so I just left the two decRefs in the patch ...

        Show
        Michael McCandless added a comment - that decRef could have closed the reader Hmm ... if we assume that this TR/IR pair is managed only by that manager, then an IOE thrown from decRef could only be caused by closing the reader, right? So if you successfully IR.decRef() but fail to TR.decRef(), it means that IR is closed already right? Therefore there's no point to even tryIncRef? You're right ... so I just left the two decRefs in the patch ...
        Hide
        Shai Erera added a comment -

        Few comments:

        • This assert in the test assertEquals(1, results.size()); is kinda moot because we always return a FacetResult, even if empty. Perhaps you can assert that if the acquired reader.maxDoc is > 0, the returned FacetResult.rootNode has at least one child with count that is at least 1?
        • Maybe change the end of the test to a single-line IOUtils.close()?
        • You wrote previously that the test uses LineFileDocs, but I don't see it. It seems it only adds facets to documents? If so, can it go back to newDirectory()?
        • It's good that you identify replaceTaxonomy, makes the code safer.
        • TR.getTaxoEpoch: maybe instead of adding it to TR you can use the one on DTW (make it public, @lucene.internal)? It's odd that it documents that this epoch is returned only for an NRT TR, because the epoch is recorded on the taxo index commit data, so conceptually there's no reason why it shouldn't always return it. Yet, since this epoch is used internally, between TW and TR, I prefer not to expose it too much. Hmmm, but then you may hit a false positive where the returned TR is valid, yet just in between the checks the app called replaceTaxo. But I think that's ok since it means the check will fail on the next refresh attempt. Really, if ever DTW.epoch changes, we should fail.
        • I don't know how important it is, but perhaps given the short discussion we had above, it would be good to add a 1-liner to decRef why the method seems unprotected, but in reality it's the best we can do?
        • In refreshIfNeeded, I understand this code newReader.decRef() is equivalent to closing newReader (if epoch has changed). But after I received a question yesterday from a someone who did not understand why we don't call close(), perhaps we should, for clarity?

        Otherwise this looks great! When I worked on it in the past, DTR wasn't NRT and the sync was a nightmare. Making it NRT really simplified this manager!

        Show
        Shai Erera added a comment - Few comments: This assert in the test assertEquals(1, results.size()); is kinda moot because we always return a FacetResult, even if empty. Perhaps you can assert that if the acquired reader.maxDoc is > 0, the returned FacetResult.rootNode has at least one child with count that is at least 1? Maybe change the end of the test to a single-line IOUtils.close()? You wrote previously that the test uses LineFileDocs, but I don't see it. It seems it only adds facets to documents? If so, can it go back to newDirectory()? It's good that you identify replaceTaxonomy, makes the code safer. TR.getTaxoEpoch: maybe instead of adding it to TR you can use the one on DTW (make it public, @lucene.internal)? It's odd that it documents that this epoch is returned only for an NRT TR, because the epoch is recorded on the taxo index commit data, so conceptually there's no reason why it shouldn't always return it. Yet, since this epoch is used internally, between TW and TR, I prefer not to expose it too much. Hmmm, but then you may hit a false positive where the returned TR is valid, yet just in between the checks the app called replaceTaxo. But I think that's ok since it means the check will fail on the next refresh attempt. Really, if ever DTW.epoch changes, we should fail. I don't know how important it is, but perhaps given the short discussion we had above, it would be good to add a 1-liner to decRef why the method seems unprotected, but in reality it's the best we can do? In refreshIfNeeded, I understand this code newReader.decRef() is equivalent to closing newReader (if epoch has changed). But after I received a question yesterday from a someone who did not understand why we don't call close(), perhaps we should, for clarity? Otherwise this looks great! When I worked on it in the past, DTR wasn't NRT and the sync was a nightmare. Making it NRT really simplified this manager!
        Hide
        Michael McCandless added a comment -

        Thanks for all the feedback Shai, I incorporated it all except for
        this one:

        TR.getTaxoEpoch: maybe instead of adding it to TR you can use the one on DTW (make it public, @lucene.internal)? It's odd that it documents that this epoch is returned only for an NRT TR, because the epoch is recorded on the taxo index commit data, so conceptually there's no reason why it shouldn't always return it. Yet, since this epoch is used internally, between TW and TR, I prefer not to expose it too much. Hmmm, but then you may hit a false positive where the returned TR is valid, yet just in between the checks the app called replaceTaxo. But I think that's ok since it means the check will fail on the next refresh attempt. Really, if ever DTW.epoch changes, we should fail.

        I don't like that cutting over to DTW would open up the thread hazard
        that we fail to catch the replace ... admittedly it'd be rare but why
        open it up? The added Expert/@lucene.internal method seems minor ...

        When I worked on it in the past, DTR wasn't NRT and the sync was a nightmare. Making it NRT really simplified this manager!

        Thank you for doing all the hard work first (making DTR NRT)

        Show
        Michael McCandless added a comment - Thanks for all the feedback Shai, I incorporated it all except for this one: TR.getTaxoEpoch: maybe instead of adding it to TR you can use the one on DTW (make it public, @lucene.internal)? It's odd that it documents that this epoch is returned only for an NRT TR, because the epoch is recorded on the taxo index commit data, so conceptually there's no reason why it shouldn't always return it. Yet, since this epoch is used internally, between TW and TR, I prefer not to expose it too much. Hmmm, but then you may hit a false positive where the returned TR is valid, yet just in between the checks the app called replaceTaxo. But I think that's ok since it means the check will fail on the next refresh attempt. Really, if ever DTW.epoch changes, we should fail. I don't like that cutting over to DTW would open up the thread hazard that we fail to catch the replace ... admittedly it'd be rare but why open it up? The added Expert/@lucene.internal method seems minor ... When I worked on it in the past, DTR wasn't NRT and the sync was a nightmare. Making it NRT really simplified this manager! Thank you for doing all the hard work first (making DTR NRT)
        Hide
        Shai Erera added a comment -

        I reviewed again, and now that you switch to calling close() instead of decRef(), I think the close() should be done via IOUtils.close, to avoid a potential close failure (newReader.close()) and leave behind a dangling TR?

        Also test() also has these 5 close() statements which can be folded into one IOUtils. But that's just style.

        I don't like that cutting over to DTW would open up the thread hazard that we fail to catch the replace

        What I meant is that if instead of checking epoch on TR you check on DTW, you won't (I think!) get into that hazard. That is: (1) reopen TR (2) check if DTW.epoch is different than the one the manager was created with. The only false positive in this case is that the TR might be valid (i.e. in between 1 and 2 a replaceTaxo was called), but I think that's ok to throw the exception, since on the next refresh you'll fail anyway?

        Anyway, I don't have any strong feelings about it. It is indeed safer to put it on TR and let TR provide evidence "for itself". If you want to keep it, can you make getEpoch public on both DTW and DTR, to be consistent.

        Show
        Shai Erera added a comment - I reviewed again, and now that you switch to calling close() instead of decRef(), I think the close() should be done via IOUtils.close, to avoid a potential close failure (newReader.close()) and leave behind a dangling TR? Also test() also has these 5 close() statements which can be folded into one IOUtils. But that's just style. I don't like that cutting over to DTW would open up the thread hazard that we fail to catch the replace What I meant is that if instead of checking epoch on TR you check on DTW, you won't (I think!) get into that hazard. That is: (1) reopen TR (2) check if DTW.epoch is different than the one the manager was created with. The only false positive in this case is that the TR might be valid (i.e. in between 1 and 2 a replaceTaxo was called), but I think that's ok to throw the exception, since on the next refresh you'll fail anyway? Anyway, I don't have any strong feelings about it. It is indeed safer to put it on TR and let TR provide evidence "for itself". If you want to keep it, can you make getEpoch public on both DTW and DTR, to be consistent.
        Hide
        Michael McCandless added a comment -

        I reviewed again, and now that you switch to calling close() instead of decRef(), I think the close() should be done via IOUtils.close, to avoid a potential close failure (newReader.close()) and leave behind a dangling TR?

        Good, I'll fix that.

        Also test() also has these 5 close() statements which can be folded into one IOUtils. But that's just style.

        Woops, I missed that one ... I'll fix.

        What I meant is that if instead of checking epoch on TR you check on DTW, you won't (I think!) get into that hazard.

        Ahh, right, as long as I check taxoWriter after the reopen: good! I'll fix to just use DTW...

        Show
        Michael McCandless added a comment - I reviewed again, and now that you switch to calling close() instead of decRef(), I think the close() should be done via IOUtils.close, to avoid a potential close failure (newReader.close()) and leave behind a dangling TR? Good, I'll fix that. Also test() also has these 5 close() statements which can be folded into one IOUtils. But that's just style. Woops, I missed that one ... I'll fix. What I meant is that if instead of checking epoch on TR you check on DTW, you won't (I think!) get into that hazard. Ahh, right, as long as I check taxoWriter after the reopen: good! I'll fix to just use DTW...
        Hide
        Michael McCandless added a comment -

        New patch w/ last round of changes ... thanks Shai!

        Show
        Michael McCandless added a comment - New patch w/ last round of changes ... thanks Shai!
        Hide
        Shai Erera added a comment -

        Looks good, +1. Thanks for doing the work Mike!

        Show
        Shai Erera added a comment - Looks good, +1. Thanks for doing the work Mike!
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development