Lucene - Core
  1. Lucene - Core
  2. LUCENE-2474

Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey)

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Allow to plug in a Cache Eviction Listener to IndexReader to eagerly clean custom caches that use the IndexReader (getFieldCacheKey).

      A spin of: https://issues.apache.org/jira/browse/LUCENE-2468. Basically, its make a lot of sense to cache things based on IndexReader#getFieldCacheKey, even Lucene itself uses it, for example, with the CachingWrapperFilter. FieldCache enjoys being called explicitly to purge its cache when possible (which is tricky to know from the "outside", especially when using NRT - reader attack of the clones).

      The provided patch allows to plug a CacheEvictionListener which will be called when the cache should be purged for an IndexReader.

      1. LUCENE-2474.patch
        20 kB
        Michael McCandless
      2. LUCENE-2474.patch
        10 kB
        Michael McCandless
      3. LUCENE-2474.patch
        6 kB
        Shay Banon
      4. LUCENE-2574.patch
        43 kB
        Michael McCandless
      5. MapBackedSet.java
        1 kB
        Shay Banon

        Activity

        Hide
        Shay Banon added a comment -

        First revision of the patch, tell me what you think... .

        Show
        Shay Banon added a comment - First revision of the patch, tell me what you think... .
        Hide
        Michael McCandless added a comment -

        Should we rename this to "CloseEventListener"? Ie, when an IR is closed it'll notify those subscribers who asked to find out?

        Also, shouldn't the FieldCache's listener be created/installed from FieldCache, not from IR? Ie, when FieldCache creates an entry it should at that point ask the reader to notify it when that reader is closed?

        Show
        Michael McCandless added a comment - Should we rename this to "CloseEventListener"? Ie, when an IR is closed it'll notify those subscribers who asked to find out? Also, shouldn't the FieldCache's listener be created/installed from FieldCache, not from IR? Ie, when FieldCache creates an entry it should at that point ask the reader to notify it when that reader is closed?
        Hide
        Yonik Seeley added a comment -

        There's a lot of little issues here I think:

        • "close event" doesn't really describe the behavior since an event is not generated on every close of every reader as one might expect.
        • This implementation is problematic since higher level readers don't propagate the event listeners to subreaders... i.e. I need to walk the tree myself and add add a listener to every reader, and on a reopen() I would need to walk the tree again and add listeners only to the new readers that have a new coreKey.

        We've talked before about putting caches directly on the readers - that still seems like the most straightforward approach?
        We really need one cache that doesn't care about deletions, and one cache that does.

        Show
        Yonik Seeley added a comment - There's a lot of little issues here I think: "close event" doesn't really describe the behavior since an event is not generated on every close of every reader as one might expect. This implementation is problematic since higher level readers don't propagate the event listeners to subreaders... i.e. I need to walk the tree myself and add add a listener to every reader, and on a reopen() I would need to walk the tree again and add listeners only to the new readers that have a new coreKey. We've talked before about putting caches directly on the readers - that still seems like the most straightforward approach? We really need one cache that doesn't care about deletions, and one cache that does.
        Hide
        Michael McCandless added a comment -

        "close event" doesn't really describe the behavior since an event is not generated on every close of every reader as one might expect.

        Right, it's really more like a "segment is unloaded" event. Ie a single segment can have many cloned/reopened SegmentReaders, all sharing the same "core" (= same cache entry eg in FieldCache)... when this event occurs in means all SegmentReaders for a given segment have been closed. But, then we also need to generate this for toplevel readers, since [horribly] such readers are still allowed into eg the FieldCache.

        This implementation is problematic since higher level readers don't propagate the event listeners to subreaders... i.e. I need to walk the tree myself and add add a listener to every reader, and on a reopen() I would need to walk the tree again and add listeners only to the new readers that have a new coreKey.

        I think we should just fix that, ie so your listener is propagated to the subs and to reopened readers (and their subs and their reopens, etc.).

        We've talked before about putting caches directly on the readers - that still seems like the most straightforward approach?

        This would be great, but I'm not sure I'd call it straightforward I think a separate baby step (ie this proposed approach) is fine for today?

        We really need one cache that doesn't care about deletions, and one cache that does.

        And maybe norms, since they too can change in cloned SegmentReaders that otherwise share the same core. Or, maybe we make the "core" a first class object, and you interact with it to cache things that don't care about changes to deletions/norms. Or, the core could just the first SegmentReader to be opened on this segment. Or something.

        I think Earwin has also worked out some sort of caching model w/ IndexReaders... Earwin, how do you handle timely eviction?

        Show
        Michael McCandless added a comment - "close event" doesn't really describe the behavior since an event is not generated on every close of every reader as one might expect. Right, it's really more like a "segment is unloaded" event. Ie a single segment can have many cloned/reopened SegmentReaders, all sharing the same "core" (= same cache entry eg in FieldCache)... when this event occurs in means all SegmentReaders for a given segment have been closed. But, then we also need to generate this for toplevel readers, since [horribly] such readers are still allowed into eg the FieldCache. This implementation is problematic since higher level readers don't propagate the event listeners to subreaders... i.e. I need to walk the tree myself and add add a listener to every reader, and on a reopen() I would need to walk the tree again and add listeners only to the new readers that have a new coreKey. I think we should just fix that, ie so your listener is propagated to the subs and to reopened readers (and their subs and their reopens, etc.). We've talked before about putting caches directly on the readers - that still seems like the most straightforward approach? This would be great, but I'm not sure I'd call it straightforward I think a separate baby step (ie this proposed approach) is fine for today? We really need one cache that doesn't care about deletions, and one cache that does. And maybe norms, since they too can change in cloned SegmentReaders that otherwise share the same core. Or, maybe we make the "core" a first class object, and you interact with it to cache things that don't care about changes to deletions/norms. Or, the core could just the first SegmentReader to be opened on this segment. Or something. I think Earwin has also worked out some sort of caching model w/ IndexReaders... Earwin, how do you handle timely eviction?
        Hide
        Shay Banon added a comment -

        Right, I was thinking that its a low level API that you can just add it to the low level readers, but I agree, it will be nicer to have it on the high level as well. Regarding the close method name, I guess we can name it similar to the FieldCache one, maybe purge?

        > We've talked before about putting caches directly on the readers - that still seems like the most straightforward approach?

        not sure I understand that. Do you mean getting FieldCache into the readers? And then what about cached filters? And other custom caching constructs that rely on the same mechanism as the CachingWrapperFilter?

        I think that if one implements such caching, its an advance enough feature where you should know how to handle deletes and other tidbits (if you need to).

        > We really need one cache that doesn't care about deletions, and one cache that does.

        Isn't that up to the cache to decide? That cache can be anything (internally implemented in Lucene or externally) that follows the mechanism of caching based on (segment) readers. As long as there are constructs to get the deleted docs to handle deletes (for example), then the implementation can use it.

        Show
        Shay Banon added a comment - Right, I was thinking that its a low level API that you can just add it to the low level readers, but I agree, it will be nicer to have it on the high level as well. Regarding the close method name, I guess we can name it similar to the FieldCache one, maybe purge? > We've talked before about putting caches directly on the readers - that still seems like the most straightforward approach? not sure I understand that. Do you mean getting FieldCache into the readers? And then what about cached filters? And other custom caching constructs that rely on the same mechanism as the CachingWrapperFilter? I think that if one implements such caching, its an advance enough feature where you should know how to handle deletes and other tidbits (if you need to). > We really need one cache that doesn't care about deletions, and one cache that does. Isn't that up to the cache to decide? That cache can be anything (internally implemented in Lucene or externally) that follows the mechanism of caching based on (segment) readers. As long as there are constructs to get the deleted docs to handle deletes (for example), then the implementation can use it.
        Hide
        Yonik Seeley added a comment - - edited

        > > We've talked before about putting caches directly on the readers - that still seems like the most straightforward approach?
        > not sure I understand that. Do you mean getting FieldCache into the readers? And then what about cached filters?

        It would be a cache of anything... one element of that cache would be the FieldCache, there could be one for filters, or one entry per-filter.
        edit: Maybe a better way to think about it is like a ServletContext or something - it's just a way to attach anything arbitrary to a reader.

        > > We really need one cache that doesn't care about deletions, and one cache that does.
        > Isn't that up to the cache to decide?

        Not with this current patch, as there is no mechanism to get a callback when you do care about deletes. If I want to cache something that depends on deletions, I want to purge that cache when the actual reader is closed (as opposed to the reader's core cache key that is shared amongst all readers that just have different deletions). So if we go a "close event" route, we really want two different events... one for the close of a reader (i.e. deleted matter), and one for the close of the segment (deletes don't matter).

        Show
        Yonik Seeley added a comment - - edited > > We've talked before about putting caches directly on the readers - that still seems like the most straightforward approach? > not sure I understand that. Do you mean getting FieldCache into the readers? And then what about cached filters? It would be a cache of anything... one element of that cache would be the FieldCache, there could be one for filters, or one entry per-filter. edit: Maybe a better way to think about it is like a ServletContext or something - it's just a way to attach anything arbitrary to a reader. > > We really need one cache that doesn't care about deletions, and one cache that does. > Isn't that up to the cache to decide? Not with this current patch, as there is no mechanism to get a callback when you do care about deletes. If I want to cache something that depends on deletions, I want to purge that cache when the actual reader is closed (as opposed to the reader's core cache key that is shared amongst all readers that just have different deletions). So if we go a "close event" route, we really want two different events... one for the close of a reader (i.e. deleted matter), and one for the close of the segment (deletes don't matter).
        Hide
        Shay Banon added a comment -

        > It would be a cache of anything... one element of that cache would be the FieldCache, there could be one for filters, or one entry per-filter.
        > edit: Maybe a better way to think about it is like a ServletContext or something - it's just a way to attach anything arbitrary to a reader.

        Got you. My personal taste is to try and keep those readers as lightweight as possible, and have the proper constructs in place to allow to externally use them for caching, without having them manage it as well.

        > Not with this current patch, as there is no mechanism to get a callback when you do care about deletes. If I want to cache something that depends on deletions, I want to purge that cache when the actual reader is closed (as opposed to the reader's core cache key that is shared amongst all readers that just have different deletions). So if we go a "close event" route, we really want two different events... one for the close of a reader (i.e. deleted matter), and one for the close of the segment (deletes don't matter).

        I think that a cache that is affected by deletes is a problematic cache to begin with, so was thinking that maybe it should be discouraged by not allowing for it. Especially with NRT. My idea was to simply expand the purge capability that the FC gets for free to other external custom components.

        Also, if we did have a type safe separation between segment readers and compound readers, I would not have added the ability to register a listener on the compound readers, just the segment readers, as this will encourage people to write caches that only work on segment readers (since the registration for the "purge event" will happen within the cache, and it should work only with segment readers). That was why my patch does not take compound readers into account.

        Show
        Shay Banon added a comment - > It would be a cache of anything... one element of that cache would be the FieldCache, there could be one for filters, or one entry per-filter. > edit: Maybe a better way to think about it is like a ServletContext or something - it's just a way to attach anything arbitrary to a reader. Got you. My personal taste is to try and keep those readers as lightweight as possible, and have the proper constructs in place to allow to externally use them for caching, without having them manage it as well. > Not with this current patch, as there is no mechanism to get a callback when you do care about deletes. If I want to cache something that depends on deletions, I want to purge that cache when the actual reader is closed (as opposed to the reader's core cache key that is shared amongst all readers that just have different deletions). So if we go a "close event" route, we really want two different events... one for the close of a reader (i.e. deleted matter), and one for the close of the segment (deletes don't matter). I think that a cache that is affected by deletes is a problematic cache to begin with, so was thinking that maybe it should be discouraged by not allowing for it. Especially with NRT. My idea was to simply expand the purge capability that the FC gets for free to other external custom components. Also, if we did have a type safe separation between segment readers and compound readers, I would not have added the ability to register a listener on the compound readers, just the segment readers, as this will encourage people to write caches that only work on segment readers (since the registration for the "purge event" will happen within the cache, and it should work only with segment readers). That was why my patch does not take compound readers into account.
        Hide
        Michael McCandless added a comment -

        My idea was to simply expand the purge capability that the FC gets for free to other external custom components.

        I think that's a good baby step, for this issue, today, just for segment readers.

        Separately, eventually, we can tackle the bigger challenge of allowing caching on any reader.

        Also, if we did have a type safe separation between segment readers and compound readers, I would not have added the ability to register a listener on the compound readers, just the segment readers, as this will encourage people to write caches that only work on segment readers (since the registration for the "purge event" will happen within the cache, and it should work only with segment readers). That was why my patch does not take compound readers into account.

        I very much want to get type safe IndexReaders done before 4.0...

        But: I think we'd want to have composite reader just forward the registration down to the atomic readers? (And, forward on reopen).

        Show
        Michael McCandless added a comment - My idea was to simply expand the purge capability that the FC gets for free to other external custom components. I think that's a good baby step, for this issue, today, just for segment readers. Separately, eventually, we can tackle the bigger challenge of allowing caching on any reader. Also, if we did have a type safe separation between segment readers and compound readers, I would not have added the ability to register a listener on the compound readers, just the segment readers, as this will encourage people to write caches that only work on segment readers (since the registration for the "purge event" will happen within the cache, and it should work only with segment readers). That was why my patch does not take compound readers into account. I very much want to get type safe IndexReaders done before 4.0... But: I think we'd want to have composite reader just forward the registration down to the atomic readers? (And, forward on reopen).
        Hide
        Shay Banon added a comment -

        But: I think we'd want to have composite reader just forward the registration down to the atomic readers? (And, forward on reopen).

        I am not sure that you would want to do it. Any caching layer or an external component that is properly written would work on the low level segment readers, it will not even compile against compound readers. This will help direct people to write proper code and dealing only with segment readers.

        Show
        Shay Banon added a comment - But: I think we'd want to have composite reader just forward the registration down to the atomic readers? (And, forward on reopen). I am not sure that you would want to do it. Any caching layer or an external component that is properly written would work on the low level segment readers, it will not even compile against compound readers. This will help direct people to write proper code and dealing only with segment readers.
        Hide
        Yonik Seeley added a comment -

        > > But: I think we'd want to have composite reader just forward the registration down to the atomic readers? (And, forward on reopen).

        +1

        > >I am not sure that you would want to do it.

        So a user doesn't have to walk the tree and re-register it's listeners with every new segment on a reopen.

        > Any caching layer or an external component that is properly written would work on the low level segment readers

        It's completely proper to cache stuff wrt the top level reader - Solr does so (and will continue to provide that option) . Some people update their index infrequently, and the performance gains of having cached information on a unified view of the index is a win for them.

        Show
        Yonik Seeley added a comment - > > But: I think we'd want to have composite reader just forward the registration down to the atomic readers? (And, forward on reopen). +1 > >I am not sure that you would want to do it. So a user doesn't have to walk the tree and re-register it's listeners with every new segment on a reopen. > Any caching layer or an external component that is properly written would work on the low level segment readers It's completely proper to cache stuff wrt the top level reader - Solr does so (and will continue to provide that option) . Some people update their index infrequently, and the performance gains of having cached information on a unified view of the index is a win for them.
        Hide
        Michael McCandless added a comment -

        I started to implement the "forwards to all subs and to all reopened readers" and... it's kinda hairy. I mean there are TONS of places where we make new readers (Earwin's working on improving this, I think, under LUCENE-2355).

        So then I wondered: what if we just make this a static method, eg on IndexReader, add/removeReaderFinishedListener? (Or we could put it on FieldCache). That'd be a tiny change...

        Show
        Michael McCandless added a comment - I started to implement the "forwards to all subs and to all reopened readers" and... it's kinda hairy. I mean there are TONS of places where we make new readers (Earwin's working on improving this, I think, under LUCENE-2355 ). So then I wondered: what if we just make this a static method, eg on IndexReader, add/removeReaderFinishedListener? (Or we could put it on FieldCache). That'd be a tiny change...
        Hide
        Jason Rutherglen added a comment -

        make this a static method, eg on IndexReader, add/removeReaderFinishedListener? (Or we could put it on FieldCache). That'd be a tiny change...

        This makes the most sense however it feels temporary as should should probably move to a unified IWC/IRC config where all parameters are set and shared for writers and readers? This way we can eventually coordinate things like IO scheduling, eg, LUCENE-2793's IOContext. Also shouldn't there simply be a reader event listener and perhaps even a writer event listener?

        Show
        Jason Rutherglen added a comment - make this a static method, eg on IndexReader, add/removeReaderFinishedListener? (Or we could put it on FieldCache). That'd be a tiny change... This makes the most sense however it feels temporary as should should probably move to a unified IWC/IRC config where all parameters are set and shared for writers and readers? This way we can eventually coordinate things like IO scheduling, eg, LUCENE-2793 's IOContext. Also shouldn't there simply be a reader event listener and perhaps even a writer event listener?
        Hide
        Michael McCandless added a comment -

        I think we can generalize this to any event and to writer in the future... for today, just letting something external be notified when a reader is "gone", just as FieldCache is privately notified today, is a good baby step.

        Show
        Michael McCandless added a comment - I think we can generalize this to any event and to writer in the future... for today, just letting something external be notified when a reader is "gone", just as FieldCache is privately notified today, is a good baby step.
        Hide
        Michael McCandless added a comment -

        OK, here's a patch exposing the readerFinishedListeners as static methods on IndexReader.

        It was also nice to consolidate all the various places we were previously purging the FieldCache.

        Show
        Michael McCandless added a comment - OK, here's a patch exposing the readerFinishedListeners as static methods on IndexReader. It was also nice to consolidate all the various places we were previously purging the FieldCache.
        Hide
        Earwin Burrfoot added a comment -

        Earwin's working on improving this, I think, under LUCENE-2355

        I stalled, and then there were just so many changes under trunk, so I have to restart now Thanks for another kick.

        Show
        Earwin Burrfoot added a comment - Earwin's working on improving this, I think, under LUCENE-2355 I stalled, and then there were just so many changes under trunk, so I have to restart now Thanks for another kick.
        Hide
        Shay Banon added a comment -

        OK, here's a patch exposing the readerFinishedListeners as static methods on IndexReader.

        I think we should use a CopyOneWriteArrayList so calling the listeners will not happen under a global synchronize block. If maintaining set behavior is required, then I can patch with a ConcurrentHashSet implementation or we can simply replace it with a CHM with PRESENT, or any other solution that does not require calling the listeners under a global sync block.

        Show
        Shay Banon added a comment - OK, here's a patch exposing the readerFinishedListeners as static methods on IndexReader. I think we should use a CopyOneWriteArrayList so calling the listeners will not happen under a global synchronize block. If maintaining set behavior is required, then I can patch with a ConcurrentHashSet implementation or we can simply replace it with a CHM with PRESENT, or any other solution that does not require calling the listeners under a global sync block.
        Hide
        Michael McCandless added a comment -

        Ahh, I get it – invoking the listeners (on cache evict) is dangerous to do under a global lock since they could conceivably be costly.

        I had switched to Set to try to prevent silliness in the event that an app adds same listener over & over (w/o removing it), and also to not have O(N^2) cost when removing listeners. I mean, it is an expert API, but I still think we should attempt to be defensive against silliness?

        How about CHM? (There is not builtin CHS, right? And HS just wraps an HM anyway....).

        Show
        Michael McCandless added a comment - Ahh, I get it – invoking the listeners (on cache evict) is dangerous to do under a global lock since they could conceivably be costly. I had switched to Set to try to prevent silliness in the event that an app adds same listener over & over (w/o removing it), and also to not have O(N^2) cost when removing listeners. I mean, it is an expert API, but I still think we should attempt to be defensive against silliness? How about CHM? (There is not builtin CHS, right? And HS just wraps an HM anyway....).
        Hide
        Shay Banon added a comment -

        Yea, I got the reasoning for Set, we can use that, CHM with PRESENT. If you want, I can attach a simple MapBackedSet that makes any Map a Set.

        Still, I think that using CopyOnWriteArrayList is best here. I don't think that adding and removing listeners is something that will be done often in an app. But I might be mistaken. In this case, traversal over listeners is much better on CopyOnWriteArrayList compared to CHM.

        Show
        Shay Banon added a comment - Yea, I got the reasoning for Set, we can use that, CHM with PRESENT. If you want, I can attach a simple MapBackedSet that makes any Map a Set. Still, I think that using CopyOnWriteArrayList is best here. I don't think that adding and removing listeners is something that will be done often in an app. But I might be mistaken. In this case, traversal over listeners is much better on CopyOnWriteArrayList compared to CHM.
        Hide
        Yonik Seeley added a comment -

        Still, I think that using CopyOnWriteArrayList is best here.

        Agree - I think we should optimize for good/correct behavior.

        I'd like even more for there to be just a single CopyOnWriteArrayList per "top-level" reader that is then propagated to all sub/segment readers, including new ones on a reopen. But I guess Mike indicated that was currently too hard/hairy.

        The static is really non-optimal though - among other problems, it requires systems with multiple readers (and wants to do different things with different readers, such as maintain separate caches) to figure out what top-level reader a segment reader is associated with. And given that we are dealing with IndexReader instances in the callbacks, and not ReaderContext objects, this seems impossible?

        Show
        Yonik Seeley added a comment - Still, I think that using CopyOnWriteArrayList is best here. Agree - I think we should optimize for good/correct behavior. I'd like even more for there to be just a single CopyOnWriteArrayList per "top-level" reader that is then propagated to all sub/segment readers, including new ones on a reopen. But I guess Mike indicated that was currently too hard/hairy. The static is really non-optimal though - among other problems, it requires systems with multiple readers (and wants to do different things with different readers, such as maintain separate caches) to figure out what top-level reader a segment reader is associated with. And given that we are dealing with IndexReader instances in the callbacks, and not ReaderContext objects, this seems impossible?
        Hide
        Michael McCandless added a comment -

        Still, I think that using CopyOnWriteArrayList is best here.

        OK I'll switch back to COWAL... it makes me nervous though. I like
        being defensive and the added cost of CHM iteration really should be
        negligible here.

        I'd like even more for there to be just a single CopyOnWriteArrayList per "top-level" reader that is then propagated to all sub/segment readers, including new ones on a reopen. But I guess Mike indicated that was currently too hard/hairy.

        This did get hairy... eg if you make a MultiReader (or ParallelReader)
        w/ subs... what should happen to their listeners? Ie what if the subs
        already have listeners enrolled?

        It also spooked me that apps may think they have to re-register after
        re-open (if we stick w/ ArrayList) since then the list'd just grow...
        it's trappy.

        And, if you pull an NRT reader from IW (which is what reopen does
        under the hood for an NRT reader), how to share its listeners? Ie,
        we'd have to add a setter to IW as well, so it's also "single source"
        (propagates on reopen).

        This is why I fell back to a simple static as the baby step for now.

        The static is really non-optimal though - among other problems, it requires systems with multiple readers (and wants to do different things with different readers, such as maintain separate caches) to figure out what top-level reader a segment reader is associated with. And given that we are dealing with IndexReader instances in the callbacks, and not ReaderContext objects, this seems impossible?

        ReaderContext doesn't really make sense here?

        Ie, the listener is invoked when any/all composite readers sharing a
        given segment have now closed (ie when the RC for that segment's core
        drops to 0), or when a composite reader is closed.

        Also, in practice, is it really so hard for the app to figure out
        which SR goes to which of their caches? Isn't this "typically" a
        containsKey against the app level caches...?

        Show
        Michael McCandless added a comment - Still, I think that using CopyOnWriteArrayList is best here. OK I'll switch back to COWAL... it makes me nervous though. I like being defensive and the added cost of CHM iteration really should be negligible here. I'd like even more for there to be just a single CopyOnWriteArrayList per "top-level" reader that is then propagated to all sub/segment readers, including new ones on a reopen. But I guess Mike indicated that was currently too hard/hairy. This did get hairy... eg if you make a MultiReader (or ParallelReader) w/ subs... what should happen to their listeners? Ie what if the subs already have listeners enrolled? It also spooked me that apps may think they have to re-register after re-open (if we stick w/ ArrayList) since then the list'd just grow... it's trappy. And, if you pull an NRT reader from IW (which is what reopen does under the hood for an NRT reader), how to share its listeners? Ie, we'd have to add a setter to IW as well, so it's also "single source" (propagates on reopen). This is why I fell back to a simple static as the baby step for now. The static is really non-optimal though - among other problems, it requires systems with multiple readers (and wants to do different things with different readers, such as maintain separate caches) to figure out what top-level reader a segment reader is associated with. And given that we are dealing with IndexReader instances in the callbacks, and not ReaderContext objects, this seems impossible? ReaderContext doesn't really make sense here? Ie, the listener is invoked when any/all composite readers sharing a given segment have now closed (ie when the RC for that segment's core drops to 0), or when a composite reader is closed. Also, in practice, is it really so hard for the app to figure out which SR goes to which of their caches? Isn't this "typically" a containsKey against the app level caches...?
        Hide
        Michael McCandless added a comment -

        New patch – cut back to COWAL.

        Show
        Michael McCandless added a comment - New patch – cut back to COWAL.
        Hide
        Yonik Seeley added a comment -

        eg if you make a MultiReader (or ParallelReader) w/ subs... what should happen to their listeners?

        This doesn't have to be a super-flexible public API - it's clearly expert level and we can impose limitations that make sense.
        A MultiReader is just a wrapper - you don't reopen it, so it could just start off with an empty listener list, the subs could all retain their listener lists and an addListener() could just delegate to the contained readers.

        ReaderContext doesn't really make sense here?

        Right, it doesn't (which circles around to the fact that the callbacks should be set on the top-reader)

        Isn't this "typically" a containsKey against the app level caches...?

        In a (solr) multi-core environment, you don't know which caches to check, or even how to get to those caches.
        You would either need to add all of them and check all of them on a purge (which has other problems, including performance
        problems for highly multi-tenanted uses where people have thousands of cores), or have one big static cache (which has other problems such as not being configurable per-core).

        I'm not against this patch, I'm just pointing out that the "staticness" of it (which I hope we can continue to move away from over time) severely limits it's usefulness, and it doesn't represent incremental progress if we do want to get rid of statics.

        Actually, I am against the last patch you posted, as it clearly has nothing to do with this issue

        Show
        Yonik Seeley added a comment - eg if you make a MultiReader (or ParallelReader) w/ subs... what should happen to their listeners? This doesn't have to be a super-flexible public API - it's clearly expert level and we can impose limitations that make sense. A MultiReader is just a wrapper - you don't reopen it, so it could just start off with an empty listener list, the subs could all retain their listener lists and an addListener() could just delegate to the contained readers. ReaderContext doesn't really make sense here? Right, it doesn't (which circles around to the fact that the callbacks should be set on the top-reader) Isn't this "typically" a containsKey against the app level caches...? In a (solr) multi-core environment, you don't know which caches to check, or even how to get to those caches. You would either need to add all of them and check all of them on a purge (which has other problems, including performance problems for highly multi-tenanted uses where people have thousands of cores), or have one big static cache (which has other problems such as not being configurable per-core). I'm not against this patch, I'm just pointing out that the "staticness" of it (which I hope we can continue to move away from over time) severely limits it's usefulness, and it doesn't represent incremental progress if we do want to get rid of statics. Actually, I am against the last patch you posted, as it clearly has nothing to do with this issue
        Hide
        Michael McCandless added a comment -

        Actually, I am against the last patch you posted, as it clearly has nothing to do with this issue

        Woops! Heh.

        A MultiReader is just a wrapper - you don't reopen it, so it could just start off with an empty listener list, the subs could all retain their listener lists and an addListener() could just delegate to the contained readers.

        Well, it does have a reopen (reopens the subs & wraps in a new MR), but I guess delegation would work for MR. And, same for ParallelReader.

        And I think the NRT case should work fine, since we don't expose IW.getReader anymore (hmm – this was never backported to 3.x?) – if you new IndexReader(IW), it creates a single collection holding all listeners, and then shares it w/ all SRs.

        Show
        Michael McCandless added a comment - Actually, I am against the last patch you posted, as it clearly has nothing to do with this issue Woops! Heh. A MultiReader is just a wrapper - you don't reopen it, so it could just start off with an empty listener list, the subs could all retain their listener lists and an addListener() could just delegate to the contained readers. Well, it does have a reopen (reopens the subs & wraps in a new MR), but I guess delegation would work for MR. And, same for ParallelReader. And I think the NRT case should work fine, since we don't expose IW.getReader anymore (hmm – this was never backported to 3.x?) – if you new IndexReader(IW), it creates a single collection holding all listeners, and then shares it w/ all SRs.
        Hide
        Michael McCandless added a comment -

        Patch (for trunk), with non-static solution.

        I cut back to sync(HashSet) because it's easy, now, for dups to be added. EG the collection is shared to all sub-readers, and FC registers its listener to all subs... and, each of FC's typed caches will register the listener.

        DirReader shares the collection to all subs and to all reopens.

        MultiReader, ParReader delegate (but also track/invoke their own listeners).

        I added a private reader listeners collection to IW, which is only used to make sure all NRT readers opened from the writer share the same collection (ie, this is not a public API in IW).

        Subclasses of IR are now responsible for initializing the reader listeners collection (if the add/removeReaderFinishedListener could be invoked).

        Show
        Michael McCandless added a comment - Patch (for trunk), with non-static solution. I cut back to sync(HashSet) because it's easy, now, for dups to be added. EG the collection is shared to all sub-readers, and FC registers its listener to all subs... and, each of FC's typed caches will register the listener. DirReader shares the collection to all subs and to all reopens. MultiReader, ParReader delegate (but also track/invoke their own listeners). I added a private reader listeners collection to IW, which is only used to make sure all NRT readers opened from the writer share the same collection (ie, this is not a public API in IW). Subclasses of IR are now responsible for initializing the reader listeners collection (if the add/removeReaderFinishedListener could be invoked).
        Hide
        Shay Banon added a comment -

        A MapBackedSet implementation, that can wrap a CHM to have a concurrent set implementation. We can consider using that instead of sync set and copy on read when notifying listeners.

        Show
        Shay Banon added a comment - A MapBackedSet implementation, that can wrap a CHM to have a concurrent set implementation. We can consider using that instead of sync set and copy on read when notifying listeners.
        Hide
        Michael McCandless added a comment -

        OK, I cutover to MBS(CHM)! Thanks Shay.

        Show
        Michael McCandless added a comment - OK, I cutover to MBS(CHM)! Thanks Shay.
        Hide
        Uwe Schindler added a comment -

        I would make the map wrapper final, else no comments for now, but I want to look into the code, if for perf reasons more AbstractSet methods should be overridden. Is this map backed set some impl from e.g. Commons Collect or Google Collect?

        Show
        Uwe Schindler added a comment - I would make the map wrapper final, else no comments for now, but I want to look into the code, if for perf reasons more AbstractSet methods should be overridden. Is this map backed set some impl from e.g. Commons Collect or Google Collect?
        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Shay Banon
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development