Lucene - Core
  1. Lucene - Core
  2. LUCENE-2135

IndexReader.close should forcefully evict entries from FieldCache

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.3, 3.0.2, 3.1, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Spinoff of java-user thread "heap memory issues when sorting by a string field".

      We rely on WeakHashMap to hold our FieldCache, keyed by reader. But this lacks immediacy on releasing the reference, after a reader is closed.

      WeakHashMap can't free the key until the reader is no longer referenced by the app. And, apparently, WeakHashMap has a further impl detail that requires invoking one of its methods for it to notice that a key has just become only weakly reachable.

      To fix this, I think on IR.close we should evict entries from the FieldCache, as long as the sub-readers are truly closed (refCount dropped to 0).

      1. LUCENE-2135.patch
        11 kB
        Michael McCandless
      2. LUCENE-2135.patch
        11 kB
        Michael McCandless
      3. LUCENE-2135.patch
        6 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          Michael McCandless added a comment -

          This is not unlike what we had to do in creating CloseableThreadLocal... that was another case where the underlying impl failed to free things as immediately as we'd like.

          Show
          Michael McCandless added a comment - This is not unlike what we had to do in creating CloseableThreadLocal... that was another case where the underlying impl failed to free things as immediately as we'd like.
          Hide
          Earwin Burrfoot added a comment -

          A better approach is to don IR-keyed weakHashMaps completely and bind everything you need onto IR itself. That's how I do it and it works like a charm.

          Show
          Earwin Burrfoot added a comment - A better approach is to don IR-keyed weakHashMaps completely and bind everything you need onto IR itself. That's how I do it and it works like a charm.
          Hide
          Uwe Schindler added a comment -
          Show
          Uwe Schindler added a comment - LUCENE-831 ...
          Hide
          Yonik Seeley added a comment -

          A better approach is to don IR-keyed weakHashMaps completely and bind everything you need onto IR itself. That's how I do it and it works like a charm.

          That would be nice... I'd love to see a
          Map<Object,Object> IndexReader.getInfo()
          That was usable by anyone (not just the field cache) to associate stuff with a reader.

          Show
          Yonik Seeley added a comment - A better approach is to don IR-keyed weakHashMaps completely and bind everything you need onto IR itself. That's how I do it and it works like a charm. That would be nice... I'd love to see a Map<Object,Object> IndexReader.getInfo() That was usable by anyone (not just the field cache) to associate stuff with a reader.
          Hide
          Earwin Burrfoot added a comment -

          I'd love to see a Map<Object,Object> IndexReader.getInfo()

          I'm currently using - <T> T IndexReader.component(Class<T> key)
          Plus a bundle of factories passed to IR on construction. Factories are called after IR is initialized, and also for child IRs and reopens. In case of reopens, besides new IR they are handed the component they produced for the current one (probably better just to pass old IR).

          I can try to conjure a patch this weekend.

          Show
          Earwin Burrfoot added a comment - I'd love to see a Map<Object,Object> IndexReader.getInfo() I'm currently using - <T> T IndexReader.component(Class<T> key) Plus a bundle of factories passed to IR on construction. Factories are called after IR is initialized, and also for child IRs and reopens. In case of reopens, besides new IR they are handed the component they produced for the current one (probably better just to pass old IR). I can try to conjure a patch this weekend.
          Hide
          Uwe Schindler added a comment -

          I'm currently using - <T> T IndexReader.component(Class<T> key)

          Thats much better than untyped. Same like AttributeSource.

          +1 for a patch.

          Show
          Uwe Schindler added a comment - I'm currently using - <T> T IndexReader.component(Class<T> key) Thats much better than untyped. Same like AttributeSource. +1 for a patch.
          Hide
          Christian Kohlschütter added a comment -

          Please see LUCENE-2133 for a refactoring of FieldCache, which also addresses these problems.

          Show
          Christian Kohlschütter added a comment - Please see LUCENE-2133 for a refactoring of FieldCache, which also addresses these problems.
          Hide
          Earwin Burrfoot added a comment -

          Please see LUCENE-2133 for a refactoring of FieldCache, which also addresses these problems.

          It doesn't address a problem of adding custom components to IR. It also does complicate IR beyond that unholy mess it already is.

          I think it's better to have an ability to add 'any' kind of component to IR, and then implement whateverCaches over it.

          Show
          Earwin Burrfoot added a comment - Please see LUCENE-2133 for a refactoring of FieldCache, which also addresses these problems. It doesn't address a problem of adding custom components to IR. It also does complicate IR beyond that unholy mess it already is. I think it's better to have an ability to add 'any' kind of component to IR, and then implement whateverCaches over it.
          Hide
          Christian Kohlschütter added a comment -

          I haven't followed the aforementioned discussion on the mailing list, but I think this issue covers a few things that are not mentioned explicitly here. Maybe it is a good idea to summarize the actual problems/challenges/benefits in a few sentences?

          What I understand is that you plan to add arbitrary (cacheable) attributes to IndexReader. I suggest to move these features to the IndexCache proposed in LUCENE-2133. Especially when using decorating IndexReaders (things like "ReadOnlyIndexReader") you would not want to store attributes separately from the decorated IndexReader. The same probably applies to SegmentReader with all its clones.

          IndexCache would provide a common base for the extensions you mentioned. (i.e. you are welcome to apply your patches on top of LUCENE-2133).

          Show
          Christian Kohlschütter added a comment - I haven't followed the aforementioned discussion on the mailing list, but I think this issue covers a few things that are not mentioned explicitly here. Maybe it is a good idea to summarize the actual problems/challenges/benefits in a few sentences? What I understand is that you plan to add arbitrary (cacheable) attributes to IndexReader. I suggest to move these features to the IndexCache proposed in LUCENE-2133 . Especially when using decorating IndexReaders (things like "ReadOnlyIndexReader") you would not want to store attributes separately from the decorated IndexReader. The same probably applies to SegmentReader with all its clones. IndexCache would provide a common base for the extensions you mentioned. (i.e. you are welcome to apply your patches on top of LUCENE-2133 ).
          Hide
          Michael McCandless added a comment -

          I would love to see a bigger solution here, but in the interim, I
          think we should fix the current FieldCache (patch attached).

          The patch adds FieldCache.purge to the interface. This is technically
          a break in back-compat, to any external impls of FieldCache, but
          that's such an insanely expert & difficult thing that I think it's
          fine to make an exception.

          A few tests (incl back-compat) needed fixing, because they were
          closing the reader in-between to calls to getInts and then incorrectly
          asserting the int[]'s were the same.

          Show
          Michael McCandless added a comment - I would love to see a bigger solution here, but in the interim, I think we should fix the current FieldCache (patch attached). The patch adds FieldCache.purge to the interface. This is technically a break in back-compat, to any external impls of FieldCache, but that's such an insanely expert & difficult thing that I think it's fine to make an exception. A few tests (incl back-compat) needed fixing, because they were closing the reader in-between to calls to getInts and then incorrectly asserting the int[]'s were the same.
          Hide
          Uwe Schindler added a comment -

          In 2.9. we wrote in the BW section, that FieldCache interface is no BW problem as nobody ever can implement it (because the FileCacheImpl singleton is the only used one). Ok, you can implement it without any use.

          Show
          Uwe Schindler added a comment - In 2.9. we wrote in the BW section, that FieldCache interface is no BW problem as nobody ever can implement it (because the FileCacheImpl singleton is the only used one). Ok, you can implement it without any use.
          Hide
          Uwe Schindler added a comment -

          In my opinion, all IndexReaders should call purge, mabe put it on toplevel IR.close default impl? Because if you request FieldCache from Top-level (which you should not do, but you can), it should also be purged.

          Show
          Uwe Schindler added a comment - In my opinion, all IndexReaders should call purge, mabe put it on toplevel IR.close default impl? Because if you request FieldCache from Top-level (which you should not do, but you can), it should also be purged.
          Hide
          Michael McCandless added a comment -

          all IndexReaders should call purge, mabe put it on toplevel IR.close default impl?

          Hmm... this actually gets tricky to get right, because of the FieldCacheKey.

          EG on closing a SegmentReader that's a clone of another, you don't want to evict it from the FieldCache.

          I guess I could fix each of the IndexReader subclasses to evict themselves from the cache. Let me look into that...

          Show
          Michael McCandless added a comment - all IndexReaders should call purge, mabe put it on toplevel IR.close default impl? Hmm... this actually gets tricky to get right, because of the FieldCacheKey. EG on closing a SegmentReader that's a clone of another, you don't want to evict it from the FieldCache. I guess I could fix each of the IndexReader subclasses to evict themselves from the cache. Let me look into that...
          Hide
          Yonik Seeley added a comment -

          Hmm... this actually gets tricky to get right, because of the FieldCacheKey.

          It's almost like we want two caches... one with entries that are independent of any changes in deleted docs (like the current FieldCache), and one that isn't.

          Show
          Yonik Seeley added a comment - Hmm... this actually gets tricky to get right, because of the FieldCacheKey. It's almost like we want two caches... one with entries that are independent of any changes in deleted docs (like the current FieldCache), and one that isn't.
          Hide
          Michael McCandless added a comment -

          New patch, also evicts the other subclasses of IR from FieldCache.

          Show
          Michael McCandless added a comment - New patch, also evicts the other subclasses of IR from FieldCache.
          Hide
          Uwe Schindler added a comment -

          +1

          I just noticed, it is even possible to retrieve a field cache from the FilterIndexReader and that would be a duplicate of the dlegate's cache. Very ugly.

          Show
          Uwe Schindler added a comment - +1 I just noticed, it is even possible to retrieve a field cache from the FilterIndexReader and that would be a duplicate of the dlegate's cache. Very ugly.
          Hide
          Michael McCandless added a comment -

          it is even possible to retrieve a field cache from the FilterIndexReader and that would be a duplicate of the dlegate's cache.

          Yeah, not good. Should we default getFieldCacheKey to delegate? A subclass of FIR would presumably need to then override if their filtering altered what's in the field cache.

          Show
          Michael McCandless added a comment - it is even possible to retrieve a field cache from the FilterIndexReader and that would be a duplicate of the dlegate's cache. Yeah, not good. Should we default getFieldCacheKey to delegate? A subclass of FIR would presumably need to then override if their filtering altered what's in the field cache.
          Hide
          Uwe Schindler added a comment -

          +1 Good idea, just add a note to the method javadocs, that you have to override this, if you change the contents by the filter.

          Show
          Uwe Schindler added a comment - +1 Good idea, just add a note to the method javadocs, that you have to override this, if you change the contents by the filter.
          Hide
          Christian Kohlschütter added a comment -

          Honestly, please have a look at LUCENE-2133, I really think it is a good starting point to solve all these problems. Could we perhaps merge the two issues (LUCENE-2133 and LUCENE-2135)?

          A quick summary of LUCENE-2133:

          The patch allows one or more IndexReaders to share common cache information (whatever this is), stored in the same "IndexCache" instance. The IndexCache is designed to contain any cacheable/volatile information that can be regenerated from the IndexReader.

          For example: all clones of SegmentReader share the same SegmentReaderIndexCache with the original instance, containing the ThreadLocals of the "core reader".
          By default (for all IndexReader classes) the IndexCache provides access to the "IndexFieldCache" (a non-static reimplementation of FieldCache).

          To provide arbitrary cacheable objects we could now extend IndexCache by a simple HashMap (it does not need to be a WeakHashMap, since the IndexCache is closed and purged as soon as the original IndexReader is closed).

          If you wish so, with the help of IndexCache we might even easily implement two different field caches for the same IndexCache instance, one that changes with deleted docs and another one that does not. Basically we may add any other kind of cache at a later point without touching IndexReader again. To re-use Earwin Burrfoot statement from above, that would then not "complicate IndexReader beyond that unholy mess it already is."

          Show
          Christian Kohlschütter added a comment - Honestly, please have a look at LUCENE-2133 , I really think it is a good starting point to solve all these problems. Could we perhaps merge the two issues ( LUCENE-2133 and LUCENE-2135 )? A quick summary of LUCENE-2133 : The patch allows one or more IndexReaders to share common cache information (whatever this is), stored in the same "IndexCache" instance. The IndexCache is designed to contain any cacheable/volatile information that can be regenerated from the IndexReader. For example: all clones of SegmentReader share the same SegmentReaderIndexCache with the original instance, containing the ThreadLocals of the "core reader". By default (for all IndexReader classes) the IndexCache provides access to the "IndexFieldCache" (a non-static reimplementation of FieldCache). To provide arbitrary cacheable objects we could now extend IndexCache by a simple HashMap (it does not need to be a WeakHashMap, since the IndexCache is closed and purged as soon as the original IndexReader is closed). If you wish so, with the help of IndexCache we might even easily implement two different field caches for the same IndexCache instance, one that changes with deleted docs and another one that does not. Basically we may add any other kind of cache at a later point without touching IndexReader again. To re-use Earwin Burrfoot statement from above, that would then not "complicate IndexReader beyond that unholy mess it already is."
          Hide
          Michael McCandless added a comment -

          I definitely plan to have a look at LUCENE-2133, but that's a rather large (and, good, on first read!) change to Lucene. I just don't think it should hold this small change up.

          Show
          Michael McCandless added a comment - I definitely plan to have a look at LUCENE-2133 , but that's a rather large (and, good, on first read!) change to Lucene. I just don't think it should hold this small change up.
          Hide
          Michael McCandless added a comment -

          New patch, adds override to FIR.getFieldCacheKey

          Show
          Michael McCandless added a comment - New patch, adds override to FIR.getFieldCacheKey
          Hide
          Earwin Burrfoot added a comment -

          To provide arbitrary cacheable objects we could now extend IndexCache by a simple HashMap (it does not need to be a WeakHashMap, since the IndexCache is closed and purged as soon as the original IndexReader is closed).

          If I'm reading your patch right, to add something as a user to DirectoryReader+SegmentReader, I have to extend SegmentReaderIndexCache, IndexCache, then extend SegmentReader and DirectoryReader, and override all methods in DirectoryReader that create SegmentReader.

          My aim is to be able to bind stuff to readers without overriding them, delegating, or touching in any manner except providing certain factories on creation.

          Show
          Earwin Burrfoot added a comment - To provide arbitrary cacheable objects we could now extend IndexCache by a simple HashMap (it does not need to be a WeakHashMap, since the IndexCache is closed and purged as soon as the original IndexReader is closed). If I'm reading your patch right, to add something as a user to DirectoryReader+SegmentReader, I have to extend SegmentReaderIndexCache, IndexCache, then extend SegmentReader and DirectoryReader, and override all methods in DirectoryReader that create SegmentReader. My aim is to be able to bind stuff to readers without overriding them, delegating, or touching in any manner except providing certain factories on creation.
          Hide
          Christian Kohlschütter added a comment - - edited

          If I'm reading your patch right, to add something as a user to DirectoryReader+SegmentReader, I have to extend SegmentReaderIndexCache, IndexCache, then extend SegmentReader and DirectoryReader, and override all methods in DirectoryReader that create SegmentReader.

          No, the functionality you intend to have is just not there at the moment. But it could be added directly to IndexCache (and thus, to all subclasses of IndexCache automatically).

          My aim is to be able to bind stuff to readers without overriding them, delegating, or touching in any manner except providing certain factories on creation.

          We could add getProperty/setProperty methods to IndexCache. You could then bind/get arbitrary objects as follows:

          IndexReader ir = // somehow create your new reader
          IndexCache cache = ir.getIndexCache();
          Object someObject = cache.getProperty(someKey);
          cache.setProperty("org.example.someCoolProperty", anotherObject);

          (Personally, I prefer standardized string keys to avoid collisions, just like in the Servlet API, for example)

          Again, this is not yet implemented, but could be done easily, without affecting any existing IndexReader or the other changes on FieldCache etc.

          Show
          Christian Kohlschütter added a comment - - edited If I'm reading your patch right, to add something as a user to DirectoryReader+SegmentReader, I have to extend SegmentReaderIndexCache, IndexCache, then extend SegmentReader and DirectoryReader, and override all methods in DirectoryReader that create SegmentReader. No, the functionality you intend to have is just not there at the moment. But it could be added directly to IndexCache (and thus, to all subclasses of IndexCache automatically). My aim is to be able to bind stuff to readers without overriding them, delegating, or touching in any manner except providing certain factories on creation. We could add getProperty/setProperty methods to IndexCache. You could then bind/get arbitrary objects as follows: IndexReader ir = // somehow create your new reader IndexCache cache = ir.getIndexCache(); Object someObject = cache.getProperty(someKey); cache.setProperty("org.example.someCoolProperty", anotherObject); (Personally, I prefer standardized string keys to avoid collisions, just like in the Servlet API, for example) Again, this is not yet implemented, but could be done easily, without affecting any existing IndexReader or the other changes on FieldCache etc.
          Hide
          Christian Kohlschütter added a comment -

          There is some discussion in LUCENE-2133 where we need a decision that also affects this issue. Could you please check and comment?

          Thanks!
          Christian

          Show
          Christian Kohlschütter added a comment - There is some discussion in LUCENE-2133 where we need a decision that also affects this issue. Could you please check and comment? Thanks! Christian
          Hide
          Michael McCandless added a comment -

          backport

          Show
          Michael McCandless added a comment - backport

            People

            • Assignee:
              Michael McCandless
              Reporter:
              Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development