Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7410

Make cache keys and closed listeners less trappy

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      IndexReader currently exposes getCoreCacheKey(), getCombinedCoreAndDeletesKey(), addCoreClosedListener() and addReaderClosedListener(). They are typically used to manage resources whose lifetime needs to mimic the lifetime of segments/indexes, typically caches.

      I think this is trappy for various reasons:

      Memory leaks

      When maintaining a cache, entries are added to the cache based on the cache key and then evicted using the cache key that is given back by the close listener, so it is very important that both keys are the same.

      But if a filter reader happens to delegate get*Key() and not add*ClosedListener() or vice-versa then there is potential for a memory leak since the closed listener will be called on a different key and entries will never be evicted from the cache.

      Lifetime expectations

      The expectation of using the core cache key is that it will not change in case of deletions, but this is only true on SegmentReader and LeafReader impls that delegate to it. Other implementations such as composite readers or parallel leaf readers use the same key for "core" and "combined core and deletes".

      Throw-away wrappers cause cache trashing

      An application might want to either expose more (with a ParrallelReader or MultiReader) or less information (by filtering fields/docs that can be seen) depending on the user who is logged in. In that case the application would typically maintain a DirectoryReader and then wrap it per request depending on the logged user and throw away the wrapper once the request is completed.

      The problem is that these wrappers have their own cache keys and the application may build something costly and put it in a cache to throw it away a couple milliseconds later. I would rather like for such readers to have a way to opt out from caching on order to avoid this performance trap.

      Type safety

      The keys that are exposed are plain java.lang.Object instances, which requires caches to look like a Map<Object, ?> which makes it very easy to either try to get, put or remove on the wrong object since any object would be accepted.

      1. LUCENE-7410.patch
        165 kB
        Adrien Grand
      2. LUCENE-7410.patch
        159 kB
        Adrien Grand
      3. LUCENE-7410.patch
        140 kB
        Adrien Grand

        Activity

        Hide
        rcmuir Robert Muir added a comment -

        getCombinedCoreAndDeletesKey(): what uses this one? Can we remove it?

        Show
        rcmuir Robert Muir added a comment - getCombinedCoreAndDeletesKey() : what uses this one? Can we remove it?
        Hide
        jpountz Adrien Grand added a comment -

        Agreed. I'm also looking into removing the ability to remove closed listeners.

        Show
        jpountz Adrien Grand added a comment - Agreed. I'm also looking into removing the ability to remove closed listeners.
        Hide
        jpountz Adrien Grand added a comment -

        I have a patch which makes the situation better I think:

        • the ability to remove a listener is gone
        • there is no "core" cache key or listener on IndexReader anymore, only on LeafReader
        • cache key and listener registration have moved to the IndexReader.CacheHelper class so that it is clear which key relates to which listener. It also makes it very hard to propagate the cache key from a filtered reader without propagating the listener registration or vice versa, you cannot do it by mistake anymore.
        • IndexReader.addReaderClosedListener and IndexReader.getCombinedCoreAndDeletesKey have been replaced by IndexReader.getReaderCacheHelper, which returns null by default, meaning that the reader is not suited for caching
        • IndexReader.addCoreClosedListener and IndexReader.getCoreCacheKey() have been replaced by LeafReader.getCoreCacheHelper, which returns null by default, meaning that this leaf reader has no concept of "core" data
        • there is only one impl that actually implements LeafReader.getCoreCacheHelper: SegmentReader. All other impls either delegate to it or do not support caching on a core key.
        • there are only two impls that actually implement IndexReader.getReaderCacheHelper: SegmentReader and StandardDirectoryReader. All other impls either delegate to it or do not support caching.
        • the query cache and BitSetProducer for joins skip caching when LeafReader.getCoreCacheHelper returns null. Other APIs like segment-based replication or FieldCache fail with an exception since not being able to cache is a problem/performance trap in those cases.
        • IndexReader.CacheKey is used as a cache key to avoid type safety issues.

        On the cons side, I removed insanity checking since I could not implement it anymore with the new API but maybe it is not that much of an issue since cache insanity is not really possible anymore unless you really want it? I also found some usage of cache keys in Solr which can be dangerous since cache keys are shared between readers that have different content, I think it should be fine given how these readers are used (I left notes in the patch), but that is something we might still want to look into since it could cause very subtle bugs.

        Show
        jpountz Adrien Grand added a comment - I have a patch which makes the situation better I think: the ability to remove a listener is gone there is no "core" cache key or listener on IndexReader anymore, only on LeafReader cache key and listener registration have moved to the IndexReader.CacheHelper class so that it is clear which key relates to which listener. It also makes it very hard to propagate the cache key from a filtered reader without propagating the listener registration or vice versa, you cannot do it by mistake anymore. IndexReader.addReaderClosedListener and IndexReader.getCombinedCoreAndDeletesKey have been replaced by IndexReader.getReaderCacheHelper , which returns null by default, meaning that the reader is not suited for caching IndexReader.addCoreClosedListener and IndexReader.getCoreCacheKey() have been replaced by LeafReader.getCoreCacheHelper , which returns null by default, meaning that this leaf reader has no concept of "core" data there is only one impl that actually implements LeafReader.getCoreCacheHelper : SegmentReader . All other impls either delegate to it or do not support caching on a core key. there are only two impls that actually implement IndexReader.getReaderCacheHelper : SegmentReader and StandardDirectoryReader . All other impls either delegate to it or do not support caching. the query cache and BitSetProducer for joins skip caching when LeafReader.getCoreCacheHelper returns null. Other APIs like segment-based replication or FieldCache fail with an exception since not being able to cache is a problem/performance trap in those cases. IndexReader.CacheKey is used as a cache key to avoid type safety issues. On the cons side, I removed insanity checking since I could not implement it anymore with the new API but maybe it is not that much of an issue since cache insanity is not really possible anymore unless you really want it? I also found some usage of cache keys in Solr which can be dangerous since cache keys are shared between readers that have different content, I think it should be fine given how these readers are used (I left notes in the patch), but that is something we might still want to look into since it could cause very subtle bugs.
        Hide
        jpountz Adrien Grand added a comment -

        Any opinions?

        Show
        jpountz Adrien Grand added a comment - Any opinions?
        Hide
        rcmuir Robert Muir added a comment -

        At a glance a lot of these ideas sound great, but i need some time to look at the patch in detail, i only glanced thru it.

        things like switching the filterreader default to not caching at all, well thats arguably less trappy, but at the same time a risky change. Are all of our filter readers really correct/optimal now? Maybe its just better for these classes to redeclare the method as 'abstract', forcing implementations to deal with it? Personally I think thats the best solution for this case.

        Not a huge fan of the name CacheHelper, at the same time I don't have a better suggestion.

        Show
        rcmuir Robert Muir added a comment - At a glance a lot of these ideas sound great, but i need some time to look at the patch in detail, i only glanced thru it. things like switching the filterreader default to not caching at all, well thats arguably less trappy, but at the same time a risky change. Are all of our filter readers really correct/optimal now? Maybe its just better for these classes to redeclare the method as 'abstract', forcing implementations to deal with it? Personally I think thats the best solution for this case. Not a huge fan of the name CacheHelper, at the same time I don't have a better suggestion.
        Hide
        jpountz Adrien Grand added a comment -

        Thanks for having a look. I agree the name is not ideal, but could not come up with a better alternative either.

        Here is an updated patch that makes these methods abstract in IndexReader/LeafReader and not extended by the Filter* impls so that every singe implementation needs to decide what to do.

        Show
        jpountz Adrien Grand added a comment - Thanks for having a look. I agree the name is not ideal, but could not come up with a better alternative either. Here is an updated patch that makes these methods abstract in IndexReader/LeafReader and not extended by the Filter* impls so that every singe implementation needs to decide what to do.
        Hide
        jpountz Adrien Grand added a comment -

        I rebased this patch to current master. I would appreciate if someone can have a look as I think it'd be nice to make caching less trappy for 7.0.

        Show
        jpountz Adrien Grand added a comment - I rebased this patch to current master. I would appreciate if someone can have a look as I think it'd be nice to make caching less trappy for 7.0.
        Hide
        mikemccand Michael McCandless added a comment -

        +1, this is a great infrastructure API improvement, thanks Adrien Grand. I only saw some minor issues:

        Can/should we ensureOpen in the getReaderCacheHelper and
        addClosedListener impls? Just seems like it'd be some good extra
        safety to catch accidental mis-use?

        There is a typo (like -> live) here:

        +    // TODO: this is trappy as the expectation is that core keys like for a long
        

        And a missing d in StandarDirectoryReader in the exception message here:

        +    if (cacheHelper == null) {
        +      throw new IllegalStateException("StandarDirectoryReader must support caching");
        +    }
        

        And extra s here:

        +   * valuess updates may be considered equal. Consider using
        
        Show
        mikemccand Michael McCandless added a comment - +1, this is a great infrastructure API improvement, thanks Adrien Grand . I only saw some minor issues: Can/should we ensureOpen in the getReaderCacheHelper and addClosedListener impls? Just seems like it'd be some good extra safety to catch accidental mis-use? There is a typo (like -> live) here: + // TODO: this is trappy as the expectation is that core keys like for a long And a missing d in StandarDirectoryReader in the exception message here: + if (cacheHelper == null) { + throw new IllegalStateException("StandarDirectoryReader must support caching"); + } And extra s here: + * valuess updates may be considered equal. Consider using
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit df6f83072303b4891a296b700a50c743284d3c30 in lucene-solr's branch refs/heads/master from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=df6f830 ]

        LUCENE-7410: Make cache keys and close listeners less trappy.

        Show
        jira-bot ASF subversion and git services added a comment - Commit df6f83072303b4891a296b700a50c743284d3c30 in lucene-solr's branch refs/heads/master from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=df6f830 ] LUCENE-7410 : Make cache keys and close listeners less trappy.
        Hide
        jpountz Adrien Grand added a comment -

        Thanks Mike.

        Show
        jpountz Adrien Grand added a comment - Thanks Mike.
        Hide
        steve_rowe Steve Rowe added a comment -

        My Jenkins found a reproducing seed for a TestReaderClosed.testReaderChaining() failure, and git bisect running the repro line says:

        df6f83072303b4891a296b700a50c743284d3c30 is the first bad commit
        commit df6f83072303b4891a296b700a50c743284d3c30
        Author: Adrien Grand <jpountz@gmail.com>
        Date:   Tue Feb 28 14:21:30 2017 +0100
        
            LUCENE-7410: Make cache keys and close listeners less trappy.
        
           [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestReaderClosed -Dtests.method=testReaderChaining -Dtests.seed=C4374342D2D99B8F -Dtests.slow=true -Dtests.locale=hi -Dtests.timezone=America/Dominica -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
           [junit4] FAILURE 0.04s J1 | TestReaderClosed.testReaderChaining <<<
           [junit4]    > Throwable #1: java.lang.AssertionError: Query failed, but not due to an AlreadyClosedException
           [junit4]    > 	at __randomizedtesting.SeedInfo.seed([C4374342D2D99B8F:530116CD6543D942]:0)
           [junit4]    > 	at org.apache.lucene.index.TestReaderClosed.testReaderChaining(TestReaderClosed.java:96)
           [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
           [junit4]   2> NOTE: test params are: codec=Asserting(Lucene70): {field=PostingsFormat(name=LuceneVarGapFixedInterval)}, docValues:{}, maxPointsInLeafNode=1885, maxMBSortInHeap=6.663525927605304, sim=RandomSimilarity(queryNorm=true): {}, locale=hi, timezone=America/Dominica
           [junit4]   2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_77 (64-bit)/cpus=16,threads=1,free=82013152,total=289931264
        
        Show
        steve_rowe Steve Rowe added a comment - My Jenkins found a reproducing seed for a TestReaderClosed.testReaderChaining() failure, and git bisect running the repro line says: df6f83072303b4891a296b700a50c743284d3c30 is the first bad commit commit df6f83072303b4891a296b700a50c743284d3c30 Author: Adrien Grand <jpountz@gmail.com> Date: Tue Feb 28 14:21:30 2017 +0100 LUCENE-7410: Make cache keys and close listeners less trappy. [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestReaderClosed -Dtests.method=testReaderChaining -Dtests.seed=C4374342D2D99B8F -Dtests.slow=true -Dtests.locale=hi -Dtests.timezone=America/Dominica -Dtests.asserts=true -Dtests.file.encoding=US-ASCII [junit4] FAILURE 0.04s J1 | TestReaderClosed.testReaderChaining <<< [junit4] > Throwable #1: java.lang.AssertionError: Query failed, but not due to an AlreadyClosedException [junit4] > at __randomizedtesting.SeedInfo.seed([C4374342D2D99B8F:530116CD6543D942]:0) [junit4] > at org.apache.lucene.index.TestReaderClosed.testReaderChaining(TestReaderClosed.java:96) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: test params are: codec=Asserting(Lucene70): {field=PostingsFormat(name=LuceneVarGapFixedInterval)}, docValues:{}, maxPointsInLeafNode=1885, maxMBSortInHeap=6.663525927605304, sim=RandomSimilarity(queryNorm=true): {}, locale=hi, timezone=America/Dominica [junit4] 2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_77 (64-bit)/cpus=16,threads=1,free=82013152,total=289931264
        Hide
        jpountz Adrien Grand added a comment -

        Thanks, I'll dig!

        Show
        jpountz Adrien Grand added a comment - Thanks, I'll dig!
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 540a23723104e250a4fce94042fb90c86fcf3720 in lucene-solr's branch refs/heads/master from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=540a237 ]

        LUCENE-7410: Make TestReaderClosed pass if the IndexSearcher wraps a threadpool.

        Show
        jira-bot ASF subversion and git services added a comment - Commit 540a23723104e250a4fce94042fb90c86fcf3720 in lucene-solr's branch refs/heads/master from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=540a237 ] LUCENE-7410 : Make TestReaderClosed pass if the IndexSearcher wraps a threadpool.
        Hide
        jpountz Adrien Grand added a comment -

        It was a test bug introduced by this change, which happens when the created IndexSearcher wraps a threadpool. I just pushed a fix.

        Show
        jpountz Adrien Grand added a comment - It was a test bug introduced by this change, which happens when the created IndexSearcher wraps a threadpool. I just pushed a fix.
        Hide
        steve_rowe Steve Rowe added a comment -

        Policeman Jenkins found a reproducing seed for a test method introduced on this issue https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/19092/ (at https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=df6f830 - seed reproduces with this hash):

           [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestSlowCompositeReaderWrapper -Dtests.method=testOrdMapsAreCached -Dtests.seed=ED06E16D2F7E2C9F -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=en-BS -Dtests.timezone=Asia/Kashgar -Dtests.asserts=true -Dtests.file.encoding=UTF-8
           [junit4] FAILURE 0.01s J2 | TestSlowCompositeReaderWrapper.testOrdMapsAreCached <<<
           [junit4]    > Throwable #1: java.lang.AssertionError
           [junit4]    > 	at __randomizedtesting.SeedInfo.seed([ED06E16D2F7E2C9F:B1C7CA871734CF6C]:0)
           [junit4]    > 	at org.apache.solr.index.TestSlowCompositeReaderWrapper.testOrdMapsAreCached(TestSlowCompositeReaderWrapper.java:111)
           [junit4]    > 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           [junit4]    > 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           [junit4]    > 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           [junit4]    > 	at java.base/java.lang.reflect.Method.invoke(Method.java:547)
           [junit4]    > 	at java.base/java.lang.Thread.run(Thread.java:844)
           [junit4]   2> NOTE: test params are: codec=Asserting(Lucene70): {}, docValues:{sorted=DocValuesFormat(name=Lucene70), sorted_set=DocValuesFormat(name=Lucene70)}, maxPointsInLeafNode=1289, maxMBSortInHeap=5.014157490710387, sim=RandomSimilarity(queryNorm=true): {}, locale=en-BS, timezone=Asia/Kashgar
           [junit4]   2> NOTE: Linux 4.4.0-53-generic amd64/Oracle Corporation 9-ea (64-bit)/cpus=12,threads=1,free=204857864,total=518979584
        
        Show
        steve_rowe Steve Rowe added a comment - Policeman Jenkins found a reproducing seed for a test method introduced on this issue https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/19092/ (at https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=df6f830 - seed reproduces with this hash): [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestSlowCompositeReaderWrapper -Dtests.method=testOrdMapsAreCached -Dtests.seed=ED06E16D2F7E2C9F -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=en-BS -Dtests.timezone=Asia/Kashgar -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 0.01s J2 | TestSlowCompositeReaderWrapper.testOrdMapsAreCached <<< [junit4] > Throwable #1: java.lang.AssertionError [junit4] > at __randomizedtesting.SeedInfo.seed([ED06E16D2F7E2C9F:B1C7CA871734CF6C]:0) [junit4] > at org.apache.solr.index.TestSlowCompositeReaderWrapper.testOrdMapsAreCached(TestSlowCompositeReaderWrapper.java:111) [junit4] > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit4] > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [junit4] > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit4] > at java.base/java.lang.reflect.Method.invoke(Method.java:547) [junit4] > at java.base/java.lang.Thread.run(Thread.java:844) [junit4] 2> NOTE: test params are: codec=Asserting(Lucene70): {}, docValues:{sorted=DocValuesFormat(name=Lucene70), sorted_set=DocValuesFormat(name=Lucene70)}, maxPointsInLeafNode=1289, maxMBSortInHeap=5.014157490710387, sim=RandomSimilarity(queryNorm=true): {}, locale=en-BS, timezone=Asia/Kashgar [junit4] 2> NOTE: Linux 4.4.0-53-generic amd64/Oracle Corporation 9-ea (64-bit)/cpus=12,threads=1,free=204857864,total=518979584
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit fbc844d33439efc1c5c6fee5547715d1a1b0db83 in lucene-solr's branch refs/heads/master from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fbc844d ]

        LUCENE-7410: Fix test bug.

        Show
        jira-bot ASF subversion and git services added a comment - Commit fbc844d33439efc1c5c6fee5547715d1a1b0db83 in lucene-solr's branch refs/heads/master from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fbc844d ] LUCENE-7410 : Fix test bug.
        Hide
        jpountz Adrien Grand added a comment -

        Thanks Steve, it was a test bug, I just pushed a fix.

        Show
        jpountz Adrien Grand added a comment - Thanks Steve, it was a test bug, I just pushed a fix.

          People

          • Assignee:
            Unassigned
            Reporter:
            jpountz Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development