Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7409

Look into making mmapdirectory's unmap safer

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.2, 7.0
    • Component/s: core/store
    • Labels:
    • Lucene Fields:
      New

      Description

      I have seen a few bugs around this recently: of course its a bug in application code but a JVM crash is not good.

      I think we should see if we can prevent the crashes better than the current weak map, e.g. make it a safer option.

      I made an ugly prototype here: https://github.com/apache/lucene-solr/compare/master...rmuir:ace?expand=1

      It has a test that crashes the JVM without the patch but passes with.

      Hacky patch only implements readBytes() but has no problems with the luceneutil benchmark (1M):

      Report after iter 19:
                          Task    QPS base      StdDev   QPS patch      StdDev                Pct diff
                        IntNRQ      105.23     (17.6%)      100.42     (10.1%)   -4.6% ( -27% -   28%)
                       Respell      128.35     (13.2%)      125.88      (7.4%)   -1.9% ( -19% -   21%)
                        Fuzzy1      110.14     (17.2%)      108.28     (13.2%)   -1.7% ( -27% -   34%)
                     LowPhrase      337.02     (13.0%)      333.72      (9.3%)   -1.0% ( -20% -   24%)
                     MedPhrase      146.44     (12.9%)      145.55      (8.0%)   -0.6% ( -19% -   23%)
                   MedSpanNear       96.85     (13.1%)       96.57      (7.8%)   -0.3% ( -18% -   23%)
                  HighSpanNear       95.85     (13.9%)       96.33      (8.2%)    0.5% ( -18% -   26%)
                    HighPhrase      146.84     (13.6%)      148.40      (8.4%)    1.1% ( -18% -   26%)
                      HighTerm      295.15     (15.8%)      298.77      (9.5%)    1.2% ( -20% -   31%)
                   LowSpanNear      268.80     (12.4%)      272.16      (7.9%)    1.2% ( -16% -   24%)
                      Wildcard      284.09     (11.7%)      290.91      (8.9%)    2.4% ( -16% -   25%)
                       Prefix3      212.50     (15.4%)      217.76     (10.0%)    2.5% ( -19% -   32%)
                     OrHighLow      358.65     (15.0%)      368.93     (10.7%)    2.9% ( -19% -   33%)
                    AndHighMed      799.65     (13.2%)      834.74      (7.8%)    4.4% ( -14% -   29%)
               MedSloppyPhrase      229.36     (15.9%)      239.95      (9.8%)    4.6% ( -18% -   36%)
                        Fuzzy2       69.58     (14.6%)       72.82     (14.5%)    4.7% ( -21% -   39%)
                   AndHighHigh      426.98     (12.8%)      451.77      (7.3%)    5.8% ( -12% -   29%)
                       MedTerm     1361.11     (14.5%)     1450.90      (9.2%)    6.6% ( -14% -   35%)
                      PKLookup      266.61     (13.4%)      284.28      (8.4%)    6.6% ( -13% -   32%)
              HighSloppyPhrase      251.22     (16.9%)      268.32     (10.7%)    6.8% ( -17% -   41%)
                     OrHighMed      235.92     (17.2%)      253.12     (12.8%)    7.3% ( -19% -   45%)
                    OrHighHigh      186.79     (13.5%)      201.15      (9.7%)    7.7% ( -13% -   35%)
               LowSloppyPhrase      395.23     (15.9%)      425.93      (9.3%)    7.8% ( -15% -   39%)
                    AndHighLow     1128.28     (14.9%)     1242.11      (8.2%)   10.1% ( -11% -   38%)
                       LowTerm     3024.62     (12.9%)     3367.65      (9.7%)   11.3% (  -9% -   39%)
      

      We should do more testing. Maybe its totally the wrong tradeoff, maybe we only need handles for getters and everything inlines correctly, rather than needing a ton for every getXYZ() method...

      1. LUCENE-7409.patch
        23 kB
        Uwe Schindler
      2. StoreStore.java
        2 kB
        Dawid Weiss

        Activity

        Hide
        mikemccand Michael McCandless added a comment -

        Bulk close resolved issues after 6.2.0 release.

        Show
        mikemccand Michael McCandless added a comment - Bulk close resolved issues after 6.2.0 release.
        Hide
        thetaphi Uwe Schindler added a comment -

        I disabled the test on master, too.

        Testing showed that we still crush quite often on multi-cpu machines if you try long enough, but thats expected and cannot be prevented.

        We can reopen the issue if somebody has a better idea, but without full locking there is no chance.

        The best on this patch is also the removal of WeakIdentiyMap, so no weak references needed.

        Show
        thetaphi Uwe Schindler added a comment - I disabled the test on master, too. Testing showed that we still crush quite often on multi-cpu machines if you try long enough, but thats expected and cannot be prevented. We can reopen the issue if somebody has a better idea, but without full locking there is no chance. The best on this patch is also the removal of WeakIdentiyMap, so no weak references needed.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit de71ed740d989599cf736dc7e4392b75251815c7 in lucene-solr's branch refs/heads/master from Uwe Schindler
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=de71ed7 ]

        LUCENE-7409: Disable test also on master

        Show
        jira-bot ASF subversion and git services added a comment - Commit de71ed740d989599cf736dc7e4392b75251815c7 in lucene-solr's branch refs/heads/master from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=de71ed7 ] LUCENE-7409 : Disable test also on master
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 99785c0e0949f4bd47486ef8dd7ff6af7ff1e3c1 in lucene-solr's branch refs/heads/branch_6x from Uwe Schindler
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=99785c0 ]

        LUCENE-7409: Fix comments as suggested by Dawid

        Show
        jira-bot ASF subversion and git services added a comment - Commit 99785c0e0949f4bd47486ef8dd7ff6af7ff1e3c1 in lucene-solr's branch refs/heads/branch_6x from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=99785c0 ] LUCENE-7409 : Fix comments as suggested by Dawid
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 97f6bb7d7ff43f4501492eee07e6a9200b402b13 in lucene-solr's branch refs/heads/master from Uwe Schindler
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=97f6bb7 ]

        LUCENE-7409: Fix comments as suggested by Dawid

        Show
        jira-bot ASF subversion and git services added a comment - Commit 97f6bb7d7ff43f4501492eee07e6a9200b402b13 in lucene-solr's branch refs/heads/master from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=97f6bb7 ] LUCENE-7409 : Fix comments as suggested by Dawid
        Hide
        thetaphi Uwe Schindler added a comment -

        Hi, I can change the comment! I was about to do the same.

        MutableCallSite does more so your test class works fine, if I use a real MutableCallSite. The store-store-barrier there is used to really do ordering (but this is related to MutableCallSite's native implementation of setTarget).

        Show
        thetaphi Uwe Schindler added a comment - Hi, I can change the comment! I was about to do the same. MutableCallSite does more so your test class works fine, if I use a real MutableCallSite. The store-store-barrier there is used to really do ordering (but this is related to MutableCallSite's native implementation of setTarget).
        Hide
        dweiss Dawid Weiss added a comment -

        Sorry to chip in late, Uwe.

        +  /** not volatile, we use store-store barrier! */
        +  private boolean invalidated = false;
        
        +      // this should trigger a happens-before - so flushes all caches
        +      barrier.lazySet(0);
        

        I wouldn't mention barriers here (I'm also pretty confident lazyset doesn't create an edge in happens-before; it indicates memory ordering and as such isn't even clearly specified in the JMM), instead making the intent of this code clearer. Something like this:

        +  /** not volatile, see comments on visibility in [...] . */
        +  private boolean invalidated = false;
        
        +      // This call should hopefully flush any CPU caches and as a result make
        +      // the "invalidated" field update visible to other threads. We specifically
        +      // don't make "invalidated" field volatile for performance reasons, hoping the JVM
        +      // won't optimize away reads of that field and hardware should ensure
        +      // caches are in sync after this call. This isn't entirely "fool-proof" 
        +      // (see LUCENE-7409 discussion), but it has been shown to work in practice
        +      // and we count on this behavior.
        +      barrier.lazySet(0);
        
        Show
        dweiss Dawid Weiss added a comment - Sorry to chip in late, Uwe. + /** not volatile , we use store-store barrier! */ + private boolean invalidated = false ; + // this should trigger a happens-before - so flushes all caches + barrier.lazySet(0); I wouldn't mention barriers here (I'm also pretty confident lazyset doesn't create an edge in happens-before; it indicates memory ordering and as such isn't even clearly specified in the JMM), instead making the intent of this code clearer. Something like this: + /** not volatile , see comments on visibility in [...] . */ + private boolean invalidated = false ; + // This call should hopefully flush any CPU caches and as a result make + // the "invalidated" field update visible to other threads. We specifically + // don't make "invalidated" field volatile for performance reasons, hoping the JVM + // won't optimize away reads of that field and hardware should ensure + // caches are in sync after this call. This isn't entirely "fool-proof" + // (see LUCENE-7409 discussion), but it has been shown to work in practice + // and we count on this behavior. + barrier.lazySet(0);
        Hide
        thetaphi Uwe Schindler added a comment -

        I keep this issue open for a while with status "in progress" to not forget to disable the test after a while. I want to run it on master several times with various Jenkins machines.

        Show
        thetaphi Uwe Schindler added a comment - I keep this issue open for a while with status "in progress" to not forget to disable the test after a while. I want to run it on master several times with various Jenkins machines.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 4ae13ef31a0f18f119c119997687b4f6e6bfd8f0 in lucene-solr's branch refs/heads/branch_6x from Uwe Schindler
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4ae13ef ]

        LUCENE-7409: Disable test on stable branch

        Show
        jira-bot ASF subversion and git services added a comment - Commit 4ae13ef31a0f18f119c119997687b4f6e6bfd8f0 in lucene-solr's branch refs/heads/branch_6x from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4ae13ef ] LUCENE-7409 : Disable test on stable branch
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit f31ec6dd64b68bd947d6c82d9d5984af5b5d9688 in lucene-solr's branch refs/heads/branch_6x from Uwe Schindler
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f31ec6d ]

        LUCENE-7409: Changed MMapDirectory's unmapping to work safer, but still with no guarantees

        Show
        jira-bot ASF subversion and git services added a comment - Commit f31ec6dd64b68bd947d6c82d9d5984af5b5d9688 in lucene-solr's branch refs/heads/branch_6x from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f31ec6d ] LUCENE-7409 : Changed MMapDirectory's unmapping to work safer, but still with no guarantees
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 48cc5999369a1f99af159aa5eb756f5c6f118594 in lucene-solr's branch refs/heads/master from Uwe Schindler
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=48cc599 ]

        LUCENE-7409: Changed MMapDirectory's unmapping to work safer, but still with no guarantees

        Show
        jira-bot ASF subversion and git services added a comment - Commit 48cc5999369a1f99af159aa5eb756f5c6f118594 in lucene-solr's branch refs/heads/master from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=48cc599 ] LUCENE-7409 : Changed MMapDirectory's unmapping to work safer, but still with no guarantees
        Hide
        thetaphi Uwe Schindler added a comment -

        Final patch. Committing in a moment!

        Show
        thetaphi Uwe Schindler added a comment - Final patch. Committing in a moment!
        Hide
        thetaphi Uwe Schindler added a comment -

        I committed a @Ignore with explanation to the test, but I commented it out. I'd like to run it a while on Jenkins, especially Multi-CPU machines like the Apache or Policeman one. Multi-CPU machines are morke likely to crush.

        Show
        thetaphi Uwe Schindler added a comment - I committed a @Ignore with explanation to the test, but I commented it out. I'd like to run it a while on Jenkins, especially Multi-CPU machines like the Apache or Policeman one. Multi-CPU machines are morke likely to crush.
        Hide
        rcmuir Robert Muir added a comment -

        I worried about that too Dawid. So I ran the test in a variety of ways, with -Xcomp, with tests.iters to reuse the same jvm over and over, and so on. Not saying thats perfect testing of those situations, but it still worked

        Show
        rcmuir Robert Muir added a comment - I worried about that too Dawid. So I ran the test in a variety of ways, with -Xcomp, with tests.iters to reuse the same jvm over and over, and so on. Not saying thats perfect testing of those situations, but it still worked
        Hide
        dweiss Dawid Weiss added a comment -

        Its just a better best-effort

        Absolutely. I don't know how to do it better (other than providing a coarse-grained buffered reads synchronized using heavier locks).

        As for tests, I'd be curious when it fails (and if). Uwe also pointed out that we don't know (and can't be sure of) what the test is actually running – with which optimizations, when, etc. It may be passing because it didn't collect enough stats yet to optimize... This is absurdly complicated when you look at such fine-details.

        Show
        dweiss Dawid Weiss added a comment - Its just a better best-effort Absolutely. I don't know how to do it better (other than providing a coarse-grained buffered reads synchronized using heavier locks). As for tests, I'd be curious when it fails (and if). Uwe also pointed out that we don't know (and can't be sure of) what the test is actually running – with which optimizations, when, etc. It may be passing because it didn't collect enough stats yet to optimize... This is absurdly complicated when you look at such fine-details.
        Hide
        rcmuir Robert Muir added a comment -

        Its just a better best-effort Maybe we should remove the test, since there are no guarantees here? First time it fails, what will we do? we will just disable it.

        Show
        rcmuir Robert Muir added a comment - Its just a better best-effort Maybe we should remove the test, since there are no guarantees here? First time it fails, what will we do? we will just disable it.
        Hide
        dweiss Dawid Weiss added a comment -

        Did you try your quick program with a non-lazy set (store-load barrier)?

        I don't have to – it'll hang as well. That was my point. If there is a heavy loop then any access.* calls won't invalidate anything since the ensureValid method can be inlined and the read of invalidated can be moved outside of the loop (since the loop body doesn't change its value such an optimization is valid). If you dump the assembly of the program I attached you'll see this is the case here.

        In practice, if bytebuffer is used inside the loop the optimization will most likely not take place due to inline limits and I/O calls inside. But it doesn't make the solution correct – only "practically correct".

        As for store-store vs. store-load: I think you should use store-load (because you wish other "readers" to pick up changes from cache, not just serialize stores to memory). But my understanding of this may be wrong as I always found this a bit confusing.

        Show
        dweiss Dawid Weiss added a comment - Did you try your quick program with a non-lazy set (store-load barrier)? I don't have to – it'll hang as well. That was my point. If there is a heavy loop then any access.* calls won't invalidate anything since the ensureValid method can be inlined and the read of invalidated can be moved outside of the loop (since the loop body doesn't change its value such an optimization is valid). If you dump the assembly of the program I attached you'll see this is the case here. In practice, if bytebuffer is used inside the loop the optimization will most likely not take place due to inline limits and I/O calls inside. But it doesn't make the solution correct – only "practically correct". As for store-store vs. store-load: I think you should use store-load (because you wish other "readers" to pick up changes from cache, not just serialize stores to memory). But my understanding of this may be wrong as I always found this a bit confusing.
        Hide
        thetaphi Uwe Schindler added a comment -

        Did you try your quick program with a non-lazy set (store-load barrier)?

        No it doesn't - just tested

        Show
        thetaphi Uwe Schindler added a comment - Did you try your quick program with a non-lazy set (store-load barrier)? No it doesn't - just tested
        Hide
        thetaphi Uwe Schindler added a comment -

        Dawid Weiss: Exactly that what I think it does. We cannot ensure that the other thread sees the change ASAP. This is why it originally also did not work with J9. The other trick we do here is Thread.yield() before the actual unmapping. This allows any "in flight" access to the bytebuffer to finish until our own thread gets schedulaed again. Of course this is also no guarantee, but helps with J9 and works around the race condition: the time between the check for invalidated until Unsafe read could allow another thread to unmap. The yield takes us out of scheduling.

        Did you try your quick program with a non-lazy set (store-load barrier)?

        Show
        thetaphi Uwe Schindler added a comment - Dawid Weiss : Exactly that what I think it does. We cannot ensure that the other thread sees the change ASAP. This is why it originally also did not work with J9. The other trick we do here is Thread.yield() before the actual unmapping. This allows any "in flight" access to the bytebuffer to finish until our own thread gets schedulaed again. Of course this is also no guarantee, but helps with J9 and works around the race condition: the time between the check for invalidated until Unsafe read could allow another thread to unmap. The yield takes us out of scheduling. Did you try your quick program with a non-lazy set (store-load barrier)?
        Hide
        thetaphi Uwe Schindler added a comment -

        I did some minor cleanups, its ready now:

        • renamed ByteBufferAccess to ByteBufferGuard
        • made the barrier an instance field (this is more correct)
        • I removed the assume for Oracle, because this also works with J9
        • added some javadocs
        Show
        thetaphi Uwe Schindler added a comment - I did some minor cleanups, its ready now: renamed ByteBufferAccess to ByteBufferGuard made the barrier an instance field (this is more correct) I removed the assume for Oracle, because this also works with J9 added some javadocs
        Hide
        dweiss Dawid Weiss added a comment -

        An example of a "hanging" worker even with storestore memflush.

        Show
        dweiss Dawid Weiss added a comment - An example of a "hanging" worker even with storestore memflush.
        Hide
        dweiss Dawid Weiss added a comment -

        I am still not sure that the store-store barrier does what we expect

        Depends on what your expectations are. I don't full grasp the intricacies of modern CPU barriers, but to my understanding your solution enforces a cache flush. Note that this does not mean a separate thread will always pick up the change to invalidated as reads of this field can be inlined (and then don't do any memory load and in consequence no fencing will work). I can demonstrate this on a relatively simple example – see the attached code and run it. The worker beautifully hangs on my machine. That it works "in practice" is very likely a result of what you said – hotspot optimizations are guarded by very complex rules and a call to a bytebuffer probably throws the reads of that field outside of the optimization scope.

        A don't think there is any other ("correct") solution other than a volatile (which enforces a memory read) in one form or another (AtomicBoolean). The performance degradation could be to some degree avoided if we only permitted larger (block-level) access to the underlying byte buffer and then guard-checked volatile on each such block-level access... Then the performance hit of reading a volatile would be probably insignificant (but adding another layer of indirection may be).

        Show
        dweiss Dawid Weiss added a comment - I am still not sure that the store-store barrier does what we expect Depends on what your expectations are. I don't full grasp the intricacies of modern CPU barriers, but to my understanding your solution enforces a cache flush. Note that this does not mean a separate thread will always pick up the change to invalidated as reads of this field can be inlined (and then don't do any memory load and in consequence no fencing will work). I can demonstrate this on a relatively simple example – see the attached code and run it. The worker beautifully hangs on my machine. That it works "in practice" is very likely a result of what you said – hotspot optimizations are guarded by very complex rules and a call to a bytebuffer probably throws the reads of that field outside of the optimization scope. A don't think there is any other ("correct") solution other than a volatile (which enforces a memory read) in one form or another (AtomicBoolean). The performance degradation could be to some degree avoided if we only permitted larger (block-level) access to the underlying byte buffer and then guard-checked volatile on each such block-level access... Then the performance hit of reading a volatile would be probably insignificant (but adding another layer of indirection may be).
        Hide
        thetaphi Uwe Schindler added a comment -

        OK,
        I take the issue and provide a patch soon. I will combine all commits into one during the merge to master later (although it is Robert's branch).

        I am still not sure that the store-store barrier does what we expect, but it has shown that it works better than before. So Robert is right - there are no guarantees, it just proves to be better.

        Uwe

        Show
        thetaphi Uwe Schindler added a comment - OK, I take the issue and provide a patch soon. I will combine all commits into one during the merge to master later (although it is Robert's branch). I am still not sure that the store-store barrier does what we expect, but it has shown that it works better than before. So Robert is right - there are no guarantees, it just proves to be better. Uwe
        Hide
        dweiss Dawid Weiss added a comment -

        This looks very good to me. Much cleaner than method handle approach (not to take anything away from Robert's work).

        Show
        dweiss Dawid Weiss added a comment - This looks very good to me. Much cleaner than method handle approach (not to take anything away from Robert's work).
        Hide
        rcmuir Robert Muir added a comment -

        Thanks Uwe for being persistent here! I had given up last night after the methodhandles screwed up all of our inlining

        The current patch has no downsides in the benchmarks (i also added sorting tasks from nightly and tested them, too). The crushing test fails easily with master, but i beast it hundreds of times with both oracle and J9 and it does not crash.

        It also removes the weakreferences stuff which I think is a nice improvement on its own.

        I don't think there is any guarantee to this kind of stuff, but practically this is a big improvement.

        Show
        rcmuir Robert Muir added a comment - Thanks Uwe for being persistent here! I had given up last night after the methodhandles screwed up all of our inlining The current patch has no downsides in the benchmarks (i also added sorting tasks from nightly and tested them, too). The crushing test fails easily with master, but i beast it hundreds of times with both oracle and J9 and it does not crash. It also removes the weakreferences stuff which I think is a nice improvement on its own. I don't think there is any guarantee to this kind of stuff, but practically this is a big improvement.
        Hide
        thetaphi Uwe Schindler added a comment -

        We may think of changing the store barrier's lazy write to a real write. From https://bugs.openjdk.java.net/browse/JDK-6275329 is the following quote:

        For people who like to think of these operations in terms of
        machine-level barriers on common multiprocessors, lazySet
        provides a preceeding store-store barrier (which is either
        a no-op or very cheap on current platforms), but no
        store-load barrier (which is usually the expensive part
        of a volatile-write)."

        I am not sure if the store-load helps more. At least store-store seems enough. Both is not 100% safe for inflight requests.

        Show
        thetaphi Uwe Schindler added a comment - We may think of changing the store barrier's lazy write to a real write. From https://bugs.openjdk.java.net/browse/JDK-6275329 is the following quote: For people who like to think of these operations in terms of machine-level barriers on common multiprocessors, lazySet provides a preceeding store-store barrier (which is either a no-op or very cheap on current platforms), but no store-load barrier (which is usually the expensive part of a volatile-write)." I am not sure if the store-load helps more. At least store-store seems enough. Both is not 100% safe for inflight requests.
        Hide
        thetaphi Uwe Schindler added a comment - - edited

        We added the remaining methods to make all accesses safe. Unfortunately the optimizations in Hotspot are incredible complex. We made it working inlined in getBytes(), but single Byte reads are not inlined, causing 60% slowdown.

        I then looked at source code of SwitchPoint and the MutableCallSite behind. It basically only inserts a store-store barrier using a static AtomicInteger (the well known Michael Busch trick from Twitter). Based on this information I implemented the same using standard static calls.

        The new variant also removes the weak hashmap as all accesses to ByteBuffers from the original and all clones go through one single instance of a "guard" that checks a boolean. On closing the IndexInput the boolean is changed and then it thros NPE, causing a AlreadyClosedException. As all clones are using the same "delegator guard" instance, it can be invalidated with a single call. After invalidating it accesses the AtomicInteger for the store-store barrier, then yields its threads (to produce a small pause for inflight requests) and finally unmaps.

        Here is the new branch: https://github.com/apache/lucene-solr/compare/master...rmuir:ace2

        (FYI: The old code is here, but its slows: https://github.com/apache/lucene-solr/compare/master...rmuir:ace)

        Show
        thetaphi Uwe Schindler added a comment - - edited We added the remaining methods to make all accesses safe. Unfortunately the optimizations in Hotspot are incredible complex. We made it working inlined in getBytes(), but single Byte reads are not inlined, causing 60% slowdown. I then looked at source code of SwitchPoint and the MutableCallSite behind. It basically only inserts a store-store barrier using a static AtomicInteger (the well known Michael Busch trick from Twitter). Based on this information I implemented the same using standard static calls. The new variant also removes the weak hashmap as all accesses to ByteBuffers from the original and all clones go through one single instance of a "guard" that checks a boolean. On closing the IndexInput the boolean is changed and then it thros NPE, causing a AlreadyClosedException. As all clones are using the same "delegator guard" instance, it can be invalidated with a single call. After invalidating it accesses the AtomicInteger for the store-store barrier, then yields its threads (to produce a small pause for inflight requests) and finally unmaps. Here is the new branch: https://github.com/apache/lucene-solr/compare/master...rmuir:ace2 (FYI: The old code is here, but its slows: https://github.com/apache/lucene-solr/compare/master...rmuir:ace )
        Hide
        rcmuir Robert Muir added a comment -

        My idea is to have a single getter to get the current buffer that uses the switchpoint and call the get current buffer on it. so we don't need the methodhandle with the inner get on the ByteBuffer!

        I looked into this but I think we should avoid it, as it is not, in fact, the method that we wish to invalidate: thats the getXXX(). So we may have to have a few handles: e.g. getLong()/getInt()/etc.

        Show
        rcmuir Robert Muir added a comment - My idea is to have a single getter to get the current buffer that uses the switchpoint and call the get current buffer on it. so we don't need the methodhandle with the inner get on the ByteBuffer! I looked into this but I think we should avoid it, as it is not, in fact, the method that we wish to invalidate: thats the getXXX(). So we may have to have a few handles: e.g. getLong()/getInt()/etc.
        Hide
        rcmuir Robert Muir added a comment -

        thanks Michael McCandless ! Making the tests detect these issues will help.

        As far as the SwitchPoint.invalidate, we should at least think about batching, to reduce costs. The API is designed around that. Maybe we could hook into e.g. deletePendingFiles() which gets called by IndexWriter (possibly with a hook in StandardDirectoryReader close when IW is not in use). This way, many maps could be invalidated at once and we aren't spending a ton of time waiting for safepoints. Maybe there are other approaches, we have to consider complexity too.

        Show
        rcmuir Robert Muir added a comment - thanks Michael McCandless ! Making the tests detect these issues will help. As far as the SwitchPoint.invalidate, we should at least think about batching, to reduce costs. The API is designed around that. Maybe we could hook into e.g. deletePendingFiles() which gets called by IndexWriter (possibly with a hook in StandardDirectoryReader close when IW is not in use). This way, many maps could be invalidated at once and we aren't spending a ton of time waiting for safepoints. Maybe there are other approaches, we have to consider complexity too.
        Hide
        mikemccand Michael McCandless added a comment -

        I pushed a small improvement to MockDirectoryWrapper to also detect when a clone is being used after the original IndexInput it was cloned from, is itself closed. Maybe this helps tests catch ACE issues as well...

        Show
        mikemccand Michael McCandless added a comment - I pushed a small improvement to MockDirectoryWrapper to also detect when a clone is being used after the original IndexInput it was cloned from, is itself closed. Maybe this helps tests catch ACE issues as well...
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1fa91537a9485ea08a0ff834d4306b414150d500 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1fa9153 ]

        LUCENE-7409: improve MockDirectoryWrapper's IndexInput to detect if a clone is being used after its parent was closed

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1fa91537a9485ea08a0ff834d4306b414150d500 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1fa9153 ] LUCENE-7409 : improve MockDirectoryWrapper's IndexInput to detect if a clone is being used after its parent was closed
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 04086fbfc4e1103605267024adad121077ae52d6 in lucene-solr's branch refs/heads/master from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=04086fb ]

        LUCENE-7409: improve MockDirectoryWrapper's IndexInput to detect if a clone is being used after its parent was closed

        Show
        jira-bot ASF subversion and git services added a comment - Commit 04086fbfc4e1103605267024adad121077ae52d6 in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=04086fb ] LUCENE-7409 : improve MockDirectoryWrapper's IndexInput to detect if a clone is being used after its parent was closed
        Hide
        thetaphi Uwe Schindler added a comment -

        Hi Robert,
        thanks for looking into this. Of course, if you hit SEGV its a bug in the application, but you get no idea where to look for the problem. If we better catch those before crushing, its better.

        Switchpoint is a good idea (we had it during the discussion on FOSDEM with Hotspot people, too), but those expected slowdowns by it. Maybe we have some slowdowns, but we dont see it. If the overall performance does not break we are fine.

        FYI, I have an idea how to remove the catch Throwable and the messy IOUtils.rethrow (which I personally don't like, we should replace it by my recent puzzler investigation). My idea is to have a single getter to get the current buffer that uses the switchpoint and call the get current buffer on it. so we don't need the methodhandle with the inner get on the ByteBuffer! If this proves to be as fast, we are fine. Otherwise we have to

        The current code does not handle the case where no unmapping is enabled effectively. We should pass a "noop" SafeUnmap to it (just the methodhandle without guard).

        I will look into it later this evening, still on travel!

        Show
        thetaphi Uwe Schindler added a comment - Hi Robert, thanks for looking into this. Of course, if you hit SEGV its a bug in the application, but you get no idea where to look for the problem. If we better catch those before crushing, its better. Switchpoint is a good idea (we had it during the discussion on FOSDEM with Hotspot people, too), but those expected slowdowns by it. Maybe we have some slowdowns, but we dont see it. If the overall performance does not break we are fine. FYI, I have an idea how to remove the catch Throwable and the messy IOUtils.rethrow (which I personally don't like, we should replace it by my recent puzzler investigation). My idea is to have a single getter to get the current buffer that uses the switchpoint and call the get current buffer on it. so we don't need the methodhandle with the inner get on the ByteBuffer! If this proves to be as fast, we are fine. Otherwise we have to The current code does not handle the case where no unmapping is enabled effectively. We should pass a "noop" SafeUnmap to it (just the methodhandle without guard). I will look into it later this evening, still on travel!
        Hide
        rcmuir Robert Muir added a comment -

        Note that the test will only reliably work with hotspot (see http://mail.openjdk.java.net/pipermail/mlvm-dev/2014-February/005654.html), so it needs to assumeTrue() that. On IBM J9 if you have an inflight readBytes() you can still crash. But still it is worth looking into as a better "best effort" IMO.

        Show
        rcmuir Robert Muir added a comment - Note that the test will only reliably work with hotspot (see http://mail.openjdk.java.net/pipermail/mlvm-dev/2014-February/005654.html ), so it needs to assumeTrue() that. On IBM J9 if you have an inflight readBytes() you can still crash. But still it is worth looking into as a better "best effort" IMO.

          People

          • Assignee:
            thetaphi Uwe Schindler
            Reporter:
            rcmuir Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development