Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      I'd like to generalize SearcherManager to a class which can manage instances of a certain type of interfaces. The reason is that today SearcherManager knows how to handle IndexSearcher instances. I have a SearcherManager which manages a pair of IndexSearcher and TaxonomyReader pair.

      Recently, few concurrency bugs were fixed in SearcherManager, and I realized that I need to apply them to my version as well. Which led me to think why can't we have an SM version which is generic enough so that both my version and Lucene's can benefit from?

      The way I see SearcherManager, it can be divided into two parts: (1) the part that manages the logic of acquire/release/maybeReopen (i.e., ensureOpen, protect from concurrency stuff etc.), and (2) the part which handles IndexSearcher, or my SearcherTaxoPair. I'm thinking that if we'll have an interface with incRef/decRef/tryIncRef/maybeRefresh, we can make SearcherManager a generic class which handles this interface.

      I will post a patch with the initial idea, and we can continue from there.

      1. LUCENE-3761.patch
        18 kB
        Shai Erera
      2. LUCENE-3761.patch
        19 kB
        Shai Erera
      3. LUCENE-3761.patch
        9 kB
        Shai Erera

        Activity

        Hide
        Shai Erera added a comment -

        Initial patch. Introduces a new package 'thingy' (a temporary one, this will eventually move to o.a.l.search) with the class ThingyManager, a Thingy interface and a SearcherThingy implementation.

        As far as I can tell (if there are no bugs), this can replace SearcherManager as-is, aside from a 'nocommit' which I know how to handle, but didn't get to it yet.

        The approach is that ThingyManager receives a Thingy<G> instance and delegates calls to it.

        Robert and I discussed another approach - have ThingyManager abstract with a concrete (final) SearcherManager impl which overrides methods like incRef/decRef etc. I still didn't try to impl that approach, I think that I'll give it a try, later.

        Oh, and BTW, ThingyManager (even though a cool name !) will not be its final name ! . It's just easier to progress like that, without thinking too much about the name.

        Show
        Shai Erera added a comment - Initial patch. Introduces a new package 'thingy' (a temporary one, this will eventually move to o.a.l.search) with the class ThingyManager, a Thingy interface and a SearcherThingy implementation. As far as I can tell (if there are no bugs), this can replace SearcherManager as-is, aside from a 'nocommit' which I know how to handle, but didn't get to it yet. The approach is that ThingyManager receives a Thingy<G> instance and delegates calls to it. Robert and I discussed another approach - have ThingyManager abstract with a concrete (final) SearcherManager impl which overrides methods like incRef/decRef etc. I still didn't try to impl that approach, I think that I'll give it a try, later. Oh, and BTW, ThingyManager (even though a cool name !) will not be its final name ! . It's just easier to progress like that, without thinking too much about the name.
        Hide
        Michael McCandless added a comment -

        This looks nice!

        I like "Thingy". Very Dr. Seuss...

        I think maybeRefresh should match IR.openIfChanged? (So we learn from
        our prior mistakes w/ IR.reopen). Ie, name it openIfChanged
        (maybeRefresh sounds like it could be an in-place operation),
        return null (not this) if there is no change, and document that if it
        returns non-null it transfers a reference to you.

        Show
        Michael McCandless added a comment - This looks nice! I like "Thingy". Very Dr. Seuss... I think maybeRefresh should match IR.openIfChanged? (So we learn from our prior mistakes w/ IR.reopen). Ie, name it openIfChanged (maybeRefresh sounds like it could be an in-place operation), return null (not this) if there is no change, and document that if it returns non-null it transfers a reference to you.
        Hide
        Shai Erera added a comment -

        Option #2:

        ThingyManager<G> is an abstract class which implements all the concurrency administration and exposes the abstract methods tryIncRef(), decRef() and refreshIfNeeded().

        SearcherManager now extends ThingyManager<IndexSearcher> and implements just these 3 methods (in addition to isSearcherCurrent()).

        What I like about this approach is that SearcherManager remains a concrete class, so that code can reference it and not ThingyManager. Also, IMO it's a simplified impl vs. the composite ThingyManager/Thingy. AND besides the rename of maybeReopen to maybeRefresh, NONE of the code was affected by this refactoring.

        I've left the unneeded code as commented out in SearcherManager for easy comparison, but it should go away. TestSM passes (as well as all core tests), so I think that ThingyManager handles all concurrency cases as SearcherManager. However, it could use another inspecting eye .

        As for the name – now the name is less important b/c I don't think we'll reference ThingyManagers. I lean towards something like ReferenceManager / RefCountManager or remove Manager. Something simple. Suggestions are welcome.

        Show
        Shai Erera added a comment - Option #2: ThingyManager<G> is an abstract class which implements all the concurrency administration and exposes the abstract methods tryIncRef(), decRef() and refreshIfNeeded(). SearcherManager now extends ThingyManager<IndexSearcher> and implements just these 3 methods (in addition to isSearcherCurrent()). What I like about this approach is that SearcherManager remains a concrete class, so that code can reference it and not ThingyManager. Also, IMO it's a simplified impl vs. the composite ThingyManager/Thingy. AND besides the rename of maybeReopen to maybeRefresh, NONE of the code was affected by this refactoring. I've left the unneeded code as commented out in SearcherManager for easy comparison, but it should go away. TestSM passes (as well as all core tests), so I think that ThingyManager handles all concurrency cases as SearcherManager. However, it could use another inspecting eye . As for the name – now the name is less important b/c I don't think we'll reference ThingyManagers. I lean towards something like ReferenceManager / RefCountManager or remove Manager. Something simple. Suggestions are welcome.
        Hide
        Michael McCandless added a comment -

        Looks awesome – I like this 2nd approach better!

        And, actually... I think maybeRefresh is in fact a good name, because we are refreshing internal state to the ThingyManager (ie, this is different from IR.openIfChanged, which returns a new object to you and does not alter the state of the object you had passed in).

        I think ReferenceManager is good? RefCountManager seems too low level (ie, ref counting is an impl detail, just one way to manage references, and you are not in fact managing the ref counts...). Can't think of any other candidates.... naming is the hardest part

        Show
        Michael McCandless added a comment - Looks awesome – I like this 2nd approach better! And, actually... I think maybeRefresh is in fact a good name, because we are refreshing internal state to the ThingyManager (ie, this is different from IR.openIfChanged, which returns a new object to you and does not alter the state of the object you had passed in). I think ReferenceManager is good? RefCountManager seems too low level (ie, ref counting is an impl detail, just one way to manage references, and you are not in fact managing the ref counts...). Can't think of any other candidates.... naming is the hardest part
        Hide
        Shai Erera added a comment -

        I like ReferenceManager too. I'll produce a cleaner patch with correct naming and also a SearcherTaxoManager (For an IndexSearcher and TaxoReader pair).

        I'm not in front of the code now – do you think that NRTManager should be refactored too?

        Show
        Shai Erera added a comment - I like ReferenceManager too. I'll produce a cleaner patch with correct naming and also a SearcherTaxoManager (For an IndexSearcher and TaxoReader pair). I'm not in front of the code now – do you think that NRTManager should be refactored too?
        Hide
        Michael McCandless added a comment -

        do you think that NRTManager should be refactored too?

        That would be wonderful! It's rather... hairy.

        But I think it's tricky, because the maybeReopen takes a boolean (applyDeletes)... which is confusing. Maybe, we can change this, so that you must specify the applyDeletes up front to the ctor (I think there's no harm in making two NRTMgrs if you sometimes require deletes and other times don't... I mean resource wise it'd be no different that what NRTManager now does internally). If we did that, then I think NRTManager could subclass ReferenceManager? (And would no longer "contain" a SearcherManager inside it).

        Probably we should explore that on a new issue...

        Show
        Michael McCandless added a comment - do you think that NRTManager should be refactored too? That would be wonderful! It's rather... hairy. But I think it's tricky, because the maybeReopen takes a boolean (applyDeletes)... which is confusing. Maybe, we can change this, so that you must specify the applyDeletes up front to the ctor (I think there's no harm in making two NRTMgrs if you sometimes require deletes and other times don't... I mean resource wise it'd be no different that what NRTManager now does internally). If we did that, then I think NRTManager could subclass ReferenceManager? (And would no longer "contain" a SearcherManager inside it). Probably we should explore that on a new issue...
        Hide
        Shai Erera added a comment -

        Probably we should explore that on a new issue...

        I agree. Given what you wrote, I think it deserves its own inspection. Maybe I'll do the SearcherTaxoManager in its own issue too, I'll think about it.

        Show
        Shai Erera added a comment - Probably we should explore that on a new issue... I agree. Given what you wrote, I think it deserves its own inspection. Maybe I'll do the SearcherTaxoManager in its own issue too, I'll think about it.
        Hide
        Michael McCandless added a comment -

        OK I opened LUCENE-3769 to simplify NRTManager... I think it should be easy to cutover to ThingyManager after that!

        Show
        Michael McCandless added a comment - OK I opened LUCENE-3769 to simplify NRTManager... I think it should be easy to cutover to ThingyManager after that!
        Hide
        Simon Willnauer added a comment -

        I like this improvement while there is a serious problem with the current patch.

        protected G current;
        

        should either be volatile otherwise threads could spin unnecessarily long since tryInc will fail until the reference is re-read from main memory. I wonder if we should maybe make this a AtomicReference and remove the sync entirely. I don't like that this is actually syncing on "this" just for the swap purpose. Either we go for an AtomicRef or introduce a protected lock object.

        Can't think of any other candidates.... naming is the hardest part

        what about RefCountedResourceManager? Kind of long but hits the purpose...

        Show
        Simon Willnauer added a comment - I like this improvement while there is a serious problem with the current patch. protected G current; should either be volatile otherwise threads could spin unnecessarily long since tryInc will fail until the reference is re-read from main memory. I wonder if we should maybe make this a AtomicReference and remove the sync entirely. I don't like that this is actually syncing on "this" just for the swap purpose. Either we go for an AtomicRef or introduce a protected lock object. Can't think of any other candidates.... naming is the hardest part what about RefCountedResourceManager? Kind of long but hits the purpose...
        Hide
        Michael McCandless added a comment -

        I don't like that this is actually syncing on "this" just for the swap purpose.

        Hmm, I would prefer sticking with simple synchronized methods here... swapSearcher is called once during reopen, and by definition called from a single thread, so it will be uncontended. The added cost is surely minor, while keeping the code simple.

        Show
        Michael McCandless added a comment - I don't like that this is actually syncing on "this" just for the swap purpose. Hmm, I would prefer sticking with simple synchronized methods here... swapSearcher is called once during reopen, and by definition called from a single thread, so it will be uncontended. The added cost is surely minor, while keeping the code simple.
        Hide
        Simon Willnauer added a comment -

        Hmm, I would prefer sticking with simple synchronized methods here... swapSearcher is called once during reopen, and by definition called from a single thread, so it will be uncontended. The added cost is surely minor, while keeping the code simple.

        I didn't raise this due to perf reasons. I think using AtomicRef make is simpler since it encapsulates everything rather than using synchronized + volatile
        we'd replace this:

        final G oldThingy = current;
        current = newThingy;
        release(oldThingy);
        

        with:

        release(atomicRef.getAndSet(newThingy));
        

        seems simpler to me though... for read access you simply call atomicRef.get()

        Show
        Simon Willnauer added a comment - Hmm, I would prefer sticking with simple synchronized methods here... swapSearcher is called once during reopen, and by definition called from a single thread, so it will be uncontended. The added cost is surely minor, while keeping the code simple. I didn't raise this due to perf reasons. I think using AtomicRef make is simpler since it encapsulates everything rather than using synchronized + volatile we'd replace this: final G oldThingy = current; current = newThingy; release(oldThingy); with: release(atomicRef.getAndSet(newThingy)); seems simpler to me though... for read access you simply call atomicRef.get()
        Hide
        Michael McCandless added a comment -

        Hmm, but how to protect swapThingy in one thread while close() is called in another?

        Show
        Michael McCandless added a comment - Hmm, but how to protect swapThingy in one thread while close() is called in another?
        Hide
        Shai Erera added a comment -

        should either be volatile otherwise threads could spin unnecessarily long since tryInc will fail until the reference is re-read from main memory.

        I don't think that we need to make 'current' volatile. It's only changed from swapSearcher which is synchronized, and therefore as soon as it changes, all shared copies of that instance (in all threads) gets updated.

        There are many web pages that discuss volatile vs. synchronized (just Google those 3 words) and this page (http://www.javamex.com/tutorials/synchronization_volatile.shtml) even suggests that volatile is more expensive, because synchronization happens on each access to the variable, while in synchronized it's only inside the synced block.

        We check 'current' in ensureOpen() which happens on every API call, and I think that volatile would therefore be more expensive. Also, I'm not sure that using AtomicRef would be simpler code. And following Mike's comments, I prefer to have an explicit synced swapSearcher.

        Show
        Shai Erera added a comment - should either be volatile otherwise threads could spin unnecessarily long since tryInc will fail until the reference is re-read from main memory. I don't think that we need to make 'current' volatile. It's only changed from swapSearcher which is synchronized, and therefore as soon as it changes, all shared copies of that instance (in all threads) gets updated. There are many web pages that discuss volatile vs. synchronized (just Google those 3 words) and this page ( http://www.javamex.com/tutorials/synchronization_volatile.shtml ) even suggests that volatile is more expensive, because synchronization happens on each access to the variable, while in synchronized it's only inside the synced block. We check 'current' in ensureOpen() which happens on every API call, and I think that volatile would therefore be more expensive. Also, I'm not sure that using AtomicRef would be simpler code. And following Mike's comments, I prefer to have an explicit synced swapSearcher.
        Hide
        Simon Willnauer added a comment -

        Hmm, but how to protect swapThingy in one thread while close() is called in another?

        well essentially its the same just a while() with a CAS... I will take a stab at this soon.

        I don't think that we need to make 'current' volatile. It's only changed from swapSearcher which is synchronized, and therefore as soon as it changes, all shared copies of that instance (in all threads) gets updated.

        Shai, this is a common misconception lemme give you an example:

        
        public class Deadlocker {
         private static boolean ready;
        
         public static void startThread() {
             new Thread() {
                 public void run() {
                     System.out.println("T2: Waiting two seconds.");
                     try { sleep(2000); } catch (Exception e) { /* ignore */ }
                     System.out.println("T2: Setting ready = true");
                     setReady();
                 }
             }.start();
         }
        
         public static synchronized void setReady() {
           ready = true;
         }
        
         public static void main(String [] args) {
             startThread();
             System.out.println("T1: spinning.");
             while (!ready) {
                 // Do nothing.
             }
             System.out.println("T1: Ready!");
         }
        }
        

        if you run this on a 64 bit server vm this program will deadlock while on a 32bit client vm it won't. Now make "ready" volatile and you are good to go, you can even remove the sync entirely. I leave the rest to you to figure out why this happens... In general you
        need to use special mechanisms to guarantee that communication happens between these threads, as you would on a message passing system. Memory writes that happen in one thread can "leak through" and be seen by another thread, but this is by no means guaranteed. Without explicit communication, you can't guarantee which writes get seen by other threads, or even the order in which they get seen and this communication must be on both sides, reader and writer!

        Show
        Simon Willnauer added a comment - Hmm, but how to protect swapThingy in one thread while close() is called in another? well essentially its the same just a while() with a CAS... I will take a stab at this soon. I don't think that we need to make 'current' volatile. It's only changed from swapSearcher which is synchronized, and therefore as soon as it changes, all shared copies of that instance (in all threads) gets updated. Shai, this is a common misconception lemme give you an example: public class Deadlocker { private static boolean ready; public static void startThread() { new Thread () { public void run() { System .out.println( "T2: Waiting two seconds." ); try { sleep(2000); } catch (Exception e) { /* ignore */ } System .out.println( "T2: Setting ready = true " ); setReady(); } }.start(); } public static synchronized void setReady() { ready = true ; } public static void main( String [] args) { startThread(); System .out.println( "T1: spinning." ); while (!ready) { // Do nothing. } System .out.println( "T1: Ready!" ); } } if you run this on a 64 bit server vm this program will deadlock while on a 32bit client vm it won't. Now make "ready" volatile and you are good to go, you can even remove the sync entirely. I leave the rest to you to figure out why this happens... In general you need to use special mechanisms to guarantee that communication happens between these threads, as you would on a message passing system. Memory writes that happen in one thread can "leak through" and be seen by another thread, but this is by no means guaranteed. Without explicit communication, you can't guarantee which writes get seen by other threads, or even the order in which they get seen and this communication must be on both sides, reader and writer!
        Hide
        Michael McCandless added a comment -

        Hmm, but how to protect swapThingy in one thread while close() is called in another?

        well essentially its the same just a while() with a CAS... I will take a stab at this soon.

        That's exactly the kind of added code complexity I don't like. I can
        understand it if this were some hotspot... but that's not the case
        here.

        I think we should stick w/ simple synchronized methods.

        I agree we need the volatile...

        Show
        Michael McCandless added a comment - Hmm, but how to protect swapThingy in one thread while close() is called in another? well essentially its the same just a while() with a CAS... I will take a stab at this soon. That's exactly the kind of added code complexity I don't like. I can understand it if this were some hotspot... but that's not the case here. I think we should stick w/ simple synchronized methods. I agree we need the volatile...
        Hide
        Dawid Weiss added a comment -

        Bq. if you run this on a 64 bit server vm this program will deadlock while on a 32bit client vm it won't.

        Simon didn't mention that is the behavior under HotSpot, the result of running that code under other VMs and hardware architectures is in general unpredictable.

        The above behavior on HotSpot is in fact not a result of memory visibility problems (but it could be!) but of how the code is seen by HotSpot jit optimizers. If the code is compiled by c1 compiler (default on 32-bit jvms in -client mode) everything works (or tends to because the loop: while (!ready) {} always accesses physical memory. Once the code is on-stack-replaced with the c2 compiler (default second-tier optimizer for optimizer -server, it also explains why you need a delay in T2), c2's optimizer sees while (!ready) {} as a constant (because ready is not volatile and there are no happens-before with anything else) and promotes it outside the loop. The machine code becomes something like:

        if (!ready) {
          while(true) {/* spin */}
        }
        

        You can dump the assembly with debug versions of HotSpot and verify if I'm right. Another cool way of seing such opotimizations in reality is to use gcj -O3 and compile to assembly instead of an object file.

        (Sorry for being so verbose, this used to be part of a Java course I taught while in academia; I'd seen wide eyes on folks that had been writing Java code for a good few years).

        Show
        Dawid Weiss added a comment - Bq. if you run this on a 64 bit server vm this program will deadlock while on a 32bit client vm it won't. Simon didn't mention that is the behavior under HotSpot, the result of running that code under other VMs and hardware architectures is in general unpredictable. The above behavior on HotSpot is in fact not a result of memory visibility problems (but it could be!) but of how the code is seen by HotSpot jit optimizers. If the code is compiled by c1 compiler (default on 32-bit jvms in -client mode) everything works (or tends to because the loop: while (!ready) {} always accesses physical memory. Once the code is on-stack-replaced with the c2 compiler (default second-tier optimizer for optimizer -server, it also explains why you need a delay in T2), c2's optimizer sees while (!ready) {} as a constant (because ready is not volatile and there are no happens-before with anything else) and promotes it outside the loop. The machine code becomes something like: if (!ready) { while ( true ) {/* spin */} } You can dump the assembly with debug versions of HotSpot and verify if I'm right. Another cool way of seing such opotimizations in reality is to use gcj -O3 and compile to assembly instead of an object file. (Sorry for being so verbose, this used to be part of a Java course I taught while in academia; I'd seen wide eyes on folks that had been writing Java code for a good few years).
        Hide
        Simon Willnauer added a comment -

        I think we should stick w/ simple synchronized methods.

        I am good with this, not a big deal

        I agree we need the volatile...

        yeah absolutely

        Show
        Simon Willnauer added a comment - I think we should stick w/ simple synchronized methods. I am good with this, not a big deal I agree we need the volatile... yeah absolutely
        Hide
        Shai Erera added a comment -

        Thanks for the education guys. Simon's example clarified my confusion – if all access to a variable are synchronized, it doesn't need to be volatile. However in ThingyManager, only setting the variable is synced, but reading it isn't. That's why it needs to be volatile. And swapThingy needs to be synced for concurrency issues (close in parallel to maybeRefresh).

        I will declare it volatile. Will upload a patch soon.

        BTW, I'd like to rename maybeRefresh to refresh() because this method doesn't return an instance or anything, and to me it's just like calling refresh on say a web page - nothing guarantees that it will change. The method returns true/false depending on whether refresh was done. Are there any objections?

        Show
        Shai Erera added a comment - Thanks for the education guys. Simon's example clarified my confusion – if all access to a variable are synchronized, it doesn't need to be volatile. However in ThingyManager, only setting the variable is synced, but reading it isn't. That's why it needs to be volatile. And swapThingy needs to be synced for concurrency issues (close in parallel to maybeRefresh). I will declare it volatile. Will upload a patch soon. BTW, I'd like to rename maybeRefresh to refresh() because this method doesn't return an instance or anything, and to me it's just like calling refresh on say a web page - nothing guarantees that it will change. The method returns true/false depending on whether refresh was done. Are there any objections?
        Hide
        Shai Erera added a comment -

        Updated patch:

        • ThingyManager renamed to ReferenceManager
        • Declared 'current' volatile (thanks Simon!)
        • Added two tests to TestSM. While they could be under a TestReferenceManager new class, I didn't think that creating another class + a ReferenceManager extension is worth it.
        • Added a CHANGES entry under back-compat (following maybeReopen to maybeRefresh).

        If nobody objects, I'd like to rename maybeRefresh to just refresh, and commit it. Otherwise, I'll commit what I have.

        I've decided to deal with the SearcherTaxoManager in a different issue.

        Show
        Shai Erera added a comment - Updated patch: ThingyManager renamed to ReferenceManager Declared 'current' volatile (thanks Simon!) Added two tests to TestSM. While they could be under a TestReferenceManager new class, I didn't think that creating another class + a ReferenceManager extension is worth it. Added a CHANGES entry under back-compat (following maybeReopen to maybeRefresh). If nobody objects, I'd like to rename maybeRefresh to just refresh, and commit it. Otherwise, I'll commit what I have. I've decided to deal with the SearcherTaxoManager in a different issue.
        Hide
        Robert Muir added a comment -

        For 3.x, is it possible to have maybeReopen still (final+deprecated, calls maybeRefresh) ?

        This would prevent a back compat break

        Show
        Robert Muir added a comment - For 3.x, is it possible to have maybeReopen still (final+deprecated, calls maybeRefresh) ? This would prevent a back compat break
        Hide
        Simon Willnauer added a comment -

        For 3.x, is it possible to have maybeReopen still (final+deprecated, calls maybeRefresh) ?

        +1

        If nobody objects, I'd like to rename maybeRefresh to just refresh, and commit it. Otherwise, I'll commit what I have.

        I think we should reflect the "try" / "maybe" in the name since this operation is non-blocking. I designed this to work nicely with an async system where multiple refresh ops can come in concurrently. this method shouldn't block if we are already refreshing. That said I think we should either use maybe or try as a prefix for the method name.

        Show
        Simon Willnauer added a comment - For 3.x, is it possible to have maybeReopen still (final+deprecated, calls maybeRefresh) ? +1 If nobody objects, I'd like to rename maybeRefresh to just refresh, and commit it. Otherwise, I'll commit what I have. I think we should reflect the "try" / "maybe" in the name since this operation is non-blocking. I designed this to work nicely with an async system where multiple refresh ops can come in concurrently. this method shouldn't block if we are already refreshing. That said I think we should either use maybe or try as a prefix for the method name.
        Hide
        Michael McCandless added a comment -

        Patch looks great Shai!

        Show
        Michael McCandless added a comment - Patch looks great Shai!
        Hide
        Shai Erera added a comment -

        Thanks all for your comments. Robert, I added back maybeReopen to SearcherManager as deprecated.

        Committed rev 1243906 (3x).
        Ported to trunk 1244000 (trunk).

        I will open a separate issue for SearcherTaxoManager

        Show
        Shai Erera added a comment - Thanks all for your comments. Robert, I added back maybeReopen to SearcherManager as deprecated. Committed rev 1243906 (3x). Ported to trunk 1244000 (trunk). I will open a separate issue for SearcherTaxoManager

          People

          • Assignee:
            Shai Erera
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development