Lucene - Core
  1. Lucene - Core
  2. LUCENE-4025

ReferenceManager.maybeRefresh should allow the caller to block

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      ReferenceManager.maybeRefresh() returns a boolean today, specifying whether the maybeRefresh logic was executed by the caller or not. If it's false, it means that another thread is currently refreshing and the call returns immediately.

      I think that that's inconvenient to the caller. I.e., if you wanted to do something like:

      writer.commit();
      searcherManager.maybeRefresh();
      searcherManager.acquire();
      

      It'd be better if you could guarantee that when the maybeRefresh() call returned, the follow on acquire() will return a refreshed IndexSearcher. Even if you omit the commit instruction, it'd be good if you can guarantee that.

      I don't quite see the benefit of having the caller thread not block if there's another thread currently refreshing. In, I believe, most cases, you'd anyway have just one thread calling maybeRefresh(). Even if not, the only benefit of not blocking is if you have commit() followed by maybeRefresh() logic done by some threads, while other threads acquire searchers - maybe then you wouldn't care if another thread is currently doing the refresh?

      Actually, I tend to think that not blocking is buggy? I mean, what if two threads do commit() + maybeRefresh(). The first thread finishes commit, enters maybeRefresh(), acquires the lock and reopens the Reader. Then the second thread does its commit(), enters maybeRefresh, fails to obtain the lock and exits. Its changes won't be exposed by SM until the next maybeRefresh() is called.

      So it looks to me like current logic may be buggy in that sense?

      1. LUCENE-4025.patch
        4 kB
        Shai Erera
      2. LUCENE-4025.patch
        5 kB
        Shai Erera

        Activity

        Shai Erera created issue -
        Hide
        Michael McCandless added a comment -

        I'm torn here...

        Always blocking seems dangerous because "simple" apps, that call .maybeRefresh() from searching threads, will suddenly see all search threads block when a reopen is happening ... I think it's too trappy to do that. I think when it comes to concurrent code you should try hard to not have "blocks all threads" trappiness...

        But I also agree it's a hassle now for callers that want to ensure specific changes are now visible... though, we do have NRTManager for that use case.

        Maybe we can add an optional boolean (doWait, defaulting to false) to maybeRefresh? Or we a separate method for it (maybeRefreshAndWaitIfItsAlreadyHappening), or, we can make decoupled method (waitForRefreshToFinish) and change maybeRefresh to return 3 results (IS_CURRENT, DID_REFRESH, OTHER_THREAD_IS_REFRESHING)... or, something else?

        Show
        Michael McCandless added a comment - I'm torn here... Always blocking seems dangerous because "simple" apps, that call .maybeRefresh() from searching threads, will suddenly see all search threads block when a reopen is happening ... I think it's too trappy to do that. I think when it comes to concurrent code you should try hard to not have "blocks all threads" trappiness... But I also agree it's a hassle now for callers that want to ensure specific changes are now visible... though, we do have NRTManager for that use case. Maybe we can add an optional boolean (doWait, defaulting to false) to maybeRefresh? Or we a separate method for it (maybeRefreshAndWaitIfItsAlreadyHappening), or, we can make decoupled method (waitForRefreshToFinish) and change maybeRefresh to return 3 results (IS_CURRENT, DID_REFRESH, OTHER_THREAD_IS_REFRESHING)... or, something else?
        Hide
        Hoss Man added a comment -

        isn't this exactly why things like the ExecutorService and/or Future APIs exist? let the caller decide if they want the methods they call to be executed in the current thread or as part of a thread pool and/or block until the logic is executed (with a possible time limit)

        Show
        Hoss Man added a comment - isn't this exactly why things like the ExecutorService and/or Future APIs exist? let the caller decide if they want the methods they call to be executed in the current thread or as part of a thread pool and/or block until the logic is executed (with a possible time limit)
        Hide
        Shai Erera added a comment -

        I'm torn here...

        I'm also torn here .

        Always blocking seems dangerous because "simple" apps, that call .maybeRefresh() from searching threads, will suddenly see all search threads block when a reopen is happening

        maybeRefresh says that you should call this periodically. I consider SearcherManager kind of advanced API. Calling maybeRefresh() on every search is like calling IndexReader.openIfChanged before every search. If we need to, let's document the implications of the method.

        Not blocking might be buggy, while blocking might affect your performance. I think that the performance issue is really and edge case of stupidity, while the former is a simple expectation. maybeRefresh documents that if it returns false, it might be that the next acquire won't return the most up-to-date IndexSearcher, but it doesn't give you any way to ensure that.

        As for the extra API, can we start by unnecessarily complicating the API? It looks to me that the API is clear with plenty of sample code and documentation (Mike, you even wrote a couple of blog posts about it). It's just a matter of semantics and if we tell people to call maybeRefresh periodically, then let's help them by ensuring this call does something.

        Show
        Shai Erera added a comment - I'm torn here... I'm also torn here . Always blocking seems dangerous because "simple" apps, that call .maybeRefresh() from searching threads, will suddenly see all search threads block when a reopen is happening maybeRefresh says that you should call this periodically . I consider SearcherManager kind of advanced API. Calling maybeRefresh() on every search is like calling IndexReader.openIfChanged before every search. If we need to, let's document the implications of the method. Not blocking might be buggy, while blocking might affect your performance. I think that the performance issue is really and edge case of stupidity, while the former is a simple expectation. maybeRefresh documents that if it returns false, it might be that the next acquire won't return the most up-to-date IndexSearcher, but it doesn't give you any way to ensure that. As for the extra API, can we start by unnecessarily complicating the API? It looks to me that the API is clear with plenty of sample code and documentation (Mike, you even wrote a couple of blog posts about it). It's just a matter of semantics and if we tell people to call maybeRefresh periodically, then let's help them by ensuring this call does something.
        Hide
        Simon Willnauer added a comment -

        my take on this is that if you really want this point in time semantics you are describing you should use your own application locking on top. You can simply use a read/write lock and acquire the write lock if you enter the "commit" block which I assume you don't do too frequently. The other option is to simply call maybeRefresh in an empty loop. The non-blocking fashion is a big asset here IMO since you can't build a non-blocking app with blocking components but you can do the other way around.

        Not blocking might be buggy, while blocking might affect your performance.

        that is bogus IMO.

        Show
        Simon Willnauer added a comment - my take on this is that if you really want this point in time semantics you are describing you should use your own application locking on top. You can simply use a read/write lock and acquire the write lock if you enter the "commit" block which I assume you don't do too frequently. The other option is to simply call maybeRefresh in an empty loop. The non-blocking fashion is a big asset here IMO since you can't build a non-blocking app with blocking components but you can do the other way around. Not blocking might be buggy, while blocking might affect your performance. that is bogus IMO.
        Hide
        Shai Erera added a comment -

        Calling maybeRefresh in an empty loop is not good. You'd want to at least add some sleep in between calls. And then you need to decide on the sleep interval. Complicate things.

        What's not clear to me is why was this API made non-blocking in the first place? Did someone complain about it being blocking?

        The non-blocking fashion is a big asset here IMO since you can't build a non-blocking app with blocking components but you can do the other way around.

        That's generally a correct statement, but who said that in the context of a ReferenceManager you want to build a non-blocking app?

        Perhaps a maybeRefreshBlocking() will be the best compromise after all. That method won't return anything and will just block on reopenLock. I'll create a patch, let's see how it looks first. While I'm at it, I'll rename reopenLock to refreshLock (reopenLock was from the time it was in SearcherManager).

        Show
        Shai Erera added a comment - Calling maybeRefresh in an empty loop is not good. You'd want to at least add some sleep in between calls. And then you need to decide on the sleep interval. Complicate things. What's not clear to me is why was this API made non-blocking in the first place? Did someone complain about it being blocking? The non-blocking fashion is a big asset here IMO since you can't build a non-blocking app with blocking components but you can do the other way around. That's generally a correct statement, but who said that in the context of a ReferenceManager you want to build a non-blocking app? Perhaps a maybeRefreshBlocking() will be the best compromise after all. That method won't return anything and will just block on reopenLock. I'll create a patch, let's see how it looks first. While I'm at it, I'll rename reopenLock to refreshLock (reopenLock was from the time it was in SearcherManager).
        Hide
        Shai Erera added a comment -

        Patch adds maybeRefreshBlocking which blocks on refreshLock. It also:

        • Renames reopenLock to refreshLock
        • Shares the refresh logic between maybeRefresh and maybeRefreshBlocking.
        • Switches to use ReentrantLock instead of Semaphore(1):
          • It allows to protectively obtain the lock in doMaybeRefresh (see comment in the code)
          • It is better in general because it's equivalent to Semaphore(1), yet protects against an accidental bug where someone changes Semaphore(1) to Semaphore(>1).

        I'll add a CHANGES entry after the patch is reviewed and we agree on the approach.

        Show
        Shai Erera added a comment - Patch adds maybeRefreshBlocking which blocks on refreshLock. It also: Renames reopenLock to refreshLock Shares the refresh logic between maybeRefresh and maybeRefreshBlocking. Switches to use ReentrantLock instead of Semaphore(1): It allows to protectively obtain the lock in doMaybeRefresh (see comment in the code) It is better in general because it's equivalent to Semaphore(1), yet protects against an accidental bug where someone changes Semaphore(1) to Semaphore(>1). I'll add a CHANGES entry after the patch is reviewed and we agree on the approach.
        Shai Erera made changes -
        Field Original Value New Value
        Attachment LUCENE-4025.patch [ 12525157 ]
        Hide
        Michael McCandless added a comment -

        Approach & patch look good!

        Maybe change javadocs to state that "you must call maybeRefresh or maybeRefreshBlocking periodically" (now each one states you must call that one). Also maybe say "if another thread is currently refreshing, this method blocks until that thread completes".

        Show
        Michael McCandless added a comment - Approach & patch look good! Maybe change javadocs to state that "you must call maybeRefresh or maybeRefreshBlocking periodically" (now each one states you must call that one). Also maybe say "if another thread is currently refreshing, this method blocks until that thread completes".
        Hide
        Shai Erera added a comment -

        Thanks Mike. I updated the javadoc as you suggest and added a CHANGES entry.

        I think this is ready to commit.

        Show
        Shai Erera added a comment - Thanks Mike. I updated the javadoc as you suggest and added a CHANGES entry. I think this is ready to commit.
        Shai Erera made changes -
        Attachment LUCENE-4025.patch [ 12525171 ]
        Hide
        Michael McCandless added a comment -

        +1, looks good. Thanks Shai!

        Show
        Michael McCandless added a comment - +1, looks good. Thanks Shai!
        Hide
        Shai Erera added a comment -

        Committed rev 1332699.

        I also added randomly calling it from TestSearcherManager.

        Thanks Mike for the review !

        Show
        Shai Erera added a comment - Committed rev 1332699. I also added randomly calling it from TestSearcherManager. Thanks Mike for the review !
        Shai Erera made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Lucene Fields New [ 10121 ] New,Patch Available [ 10121,10120 ]
        Resolution Fixed [ 1 ]
        Uwe Schindler made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development