Lucene - Core
  1. Lucene - Core
  2. LUCENE-2041

Complete parallelizaton of ParallelMultiSearcher

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.9
    • Fix Version/s: 3.0
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      Patch Available

      Description

      ParallelMultiSearcher is parallel only for the method signatures of 'search'.

      Part of a query process calls the method docFreq(). There was a TODO comment to parallelize this. Parallelizing this method actually increases the performance of a query on multiple indexes, especially remotely.

      1. LUCENE-2041-refactor.patch
        23 kB
        Uwe Schindler
      2. LUCENE-2041-refactor.patch
        29 kB
        Uwe Schindler
      3. LUCENE-2041-final.patch
        32 kB
        Uwe Schindler
      4. LUCENE-2041.patch
        13 kB
        Joey Surls
      5. LUCENE-2041.patch
        17 kB
        Uwe Schindler
      6. LUCENE-2041.patch
        17 kB
        Simon Willnauer
      7. LUCENE-2041.patch
        19 kB
        Uwe Schindler
      8. LUCENE-2041.patch
        28 kB
        Simon Willnauer
      9. LUCENE_2041.patch
        17 kB
        Simon Willnauer

        Activity

        Hide
        Israel Tsadok added a comment -

        This has turned into a complete refactoring of the class, but I'd like to comment that the original intent of this patch was a bit misguided: docFreq() is not called during normal use of ParallelMultiSearcher. What does get called is docFreqs() on the sub-searchers. This is done in MultiSearcher.createWeight(), and is done sequentially.

        I created issue LUCENE-2128 to suggest a solution to this.

        Show
        Israel Tsadok added a comment - This has turned into a complete refactoring of the class, but I'd like to comment that the original intent of this patch was a bit misguided: docFreq() is not called during normal use of ParallelMultiSearcher. What does get called is docFreqs() on the sub-searchers. This is done in MultiSearcher.createWeight(), and is done sequentially. I created issue LUCENE-2128 to suggest a solution to this.
        Hide
        Uwe Schindler added a comment -

        (Heavy) committed revision: 834550

        Thanks Joey & Simon

        Show
        Uwe Schindler added a comment - (Heavy) committed revision: 834550 Thanks Joey & Simon
        Hide
        Simon Willnauer added a comment -

        +1 go for it - seems to be quite pretty compared to the current trunk version.

        Show
        Simon Willnauer added a comment - +1 go for it - seems to be quite pretty compared to the current trunk version.
        Hide
        Uwe Schindler added a comment -

        Merged our patches somehow, but left HitQueue generic param (I like this more, because it conforms to the sort inner class). I think this is final now.

        Ready to commit, all tests are running.

        Show
        Uwe Schindler added a comment - Merged our patches somehow, but left HitQueue generic param (I like this more, because it conforms to the sort inner class). I think this is final now. Ready to commit, all tests are running.
        Hide
        Uwe Schindler added a comment -

        Oh I didn't see your patch. But its not so different, I also did lots of code cleanup.

        I also used ReentrantLocks. I added a new DummyLock to utils, that is a replacement for ReentrantLock, but does simply nothing. By this, the synchronized around the HitQueue in MultiSearcher can be ommitted.

        Show
        Uwe Schindler added a comment - Oh I didn't see your patch. But its not so different, I also did lots of code cleanup. I also used ReentrantLocks. I added a new DummyLock to utils, that is a replacement for ReentrantLock, but does simply nothing. By this, the synchronized around the HitQueue in MultiSearcher can be ommitted.
        Hide
        Uwe Schindler added a comment -

        Here my latest patch, will commit soon.

        Uwe

        Show
        Uwe Schindler added a comment - Here my latest patch, will commit soon. Uwe
        Hide
        Simon Willnauer added a comment -

        Added some more final keywords and generalized HitQueue to PriorityQueue<ScoreDoc> in search (no sort)

        I guess we are set with this patch, good team work Uwe! Lets get this out for heavy committing!

        I get the impression that we should think about moving stuff like the Function interface out to utils. This could be useful in many situations though. Thoughts?

        Show
        Simon Willnauer added a comment - Added some more final keywords and generalized HitQueue to PriorityQueue<ScoreDoc> in search (no sort) I guess we are set with this patch, good team work Uwe! Lets get this out for heavy committing! I get the impression that we should think about moving stuff like the Function interface out to utils. This could be useful in many situations though. Thoughts?
        Hide
        Uwe Schindler added a comment -

        Here another possibility, removing more duplicated code: MultiSearcher contains now the merge Callables as static inner classes. These Callables are used directly without Executor in MultiSearcher, but with Executor in ParallelMultiSearcher.

        Show
        Uwe Schindler added a comment - Here another possibility, removing more duplicated code: MultiSearcher contains now the merge Callables as static inner classes. These Callables are used directly without Executor in MultiSearcher, but with Executor in ParallelMultiSearcher.
        Hide
        Uwe Schindler added a comment -

        Advanced patch:

        • Removed @SuppressWarnings by duplicating some code, but it is type-safe now
        • Added support for maxScore in sorted mode
        • Refactoring: made all internal classes static inner classes
        • Fix thread safety: hq.setFields() synchronized
        Show
        Uwe Schindler added a comment - Advanced patch: Removed @SuppressWarnings by duplicating some code, but it is type-safe now Added support for maxScore in sorted mode Refactoring: made all internal classes static inner classes Fix thread safety: hq.setFields() synchronized
        Hide
        Uwe Schindler added a comment -

        I will also readd support for MaxScore in the fielded case (it is still supported by Searchable.search(SortField))!!!

        Show
        Uwe Schindler added a comment - I will also readd support for MaxScore in the fielded case (it is still supported by Searchable.search(SortField))!!!
        Hide
        Uwe Schindler added a comment -

        I also changed some formatting, we require whitespace after control statements like for, while, catch,... and before {

        Show
        Uwe Schindler added a comment - I also changed some formatting, we require whitespace after control statements like for, while, catch,... and before {
        Hide
        Simon Willnauer added a comment -

        hehe - my patch was quicker but you comment appears first

        Show
        Simon Willnauer added a comment - hehe - my patch was quicker but you comment appears first
        Hide
        Simon Willnauer added a comment -

        Fixed the Java5 issue.
        Damned I should use java 5 on my notebook.

        simon

        Show
        Simon Willnauer added a comment - Fixed the Java5 issue. Damned I should use java 5 on my notebook. simon
        Hide
        Uwe Schindler added a comment -

        Updated patch.

        Show
        Uwe Schindler added a comment - Updated patch.
        Hide
        Uwe Schindler added a comment -

        String.isEmpty() is Java 6 only, changed this to length() == 0.

        Show
        Uwe Schindler added a comment - String.isEmpty() is Java 6 only, changed this to length() == 0.
        Hide
        Simon Willnauer added a comment -

        thanks uwe!

        Show
        Simon Willnauer added a comment - thanks uwe!
        Hide
        Uwe Schindler added a comment -

        I take it as communicated.

        Show
        Uwe Schindler added a comment - I take it as communicated.
        Hide
        Simon Willnauer added a comment -

        I drove this a bit further and refactored the whole code to use the java 1.5 concurrent utils. I seem to be somewhat cleaner with executors / callable and a little refactoring.

        simon

        Show
        Simon Willnauer added a comment - I drove this a bit further and refactored the whole code to use the java 1.5 concurrent utils. I seem to be somewhat cleaner with executors / callable and a little refactoring. simon
        Hide
        Uwe Schindler added a comment -

        Ya, I thought that's why the patch was full was from the formatted code. Sorry about that. I'll look for that in the future. I noticed the link in the wiki for the formatting under Helpful Resources doesn't work anymore.

        Fixed that and uploaded a new Eclipse Galieo style also containing our way for Java 5 Generics.

        Show
        Uwe Schindler added a comment - Ya, I thought that's why the patch was full was from the formatted code. Sorry about that. I'll look for that in the future. I noticed the link in the wiki for the formatting under Helpful Resources doesn't work anymore. Fixed that and uploaded a new Eclipse Galieo style also containing our way for Java 5 Generics.
        Hide
        Joey Surls added a comment -

        I agree with you on the Pooled Executor and concurrent utils, I was actually trying to copy the semantics of the search methods in place so someone could quickly grasp the code.

        Ya, I thought that's why the patch was full was from the formatted code. Sorry about that. I'll look for that in the future. I noticed the link in the wiki for the formatting under Helpful Resources doesn't work anymore.

        http://wiki.apache.org/lucene-java/HowToContribute

        Show
        Joey Surls added a comment - I agree with you on the Pooled Executor and concurrent utils, I was actually trying to copy the semantics of the search methods in place so someone could quickly grasp the code. Ya, I thought that's why the patch was full was from the formatted code. Sorry about that. I'll look for that in the future. I noticed the link in the wiki for the formatting under Helpful Resources doesn't work anymore. http://wiki.apache.org/lucene-java/HowToContribute
        Hide
        Simon Willnauer added a comment -

        moving to 3.1 for now. we will see how this turns out

        Show
        Simon Willnauer added a comment - moving to 3.1 for now. we will see how this turns out
        Hide
        Simon Willnauer added a comment -

        Hey Joey,
        good to see you coming up with patches, nice that you catch up on that TODO. You added support for parallel DocFreq which is good! Did you think about using an interface from the new concurrent utils instead of subclassing thread? I would like to see people using the new features which make code slightly more readable too. You would be able to simply throw the exception in the Callable#call() method and retrieve it once it has terminated.
        you could also use a Pooled Executor to run those threads so you don't have to spawn them each time you call docFreq on the multi searcher.

        one more thing, your patch is hard to read as you formated the source code. please try to do not format it - that would make it way easier to read and see what has been added / changed.

        Show
        Simon Willnauer added a comment - Hey Joey, good to see you coming up with patches, nice that you catch up on that TODO. You added support for parallel DocFreq which is good! Did you think about using an interface from the new concurrent utils instead of subclassing thread? I would like to see people using the new features which make code slightly more readable too. You would be able to simply throw the exception in the Callable#call() method and retrieve it once it has terminated. you could also use a Pooled Executor to run those threads so you don't have to spawn them each time you call docFreq on the multi searcher. one more thing, your patch is hard to read as you formated the source code. please try to do not format it - that would make it way easier to read and see what has been added / changed.

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Joey Surls
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 1h
              1h
              Remaining:
              Remaining Estimate - 1h
              1h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development