Lucene - Core
  1. Lucene - Core
  2. LUCENE-2006

Optimization for FieldDocSortedHitQueue

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.0
    • Fix Version/s: 3.0
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      When updating core for generics, I found the following as a optimization of FieldDocSortedHitQueue:

      All FieldDoc values are Compareables (also the score or docid, if they
      appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code
      of lessThan seems very ineffective, as it has a big switch statement on the
      SortField type, then casts the value to the underlying numeric type Object,
      calls Number.xxxValue() & co for it and then compares manually. As
      j.l.Number is itself Comparable, I see no reason to do this. Just call
      compareTo on the Comparable interface and we are happy. The big deal is that
      it prevents casting and the two method calls xxxValue(), as Number.compareTo
      works more efficient internally.

      The only special cases are String sort, where the Locale may be used and the
      score sorting which is backwards. But these are two if statements instead of
      the whole switch.

      I had not tested it now for performance, but in my opinion it should be
      faster for MultiSearchers. All tests still pass (because they should).

      1. LUCENE-2006.patch
        4 kB
        Uwe Schindler

        Activity

        Hide
        Uwe Schindler added a comment -

        Patch.

        Show
        Uwe Schindler added a comment - Patch.
        Hide
        Uwe Schindler added a comment -

        Mark Miller on java-dev:

        Nice! I like it. Even if its not much faster (havn't checked either), I
        can't see it being much slower and its cleaner code.

        I'd be happy to do some quick perf tests when I get a chance, but I'm +1
        on it.

        Show
        Uwe Schindler added a comment - Mark Miller on java-dev: Nice! I like it. Even if its not much faster (havn't checked either), I can't see it being much slower and its cleaner code. I'd be happy to do some quick perf tests when I get a chance, but I'm +1 on it.
        Hide
        Uwe Schindler added a comment -

        Is there any MultiSearcher related task/alg in contrib/benchmark or somewhere in JIRA?

        Show
        Uwe Schindler added a comment - Is there any MultiSearcher related task/alg in contrib/benchmark or somewhere in JIRA?
        Hide
        Uwe Schindler added a comment -

        The reason why this code looked like this is simple (from SVN log): at the beginning the FieldDoc values were just "Object[] fields". So the casts were needed. After adding custom comparators they get "Comparable". So there was no real perf idea behind doing it so complicated and ineffective.

        Show
        Uwe Schindler added a comment - The reason why this code looked like this is simple (from SVN log): at the beginning the FieldDoc values were just "Object[] fields". So the casts were needed. After adding custom comparators they get "Comparable". So there was no real perf idea behind doing it so complicated and ineffective.
        Hide
        Mark Miller added a comment -

        Okay Uwe -

        I took a 2 GB zipped Wiki dump and used a SkipDocTask to create four unique indices of 64,000 docs each.

        Then I ran a search matching all docs and sorting on title, taking the average of 1000 runs and recording that overage over a few times for each method.

        I tried topn's of 10, 100, and 1000.

        I couldn't measure a meaningful difference one way or the other. Lets do it.

        Show
        Mark Miller added a comment - Okay Uwe - I took a 2 GB zipped Wiki dump and used a SkipDocTask to create four unique indices of 64,000 docs each. Then I ran a search matching all docs and sorting on title, taking the average of 1000 runs and recording that overage over a few times for each method. I tried topn's of 10, 100, and 1000. I couldn't measure a meaningful difference one way or the other. Lets do it.
        Hide
        Uwe Schindler added a comment -

        I think it is because you only merge the top 1000 docs into the HitQueue. The merging of the HQs at the end of search is simple, because it only merges the top n docs of each queue. You would only see a difference if you sort all hits.

        I think we can commit this, too.

        Show
        Uwe Schindler added a comment - I think it is because you only merge the top 1000 docs into the HitQueue. The merging of the HQs at the end of search is simple, because it only merges the top n docs of each queue. You would only see a difference if you sort all hits. I think we can commit this, too.
        Hide
        Mark Miller added a comment -

        Right - I don't think we have to worry about things much over top 1000.

        And while I don't want to take the time to do top 4*64,000, for kicks I tried top 64,000 over a couple runs.

        It actually does show a 2-3% win with the new method once you get up that high

        Its somethin' anyway

        Show
        Mark Miller added a comment - Right - I don't think we have to worry about things much over top 1000. And while I don't want to take the time to do top 4*64,000, for kicks I tried top 64,000 over a couple runs. It actually does show a 2-3% win with the new method once you get up that high Its somethin' anyway
        Hide
        Uwe Schindler added a comment -

        OK, I commit soon!

        Show
        Uwe Schindler added a comment - OK, I commit soon!
        Hide
        Uwe Schindler added a comment -

        Committed revision: 829274

        Thanks Mark for perf testing!

        Show
        Uwe Schindler added a comment - Committed revision: 829274 Thanks Mark for perf testing!

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Uwe Schindler
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development