Lucene - Core
  1. Lucene - Core
  2. LUCENE-7019

explore two-phase iteration for GeoPoint query

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.5, 6.0
    • Component/s: modules/spatial
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This query today uses an approximation+confirm approach, but it all happens when you call scorer(), in a termsEnum loop.

      This causes several problems (even after https://issues.apache.org/jira/browse/LUCENE-7018) because it can do too much work, if queries have multiple values since the doc can be "confirmed" more than once.

      I think it would be better to delay this confirmation as much as possible, so that other parts of the query (e.g. other filters, conjunctions, etc) can eliminate checks as well.

      1. LUCENE-7019.patch
        6 kB
        Robert Muir
      2. LUCENE-7019.patch
        5 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        Here's an initial patch, the tests seem happy.

        In order to keep the optimization where we only verify edges, we have to use an additional SparseBitSet.

        I didn't do any benchmarking yet: i expect that if you benchmark the query in isolation, it can only be very slightly slower (since it needs to set another bit for those edge terms). But I think it should be much better if e.g. AND'd with a termquery or similar.

        Show
        Robert Muir added a comment - Here's an initial patch, the tests seem happy. In order to keep the optimization where we only verify edges, we have to use an additional SparseBitSet. I didn't do any benchmarking yet: i expect that if you benchmark the query in isolation, it can only be very slightly slower (since it needs to set another bit for those edge terms). But I think it should be much better if e.g. AND'd with a termquery or similar.
        Hide
        Robert Muir added a comment -

        updated patch that eliminates more checks for e.g. multi-valued cases (by inverting the logic to "preapproved" docs).

        since here we talk about docs not terms, there is no reason to assume edge cases are "abnormal", so it basically just flips the logic.

        also rather than hardcoding sparse bitset, i added a crude heuristic to use a sparse impl only when the field is sparse, to try to reduce the overhead of this change.

        Show
        Robert Muir added a comment - updated patch that eliminates more checks for e.g. multi-valued cases (by inverting the logic to "preapproved" docs). since here we talk about docs not terms, there is no reason to assume edge cases are "abnormal", so it basically just flips the logic. also rather than hardcoding sparse bitset, i added a crude heuristic to use a sparse impl only when the field is sparse, to try to reduce the overhead of this change.
        Hide
        Adrien Grand added a comment -

        +1 to use two-phase iteration. The patch looks good.

        Show
        Adrien Grand added a comment - +1 to use two-phase iteration. The patch looks good.
        Hide
        Michael McCandless added a comment -

        +1

        Show
        Michael McCandless added a comment - +1
        Hide
        Michael McCandless added a comment -

        I think this is a quite nasty performance bug / adversary in the multi valued case ... I think we should fix it for 5.5.

        Show
        Michael McCandless added a comment - I think this is a quite nasty performance bug / adversary in the multi valued case ... I think we should fix it for 5.5.
        Hide
        Nicholas Knize added a comment -

        +1

        This is great! Some dirty benchmarks indicate a few 100 ms boost in query performance with the greatest boost on multi-valued queries. I also backported to 5.4 as this does a better job fixing the multi-valued performance bug than LUCENE-7018.

        Show
        Nicholas Knize added a comment - +1 This is great! Some dirty benchmarks indicate a few 100 ms boost in query performance with the greatest boost on multi-valued queries. I also backported to 5.4 as this does a better job fixing the multi-valued performance bug than LUCENE-7018 .
        Hide
        ASF subversion and git services added a comment -

        Commit a928e4b40652cad760cf2d596db08370c07dfc2f in lucene-solr's branch refs/heads/master from nknize
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a928e4b ]

        LUCENE-7019: add two-phase iteration to GeoPointTermQueryConstantScoreWrapper

        Show
        ASF subversion and git services added a comment - Commit a928e4b40652cad760cf2d596db08370c07dfc2f in lucene-solr's branch refs/heads/master from nknize [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a928e4b ] LUCENE-7019 : add two-phase iteration to GeoPointTermQueryConstantScoreWrapper
        Hide
        ASF subversion and git services added a comment -

        Commit b92ccc01f6daa43a2afb464c9112d53cbba9cc00 in lucene-solr's branch refs/heads/branch_5x from nknize
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b92ccc0 ]

        LUCENE-7019: add two-phase iteration to GeoPointTermQueryConstantScoreWrapper

        Show
        ASF subversion and git services added a comment - Commit b92ccc01f6daa43a2afb464c9112d53cbba9cc00 in lucene-solr's branch refs/heads/branch_5x from nknize [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b92ccc0 ] LUCENE-7019 : add two-phase iteration to GeoPointTermQueryConstantScoreWrapper
        Hide
        ASF subversion and git services added a comment -

        Commit b8f57232f2d3ea2304b530f10576922a665786b2 in lucene-solr's branch refs/heads/branch_5_4 from nknize
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b8f5723 ]

        LUCENE-7019: add two-phase iteration to GeoPointTermQueryConstantScoreWrapper

        Show
        ASF subversion and git services added a comment - Commit b8f57232f2d3ea2304b530f10576922a665786b2 in lucene-solr's branch refs/heads/branch_5_4 from nknize [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b8f5723 ] LUCENE-7019 : add two-phase iteration to GeoPointTermQueryConstantScoreWrapper

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development