Lucene - Core
  1. Lucene - Core
  2. LUCENE-4978

Spatial search with point query won't find identical indexed point

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 4.1
    • Fix Version/s: 4.7.1, 4.8, 6.0
    • Component/s: modules/spatial
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Given a document with indexed POINT (10 20), when a search for INTERSECTS( POINT (10 20)) is issued, no results are returned.

      The work-around is to not search with a point shape, use a very small-radius circle or rectangle. (I'm marking this issue as "minor" because it's easy to do this).

      An unstated objective of the PrefixTree/grid approximation is that no matter what precision you use, an intersects query will find all true-positives. Due to approximations, it may also find some close false-positives. But in the case above, that unstated promise is violated. But it can also happen for query shapes other than points which do in fact barely enclose the point given at index time yet the indexed point is in-effect shifted to the center point of a cell which could be outside the query shape, and ultimately leading to a false-negative.

        Activity

        Hide
        David Smiley added a comment -

        The offending code is the implementation of IntersectsPrefixTreeFilter's VisitorTemplate.visitScanned:

                if (cell.getLevel() == grid.getMaxLevels() && !cell.isLeaf())
                  cShape = cell.getCenter();
                else
                  cShape = cell.getShape();
                if (queryShape.relate(cShape).intersects())
                  collectDocs(results);
        

        What's going on here is a performance optimization to use a point for the intersection instead of a rectangle. I never measured the before-after effect, and I'm not sure if I even thought of it in terms of a performance optimization in the first place although that's what it is.

        One possible solution is to see if the query shape is a rectangle or circle; if so then buffer the query shape by 1/2 a grid cell, which is enough to eliminate the possibility of a false-negative. If it's another shape (whether it be a point or a polygon) then use the cell's rect instead of its center point.

        Show
        David Smiley added a comment - The offending code is the implementation of IntersectsPrefixTreeFilter's VisitorTemplate.visitScanned: if (cell.getLevel() == grid.getMaxLevels() && !cell.isLeaf()) cShape = cell.getCenter(); else cShape = cell.getShape(); if (queryShape.relate(cShape).intersects()) collectDocs(results); What's going on here is a performance optimization to use a point for the intersection instead of a rectangle. I never measured the before-after effect, and I'm not sure if I even thought of it in terms of a performance optimization in the first place although that's what it is. One possible solution is to see if the query shape is a rectangle or circle; if so then buffer the query shape by 1/2 a grid cell, which is enough to eliminate the possibility of a false-negative. If it's another shape (whether it be a point or a polygon) then use the cell's rect instead of its center point.
        Hide
        oli mcc added a comment - - edited

        Hi David,
        I think I've uncovered this same issue via Elasticsearch, see issue 3795 and have verified it with some test cases I've written. Any chance you'd have time to take a look at this? I'm digging in myself, but am still just at a stage of getting a sense for the codebase.

        Show
        oli mcc added a comment - - edited Hi David, I think I've uncovered this same issue via Elasticsearch, see issue 3795 and have verified it with some test cases I've written. Any chance you'd have time to take a look at this? I'm digging in myself, but am still just at a stage of getting a sense for the codebase.
        Hide
        David Smiley added a comment -

        Hi Oli,
        This issue is indeed the root cause of the one you refer to in ES. I spent a little time on fixing the problem months ago but held off because I wanted to better understand the performance trade-off, and I hadn't yet developed a benchmark – through I have one now in LUCENE-2844.

        Correct me if I'm wrong but I heard ES has a point-query optimization. At least I thought I saw something like that when I looked through ES's docs a couple months ago. I would like to add such an optimization within Lucene-spatial which would effectively avoid this particular issue you hit because it would end up being a simple Lucene term query. This underlying issue would still exist though, it just wouldn't show up with a point query.

        If you want a quick solution that only addresses intersection with a Point query, then you could modify the code I reference in the comment above to not use cell.getCenter() when queryShape is an instance of Point. Make sense?

        To be clear though, the quick solution or a solution optimizing a point query doesn't actually address the underlying problem; it just fixes it for point queries only. It's still possible to index a point that fits inside a query rectangle extremely close to the edge, and depending on which side of the grid line the rectangle border is, you might not match the point.

        Show
        David Smiley added a comment - Hi Oli, This issue is indeed the root cause of the one you refer to in ES. I spent a little time on fixing the problem months ago but held off because I wanted to better understand the performance trade-off, and I hadn't yet developed a benchmark – through I have one now in LUCENE-2844 . Correct me if I'm wrong but I heard ES has a point-query optimization. At least I thought I saw something like that when I looked through ES's docs a couple months ago. I would like to add such an optimization within Lucene-spatial which would effectively avoid this particular issue you hit because it would end up being a simple Lucene term query. This underlying issue would still exist though, it just wouldn't show up with a point query. If you want a quick solution that only addresses intersection with a Point query, then you could modify the code I reference in the comment above to not use cell.getCenter() when queryShape is an instance of Point. Make sense? To be clear though, the quick solution or a solution optimizing a point query doesn't actually address the underlying problem; it just fixes it for point queries only. It's still possible to index a point that fits inside a query rectangle extremely close to the edge, and depending on which side of the grid line the rectangle border is, you might not match the point.
        Hide
        oli mcc added a comment - - edited

        Hi David,

        Sorry for the delay in following up, I've actually been spending some time familiarizing myself with the code and have some followup thoughts/questions.

        I think I get the issue now; the optimization short circuits to using the centre (a cheaper relation verification) in some cases, and as a result, the following two query types (a point and polygon) would not match due to no intersection with the centre.

        • It's not clear to me why the quick fix doesn't work in both cases you described however, maybe I'm missing the underlying problem. If we always use cell.getShape() won't we be guaranteed to match all points in the cell (potentially inefficient and with false positives, but not false negatives)?
        • I think this is purely a query time issue, would that be right? I'd like to start indexing documents now, but will avoid doing so if a fix to this will require a re-index.

        To hopefully answer your question, Elasticsearch does have a specific mapping type for geo points (GeoPointFieldMapper.java which I think follows the pattern you described.

        Show
        oli mcc added a comment - - edited Hi David, Sorry for the delay in following up, I've actually been spending some time familiarizing myself with the code and have some followup thoughts/questions. I think I get the issue now; the optimization short circuits to using the centre (a cheaper relation verification) in some cases, and as a result, the following two query types (a point and polygon) would not match due to no intersection with the centre. It's not clear to me why the quick fix doesn't work in both cases you described however, maybe I'm missing the underlying problem. If we always use cell.getShape() won't we be guaranteed to match all points in the cell (potentially inefficient and with false positives, but not false negatives)? I think this is purely a query time issue, would that be right? I'd like to start indexing documents now, but will avoid doing so if a fix to this will require a re-index. To hopefully answer your question, Elasticsearch does have a specific mapping type for geo points ( GeoPointFieldMapper.java which I think follows the pattern you described.
        Hide
        David Smiley added a comment -

        Quoting myself with emphasis:

        If you want a quick solution that only addresses intersection with a Point query, then you could modify the code I reference in the comment above to not use cell.getCenter() when queryShape is an instance of Point.

        So if you want to always avoid this problem (not just for point shapes), simply don't ever use cell.getCenter() regardless of what the query shape is. Always use cell.getShape().

        So you're on the right track. You are correct that this issue won't affect indexed data.

        Show
        David Smiley added a comment - Quoting myself with emphasis: If you want a quick solution that only addresses intersection with a Point query, then you could modify the code I reference in the comment above to not use cell.getCenter() when queryShape is an instance of Point . So if you want to always avoid this problem (not just for point shapes), simply don't ever use cell.getCenter() regardless of what the query shape is. Always use cell.getShape(). So you're on the right track. You are correct that this issue won't affect indexed data.
        Hide
        David Smiley added a comment -

        This patch addresses the issue simply by removing the optimization. I did some performance tests with rects & circles and it was very minor, although I didn't test polygons which should have a greater effect.

        While I was at it, I beefed up the tests further in ways that would have previously failed due to the false-negative. I removed an older test: RecursivePrefixTreeTest.geohashRecursiveRandom() which is hard to maintain and is now obsoleted by SpatialOpRecursivePrefixTreeTest which now uses geohashes.

        I'll commit this Monday.

        Show
        David Smiley added a comment - This patch addresses the issue simply by removing the optimization. I did some performance tests with rects & circles and it was very minor, although I didn't test polygons which should have a greater effect. While I was at it, I beefed up the tests further in ways that would have previously failed due to the false-negative. I removed an older test: RecursivePrefixTreeTest.geohashRecursiveRandom() which is hard to maintain and is now obsoleted by SpatialOpRecursivePrefixTreeTest which now uses geohashes. I'll commit this Monday.
        Hide
        ASF subversion and git services added a comment -

        Commit 1578741 from David Smiley in branch 'dev/trunk'
        [ https://svn.apache.org/r1578741 ]

        LUCENE-4978: spatial grid false-negatives at edge

        Show
        ASF subversion and git services added a comment - Commit 1578741 from David Smiley in branch 'dev/trunk' [ https://svn.apache.org/r1578741 ] LUCENE-4978 : spatial grid false-negatives at edge
        Hide
        ASF subversion and git services added a comment -

        Commit 1578742 from David Smiley in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1578742 ]

        LUCENE-4978: spatial grid false-negatives at edge

        Show
        ASF subversion and git services added a comment - Commit 1578742 from David Smiley in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1578742 ] LUCENE-4978 : spatial grid false-negatives at edge
        Hide
        ASF subversion and git services added a comment -

        Commit 1578889 from David Smiley in branch 'dev/trunk'
        [ https://svn.apache.org/r1578889 ]

        LUCENE-4978: invert biasContains on query side

        Show
        ASF subversion and git services added a comment - Commit 1578889 from David Smiley in branch 'dev/trunk' [ https://svn.apache.org/r1578889 ] LUCENE-4978 : invert biasContains on query side
        Hide
        ASF subversion and git services added a comment -

        Commit 1578891 from David Smiley in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1578891 ]

        LUCENE-4978: invert biasContains on query side

        Show
        ASF subversion and git services added a comment - Commit 1578891 from David Smiley in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1578891 ] LUCENE-4978 : invert biasContains on query side
        Hide
        Steve Rowe added a comment -

        David Smiley, any reason not to backport this to 4.7.1?

        Show
        Steve Rowe added a comment - David Smiley , any reason not to backport this to 4.7.1?
        Hide
        David Smiley added a comment -

        Oh I didn't know there was going to be a 4.7.1. I'll backport. Thanks for alerting me.

        Show
        David Smiley added a comment - Oh I didn't know there was going to be a 4.7.1. I'll backport. Thanks for alerting me.
        Hide
        Steve Rowe added a comment -

        I proposed 4.7.1 a week ago on the dev list: http://mail-archives.apache.org/mod_mbox/lucene-dev/201403.mbox/%3c49AB4538-88CC-4273-AC9B-B51FA3CB3F45@gmail.com%3e

        David, I'd like to cut a 4.7.1 RC today - do you think you'll have time to do the backport today? If not, I can take a crack at it.

        Show
        Steve Rowe added a comment - I proposed 4.7.1 a week ago on the dev list: http://mail-archives.apache.org/mod_mbox/lucene-dev/201403.mbox/%3c49AB4538-88CC-4273-AC9B-B51FA3CB3F45@gmail.com%3e David, I'd like to cut a 4.7.1 RC today - do you think you'll have time to do the backport today? If not, I can take a crack at it.
        Hide
        David Smiley added a comment -

        I'll do it now.

        Show
        David Smiley added a comment - I'll do it now.
        Hide
        ASF subversion and git services added a comment -

        Commit 1581017 from David Smiley in branch 'dev/branches/lucene_solr_4_7'
        [ https://svn.apache.org/r1581017 ]

        LUCENE-4978: spatial grid false-negatives at edge

        Show
        ASF subversion and git services added a comment - Commit 1581017 from David Smiley in branch 'dev/branches/lucene_solr_4_7' [ https://svn.apache.org/r1581017 ] LUCENE-4978 : spatial grid false-negatives at edge
        Hide
        ASF subversion and git services added a comment -

        Commit 1581019 from David Smiley in branch 'dev/branches/lucene_solr_4_7'
        [ https://svn.apache.org/r1581019 ]

        LUCENE-4978: invert biasContains on query side

        Show
        ASF subversion and git services added a comment - Commit 1581019 from David Smiley in branch 'dev/branches/lucene_solr_4_7' [ https://svn.apache.org/r1581019 ] LUCENE-4978 : invert biasContains on query side
        Hide
        Steve Rowe added a comment -

        Bulk close 4.7.1 issues

        Show
        Steve Rowe added a comment - Bulk close 4.7.1 issues

          People

          • Assignee:
            David Smiley
            Reporter:
            David Smiley
          • Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development