Lucene - Core
  1. Lucene - Core
  2. LUCENE-5714

Improve tests for BBoxStrategy then port to 4x.

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.10, 6.0
    • Component/s: modules/spatial
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      BBoxStrategy needs better tests before I'm comfortable seeing it in 4x. Specifically it should use random rectangles based validation (ones that may cross the dateline), akin to the other tests. And I think I see an equals/hashcode bug to be fixed in there too.

      One particular thing I'd like to see added is how to handle a zero-area case for AreaSimilarity. I think an additional feature in which you declare a minimum % area (relative to the query shape) would be good.

      It should be possible for the user to combine rectangle center-point to query shape center-point distance sorting as well. I think it is but I need to make sure it's possible without having to index a separate center point field.

      Another possibility (probably not to be addressed here) is a minimum ratio between width/height, perhaps 10%. A long but nearly no height line should not be massively disadvantaged relevancy-wise to an equivalently long diagonal road that has a square bbox.

        Issue Links

          Activity

          Hide
          ASF subversion and git services added a comment -

          Commit 1603216 from David Smiley in branch 'dev/trunk'
          [ https://svn.apache.org/r1603216 ]

          LUCENE-5714: small internal refactor to use varargs and remove redundant null initializations

          Show
          ASF subversion and git services added a comment - Commit 1603216 from David Smiley in branch 'dev/trunk' [ https://svn.apache.org/r1603216 ] LUCENE-5714 : small internal refactor to use varargs and remove redundant null initializations
          Hide
          David Smiley added a comment -

          Progress report:

          • I added an onslaught of randomized testing and it uncovered dateline edge cases (e.g. touches but doesn't cross the dateline) and world-wrap (-180 to 180) bugs that I fixed.
          • Simplified queries that weren't used in a geospatial context (no need to check dateline cross)
          • Removed supposed support for the Overlaps predicate, as it was written as identical to Intersects. This strategy never truly supported Overlaps even if it didn't complain. If someone wants it; create an issue and ideally upload the code too

          Next up is Similarity testing.

          Show
          David Smiley added a comment - Progress report: I added an onslaught of randomized testing and it uncovered dateline edge cases (e.g. touches but doesn't cross the dateline) and world-wrap (-180 to 180) bugs that I fixed. Simplified queries that weren't used in a geospatial context (no need to check dateline cross) Removed supposed support for the Overlaps predicate, as it was written as identical to Intersects. This strategy never truly supported Overlaps even if it didn't complain. If someone wants it; create an issue and ideally upload the code too Next up is Similarity testing.
          Hide
          David Smiley added a comment -

          This patch is really more than testing, it also includes LUCENE-5779 which is the change to AreaSimilarity's algorithm.

          Show
          David Smiley added a comment - This patch is really more than testing, it also includes LUCENE-5779 which is the change to AreaSimilarity's algorithm.
          Hide
          Ryan McKinley added a comment -

          +1 thanks

          Show
          Ryan McKinley added a comment - +1 thanks
          Hide
          Robert Muir added a comment -

          This isn't going to make 4.9

          Show
          Robert Muir added a comment - This isn't going to make 4.9
          Hide
          David Smiley added a comment -

          That's right Robert Muir; it was assigned as such prior to your release branch. I bumped it to 4.10.

          Show
          David Smiley added a comment - That's right Robert Muir ; it was assigned as such prior to your release branch. I bumped it to 4.10.
          Hide
          David Smiley added a comment -

          Another change to the API is, I think it's not needed to have a BBoxSimilarity interface. DistanceSimilarity can be tossed, and so could BBoxSimilarityValueSource. Instead, AreaSimilarity can be ShapeAreaValueSource that takes a ValueSource that produces shapes from it's objectVal(doc). This is in the same vein as DistanceToShapeValueSource. This underscores the pluggability with, say, SerializedDVStrategy with ValueSource's. It's plausible it will be faster to decode 4 numbers from a contiguous byte array than have to retrieve a number 4 times via DocValues. And the code shouldn't have to change accordingly – it's plug and play.

          Continuing this (definitely a separate JIRA issue), looking at the TODOs: these two methods move to SpatialStrategy:

          
            /**
             * Provides access to each rectangle per document as a ValueSource in which
             * {@link org.apache.lucene.queries.function.FunctionValues#objectVal(int)} returns a {@link
             * Shape}.
             */ //TODO raise to SpatialStrategy
            public ValueSource makeShapeValueSource() {
              return new BBoxValueSource(this);
            }
          
            @Override
            public ValueSource makeDistanceValueSource(Point queryPoint, double multiplier) {
              //TODO if makeShapeValueSource gets lifted to the top; this could become a generic impl.
              return new DistanceToShapeValueSource(makeShapeValueSource(), queryPoint, multiplier, ctx);
            }
          
          Show
          David Smiley added a comment - Another change to the API is, I think it's not needed to have a BBoxSimilarity interface. DistanceSimilarity can be tossed, and so could BBoxSimilarityValueSource. Instead, AreaSimilarity can be ShapeAreaValueSource that takes a ValueSource that produces shapes from it's objectVal(doc). This is in the same vein as DistanceToShapeValueSource. This underscores the pluggability with, say, SerializedDVStrategy with ValueSource's. It's plausible it will be faster to decode 4 numbers from a contiguous byte array than have to retrieve a number 4 times via DocValues. And the code shouldn't have to change accordingly – it's plug and play. Continuing this (definitely a separate JIRA issue), looking at the TODOs: these two methods move to SpatialStrategy: /** * Provides access to each rectangle per document as a ValueSource in which * {@link org.apache.lucene.queries.function.FunctionValues#objectVal( int )} returns a {@link * Shape}. */ //TODO raise to SpatialStrategy public ValueSource makeShapeValueSource() { return new BBoxValueSource( this ); } @Override public ValueSource makeDistanceValueSource(Point queryPoint, double multiplier) { //TODO if makeShapeValueSource gets lifted to the top; this could become a generic impl. return new DistanceToShapeValueSource(makeShapeValueSource(), queryPoint, multiplier, ctx); }
          Hide
          David Smiley added a comment -

          Latest patch:

          • BBoxSimilarity is gone; instead BBoxSimilarityValueSource is abstract (just one impl though)
          • Removed DistanceSimilarity as it's obsoleted by the generic DistanceToShapeValueSource introduced a couple months ago
          • AreaSimilarity renamed to BBoxOverlapRatioValueSource as it's a more meaningful name
          • BBoxOverlapRatioValueSource has a new minSideLength option that is applied to sides of the query, target, and intersection boxes. It's an optional way to handle point queries, which without this would basically match everything with the same score since there is no intersection area.
          • Added generic ShapeAreaValueSource (with geoArea boolean option) that basically just calls shape.getArea(). This is a good way of handling sorting the results of a point query for indexed rects.
          • setPrecisionType is gone; instead I'm trying a new scheme in which you get and set a FieldType. See LUCENE-5802. Use of DocValues is configurable and enabled by default.

          I think it's probably ready to be committed now.

          Show
          David Smiley added a comment - Latest patch: BBoxSimilarity is gone; instead BBoxSimilarityValueSource is abstract (just one impl though) Removed DistanceSimilarity as it's obsoleted by the generic DistanceToShapeValueSource introduced a couple months ago AreaSimilarity renamed to BBoxOverlapRatioValueSource as it's a more meaningful name BBoxOverlapRatioValueSource has a new minSideLength option that is applied to sides of the query, target, and intersection boxes. It's an optional way to handle point queries, which without this would basically match everything with the same score since there is no intersection area. Added generic ShapeAreaValueSource (with geoArea boolean option) that basically just calls shape.getArea(). This is a good way of handling sorting the results of a point query for indexed rects. setPrecisionType is gone; instead I'm trying a new scheme in which you get and set a FieldType. See LUCENE-5802 . Use of DocValues is configurable and enabled by default. I think it's probably ready to be committed now.
          Hide
          ASF subversion and git services added a comment -

          Commit 1608793 from David Smiley in branch 'dev/trunk'
          [ https://svn.apache.org/r1608793 ]

          LUCENE-5714, LUCENE-5779: Enhance BBoxStrategy & Overlap similarity. Configurable docValues / index usage.
          Add new ShapeAreaValueSource.

          Show
          ASF subversion and git services added a comment - Commit 1608793 from David Smiley in branch 'dev/trunk' [ https://svn.apache.org/r1608793 ] LUCENE-5714 , LUCENE-5779 : Enhance BBoxStrategy & Overlap similarity. Configurable docValues / index usage. Add new ShapeAreaValueSource.
          Hide
          ASF subversion and git services added a comment -

          Commit 1608987 from David Smiley in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1608987 ]

          LUCENE-5714, LUCENE-5779: Enhance BBoxStrategy & Overlap similarity. Configurable docValues / index usage.
          Add new ShapeAreaValueSource.

          Show
          ASF subversion and git services added a comment - Commit 1608987 from David Smiley in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1608987 ] LUCENE-5714 , LUCENE-5779 : Enhance BBoxStrategy & Overlap similarity. Configurable docValues / index usage. Add new ShapeAreaValueSource.
          Hide
          ASF subversion and git services added a comment -

          Commit 1608988 from David Smiley in branch 'dev/trunk'
          [ https://svn.apache.org/r1608988 ]

          LUCENE-5714: Remove unused private constants in BBoxStrategy

          Show
          ASF subversion and git services added a comment - Commit 1608988 from David Smiley in branch 'dev/trunk' [ https://svn.apache.org/r1608988 ] LUCENE-5714 : Remove unused private constants in BBoxStrategy
          Hide
          David Smiley added a comment -

          On additional thing I'd like to get in is to auto-extract the bounding box of a provided shape instead of throwing an error. It's not like this behavior would be unexpected; if you use this field it's quite intuitive that if you provide any old shape, it's going to index the bounding box. The current code is simply:

            public Field[] createIndexableFields(Shape shape) {
              if (shape instanceof Rectangle)
                return createIndexableFields((Rectangle)shape);
              throw new UnsupportedOperationException("Can only index a Rectangle, not " + shape);
            }
          

          Instead it would be:

            public Field[] createIndexableFields(Shape shape) {
              return createIndexableFields(shape.getBoundingBox());
            }
          

          Rectangle.getBoundingBox() returns "this", by the way.

          Show
          David Smiley added a comment - On additional thing I'd like to get in is to auto-extract the bounding box of a provided shape instead of throwing an error. It's not like this behavior would be unexpected; if you use this field it's quite intuitive that if you provide any old shape, it's going to index the bounding box. The current code is simply: public Field[] createIndexableFields(Shape shape) { if (shape instanceof Rectangle) return createIndexableFields((Rectangle)shape); throw new UnsupportedOperationException( "Can only index a Rectangle, not " + shape); } Instead it would be: public Field[] createIndexableFields(Shape shape) { return createIndexableFields(shape.getBoundingBox()); } Rectangle.getBoundingBox() returns "this", by the way.
          Hide
          ASF subversion and git services added a comment -

          Commit 1609134 from David Smiley in branch 'dev/trunk'
          [ https://svn.apache.org/r1609134 ]

          LUCENE-5714: TestBBoxStrategy needs to declare docValues requirement

          Show
          ASF subversion and git services added a comment - Commit 1609134 from David Smiley in branch 'dev/trunk' [ https://svn.apache.org/r1609134 ] LUCENE-5714 : TestBBoxStrategy needs to declare docValues requirement
          Hide
          ASF subversion and git services added a comment -

          Commit 1609188 from David Smiley in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1609188 ]

          LUCENE-5714: TestBBoxStrategy needs to declare docValues requirement

          Show
          ASF subversion and git services added a comment - Commit 1609188 from David Smiley in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1609188 ] LUCENE-5714 : TestBBoxStrategy needs to declare docValues requirement
          Hide
          ASF subversion and git services added a comment -

          Commit 1609209 from David Smiley in branch 'dev/trunk'
          [ https://svn.apache.org/r1609209 ]

          LUCENE-5714: Enhance TestBBoxStrategy with/without docvalues & indexed

          Show
          ASF subversion and git services added a comment - Commit 1609209 from David Smiley in branch 'dev/trunk' [ https://svn.apache.org/r1609209 ] LUCENE-5714 : Enhance TestBBoxStrategy with/without docvalues & indexed
          Hide
          ASF subversion and git services added a comment -

          Commit 1609210 from David Smiley in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1609210 ]

          LUCENE-5714: Enhance TestBBoxStrategy with/without docvalues & indexed

          Show
          ASF subversion and git services added a comment - Commit 1609210 from David Smiley in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1609210 ] LUCENE-5714 : Enhance TestBBoxStrategy with/without docvalues & indexed
          Hide
          ASF subversion and git services added a comment -

          Commit 1609468 from David Smiley in branch 'dev/trunk'
          [ https://svn.apache.org/r1609468 ]

          LUCENE-5714: BBoxStrategy should convert shapes to bounding box on indexing (but not search)

          Show
          ASF subversion and git services added a comment - Commit 1609468 from David Smiley in branch 'dev/trunk' [ https://svn.apache.org/r1609468 ] LUCENE-5714 : BBoxStrategy should convert shapes to bounding box on indexing (but not search)
          Hide
          ASF subversion and git services added a comment -

          Commit 1609469 from David Smiley in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1609469 ]

          LUCENE-5714: BBoxStrategy should convert shapes to bounding box on indexing (but not search)

          Show
          ASF subversion and git services added a comment - Commit 1609469 from David Smiley in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1609469 ] LUCENE-5714 : BBoxStrategy should convert shapes to bounding box on indexing (but not search)
          Hide
          ASF subversion and git services added a comment -

          Commit 1619163 from David Smiley in branch 'dev/trunk'
          [ https://svn.apache.org/r1619163 ]

          LUCENE-5714: TestBBoxStrategy work-around for Spatial4j Rect bug #85

          Show
          ASF subversion and git services added a comment - Commit 1619163 from David Smiley in branch 'dev/trunk' [ https://svn.apache.org/r1619163 ] LUCENE-5714 : TestBBoxStrategy work-around for Spatial4j Rect bug #85
          Hide
          ASF subversion and git services added a comment -

          Commit 1619165 from David Smiley in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1619165 ]

          LUCENE-5714: TestBBoxStrategy work-around for Spatial4j Rect bug #85

          Show
          ASF subversion and git services added a comment - Commit 1619165 from David Smiley in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1619165 ] LUCENE-5714 : TestBBoxStrategy work-around for Spatial4j Rect bug #85

            People

            • Assignee:
              David Smiley
              Reporter:
              David Smiley
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development