Lucene - Core
  1. Lucene - Core
  2. LUCENE-4419

Test RecursivePrefixTree indexing non-point data

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1, 5.0
    • Component/s: modules/spatial
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      RecursivePrefixTreeFilter was modified in ~July 2011 to support spatial filtering of non-point indexed shapes. It seems to work when playing with the capability but it isn't tested. It really needs to be as this is a major feature.

      I imagine an approach in which some randomly generated rectangles are indexed and then a randomly generated rectangle is queried. The right answer can be calculated brute-force and then compared with the filter. In order to deal with shape imprecision, the randomly generated shapes could be generated to fit a course grid (e.g. round everything to a 1 degree interval).

        Issue Links

          Activity

          Hide
          Chris Male added a comment -

          I really don't see the benefit of randomly generating Shapes. There isn't much to be revealed with a rectangle that say covers one small part of the pacific ocean and another rectangle which covers another small part. The number of possible Shapes is just too massive to ever reveal anything.

          What I feel would be better is if we defined Shapes that test particularly troublesome areas. Datelines, equators, poles. We can also include massive Shapes and tiny Shapes, circles, points, and whatever else we end up supporting.

          Having this standardized Shape suite would be a big benefit to testing all the Strategys. I don't think it would be particularly difficult to create and once created, it wouldn't require much maintenance at all.

          Show
          Chris Male added a comment - I really don't see the benefit of randomly generating Shapes. There isn't much to be revealed with a rectangle that say covers one small part of the pacific ocean and another rectangle which covers another small part. The number of possible Shapes is just too massive to ever reveal anything. What I feel would be better is if we defined Shapes that test particularly troublesome areas. Datelines, equators, poles. We can also include massive Shapes and tiny Shapes, circles, points, and whatever else we end up supporting. Having this standardized Shape suite would be a big benefit to testing all the Strategys. I don't think it would be particularly difficult to create and once created, it wouldn't require much maintenance at all.
          Hide
          David Smiley added a comment -

          I'm all for what you suggest – a test that could be used by multiple strategies. We're doing that already in fact in PortedSolr3Test. And the StrategyTestCase has methods that facilitate using test files of sample data, which is used by several tests such as TestPointVectorStrategy.

          I really don't see the benefit of randomly generating Shapes.

          I could have sworn you told me we should add that to the Spatial4j todo list.

          I like randomized tests because it can catch errors that a static test simply didn't test for. This helped out tremendously when I worked out the bugs in Circle-Rectangle intersection in Spatial4j.

          Show
          David Smiley added a comment - I'm all for what you suggest – a test that could be used by multiple strategies. We're doing that already in fact in PortedSolr3Test. And the StrategyTestCase has methods that facilitate using test files of sample data, which is used by several tests such as TestPointVectorStrategy. I really don't see the benefit of randomly generating Shapes. I could have sworn you told me we should add that to the Spatial4j todo list. I like randomized tests because it can catch errors that a static test simply didn't test for. This helped out tremendously when I worked out the bugs in Circle-Rectangle intersection in Spatial4j.
          Hide
          Chris Male added a comment -

          I'm all for what you suggest – a test that could be used by multiple strategies

          I didn't suggest that. I suggested a common suite of Shapes. I don't like the idea of having a single test for all Strategys since they work in different ways and support different things.

          I like randomized tests because it can catch errors that a static test simply didn't test for

          Theres a difference between randomized tests and randomized Shape generation (again I didn't suggest we stopped randomized testing). The world is massive, much of it isn't remotely interesting or challenging to our spatial implementations. Just generating arbitrary Shapes somewhere on the globe seems a total waste of time.

          If we have a standard set of Shapes then we can use randomized testing to handle the permutations between them, but we shouldn't waste days waiting for tests to hit an interesting Shape.

          Show
          Chris Male added a comment - I'm all for what you suggest – a test that could be used by multiple strategies I didn't suggest that. I suggested a common suite of Shapes. I don't like the idea of having a single test for all Strategys since they work in different ways and support different things. I like randomized tests because it can catch errors that a static test simply didn't test for Theres a difference between randomized tests and randomized Shape generation (again I didn't suggest we stopped randomized testing). The world is massive, much of it isn't remotely interesting or challenging to our spatial implementations. Just generating arbitrary Shapes somewhere on the globe seems a total waste of time. If we have a standard set of Shapes then we can use randomized testing to handle the permutations between them, but we shouldn't waste days waiting for tests to hit an interesting Shape.
          Hide
          David Smiley added a comment -

          The attached test SpatialOpRecursivePrefixTreeTest tests the RecursivePrefixTree extensively. It yielded the bugs in LUCENE-4585. It uses the QuadPrefixTree with random levels and random scan threshold for RecursivePrefixTreeFilter. Some randomly generated rectangles are asserted as a MUST match, and some are optional depending on the grid approximation.

          QuadPrefixTree was chosen because it supports non-geo, and this test has some simplistic logic in going from shape -> grid-snapped shape that would be more complicated in geospatial that ultimately RecursivePrefixTree doesn't care about any way – it's Spatial4j + GeohashPrefixTree that deal with that.

          This test added an evaluate() method to SpatialOperation which I found quite handy. With this in place and this test as a template, it should be easy to test for operations other than intersect once they are supported.

          Show
          David Smiley added a comment - The attached test SpatialOpRecursivePrefixTreeTest tests the RecursivePrefixTree extensively. It yielded the bugs in LUCENE-4585 . It uses the QuadPrefixTree with random levels and random scan threshold for RecursivePrefixTreeFilter. Some randomly generated rectangles are asserted as a MUST match, and some are optional depending on the grid approximation. QuadPrefixTree was chosen because it supports non-geo, and this test has some simplistic logic in going from shape -> grid-snapped shape that would be more complicated in geospatial that ultimately RecursivePrefixTree doesn't care about any way – it's Spatial4j + GeohashPrefixTree that deal with that. This test added an evaluate() method to SpatialOperation which I found quite handy. With this in place and this test as a template, it should be easy to test for operations other than intersect once they are supported.
          Hide
          David Smiley added a comment -

          Whoops; there's a stupid "if (i != 5) continue;" on line ~68 which I should have labelled with a NOCOMMIT that's in the patch.

          Show
          David Smiley added a comment - Whoops; there's a stupid "if (i != 5) continue;" on line ~68 which I should have labelled with a NOCOMMIT that's in the patch.
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] David Wayne Smiley
          http://svn.apache.org/viewvc?view=revision&revision=1418011

          LUCENE-4419: Test indexing non-point shapes; and add SpatialOperation.evaluate(s1,s2)

          Show
          Commit Tag Bot added a comment - [branch_4x commit] David Wayne Smiley http://svn.apache.org/viewvc?view=revision&revision=1418011 LUCENE-4419 : Test indexing non-point shapes; and add SpatialOperation.evaluate(s1,s2)
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] David Wayne Smiley
          http://svn.apache.org/viewvc?view=revision&revision=1418007

          LUCENE-4419: Test indexing non-point shapes; and add SpatialOperation.evaluate(s1,s2)

          Show
          Commit Tag Bot added a comment - [trunk commit] David Wayne Smiley http://svn.apache.org/viewvc?view=revision&revision=1418007 LUCENE-4419 : Test indexing non-point shapes; and add SpatialOperation.evaluate(s1,s2)
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] David Wayne Smiley
          http://svn.apache.org/viewvc?view=revision&revision=1418011

          LUCENE-4419: Test indexing non-point shapes; and add SpatialOperation.evaluate(s1,s2)

          Show
          Commit Tag Bot added a comment - [branch_4x commit] David Wayne Smiley http://svn.apache.org/viewvc?view=revision&revision=1418011 LUCENE-4419 : Test indexing non-point shapes; and add SpatialOperation.evaluate(s1,s2)
          Hide
          Uwe Schindler added a comment -

          Closed after release.

          Show
          Uwe Schindler added a comment - Closed after release.

            People

            • Assignee:
              David Smiley
              Reporter:
              David Smiley
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development