Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.1, 7.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Today the test hides possible failures by leniently handling quantization issues.

      We should fix it to do what geo2d tests now do: pre-quantized indexed points, but don't quantize query shapes.

      1. LUCENE-7168.patch
        5 kB
        Michael McCandless
      2. LUCENE-7168.patch
        11 kB
        Michael McCandless
      3. LUCENE-7168.patch
        57 kB
        Michael McCandless
      4. LUCENE-7168.patch
        41 kB
        Michael McCandless
      5. LUCENE-7168.patch
        40 kB
        Michael McCandless
      6. LUCENE-7168.patch
        13 kB
        Michael McCandless
      7. LUCENE-7168.patch
        12 kB
        Michael McCandless

        Activity

        Hide
        rcmuir Robert Muir added a comment -

        It seems also that the current test design (testing unquantized values, and trying to "correct" for quantization to satisfy that) may cause needless complexity in Geo3D?

        Because the query then tries to "correct for quantization", it computes a min/max in all 3 dimensions, and then passes these 6 values to a solid factory that returns one of 8 possible xyzsolid implementations that try to "correct" for it.

        Show
        rcmuir Robert Muir added a comment - It seems also that the current test design (testing unquantized values, and trying to "correct" for quantization to satisfy that) may cause needless complexity in Geo3D? Because the query then tries to "correct for quantization", it computes a min/max in all 3 dimensions, and then passes these 6 values to a solid factory that returns one of 8 possible xyzsolid implementations that try to "correct" for it.
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        Hi Robert,

        The reason for the 8 implementations is because there are degeneracy conditions. Specifically, when a dimension is immeasurably small, we still need to determine set membership but only when it is on the plane that represents that dimension. it's dicey to use two planes that are right on top of each other for determining "in set" in that case. There is a degeneracy condition for a single point as well.

        This is independent of the BKD implementation's quantization effects; Geo3D would want to do this anyway. There are similar issues with bounding boxes that have degeneracies of various kinds.

        Show
        kwright@metacarta.com Karl Wright added a comment - Hi Robert, The reason for the 8 implementations is because there are degeneracy conditions. Specifically, when a dimension is immeasurably small, we still need to determine set membership but only when it is on the plane that represents that dimension. it's dicey to use two planes that are right on top of each other for determining "in set" in that case. There is a degeneracy condition for a single point as well. This is independent of the BKD implementation's quantization effects; Geo3D would want to do this anyway. There are similar issues with bounding boxes that have degeneracies of various kinds.
        Hide
        rcmuir Robert Muir added a comment -

        I don't see how it can be immeasurably small. We use a 32-bit quantization so we know exactly how small things can be?

        Show
        rcmuir Robert Muir added a comment - I don't see how it can be immeasurably small. We use a 32-bit quantization so we know exactly how small things can be?
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        "Immeasurably" means within the resolution of the mathematics underlying planes, given the numerical resolution of a double. It is summarized with the constant Vector.MINIMUM_RESOLUTION.

        Show
        kwright@metacarta.com Karl Wright added a comment - "Immeasurably" means within the resolution of the mathematics underlying planes, given the numerical resolution of a double. It is summarized with the constant Vector.MINIMUM_RESOLUTION.
        Hide
        rcmuir Robert Muir added a comment -

        But we do not have the numeric resolution of a double. It is less, only 32 bits.

        Show
        rcmuir Robert Muir added a comment - But we do not have the numeric resolution of a double. It is less, only 32 bits.
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        That's an implementation detail. All of the stuff in spatial3D/geom knows nothing about that and operates from strict principles of the math alone.

        Show
        kwright@metacarta.com Karl Wright added a comment - That's an implementation detail. All of the stuff in spatial3D/geom knows nothing about that and operates from strict principles of the math alone.
        Hide
        rcmuir Robert Muir added a comment -

        Its an important implementation detail. If we can make simplifications around things like this, then I think we should.

        We can't just let quantization only make things more complicated.

        Show
        rcmuir Robert Muir added a comment - Its an important implementation detail. If we can make simplifications around things like this, then I think we should. We can't just let quantization only make things more complicated.
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        Handling geometric degeneracy conditions properly, I agree, should happen regardless of what is being done elsewhere in Lucene.

        Show
        kwright@metacarta.com Karl Wright added a comment - Handling geometric degeneracy conditions properly, I agree, should happen regardless of what is being done elsewhere in Lucene.
        Hide
        rcmuir Robert Muir added a comment -

        I think this is the wrong way to look at it? If some of these situations are merely theoretical and cannot actually happen in lucene, then why do we need to support them?

        Show
        rcmuir Robert Muir added a comment - I think this is the wrong way to look at it? If some of these situations are merely theoretical and cannot actually happen in lucene, then why do we need to support them?
        Hide
        rcmuir Robert Muir added a comment -

        And I have not tested the numbers involved either: I am just looking to simplify.

        Show
        rcmuir Robert Muir added a comment - And I have not tested the numbers involved either: I am just looking to simplify.
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        While XYZSolids are used by Lucene for BKD trees, I cannot say whether or not they would never be used by anyone else. The functionality is general, after all. In fact, the functionality was introduced because we had issues without it from the BKD tree implementation. Perhaps those problems are fixed but whether degeneracy support for that use case is needed still depends strongly on the relative values of the "known" minimum 32-bit resolution of the tree and the actual value of Vector.MINIMUM_RESOLUTION. So at the moment I'm not even convinced that the BKD implementation doesn't need degeneracy support.

        I already made all the variants of XYZSolids package-private and lucene.internal. Is this enough to hide this complication from users?

        Show
        kwright@metacarta.com Karl Wright added a comment - While XYZSolids are used by Lucene for BKD trees, I cannot say whether or not they would never be used by anyone else. The functionality is general, after all. In fact, the functionality was introduced because we had issues without it from the BKD tree implementation. Perhaps those problems are fixed but whether degeneracy support for that use case is needed still depends strongly on the relative values of the "known" minimum 32-bit resolution of the tree and the actual value of Vector.MINIMUM_RESOLUTION. So at the moment I'm not even convinced that the BKD implementation doesn't need degeneracy support. I already made all the variants of XYZSolids package-private and lucene.internal. Is this enough to hide this complication from users?
        Hide
        rcmuir Robert Muir added a comment -

        While XYZSolids are used by Lucene for BKD trees, I cannot say whether or not they would never be used by anyone else. The functionality is general, after all.

        Right but I think we should try to simplify and reduce to the subset of logic that lucene needs to index and run its queries. We sorta have to do this, otherwise everything is completely open-ended ("well someone MIGHT have a use case") and we can't do APIs that way.

        I already made all the variants of XYZSolids package-private and lucene.internal. Is this enough to hide this complication from users?

        This is a good step and will at least defer me from trying too hard to simplify this stuff for now. Because it at least hides all this complexity from the user and that is the worst of the worst. Having some hairy stuff that is package-private is much less evil.

        Show
        rcmuir Robert Muir added a comment - While XYZSolids are used by Lucene for BKD trees, I cannot say whether or not they would never be used by anyone else. The functionality is general, after all. Right but I think we should try to simplify and reduce to the subset of logic that lucene needs to index and run its queries. We sorta have to do this, otherwise everything is completely open-ended ("well someone MIGHT have a use case") and we can't do APIs that way. I already made all the variants of XYZSolids package-private and lucene.internal. Is this enough to hide this complication from users? This is a good step and will at least defer me from trying too hard to simplify this stuff for now. Because it at least hides all this complexity from the user and that is the worst of the worst. Having some hairy stuff that is package-private is much less evil.
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        Right but I think we should try to simplify and reduce to the subset of logic that lucene needs to index and run its queries. We sorta have to do this, otherwise everything is completely open-ended ("well someone MIGHT have a use case") and we can't do APIs that way.

        Yes, but it's just another shape. Presumably you would not want to peel all of the shapes that aren't explicitly used by lucene-core out of the package? I'd make the case for it on that basis alone.

        If the "shape API footprint" is worrying you, we should discuss what shapes you think people will want to search for and what they won't, and then come up with a place for the shapes you don't like to live. I'm less willing to judge that up front, however.

        Show
        kwright@metacarta.com Karl Wright added a comment - Right but I think we should try to simplify and reduce to the subset of logic that lucene needs to index and run its queries. We sorta have to do this, otherwise everything is completely open-ended ("well someone MIGHT have a use case") and we can't do APIs that way. Yes, but it's just another shape. Presumably you would not want to peel all of the shapes that aren't explicitly used by lucene-core out of the package? I'd make the case for it on that basis alone. If the "shape API footprint" is worrying you, we should discuss what shapes you think people will want to search for and what they won't, and then come up with a place for the shapes you don't like to live. I'm less willing to judge that up front, however.
        Hide
        rcmuir Robert Muir added a comment -

        I think (ultimately) we should have Geo3DPoint.java and whatever it uses in its newXXXQuery() methods is all we need, nothing more

        Remember we can always choose to expose more advanced functionality at any time. We can revive things from git and so on. We can try to repackage things (but we must be careful this doesnt cause more harm than good) to be better.

        We can add new methods like the newPathQuery (which seems useful and general) that are things Geo3D can do that nobody else can do: stuff like this is great.

        We can also try to start rounding out an expert API that does more stuff: make Custom3DPoint or Advanced3DPoint or whatever we want to call it that is even more expert: e.g. custom planet models/distance metrics/crazy shapes. I just think we should restrict ourselves to real use-cases there too to keep things bounded.

        I do think we have to err towards simplicity rather than worrying about experts. But we can still make expert stuff possible.

        Show
        rcmuir Robert Muir added a comment - I think (ultimately) we should have Geo3DPoint.java and whatever it uses in its newXXXQuery() methods is all we need, nothing more Remember we can always choose to expose more advanced functionality at any time. We can revive things from git and so on. We can try to repackage things (but we must be careful this doesnt cause more harm than good) to be better. We can add new methods like the newPathQuery (which seems useful and general) that are things Geo3D can do that nobody else can do: stuff like this is great. We can also try to start rounding out an expert API that does more stuff: make Custom3DPoint or Advanced3DPoint or whatever we want to call it that is even more expert: e.g. custom planet models/distance metrics/crazy shapes. I just think we should restrict ourselves to real use-cases there too to keep things bounded. I do think we have to err towards simplicity rather than worrying about experts. But we can still make expert stuff possible.
        Hide
        nknize Nicholas Knize added a comment -

        e.g. custom planet models/distance metrics/crazy shapes

        Projections would be useful - and BKD is the right structure to handle them. We need to be careful where we put a lot of this logic though. There are other Apache projects that already do this kind of stuff very well (and maintain the logic along with the EPSG database). I'm not sure we want to necessarily duplicate this work. Support for some of this more "expert capability" makes more sense in spatial-extras where we can bring in third-party support (like SIS, ESRI).

        Show
        nknize Nicholas Knize added a comment - e.g. custom planet models/distance metrics/crazy shapes Projections would be useful - and BKD is the right structure to handle them. We need to be careful where we put a lot of this logic though. There are other Apache projects that already do this kind of stuff very well (and maintain the logic along with the EPSG database). I'm not sure we want to necessarily duplicate this work. Support for some of this more "expert capability" makes more sense in spatial-extras where we can bring in third-party support (like SIS, ESRI).
        Hide
        mikemccand Michael McCandless added a comment -

        Here's a patch to remove query-time quantizing, threads, and the "small" boolean. Tests seem to survive beasting so far ...

        Show
        mikemccand Michael McCandless added a comment - Here's a patch to remove query-time quantizing, threads, and the "small" boolean. Tests seem to survive beasting so far ...
        Hide
        mikemccand Michael McCandless added a comment -

        Another patch, just improving the failure message, and beasting uncovered a failure!

           [junit4] Suite: org.apache.lucene.spatial3d.TestGeo3DPoint
           [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeo3DPoint -Dtests.method=testRandomMedium -Dtests.seed=1B44C9C6ED816CF9 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=pl-PL -Dtests.timezone=Asia/Damascus -Dtests.asserts=true -Dtests.file.encoding=UTF-8
           [junit4] FAILURE 0.68s | TestGeo3DPoint.testRandomMedium <<<
           [junit4]    > Throwable #1: java.lang.AssertionError: FAIL: id=147 should have matched but did not
           [junit4]    >   shape=GeoPath: {planetmodel=PlanetModel.WGS84, width=0.9250245035569946(53.0), points={[[lat=-0.31202581053256323, lon=-2.3769151553645593], [lat=-0.04559091474230697, lon=0.3560334442429531], [lat=-0.6470509953978505, lon=1.0469362861181948], [lat=-0.685120355322366, lon=1.6043555643894918], [lat=-1.568917110871099, lon=1.008031992795193]]}}
           [junit4]    >   lat=2.3507843431772453 lon=127.59305012508766
           [junit4]    >   point=[lat=0.04102892679277523, lon=2.2269188273449423]
           [junit4]    >   docID=133 deleted?=false
           [junit4]    >   query=PointInGeo3DShapeQuery: field=point: Shape: GeoPath: {planetmodel=PlanetModel.WGS84, width=0.9250245035569946(53.0), points={[[lat=-0.31202581053256323, lon=-2.3769151553645593], [lat=-0.04559091474230697, lon=0.3560334442429531], [lat=-0.6470509953978505, lon=1.0469362861181948], [lat=-0.685120355322366, lon=1.6043555643894918], [lat=-1.568917110871099, lon=1.008031992795193]]}}
           [junit4]    > 	at __randomizedtesting.SeedInfo.seed([1B44C9C6ED816CF9:A69AFE6EACE40F9F]:0)
           [junit4]    > 	at org.apache.lucene.spatial3d.TestGeo3DPoint.verify(TestGeo3DPoint.java:741)
           [junit4]    > 	at org.apache.lucene.spatial3d.TestGeo3DPoint.doTestRandom(TestGeo3DPoint.java:529)
           [junit4]    > 	at org.apache.lucene.spatial3d.TestGeo3DPoint.testRandomMedium(TestGeo3DPoint.java:456)
           [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
           [junit4]   2> NOTE: test params are: codec=Asserting(Lucene60): {id=PostingsFormat(name=Asserting)}, docValues:{id=DocValuesFormat(name=Lucene54)}, maxPointsInLeafNode=1169, maxMBSortInHeap=6.9707915551209005, sim=ClassicSimilarity, locale=pl-PL, timezone=Asia/Damascus
           [junit4]   2> NOTE: Linux 3.13.0-71-generic amd64/Oracle Corporation 1.8.0_60 (64-bit)/cpus=8,threads=1,free=456488064,total=504889344
           [junit4]   2> NOTE: All tests run in this JVM: [TestGeo3DPoint]
           [junit4] Completed [1/1 (1!)] in 0.84s, 1 test, 1 failure <<< FAILURES!
        
        Show
        mikemccand Michael McCandless added a comment - Another patch, just improving the failure message, and beasting uncovered a failure! [junit4] Suite: org.apache.lucene.spatial3d.TestGeo3DPoint [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeo3DPoint -Dtests.method=testRandomMedium -Dtests.seed=1B44C9C6ED816CF9 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=pl-PL -Dtests.timezone=Asia/Damascus -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 0.68s | TestGeo3DPoint.testRandomMedium <<< [junit4] > Throwable #1: java.lang.AssertionError: FAIL: id=147 should have matched but did not [junit4] > shape=GeoPath: {planetmodel=PlanetModel.WGS84, width=0.9250245035569946(53.0), points={[[lat=-0.31202581053256323, lon=-2.3769151553645593], [lat=-0.04559091474230697, lon=0.3560334442429531], [lat=-0.6470509953978505, lon=1.0469362861181948], [lat=-0.685120355322366, lon=1.6043555643894918], [lat=-1.568917110871099, lon=1.008031992795193]]}} [junit4] > lat=2.3507843431772453 lon=127.59305012508766 [junit4] > point=[lat=0.04102892679277523, lon=2.2269188273449423] [junit4] > docID=133 deleted?=false [junit4] > query=PointInGeo3DShapeQuery: field=point: Shape: GeoPath: {planetmodel=PlanetModel.WGS84, width=0.9250245035569946(53.0), points={[[lat=-0.31202581053256323, lon=-2.3769151553645593], [lat=-0.04559091474230697, lon=0.3560334442429531], [lat=-0.6470509953978505, lon=1.0469362861181948], [lat=-0.685120355322366, lon=1.6043555643894918], [lat=-1.568917110871099, lon=1.008031992795193]]}} [junit4] > at __randomizedtesting.SeedInfo.seed([1B44C9C6ED816CF9:A69AFE6EACE40F9F]:0) [junit4] > at org.apache.lucene.spatial3d.TestGeo3DPoint.verify(TestGeo3DPoint.java:741) [junit4] > at org.apache.lucene.spatial3d.TestGeo3DPoint.doTestRandom(TestGeo3DPoint.java:529) [junit4] > at org.apache.lucene.spatial3d.TestGeo3DPoint.testRandomMedium(TestGeo3DPoint.java:456) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: test params are: codec=Asserting(Lucene60): {id=PostingsFormat(name=Asserting)}, docValues:{id=DocValuesFormat(name=Lucene54)}, maxPointsInLeafNode=1169, maxMBSortInHeap=6.9707915551209005, sim=ClassicSimilarity, locale=pl-PL, timezone=Asia/Damascus [junit4] 2> NOTE: Linux 3.13.0-71-generic amd64/Oracle Corporation 1.8.0_60 (64-bit)/cpus=8,threads=1,free=456488064,total=504889344 [junit4] 2> NOTE: All tests run in this JVM: [TestGeo3DPoint] [junit4] Completed [1/1 (1!)] in 0.84s, 1 test, 1 failure <<< FAILURES!
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        Michael McCandless: What is the x,y,z of the quantized point?

        Show
        kwright@metacarta.com Karl Wright added a comment - Michael McCandless : What is the x,y,z of the quantized point?
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        Michael McCandless: Looking at the change in the test code, all you did is remove the quantization from the check. I think what you need to do is the following:

        • Quantize the points up front
        • Check ONLY the quantized points, not the raw lat/lon points, when verifying the correctness of the search

        The test as you have it now is guaranteed to find spurious failures that are solely due to quantization effects.

        Show
        kwright@metacarta.com Karl Wright added a comment - Michael McCandless : Looking at the change in the test code, all you did is remove the quantization from the check. I think what you need to do is the following: Quantize the points up front Check ONLY the quantized points, not the raw lat/lon points, when verifying the correctness of the search The test as you have it now is guaranteed to find spurious failures that are solely due to quantization effects.
        Hide
        mikemccand Michael McCandless added a comment -

        Looking at the change in the test code, all you did is remove the quantization from the check.

        Wait: I also made a change to pre-quantize all points up front, at the top of the verify method, so that we only index quantized points.

        At search time, I do not quantize the random lat/lon used to generate shapes ...

        Show
        mikemccand Michael McCandless added a comment - Looking at the change in the test code, all you did is remove the quantization from the check. Wait: I also made a change to pre-quantize all points up front, at the top of the verify method, so that we only index quantized points. At search time, I do not quantize the random lat/lon used to generate shapes ...
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        Right, but when you create the GeoPoint to test with, you use a lat/lon. You really should be using the (x,y,z) values from the pre-quantization.

        Show
        kwright@metacarta.com Karl Wright added a comment - Right, but when you create the GeoPoint to test with, you use a lat/lon. You really should be using the (x,y,z) values from the pre-quantization.
        Hide
        mikemccand Michael McCandless added a comment -

        OK I'll fixup the patch to do the quantizing in x/y/z space instead of lat/lon space.

        Show
        mikemccand Michael McCandless added a comment - OK I'll fixup the patch to do the quantizing in x/y/z space instead of lat/lon space.
        Hide
        rcmuir Robert Muir added a comment -

        we can start by just testing that the query's logic within x/y/z space is correct (e.g. consistent with what would happen if it just returned CELL_CROSSES_QUERY from compare() for every call)?

        This at least tells us that compare() is consistent with visit().

        Then separately, we test that 2D -> 3D conversion works as we expect (these can be unit tests for GeoPoint.java).
        And separately test that visit() for a single point really does what we think it should (maybe tests against specific 3D shapes, etc)

        Unfortunately this means we can't reuse all our 2D testing infra for Geo3D, but I think it would be complicated if we tried?

        Show
        rcmuir Robert Muir added a comment - we can start by just testing that the query's logic within x/y/z space is correct (e.g. consistent with what would happen if it just returned CELL_CROSSES_QUERY from compare() for every call)? This at least tells us that compare() is consistent with visit(). Then separately, we test that 2D -> 3D conversion works as we expect (these can be unit tests for GeoPoint.java). And separately test that visit() for a single point really does what we think it should (maybe tests against specific 3D shapes, etc) Unfortunately this means we can't reuse all our 2D testing infra for Geo3D, but I think it would be complicated if we tried?
        Hide
        mikemccand Michael McCandless added a comment -

        Work-in-progress, dirty patch, fixing (I think) the test quantization.

        I also attempted to fix geo3d's encode/decode to look like LatLonPoint, and added some encode/decode tests, however TestGeo3DPoint.testEncodeDecodeIsStable is failing!

        I also moved some "called only by tests" methods into the test cases, and "called only by the one query" methods into the query.

        Show
        mikemccand Michael McCandless added a comment - Work-in-progress, dirty patch, fixing (I think) the test quantization. I also attempted to fix geo3d's encode/decode to look like LatLonPoint , and added some encode/decode tests, however TestGeo3DPoint.testEncodeDecodeIsStable is failing! I also moved some "called only by tests" methods into the test cases, and "called only by the one query" methods into the query.
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        What are the values that fail?
        It should be straightforward to debug why they fail if we know some examples.

        Show
        kwright@metacarta.com Karl Wright added a comment - What are the values that fail? It should be straightforward to debug why they fail if we know some examples.
        Hide
        mikemccand Michael McCandless added a comment -

        OK, some progress (new patch).

        I created a silly method to find a "good" double value for "quantizing to int" purposes:

          /** Returns a double value >= x such that if you multiply that value by an int, and then
           *  divide it by that int again, you get precisely the same value back */
        

        If we use such a value, then the encode/decode is stable. E.g. for WGS84, the planetMax is 1.0011188539924791, and the nextSafeDouble after that value is 1.0011191368103027.

        But there is still one problem that I'm trying to work out, which is that if you encode the min value (-planetMax), and then decode that, you get back a value LESS than -planetMax, because of the flooring we do, and it's not guaranteed the planetMax would "be" at a floor boundary. Somehow, magically, LatLonPoint's min values seem to work

        Other tests are passing, except with this patch I hit this failure on beasting:

           [junit4] Started J0 PID(5288@localhost).
           [junit4] Suite: org.apache.lucene.spatial3d.TestGeo3DPoint
           [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeo3DPoint -Dtests.method=testRandomBig -Dtests.seed=3AD1F04EDC75F5CC -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=ru-RU -Dtests.timezone=America/Grand_Turk -Dtests.asserts=true -Dtests.file.encoding=UTF-8
           [junit4] FAILURE 5.05s | TestGeo3DPoint.testRandomBig <<<
           [junit4]    > Throwable #1: java.lang.AssertionError: FAIL: id=541300 should have matched but did not
           [junit4]    >   shape=GeoStandardPath: {planetmodel=PlanetModel.WGS84, width=0.3665191429188092(21.0), points={[[lat=-0.9764909432864418, lon=1.5923866750547349], [lat=-0.044905310521497856, lon=0.543041341355748], [lat=-1.50783500438853, lon=-0.8877147957518042]]}}
           [junit4]    >   point=[X=0.8653002868649471, Y=0.50134342478497, Z=0.046203414829601996]
           [junit4]    >   docID=539356 deleted?=false
           [junit4]    >   query=PointInGeo3DShapeQuery: field=point: Shape: GeoStandardPath: {planetmodel=PlanetModel.WGS84, width=0.3665191429188092(21.0), points={[[lat=-0.9764909432864418, lon=1.5923866750547349], [lat=-0.044905310521497856, lon=0.543041341355748], [lat=-1.50783500438853, lon=-0.8877147957518042]]}}
           [junit4]    > 	at __randomizedtesting.SeedInfo.seed([3AD1F04EDC75F5CC:BD868DC14D2C894C]:0)
           [junit4]    > 	at org.apache.lucene.spatial3d.TestGeo3DPoint.verify(TestGeo3DPoint.java:730)
           [junit4]    > 	at org.apache.lucene.spatial3d.TestGeo3DPoint.doTestRandom(TestGeo3DPoint.java:530)
           [junit4]    > 	at org.apache.lucene.spatial3d.TestGeo3DPoint.testRandomBig(TestGeo3DPoint.java:462)
           [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
           [junit4]   2> NOTE: leaving temporary files on disk at: /l/geo3dquant/lucene/build/spatial3d/test/J0/temp/lucene.spatial3d.TestGeo3DPoint_3AD1F04EDC75F5CC-001
           [junit4]   2> NOTE: test params are: codec=Asserting(Lucene60): {id=TestBloomFilteredLucenePostings(BloomFilteringPostingsFormat(Lucene50(blocksize=128)))}, docValues:{id=DocValuesFormat(name=Asserting)}, maxPointsInLeafNode=357, maxMBSortInHeap=5.860921451607786, sim=ClassicSimilarity, locale=ru-RU, timezone=America/Grand_Turk
           [junit4]   2> NOTE: Linux 3.13.0-71-generic amd64/Oracle Corporation 1.8.0_60 (64-bit)/cpus=8,threads=1,free=407941736,total=492306432
           [junit4]   2> NOTE: All tests run in this JVM: [TestGeo3DPoint]
           [junit4] Completed [1/1 (1!)] in 5.23s, 1 test, 1 failure <<< FAILURES!
        

        I haven't dug into it yet ...

        Show
        mikemccand Michael McCandless added a comment - OK, some progress (new patch). I created a silly method to find a "good" double value for "quantizing to int" purposes: /** Returns a double value >= x such that if you multiply that value by an int, and then * divide it by that int again, you get precisely the same value back */ If we use such a value, then the encode/decode is stable. E.g. for WGS84, the planetMax is 1.0011188539924791, and the nextSafeDouble after that value is 1.0011191368103027. But there is still one problem that I'm trying to work out, which is that if you encode the min value ( -planetMax ), and then decode that, you get back a value LESS than -planetMax , because of the flooring we do, and it's not guaranteed the planetMax would "be" at a floor boundary. Somehow, magically, LatLonPoint 's min values seem to work Other tests are passing, except with this patch I hit this failure on beasting: [junit4] Started J0 PID(5288@localhost). [junit4] Suite: org.apache.lucene.spatial3d.TestGeo3DPoint [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeo3DPoint -Dtests.method=testRandomBig -Dtests.seed=3AD1F04EDC75F5CC -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=ru-RU -Dtests.timezone=America/Grand_Turk -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 5.05s | TestGeo3DPoint.testRandomBig <<< [junit4] > Throwable #1: java.lang.AssertionError: FAIL: id=541300 should have matched but did not [junit4] > shape=GeoStandardPath: {planetmodel=PlanetModel.WGS84, width=0.3665191429188092(21.0), points={[[lat=-0.9764909432864418, lon=1.5923866750547349], [lat=-0.044905310521497856, lon=0.543041341355748], [lat=-1.50783500438853, lon=-0.8877147957518042]]}} [junit4] > point=[X=0.8653002868649471, Y=0.50134342478497, Z=0.046203414829601996] [junit4] > docID=539356 deleted?=false [junit4] > query=PointInGeo3DShapeQuery: field=point: Shape: GeoStandardPath: {planetmodel=PlanetModel.WGS84, width=0.3665191429188092(21.0), points={[[lat=-0.9764909432864418, lon=1.5923866750547349], [lat=-0.044905310521497856, lon=0.543041341355748], [lat=-1.50783500438853, lon=-0.8877147957518042]]}} [junit4] > at __randomizedtesting.SeedInfo.seed([3AD1F04EDC75F5CC:BD868DC14D2C894C]:0) [junit4] > at org.apache.lucene.spatial3d.TestGeo3DPoint.verify(TestGeo3DPoint.java:730) [junit4] > at org.apache.lucene.spatial3d.TestGeo3DPoint.doTestRandom(TestGeo3DPoint.java:530) [junit4] > at org.apache.lucene.spatial3d.TestGeo3DPoint.testRandomBig(TestGeo3DPoint.java:462) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: leaving temporary files on disk at: /l/geo3dquant/lucene/build/spatial3d/test/J0/temp/lucene.spatial3d.TestGeo3DPoint_3AD1F04EDC75F5CC-001 [junit4] 2> NOTE: test params are: codec=Asserting(Lucene60): {id=TestBloomFilteredLucenePostings(BloomFilteringPostingsFormat(Lucene50(blocksize=128)))}, docValues:{id=DocValuesFormat(name=Asserting)}, maxPointsInLeafNode=357, maxMBSortInHeap=5.860921451607786, sim=ClassicSimilarity, locale=ru-RU, timezone=America/Grand_Turk [junit4] 2> NOTE: Linux 3.13.0-71-generic amd64/Oracle Corporation 1.8.0_60 (64-bit)/cpus=8,threads=1,free=407941736,total=492306432 [junit4] 2> NOTE: All tests run in this JVM: [TestGeo3DPoint] [junit4] Completed [1/1 (1!)] in 5.23s, 1 test, 1 failure <<< FAILURES! I haven't dug into it yet ...
        Hide
        mikemccand Michael McCandless added a comment -

        OK, another patch, addressing all nocommits. I think it's nearly
        ready... I had to an "if" to decode to never go below -planetMax.

        However the above failure still happens ...

        So I've improved this test so that on failure, it runs a crazy (test
        only) IntersectVisitor that "explains" what happened to this one
        hit. For this failure, it now produces output like this:

           [junit4] Started J0 PID(11250@localhost).
           [junit4] Suite: org.apache.lucene.spatial3d.TestGeo3DPoint
           [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeo3DPoint -Dtests.method=testRandomBig -Dtests.seed=3AD1F04EDC75F5CC -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=ru-RU -Dtests.timezone=America/Grand_Turk -Dtests.asserts=true -Dtests.file.encoding=UTF-8
           [junit4] FAILURE 5.47s | TestGeo3DPoint.testRandomBig <<<
           [junit4]    > Throwable #1: java.lang.AssertionError: FAIL: id=541300 should have matched but did not
           [junit4]    >   shape=GeoStandardCircle: {planetmodel=PlanetModel.WGS84, center=[lat=-0.8971654677124566, lon=-0.3398482030102755], radius=1.4775317506492547(84.65633340877824)}
           [junit4]    >   point=[X=0.8653002868649471, Y=0.50134342478497, Z=0.046203414829601996]
           [junit4]    >   docID=538760 deleted?=false
           [junit4]    >   query=PointInGeo3DShapeQuery: field=point: Shape: GeoStandardCircle: {planetmodel=PlanetModel.WGS84, center=[lat=-0.8971654677124566, lon=-0.3398482030102755], radius=1.4775317506492547(84.65633340877824)}
           [junit4]    >   explanation:
           [junit4]    >     target is in leaf _2u(7.0.0):c1 of full reader StandardDirectoryReader(segments:210:nrt _12(7.0.0):c198337/1995:delGen=1 _24(7.0.0):c199685/601:delGen=1 _23(7.0.0):c5413/17:delGen=2 _25(7.0.0):c5413/12:delGen=1 _26(7.0.0):c5413/10:delGen=1 _27(7.0.0):c5413/10:delGen=1 _28(7.0.0):c5413/17:delGen=2 _29(7.0.0):c5413/8:delGen=2 _2a(7.0.0):c5413/12:delGen=1 _2b(7.0.0):c5413/14:delGen=1 _2c(7.0.0):c5413/4:delGen=1 _2d(7.0.0):c5413/13:delGen=2 _2e(7.0.0):c5413/12:delGen=2 _2f(7.0.0):c5413/15:delGen=2 _2g(7.0.0):c5413/10:delGen=2 _2h(7.0.0):c5413/9:delGen=2 _2i(7.0.0):c5413/6:delGen=1 _2j(7.0.0):c5413/9:delGen=1 _2k(7.0.0):c5413/8:delGen=1 _2l(7.0.0):c5413/5:delGen=1 _2m(7.0.0):c5413/3:delGen=1 _2n(7.0.0):c5413/4:delGen=1 _2o(7.0.0):c5413 _2p(7.0.0):c5413/5:delGen=1 _2q(7.0.0):c5413/1:delGen=1 _2r(7.0.0):c5413 _2s(7.0.0):c5413 _2t(7.0.0):c5413 _2u(7.0.0):c1)
           [junit4]    >     full BKD path to target doc:
           [junit4]    >       Cell(x=0.8653002866318559 TO 0.8653002870980383 y=0.5013434245518787 TO 0.5013434250180612 z=0.04620341459651078 TO 0.04620341506269321)
           [junit4]    >     on cell Cell(x=0.8653002866318559 TO 0.8653002870980383 y=0.5013434245518787 TO 0.5013434250180612 z=0.04620341459651078 TO 0.04620341506269321), wrapped visitor returned CELL_OUTSIDE_QUERY
           [junit4]    > 	at __randomizedtesting.SeedInfo.seed([3AD1F04EDC75F5CC:BD868DC14D2C894C]:0)
           [junit4]    > 	at org.apache.lucene.spatial3d.TestGeo3DPoint.verify(TestGeo3DPoint.java:741)
           [junit4]    > 	at org.apache.lucene.spatial3d.TestGeo3DPoint.doTestRandom(TestGeo3DPoint.java:537)
           [junit4]    > 	at org.apache.lucene.spatial3d.TestGeo3DPoint.testRandomBig(TestGeo3DPoint.java:469)
           [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
           [junit4]   2> NOTE: leaving temporary files on disk at: /l/geo3dquant/lucene/build/spatial3d/test/J0/temp/lucene.spatial3d.TestGeo3DPoint_3AD1F04EDC75F5CC-026
           [junit4]   2> NOTE: test params are: codec=Asserting(Lucene60): {id=TestBloomFilteredLucenePostings(BloomFilteringPostingsFormat(Lucene50(blocksize=128)))}, docValues:{id=DocValuesFormat(name=Asserting)}, maxPointsInLeafNode=357, maxMBSortInHeap=5.860921451607786, sim=ClassicSimilarity, locale=ru-RU, timezone=America/Grand_Turk
           [junit4]   2> NOTE: Linux 3.13.0-71-generic amd64/Oracle Corporation 1.8.0_60 (64-bit)/cpus=8,threads=1,free=410208728,total=495452160
           [junit4]   2> NOTE: All tests run in this JVM: [TestGeo3DPoint]
           [junit4] Completed [1/1 (1!)] in 5.63s, 1 test, 1 failure <<< FAILURES!
        

        See the new "explanation" part. Anyway, I was then able to turn those
        BKD traversal details into a standalone test, that does fail:

          public void testCuriousFailure() throws Exception {
            GeoShape shape = GeoCircleFactory.makeGeoCircle(PlanetModel.WGS84, -0.8971654677124566, -0.3398482030102755, 1.4775317506492547);
            GeoPoint point = new GeoPoint(0.8653002868649471, 0.50134342478497, 0.046203414829601996);
        
            // point is inside our circle shape:
            assertTrue(shape.isWithin(point));
        
            double xMin = 0.8653002866318559;
            double xMax = 0.8653002870980383;
            double yMin = 0.5013434245518787;
            double yMax = 0.5013434250180612;
            double zMin = 0.04620341459651078;
            double zMax = 0.04620341506269321;
            GeoArea xyzSolid = GeoAreaFactory.makeGeoArea(PlanetModel.WGS84, xMin, xMax, yMin, yMax, zMin, zMax);
        
            // point is also inside our wee tiny box:
            assertTrue(xyzSolid.isWithin(point));
        
            assertTrue(xyzSolid.getRelationship(shape) != GeoArea.DISJOINT);
          }
        

        It's an interesting failure ... the segment has only one document, so
        the BKD tree has a single leaf block with just that one value, and so
        its bbox is really really tiny. Maybe the tiny bbox and the gigantic
        circle tickle a geo3d bug?

        Show
        mikemccand Michael McCandless added a comment - OK, another patch, addressing all nocommits. I think it's nearly ready... I had to an "if" to decode to never go below -planetMax . However the above failure still happens ... So I've improved this test so that on failure, it runs a crazy (test only) IntersectVisitor that "explains" what happened to this one hit. For this failure, it now produces output like this: [junit4] Started J0 PID(11250@localhost). [junit4] Suite: org.apache.lucene.spatial3d.TestGeo3DPoint [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeo3DPoint -Dtests.method=testRandomBig -Dtests.seed=3AD1F04EDC75F5CC -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=ru-RU -Dtests.timezone=America/Grand_Turk -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 5.47s | TestGeo3DPoint.testRandomBig <<< [junit4] > Throwable #1: java.lang.AssertionError: FAIL: id=541300 should have matched but did not [junit4] > shape=GeoStandardCircle: {planetmodel=PlanetModel.WGS84, center=[lat=-0.8971654677124566, lon=-0.3398482030102755], radius=1.4775317506492547(84.65633340877824)} [junit4] > point=[X=0.8653002868649471, Y=0.50134342478497, Z=0.046203414829601996] [junit4] > docID=538760 deleted?=false [junit4] > query=PointInGeo3DShapeQuery: field=point: Shape: GeoStandardCircle: {planetmodel=PlanetModel.WGS84, center=[lat=-0.8971654677124566, lon=-0.3398482030102755], radius=1.4775317506492547(84.65633340877824)} [junit4] > explanation: [junit4] > target is in leaf _2u(7.0.0):c1 of full reader StandardDirectoryReader(segments:210:nrt _12(7.0.0):c198337/1995:delGen=1 _24(7.0.0):c199685/601:delGen=1 _23(7.0.0):c5413/17:delGen=2 _25(7.0.0):c5413/12:delGen=1 _26(7.0.0):c5413/10:delGen=1 _27(7.0.0):c5413/10:delGen=1 _28(7.0.0):c5413/17:delGen=2 _29(7.0.0):c5413/8:delGen=2 _2a(7.0.0):c5413/12:delGen=1 _2b(7.0.0):c5413/14:delGen=1 _2c(7.0.0):c5413/4:delGen=1 _2d(7.0.0):c5413/13:delGen=2 _2e(7.0.0):c5413/12:delGen=2 _2f(7.0.0):c5413/15:delGen=2 _2g(7.0.0):c5413/10:delGen=2 _2h(7.0.0):c5413/9:delGen=2 _2i(7.0.0):c5413/6:delGen=1 _2j(7.0.0):c5413/9:delGen=1 _2k(7.0.0):c5413/8:delGen=1 _2l(7.0.0):c5413/5:delGen=1 _2m(7.0.0):c5413/3:delGen=1 _2n(7.0.0):c5413/4:delGen=1 _2o(7.0.0):c5413 _2p(7.0.0):c5413/5:delGen=1 _2q(7.0.0):c5413/1:delGen=1 _2r(7.0.0):c5413 _2s(7.0.0):c5413 _2t(7.0.0):c5413 _2u(7.0.0):c1) [junit4] > full BKD path to target doc: [junit4] > Cell(x=0.8653002866318559 TO 0.8653002870980383 y=0.5013434245518787 TO 0.5013434250180612 z=0.04620341459651078 TO 0.04620341506269321) [junit4] > on cell Cell(x=0.8653002866318559 TO 0.8653002870980383 y=0.5013434245518787 TO 0.5013434250180612 z=0.04620341459651078 TO 0.04620341506269321), wrapped visitor returned CELL_OUTSIDE_QUERY [junit4] > at __randomizedtesting.SeedInfo.seed([3AD1F04EDC75F5CC:BD868DC14D2C894C]:0) [junit4] > at org.apache.lucene.spatial3d.TestGeo3DPoint.verify(TestGeo3DPoint.java:741) [junit4] > at org.apache.lucene.spatial3d.TestGeo3DPoint.doTestRandom(TestGeo3DPoint.java:537) [junit4] > at org.apache.lucene.spatial3d.TestGeo3DPoint.testRandomBig(TestGeo3DPoint.java:469) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: leaving temporary files on disk at: /l/geo3dquant/lucene/build/spatial3d/test/J0/temp/lucene.spatial3d.TestGeo3DPoint_3AD1F04EDC75F5CC-026 [junit4] 2> NOTE: test params are: codec=Asserting(Lucene60): {id=TestBloomFilteredLucenePostings(BloomFilteringPostingsFormat(Lucene50(blocksize=128)))}, docValues:{id=DocValuesFormat(name=Asserting)}, maxPointsInLeafNode=357, maxMBSortInHeap=5.860921451607786, sim=ClassicSimilarity, locale=ru-RU, timezone=America/Grand_Turk [junit4] 2> NOTE: Linux 3.13.0-71-generic amd64/Oracle Corporation 1.8.0_60 (64-bit)/cpus=8,threads=1,free=410208728,total=495452160 [junit4] 2> NOTE: All tests run in this JVM: [TestGeo3DPoint] [junit4] Completed [1/1 (1!)] in 5.63s, 1 test, 1 failure <<< FAILURES! See the new "explanation" part. Anyway, I was then able to turn those BKD traversal details into a standalone test, that does fail: public void testCuriousFailure() throws Exception { GeoShape shape = GeoCircleFactory.makeGeoCircle(PlanetModel.WGS84, -0.8971654677124566, -0.3398482030102755, 1.4775317506492547); GeoPoint point = new GeoPoint(0.8653002868649471, 0.50134342478497, 0.046203414829601996); // point is inside our circle shape: assertTrue(shape.isWithin(point)); double xMin = 0.8653002866318559; double xMax = 0.8653002870980383; double yMin = 0.5013434245518787; double yMax = 0.5013434250180612; double zMin = 0.04620341459651078; double zMax = 0.04620341506269321; GeoArea xyzSolid = GeoAreaFactory.makeGeoArea(PlanetModel.WGS84, xMin, xMax, yMin, yMax, zMin, zMax); // point is also inside our wee tiny box: assertTrue(xyzSolid.isWithin(point)); assertTrue(xyzSolid.getRelationship(shape) != GeoArea.DISJOINT); } It's an interesting failure ... the segment has only one document, so the BKD tree has a single leaf block with just that one value, and so its bbox is really really tiny. Maybe the tiny bbox and the gigantic circle tickle a geo3d bug?
        Hide
        mikemccand Michael McCandless added a comment -

        OK that test also fails on clean trunk checkout, so I don't think it should block pushing this change ... I'll @Ignore it for now.

        Show
        mikemccand Michael McCandless added a comment - OK that test also fails on clean trunk checkout, so I don't think it should block pushing this change ... I'll @Ignore it for now.
        Hide
        kwright@metacarta.com Karl Wright added a comment - - edited

        Michael McCandless Yeah, please go ahead and push it, and I'll have a more detailed look.

        I like the forensics; this will be very helpful.

        I see your logic for the little test and yes it does look to me like there must be a bug of some sort. I'll chase it down.

        Show
        kwright@metacarta.com Karl Wright added a comment - - edited Michael McCandless Yeah, please go ahead and push it, and I'll have a more detailed look. I like the forensics; this will be very helpful. I see your logic for the little test and yes it does look to me like there must be a bug of some sort. I'll chase it down.
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        Michael McCandless: The problem is that the "wee little box" doesn't actually seem to intersect with the surface of the world at all. So the point can be in the box, and it can be within the shape (because that's described by planes), but since it's off the surface of the world we don't detect these as intersecting anywhere on the surface. We need the surface points of intersection in order to evaluate the kind of intersection that is present.

        So if my analysis is correct, this isn't a bug, per se, since there's really no intersection. But it's a problem, no question.

        Do you have a concept of a minimum-sized box? That may be the way to go. The minimum would want to be twice the size of the largest possible numerical rounding delta, or something like that. If a box gets to be that size and still overlaps then you'd have to check the remaining individual items against the shape. Let me think it through, though, to be sure that's the right approach.

        Show
        kwright@metacarta.com Karl Wright added a comment - Michael McCandless : The problem is that the "wee little box" doesn't actually seem to intersect with the surface of the world at all. So the point can be in the box, and it can be within the shape (because that's described by planes), but since it's off the surface of the world we don't detect these as intersecting anywhere on the surface. We need the surface points of intersection in order to evaluate the kind of intersection that is present. So if my analysis is correct, this isn't a bug, per se, since there's really no intersection. But it's a problem, no question. Do you have a concept of a minimum-sized box? That may be the way to go. The minimum would want to be twice the size of the largest possible numerical rounding delta, or something like that. If a box gets to be that size and still overlaps then you'd have to check the remaining individual items against the shape. Let me think it through, though, to be sure that's the right approach.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit c2289de3c7100619b9476e4ec92ad6d4e89becc7 in lucene-solr's branch refs/heads/master from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c2289de ]

        LUCENE-7168: improve encode and quantization testing for geo3d

        Show
        jira-bot ASF subversion and git services added a comment - Commit c2289de3c7100619b9476e4ec92ad6d4e89becc7 in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c2289de ] LUCENE-7168 : improve encode and quantization testing for geo3d
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 6868a8cd74d2d21dd679dfbb867d0823253ac2a5 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6868a8c ]

        LUCENE-7168: improve encode and quantization testing for geo3d

        Show
        jira-bot ASF subversion and git services added a comment - Commit 6868a8cd74d2d21dd679dfbb867d0823253ac2a5 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6868a8c ] LUCENE-7168 : improve encode and quantization testing for geo3d
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        Michael McCandless Thinking it through further, here are some more ways we could proceed.

        (1) Since the problem is that the numerically approximated point is not actually on the surface, I can provide a "projection" method, which will generate a point on the surface given an (x,y,z) that is not quite on the surface. This has some cost because it will involve a sqrt operation per point so projected, but it is probably acceptable. The question is, does it help? What would we do differently in BKD?

        (2) I can, maybe in conjunction with (1) above, make geo3d shapes no longer find any point that is not on the surface to be "within" any surface shape. This is of trivial cost (but requires changes to a number of classes). That will make the relationships not be confusing since points not on the surface will be excluded from the search, but it might mean that nothing gets returned at all from BKD by design. :-P

        (3) I could add a method that would allow you to determine if a BKD cell intersected the surface or not. This, however, is maybe not a good solution, because cells can intersect on two sides and I can foresee issues when one side just doesn't make it etc. So maybe this won't help.

        (4) When you stop to think about it, a quantized BKD point actually represents a little cell; it represents a range of possible values in x, y, and z. If we knew what that range was, maybe we could compute intersection with each BKD cell. This is a more robust way of looking at the "minimum BKD cell size" idea. I could provide a construct, e.g. GeoFuzzyPoint, which would express this. But we'd still need to figure out what operations you would need to be able to do with a fuzzy point in order to not have problems arise. Would we need new "isWithin" methods? Or just "getRelationship()" support between such fuzzy points and XYZSolids? Or what, exactly? Depending on the answer, the fuzzy point might be very similar in logic to the XYZSolid, so the cost of creating one would be similar (i.e. not trivially cheap).

        Let's discuss possible solutions using these tools – if not this morning then maybe at lunch.

        Show
        kwright@metacarta.com Karl Wright added a comment - Michael McCandless Thinking it through further, here are some more ways we could proceed. (1) Since the problem is that the numerically approximated point is not actually on the surface, I can provide a "projection" method, which will generate a point on the surface given an (x,y,z) that is not quite on the surface. This has some cost because it will involve a sqrt operation per point so projected, but it is probably acceptable. The question is, does it help? What would we do differently in BKD? (2) I can, maybe in conjunction with (1) above, make geo3d shapes no longer find any point that is not on the surface to be "within" any surface shape. This is of trivial cost (but requires changes to a number of classes). That will make the relationships not be confusing since points not on the surface will be excluded from the search, but it might mean that nothing gets returned at all from BKD by design . :-P (3) I could add a method that would allow you to determine if a BKD cell intersected the surface or not. This, however, is maybe not a good solution, because cells can intersect on two sides and I can foresee issues when one side just doesn't make it etc. So maybe this won't help. (4) When you stop to think about it, a quantized BKD point actually represents a little cell; it represents a range of possible values in x, y, and z. If we knew what that range was, maybe we could compute intersection with each BKD cell. This is a more robust way of looking at the "minimum BKD cell size" idea. I could provide a construct, e.g. GeoFuzzyPoint, which would express this. But we'd still need to figure out what operations you would need to be able to do with a fuzzy point in order to not have problems arise. Would we need new "isWithin" methods? Or just "getRelationship()" support between such fuzzy points and XYZSolids? Or what, exactly? Depending on the answer, the fuzzy point might be very similar in logic to the XYZSolid, so the cost of creating one would be similar (i.e. not trivially cheap). Let's discuss possible solutions using these tools – if not this morning then maybe at lunch.
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        Michael McCandless: I realized suddenly how to do this properly.

        The trick is to line up the BKD tree cells that it descends through with the "fuzzy" cells from the point resolution. So you never would generate an XYZSolid that did not correspond exactly to an integral number of "fuzzy" point cells. This implies that you also could not have a BKD cell that was smaller than a fuzzy point cell, and thus the situation we're trying to address could never arise.

        To clarify, imagine the space that you are searching as consisting of a large number of small cubes. Each cube has an integer address, e.g. (xi, yi, zi). You can put an actual GeoPoint on the world surface into an individual cube via quantization, so you then get to describe it by (xi, yi, zi). When we do the BKD descent, the way we must describe the solid areas we are searching is not by ranges of (x,y,z), but by ranges of (xi,yi,zi). Then you can convert those ranges trivially into (x,y,z) ranges and create the XYZSolid objects you use to relate with the shape. This guarantees you will not ever create an XYZSolid that contains a point which does not intersect with the world, since the original point had to have been on the world in the first place.

        Sound like it might work?

        Show
        kwright@metacarta.com Karl Wright added a comment - Michael McCandless : I realized suddenly how to do this properly. The trick is to line up the BKD tree cells that it descends through with the "fuzzy" cells from the point resolution. So you never would generate an XYZSolid that did not correspond exactly to an integral number of "fuzzy" point cells. This implies that you also could not have a BKD cell that was smaller than a fuzzy point cell, and thus the situation we're trying to address could never arise. To clarify, imagine the space that you are searching as consisting of a large number of small cubes. Each cube has an integer address, e.g. (xi, yi, zi). You can put an actual GeoPoint on the world surface into an individual cube via quantization, so you then get to describe it by (xi, yi, zi). When we do the BKD descent, the way we must describe the solid areas we are searching is not by ranges of (x,y,z), but by ranges of (xi,yi,zi). Then you can convert those ranges trivially into (x,y,z) ranges and create the XYZSolid objects you use to relate with the shape. This guarantees you will not ever create an XYZSolid that contains a point which does not intersect with the world, since the original point had to have been on the world in the first place. Sound like it might work?
        Hide
        mikemccand Michael McCandless added a comment -

        Michael McCandless: I realized suddenly how to do this properly.

        I like this approach, but, I thought this is what we are doing already!

        I.e., BKD fundamentally already operates in this quantized (small cube) space, since it's only ever given quantized points, and it splits at the quantized points.

        Then, in PointInShapeIntersectVisitor.compare we do this:

            double xMin = decodeValueMin(NumericUtils.sortableBytesToInt(minPackedValue, 0));
            double xMax = decodeValueMax(NumericUtils.sortableBytesToInt(maxPackedValue, 0));
            double yMin = decodeValueMin(NumericUtils.sortableBytesToInt(minPackedValue, 1 * Integer.BYTES));
            double yMax = decodeValueMax(NumericUtils.sortableBytesToInt(maxPackedValue, 1 * Integer.BYTES));
            double zMin = decodeValueMin(NumericUtils.sortableBytesToInt(minPackedValue, 2 * Integer.BYTES));
            double zMax = decodeValueMax(NumericUtils.sortableBytesToInt(maxPackedValue, 2 * Integer.BYTES));
        
            GeoArea xyzSolid = GeoAreaFactory.makeGeoArea(PlanetModel.WGS84, xMin, xMax, yMin, yMax, zMin, zMax);
        

        I.e, that xyzSolid should in fact already be spanning the full little-cube space that this BKD cell covers? We then go on and relate that xyzSolid to the query shape.

        Show
        mikemccand Michael McCandless added a comment - Michael McCandless: I realized suddenly how to do this properly. I like this approach, but, I thought this is what we are doing already! I.e., BKD fundamentally already operates in this quantized (small cube) space, since it's only ever given quantized points, and it splits at the quantized points. Then, in PointInShapeIntersectVisitor.compare we do this: double xMin = decodeValueMin(NumericUtils.sortableBytesToInt(minPackedValue, 0)); double xMax = decodeValueMax(NumericUtils.sortableBytesToInt(maxPackedValue, 0)); double yMin = decodeValueMin(NumericUtils.sortableBytesToInt(minPackedValue, 1 * Integer.BYTES)); double yMax = decodeValueMax(NumericUtils.sortableBytesToInt(maxPackedValue, 1 * Integer.BYTES)); double zMin = decodeValueMin(NumericUtils.sortableBytesToInt(minPackedValue, 2 * Integer.BYTES)); double zMax = decodeValueMax(NumericUtils.sortableBytesToInt(maxPackedValue, 2 * Integer.BYTES)); GeoArea xyzSolid = GeoAreaFactory.makeGeoArea(PlanetModel.WGS84, xMin, xMax, yMin, yMax, zMin, zMax); I.e, that xyzSolid should in fact already be spanning the full little-cube space that this BKD cell covers? We then go on and relate that xyzSolid to the query shape.
        Hide
        mikemccand Michael McCandless added a comment -

        Ooooh ... I think I see the bug! I need to fix decodeValueMin/Max to match floor not round that we used to do! I'll test that change, and re-beast.

        Show
        mikemccand Michael McCandless added a comment - Ooooh ... I think I see the bug! I need to fix decodeValueMin/Max to match floor not round that we used to do! I'll test that change, and re-beast.
        Hide
        mikemccand Michael McCandless added a comment -

        OK how about this? I moved the min/max decode back to Geo3DUtil (package private!), renamed them to floor/ceil.

        I'm beasting now...

        Sorry for the noise Karl Wright! Now we can have a relaxed lunch

        Show
        mikemccand Michael McCandless added a comment - OK how about this? I moved the min/max decode back to Geo3DUtil (package private!), renamed them to floor/ceil. I'm beasting now... Sorry for the noise Karl Wright ! Now we can have a relaxed lunch
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        Phew.

        Yes, I now vaguely recall that we had this discussion early. I'm glad you found the issue before I went nuts repeating last year's logic.

        Show
        kwright@metacarta.com Karl Wright added a comment - Phew. Yes, I now vaguely recall that we had this discussion early. I'm glad you found the issue before I went nuts repeating last year's logic.
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        Patch looks good! +1 for the latest

        Show
        kwright@metacarta.com Karl Wright added a comment - Patch looks good! +1 for the latest
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit df07e0c30ff0d90f9052dd411f027c0dfcc3fb88 in lucene-solr's branch refs/heads/master from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=df07e0c ]

        LUCENE-7168: fix ceil/floor decode to match encode

        Show
        jira-bot ASF subversion and git services added a comment - Commit df07e0c30ff0d90f9052dd411f027c0dfcc3fb88 in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=df07e0c ] LUCENE-7168 : fix ceil/floor decode to match encode
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 16c6dd9a2d67d88e18a43f24ff636a4ddca0f123 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=16c6dd9 ]

        LUCENE-7168: fix ceil/floor decode to match encode

        Show
        jira-bot ASF subversion and git services added a comment - Commit 16c6dd9a2d67d88e18a43f24ff636a4ddca0f123 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=16c6dd9 ] LUCENE-7168 : fix ceil/floor decode to match encode
        Hide
        mikemccand Michael McCandless added a comment -

        Here's a new patch, simplifying the encode/decode, based on a good
        idea that Karl Wright suggested (over lunch!) to change decode to decode
        to the center value. This gets us our stability without having to
        pick special double values, and it means we can use the full integer
        space again, but I did need to add the same if that LatLonPoint
        has about not being able to encode precisely the max value.

        Show
        mikemccand Michael McCandless added a comment - Here's a new patch, simplifying the encode/decode, based on a good idea that Karl Wright suggested (over lunch!) to change decode to decode to the center value. This gets us our stability without having to pick special double values, and it means we can use the full integer space again, but I did need to add the same if that LatLonPoint has about not being able to encode precisely the max value.
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        +1 for the new patch.

        Show
        kwright@metacarta.com Karl Wright added a comment - +1 for the new patch.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1848477bd83ca3f45ffda0d15a2eee901adb90b6 in lucene-solr's branch refs/heads/master from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1848477 ]

        LUCENE-7168: use center value on decode

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1848477bd83ca3f45ffda0d15a2eee901adb90b6 in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1848477 ] LUCENE-7168 : use center value on decode
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit dd9d5d06d2d5aef1f9367ff6459b325a44d0de4c in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=dd9d5d0 ]

        LUCENE-7168: use center value on decode

        Show
        jira-bot ASF subversion and git services added a comment - Commit dd9d5d06d2d5aef1f9367ff6459b325a44d0de4c in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=dd9d5d0 ] LUCENE-7168 : use center value on decode
        Hide
        hossman Hoss Man added a comment -

        Manually correcting fixVersion per Step #S5 of LUCENE-7271

        Show
        hossman Hoss Man added a comment - Manually correcting fixVersion per Step #S5 of LUCENE-7271

          People

          • Assignee:
            Unassigned
            Reporter:
            mikemccand Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development