Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.4, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I'm opening this for discussion, because I'm not yet sure how to do
      this integration, because of my ignorance about spatial in general and
      spatial3d in particular

      Our BKD tree impl is very fast at doing lat/lon shape intersection
      (bbox, polygon, soon distance: LUCENE-6698) against previously indexed
      points.

      I think to integrate with spatial3d, we would first need to record
      lat/lon/z into doc values. Somewhere I saw discussion about how we
      could stuff all 3 into a single long value with acceptable precision
      loss? Or, we could use BinaryDocValues? We need all 3 dims available
      to do the fast per-hit query time filtering.

      But, second: what do we index into the BKD tree? Can we "just" index
      earth surface lat/lon, and then at query time is spatial3d able to
      give me an enclosing "surface lat/lon" bbox for a 3d shape? Or
      ... must we index all 3 dimensions into the BKD tree (seems like this
      could be somewhat wasteful)?

      1. Geo3DPacking.java
        4 kB
        Nicholas Knize
      2. LUCENE-6699.patch
        5 kB
        Michael McCandless
      3. LUCENE-6699.patch
        3 kB
        Karl Wright
      4. LUCENE-6699.patch
        2 kB
        Karl Wright
      5. LUCENE-6699.patch
        3 kB
        Karl Wright
      6. LUCENE-6699.patch
        10 kB
        Karl Wright
      7. LUCENE-6699.patch
        11 kB
        Karl Wright
      8. LUCENE-6699.patch
        11 kB
        Karl Wright
      9. LUCENE-6699.patch
        6 kB
        Karl Wright
      10. LUCENE-6699.patch
        4 kB
        Karl Wright
      11. LUCENE-6699.patch
        2 kB
        Michael McCandless
      12. LUCENE-6699.patch
        1 kB
        Michael McCandless
      13. LUCENE-6699.patch
        1 kB
        Michael McCandless
      14. LUCENE-6699.patch
        25 kB
        Karl Wright
      15. LUCENE-6699.patch
        19 kB
        Karl Wright
      16. LUCENE-6699.patch
        9 kB
        Karl Wright
      17. LUCENE-6699.patch
        3 kB
        Karl Wright
      18. LUCENE-6699.patch
        120 kB
        Karl Wright
      19. LUCENE-6699.patch
        13 kB
        Karl Wright
      20. LUCENE-6699.patch
        18 kB
        Karl Wright
      21. LUCENE-6699.patch
        35 kB
        Karl Wright
      22. LUCENE-6699.patch
        14 kB
        Karl Wright
      23. LUCENE-6699.patch
        14 kB
        Karl Wright
      24. LUCENE-6699.patch
        5 kB
        Karl Wright
      25. LUCENE-6699.patch
        117 kB
        Michael McCandless
      26. LUCENE-6699.patch
        27 kB
        Karl Wright
      27. LUCENE-6699.patch
        14 kB
        Karl Wright

        Issue Links

          Activity

          Hide
          Karl Wright added a comment -

          Hi Mike,

          The coordinates you'd need to store for geospatial3d are: (x,y,z), rather than (lat,lon,z). The values are floating-point numbers that range between -1.0 and 1.0 for a sphere, and slightly more than that in abs value for a WGS84 ellipsoid.

          There was a ticket where I attached code for a packing scheme that would ram all three values into a 64-bit long; I'll see if I can find it. That packing scheme basically gave you resolution of about a meter. However, as you have pointed out, it's not strictly necessary to stick to 64-bit longs either, so you're free to propose anything that makes sense.

          Show
          Karl Wright added a comment - Hi Mike, The coordinates you'd need to store for geospatial3d are: (x,y,z), rather than (lat,lon,z). The values are floating-point numbers that range between -1.0 and 1.0 for a sphere, and slightly more than that in abs value for a WGS84 ellipsoid. There was a ticket where I attached code for a packing scheme that would ram all three values into a 64-bit long; I'll see if I can find it. That packing scheme basically gave you resolution of about a meter. However, as you have pointed out, it's not strictly necessary to stick to 64-bit longs either, so you're free to propose anything that makes sense.
          Hide
          Karl Wright added a comment - - edited

          Can we "just" index earth surface lat/lon, and then at query time is spatial3d able to give me an enclosing "surface lat/lon" bbox for a 3d shape?

          Michael McCandless Ok, so now you have me confused a bit as to what your requirements are for BKD. If you want to split the globe up into lat/lon rectangles, and use BKD that way to descend, then obviously you'd need points to be stored in lat/lon. But that would make less sense for geospatial3d, because what you're really trying to do is assess membership in a shape, or distance also in regards to a shape, both of which require (x,y,z) not lat/lon. Yes, you can convert to (x,y,z) from lat/lon, but the conversion is relatively expensive.

          Instead, I could imagine just staying natively in (x,y,z), and doing your splits in that space, e.g. split in x, then in y, then in z. So you'd have a GeoPoint3D which would pack (x,y,z) in a format you could rapidly extract, and a splitting algorithm that would use the known ranges for these values. Does that make sense to you? Would that work with BKD?

          Show
          Karl Wright added a comment - - edited Can we "just" index earth surface lat/lon, and then at query time is spatial3d able to give me an enclosing "surface lat/lon" bbox for a 3d shape? Michael McCandless Ok, so now you have me confused a bit as to what your requirements are for BKD. If you want to split the globe up into lat/lon rectangles, and use BKD that way to descend, then obviously you'd need points to be stored in lat/lon. But that would make less sense for geospatial3d, because what you're really trying to do is assess membership in a shape, or distance also in regards to a shape, both of which require (x,y,z) not lat/lon. Yes, you can convert to (x,y,z) from lat/lon, but the conversion is relatively expensive. Instead, I could imagine just staying natively in (x,y,z), and doing your splits in that space, e.g. split in x, then in y, then in z. So you'd have a GeoPoint3D which would pack (x,y,z) in a format you could rapidly extract, and a splitting algorithm that would use the known ranges for these values. Does that make sense to you? Would that work with BKD?
          Hide
          Michael McCandless added a comment -

          OK, thanks Karl Wright, I think I understand.

          Yes BKD tree can easily split on the 3 (x,y,z) dimensions ... I just need to generalize it (it's currently "hardwired" for just 2 dimensions).

          So for this to work, I need to be able to efficiently ask the query shape whether it separately overlaps with a range in each of the 3 dimensions. E.g. "do you intersect 0.3 <= x < 0.7", and same for y and z. I assume this is fast/simple to do with the spatial3d APIs?

          And then of course with each indexed x,y,z point I also need to test whether the shape includes it.

          Hmm, one source of efficiency for the BKD tree is it can recognize at search time when a given indexed cell is fully contained by the query shape, and then it doesn't need to test every point against the shape (it just blindly collects all docIDs in that cell). But in the 3D case, I think that optimization would never apply? I.e, the leaf cells would all be little cubes on the earth's surface, which include a volume (above and below earth's surface) that the shape does not accept?

          Show
          Michael McCandless added a comment - OK, thanks Karl Wright , I think I understand. Yes BKD tree can easily split on the 3 (x,y,z) dimensions ... I just need to generalize it (it's currently "hardwired" for just 2 dimensions). So for this to work, I need to be able to efficiently ask the query shape whether it separately overlaps with a range in each of the 3 dimensions. E.g. "do you intersect 0.3 <= x < 0.7", and same for y and z. I assume this is fast/simple to do with the spatial3d APIs? And then of course with each indexed x,y,z point I also need to test whether the shape includes it. Hmm, one source of efficiency for the BKD tree is it can recognize at search time when a given indexed cell is fully contained by the query shape, and then it doesn't need to test every point against the shape (it just blindly collects all docIDs in that cell). But in the 3D case, I think that optimization would never apply? I.e, the leaf cells would all be little cubes on the earth's surface, which include a volume (above and below earth's surface) that the shape does not accept?
          Hide
          Karl Wright added a comment - - edited

          Michael McCandless

          So for this to work, I need to be able to efficiently ask the query shape whether it separately overlaps with a range in each of the 3 dimensions. E.g. "do you intersect 0.3 <= x < 0.7", and same for y and z. I assume this is fast/simple to do with the spatial3d APIs?

          Hmm. There's a "bounds()" method which obtains the lat/lon bounds of a shape. But for 3d we'd not be able to use that. However:

          Hmm, one source of efficiency for the BKD tree is it can recognize at search time when a given indexed cell is fully contained by the query shape, and then it doesn't need to test every point against the shape (it just blindly collects all docIDs in that cell). But in the 3D case, I think that optimization would never apply?

          Well, IF we presume that the records all lie on the surface of the world, then you can ask the question, "do these x, y, z bounds, when intersected with the world surface, all lie within the shape, or overlap the shape?" I'll have to think about how efficient that can be made to work, but you ARE after all describing a second surface shape by your x-y-z ranges, so in theory this should be doable. In particular, the GeoBBox construct allows you to find the relationship between a specific kind of surface shape (currently only lat/lon rectangles and their degenerate 3d equivalents) and any other GeoShape. We could try to introduce x-y-z 3d rectangles as shapes; they'd be completely describable by planes, so really all the same logic would apply. I would need to create a new GeoAreaFactory method, and a family of x-y-z bounding shapes, e.g. GeoXYZArea, and it should all work.

          To be sure, have a look at the functionality that GeoArea gives you. If that looks like what you need, and you can convince yourself that it would be reasonably efficient as far as BKD is concerned, then I'll go ahead and create a patch that does what you need.

          Thinking in more detail about BKD efficiency, I realize that GeoXYZArea could describe a huge number of 3D rectangles that have NO intersection with the world surface. That's worrisome because it implies that BKD over the (x,y,z) space may not be a very good way of organizing records? Or does it?

          Show
          Karl Wright added a comment - - edited Michael McCandless So for this to work, I need to be able to efficiently ask the query shape whether it separately overlaps with a range in each of the 3 dimensions. E.g. "do you intersect 0.3 <= x < 0.7", and same for y and z. I assume this is fast/simple to do with the spatial3d APIs? Hmm. There's a "bounds()" method which obtains the lat/lon bounds of a shape. But for 3d we'd not be able to use that. However: Hmm, one source of efficiency for the BKD tree is it can recognize at search time when a given indexed cell is fully contained by the query shape, and then it doesn't need to test every point against the shape (it just blindly collects all docIDs in that cell). But in the 3D case, I think that optimization would never apply? Well, IF we presume that the records all lie on the surface of the world, then you can ask the question, "do these x, y, z bounds, when intersected with the world surface, all lie within the shape, or overlap the shape?" I'll have to think about how efficient that can be made to work, but you ARE after all describing a second surface shape by your x-y-z ranges, so in theory this should be doable. In particular, the GeoBBox construct allows you to find the relationship between a specific kind of surface shape (currently only lat/lon rectangles and their degenerate 3d equivalents) and any other GeoShape. We could try to introduce x-y-z 3d rectangles as shapes; they'd be completely describable by planes, so really all the same logic would apply. I would need to create a new GeoAreaFactory method, and a family of x-y-z bounding shapes, e.g. GeoXYZArea, and it should all work. To be sure, have a look at the functionality that GeoArea gives you. If that looks like what you need, and you can convince yourself that it would be reasonably efficient as far as BKD is concerned, then I'll go ahead and create a patch that does what you need. Thinking in more detail about BKD efficiency, I realize that GeoXYZArea could describe a huge number of 3D rectangles that have NO intersection with the world surface. That's worrisome because it implies that BKD over the (x,y,z) space may not be a very good way of organizing records? Or does it?
          Hide
          Karl Wright added a comment -

          More analysis:

          (1) Your BKD search would need to construct a GeoArea object each time it descends a level.
          (2) Construction of an XYZ area will definitely involve construction of up to six additional objects (all planes), but should otherwise be computationally cheap.
          (3) To minimize work on both construction and comparison, I'd need to create the following new classes, where Dg indicates "degenerate" in that dimension, e.g. "DgX" means "degenerate in X":

          XYZArea
          DgXYZArea
          XDgYZArea
          XYDgZArea
          DgXDgYArea
          DgXYDgZArea
          XDgYDgZArea
          DgXDgYDgZArea

          Note: these are not named "Geo" objects, because they do not have anything to do with the surface of the world.

          Show
          Karl Wright added a comment - More analysis: (1) Your BKD search would need to construct a GeoArea object each time it descends a level. (2) Construction of an XYZ area will definitely involve construction of up to six additional objects (all planes), but should otherwise be computationally cheap. (3) To minimize work on both construction and comparison, I'd need to create the following new classes, where Dg indicates "degenerate" in that dimension, e.g. "DgX" means "degenerate in X": XYZArea DgXYZArea XDgYZArea XYDgZArea DgXDgYArea DgXYDgZArea XDgYDgZArea DgXDgYDgZArea Note: these are not named "Geo" objects, because they do not have anything to do with the surface of the world.
          Hide
          Michael McCandless added a comment -

          you ARE after all describing a second surface shape by your x-y-z ranges

          That's exactly it. Essentially the BKD tree needs a way to recursively chop up the earth surface into smaller and smaller curved-rectangle-like (I think?) shapes on the earth's surface, both at indexing time and at search time.

          At search time, for a given cell, it needs to "relate" to the query shape to know if the query shape full contains the cell, does not overlap with the cell, or partially overlaps.

          But I don't understand the "degenerate in X/Y" Area objects... it seems like GeoArea would be used for the lat/lon "approximation" (outer bounding box?) to a GeoShape? Seems like it's better if we can do all functions using proper earth-surface shapes? Sorry this is still all very new to me

          Show
          Michael McCandless added a comment - you ARE after all describing a second surface shape by your x-y-z ranges That's exactly it. Essentially the BKD tree needs a way to recursively chop up the earth surface into smaller and smaller curved-rectangle-like (I think?) shapes on the earth's surface, both at indexing time and at search time. At search time, for a given cell, it needs to "relate" to the query shape to know if the query shape full contains the cell, does not overlap with the cell, or partially overlaps. But I don't understand the "degenerate in X/Y" Area objects... it seems like GeoArea would be used for the lat/lon "approximation" (outer bounding box?) to a GeoShape? Seems like it's better if we can do all functions using proper earth-surface shapes? Sorry this is still all very new to me
          Hide
          Karl Wright added a comment - - edited

          Essentially the BKD tree needs a way to recursively chop up the earth surface into smaller and smaller curved-rectangle-like (I think?) shapes on the earth's surface, both at indexing time and at search time.

          With x/y/z splitting, actually you're not quite doing that. Instead, you are chopping up the space that the entire world lives in (not just its surface). One these 3d rectangles may or may not actually intersect the surface (which is where all the geo shapes actually lie). If it does intersect, it might intersect on only one side of the world, or it might intersect on two sides of the world. A long, thin 3d rectangle could well encompass a little bit of territory in (say) the UK as well as Alaska, for instance. But as you describe the algorithm, I'm not sure that this is important at all to know.

          At search time, for a given cell, it needs to "relate" to the query shape to know if the query shape full contains the cell, does not overlap with the cell, or partially overlaps.

          The GeoArea interface gives you all of that, which is why I wanted to implement objects that aren't limited to the surface but do implement GeoArea.

          But I don't understand the "degenerate in X/Y" Area objects... it seems like GeoArea would be used for the lat/lon "approximation" (outer bounding box?) to a GeoShape? Seems like it's better if we can do all functions using proper earth-surface shapes?

          GeoArea objects are not constrained to be surface objects. Right now the only implementers of GeoArea are bounding boxes, bounded in latitude and longitude, but that's merely due to lack of need for anything else. The relationship types GeoArea objects can determine are against general GeoShape objects (which are surface objects), so the semantics are perfect for what you are trying to do. You can determine whether a lat/lon rectangle overlaps, contains, is within, or doesn't intersect at all with, any arbitrary spatial3d surface shape.

          But let's be clear: for BKD in the x,y,z space, GeoArea objects bounding in lat/lon are useless. So we need to invent GeoArea objects that represent 3d rectangles. If we try to talk about a general XYZ-bounded area, then there will be up to two bounds for X (min and max), two bounds for Y (min and max), and two bounds for Z (min and max). The degenerate cases come into play when min-X == max-X, or min-Y == max-Y, etc, or when there is no min-X bound but just a max-X one, for instance. This can be conditional logic but the whole thing is faster and more efficient if there's an object for each funky case.

          Hope this helps.

          Michael McCandless If this all is clear now, and you want to proceed, let me know and I can start working on it tonight.

          Show
          Karl Wright added a comment - - edited Essentially the BKD tree needs a way to recursively chop up the earth surface into smaller and smaller curved-rectangle-like (I think?) shapes on the earth's surface, both at indexing time and at search time. With x/y/z splitting, actually you're not quite doing that. Instead, you are chopping up the space that the entire world lives in (not just its surface). One these 3d rectangles may or may not actually intersect the surface (which is where all the geo shapes actually lie). If it does intersect, it might intersect on only one side of the world, or it might intersect on two sides of the world. A long, thin 3d rectangle could well encompass a little bit of territory in (say) the UK as well as Alaska, for instance. But as you describe the algorithm, I'm not sure that this is important at all to know. At search time, for a given cell, it needs to "relate" to the query shape to know if the query shape full contains the cell, does not overlap with the cell, or partially overlaps. The GeoArea interface gives you all of that, which is why I wanted to implement objects that aren't limited to the surface but do implement GeoArea. But I don't understand the "degenerate in X/Y" Area objects... it seems like GeoArea would be used for the lat/lon "approximation" (outer bounding box?) to a GeoShape? Seems like it's better if we can do all functions using proper earth-surface shapes? GeoArea objects are not constrained to be surface objects. Right now the only implementers of GeoArea are bounding boxes, bounded in latitude and longitude, but that's merely due to lack of need for anything else. The relationship types GeoArea objects can determine are against general GeoShape objects (which are surface objects), so the semantics are perfect for what you are trying to do. You can determine whether a lat/lon rectangle overlaps, contains, is within, or doesn't intersect at all with, any arbitrary spatial3d surface shape. But let's be clear: for BKD in the x,y,z space, GeoArea objects bounding in lat/lon are useless. So we need to invent GeoArea objects that represent 3d rectangles. If we try to talk about a general XYZ-bounded area, then there will be up to two bounds for X (min and max), two bounds for Y (min and max), and two bounds for Z (min and max). The degenerate cases come into play when min-X == max-X, or min-Y == max-Y, etc, or when there is no min-X bound but just a max-X one, for instance. This can be conditional logic but the whole thing is faster and more efficient if there's an object for each funky case. Hope this helps. Michael McCandless If this all is clear now, and you want to proceed, let me know and I can start working on it tonight.
          Hide
          Nicholas Knize added a comment - - edited

          I had thrown this code together a while ago before I put the GeoPointField / Geo3d integration work on hold. This rough draft converts 3D LLA coordinates into ECEF cartesian coordinates using the WGS84 based non-iterative approach in GeoProjectionUtils (derived from the conversion approach illustrated in the Manual of Photogrammetry using 2 sin/cos and 1 sqrt). The ECEF cartesian coordinate is scaled to a unit spheroid (since this is presumably what Geo3D requires) and each of the 32 bits are interleaved (XYZ) akin to MortonEncoding. Decoding procedures are also provided. While this encoding is not nicely represented as a long (not yet convinced 21 bits per will preserve "acceptable" precision, though we could allocate bits differently since larger altitudes degrade w/ conversion) it is BinaryDocValue friendly and enables a space partitioning/prefix coded approach similar to the way GeoPointField currently works. The Most-Significant 3 bits represent an Oct-Cell at the first level, next 3 for level 2, etc.

          Show
          Nicholas Knize added a comment - - edited I had thrown this code together a while ago before I put the GeoPointField / Geo3d integration work on hold. This rough draft converts 3D LLA coordinates into ECEF cartesian coordinates using the WGS84 based non-iterative approach in GeoProjectionUtils (derived from the conversion approach illustrated in the Manual of Photogrammetry using 2 sin/cos and 1 sqrt). The ECEF cartesian coordinate is scaled to a unit spheroid (since this is presumably what Geo3D requires) and each of the 32 bits are interleaved (XYZ) akin to MortonEncoding. Decoding procedures are also provided. While this encoding is not nicely represented as a long (not yet convinced 21 bits per will preserve "acceptable" precision, though we could allocate bits differently since larger altitudes degrade w/ conversion) it is BinaryDocValue friendly and enables a space partitioning/prefix coded approach similar to the way GeoPointField currently works. The Most-Significant 3 bits represent an Oct-Cell at the first level, next 3 for level 2, etc.
          Hide
          Michael McCandless added a comment -

          Instead, you are chopping up the space that the entire world lives in (not just its surface).

          Right, but, that approach seems wasteful? I mean it should work correctly, but as you said some cells will span strange parts of the surface. It should work fine but seems inefficient ...

          Like, if there is a way instead to make smaller and smaller surface shapes in the BKD tree that would be best? I.e. fix BKD to operate entirely on the surface ...

          Michael McCandless If this all is clear now, and you want to proceed, let me know and I can start working on it tonight.

          Thank you for being so eager and volunteering here but maybe wait until we figure out the likely best approach here! I am still a newbie/lost in the details... I lack intuition.

          Show
          Michael McCandless added a comment - Instead, you are chopping up the space that the entire world lives in (not just its surface). Right, but, that approach seems wasteful? I mean it should work correctly, but as you said some cells will span strange parts of the surface. It should work fine but seems inefficient ... Like, if there is a way instead to make smaller and smaller surface shapes in the BKD tree that would be best? I.e. fix BKD to operate entirely on the surface ... Michael McCandless If this all is clear now, and you want to proceed, let me know and I can start working on it tonight. Thank you for being so eager and volunteering here but maybe wait until we figure out the likely best approach here! I am still a newbie/lost in the details... I lack intuition.
          Hide
          Karl Wright added a comment -

          Right, but, that approach seems wasteful? I mean it should work correctly, but as you said some cells will span strange parts of the surface. It should work fine but seems inefficient ... Like, if there is a way instead to make smaller and smaller surface shapes in the BKD tree that would be best? I.e. fix BKD to operate entirely on the surface ...

          So, if you turn the BKD descent into something other than (x,y,z), it would imply that you store points for records in something other than (x,y,z) too, no? I am unsure whether the additional cost of representing records with, say, lat/lon, and doing the conversion to (x,y,z) at search time for every records encountered would be more or less expensive than having a somewhat wasteful representation for the tree itself. I expect the conversion for every record would turn out to be more expensive, but maybe this is something we'd just have to try. The obvious alternative would be to store just lat/lon, exactly like is done for Nicholas's code, and use the same BKD tree. The only difference would be: (a) you'd need to construct a GeoBBox, using GeoBBoxFactory, representing your area, and then use getRelationship() to determine intersection, contains, within, or disjoint for the shape, and (b) you'd have to create a GeoPoint at search time for every record encountered.

          Show
          Karl Wright added a comment - Right, but, that approach seems wasteful? I mean it should work correctly, but as you said some cells will span strange parts of the surface. It should work fine but seems inefficient ... Like, if there is a way instead to make smaller and smaller surface shapes in the BKD tree that would be best? I.e. fix BKD to operate entirely on the surface ... So, if you turn the BKD descent into something other than (x,y,z), it would imply that you store points for records in something other than (x,y,z) too, no? I am unsure whether the additional cost of representing records with, say, lat/lon, and doing the conversion to (x,y,z) at search time for every records encountered would be more or less expensive than having a somewhat wasteful representation for the tree itself. I expect the conversion for every record would turn out to be more expensive, but maybe this is something we'd just have to try. The obvious alternative would be to store just lat/lon, exactly like is done for Nicholas's code, and use the same BKD tree. The only difference would be: (a) you'd need to construct a GeoBBox, using GeoBBoxFactory, representing your area, and then use getRelationship() to determine intersection, contains, within, or disjoint for the shape, and (b) you'd have to create a GeoPoint at search time for every record encountered.
          Hide
          Nicholas Knize added a comment - - edited

          The obvious alternative would be to store just lat/lon, exactly like is done for Nicholas's code

          Just to clarify, so there's no confusion, the attached code converts from lat/lon to ECEF (either full range in meters, or scaled to the unit spheroid) and stores the result in a 96 bit BitSet. GeoPointField stores encoded lat/lon (if that's what you were referring to). The attached was intended to be used for a Geo3d integration w/ GeoPointField.

          I expect the conversion for every record would turn out to be more expensive.

          Just to note again, the conversion from lat/lon to ECEF XYZ is a non-iterative conversion (2 sines, 2 cosines, 1 sqrt). Conversion cost vs. wateful representation is the interesting question, so I threw together a quick encoding/decoding benchmark (on my i7 16GB System76) and ran it on 100M points (converting from lla to ECEF and back). The law of large numbers took over at around 35M. For interest the results are provided:

          Avg computation: 620.4319757666667 ns  Trials: 30000000  Total time: 18612.959273 ms
          Avg computation: 621.3008362285715 ns  Trials: 35000000  Total time: 21745.529268 ms
          Avg computation: 621.582647925 ns  Trials: 40000000  Total time: 24863.305917 ms
          Avg computation: 621.3724589555555 ns  Trials: 45000000  Total time: 27961.760653 ms
          Avg computation: 621.16271364 ns  Trials: 50000000  Total time: 31058.135682 ms
          Avg computation: 621.3857686909091 ns  Trials: 55000000  Total time: 34176.217278 ms
          Avg computation: 621.8110524 ns  Trials: 60000000  Total time: 37308.663144 ms
          Avg computation: 621.6768083230769 ns  Trials: 65000000  Total time: 40408.992541 ms
          Avg computation: 621.5324306714285 ns  Trials: 70000000  Total time: 43507.270147 ms
          Avg computation: 621.4440536933333 ns  Trials: 75000000  Total time: 46608.304027 ms
          Avg computation: 621.7594845875 ns  Trials: 80000000  Total time: 49740.758767 ms
          Avg computation: 621.9327540705882 ns  Trials: 85000000  Total time: 52864.284096 ms
          Avg computation: 621.9429434555556 ns  Trials: 90000000  Total time: 55974.864911 ms
          Avg computation: 621.8868688947368 ns  Trials: 95000000  Total time: 59079.252545 ms
          Avg computation: 621.98037608 ns  Trials: 100000000  Total time: 62198.037608 ms
          

          Again, those are both conversions to ECEF and back to LLA. So roughly 1 minute for 100M points. Halve those numbers and you have the cost for converting either direction.

          Next we could benchmark tree access and compare? I suspect traversal cost should be a function of the on-disk representation and chosen split method (that's where RTrees tend to become costly). Since BKD splits a sorted space, and it can exploit file system cache? I suspect there aren't many random seeks? Seems benchmarking might be relatively straightforward?

          Show
          Nicholas Knize added a comment - - edited The obvious alternative would be to store just lat/lon, exactly like is done for Nicholas's code Just to clarify, so there's no confusion, the attached code converts from lat/lon to ECEF (either full range in meters, or scaled to the unit spheroid) and stores the result in a 96 bit BitSet. GeoPointField stores encoded lat/lon (if that's what you were referring to). The attached was intended to be used for a Geo3d integration w/ GeoPointField. I expect the conversion for every record would turn out to be more expensive. Just to note again, the conversion from lat/lon to ECEF XYZ is a non-iterative conversion (2 sines, 2 cosines, 1 sqrt). Conversion cost vs. wateful representation is the interesting question, so I threw together a quick encoding/decoding benchmark (on my i7 16GB System76) and ran it on 100M points (converting from lla to ECEF and back). The law of large numbers took over at around 35M. For interest the results are provided: Avg computation: 620.4319757666667 ns Trials: 30000000 Total time: 18612.959273 ms Avg computation: 621.3008362285715 ns Trials: 35000000 Total time: 21745.529268 ms Avg computation: 621.582647925 ns Trials: 40000000 Total time: 24863.305917 ms Avg computation: 621.3724589555555 ns Trials: 45000000 Total time: 27961.760653 ms Avg computation: 621.16271364 ns Trials: 50000000 Total time: 31058.135682 ms Avg computation: 621.3857686909091 ns Trials: 55000000 Total time: 34176.217278 ms Avg computation: 621.8110524 ns Trials: 60000000 Total time: 37308.663144 ms Avg computation: 621.6768083230769 ns Trials: 65000000 Total time: 40408.992541 ms Avg computation: 621.5324306714285 ns Trials: 70000000 Total time: 43507.270147 ms Avg computation: 621.4440536933333 ns Trials: 75000000 Total time: 46608.304027 ms Avg computation: 621.7594845875 ns Trials: 80000000 Total time: 49740.758767 ms Avg computation: 621.9327540705882 ns Trials: 85000000 Total time: 52864.284096 ms Avg computation: 621.9429434555556 ns Trials: 90000000 Total time: 55974.864911 ms Avg computation: 621.8868688947368 ns Trials: 95000000 Total time: 59079.252545 ms Avg computation: 621.98037608 ns Trials: 100000000 Total time: 62198.037608 ms Again, those are both conversions to ECEF and back to LLA. So roughly 1 minute for 100M points. Halve those numbers and you have the cost for converting either direction. Next we could benchmark tree access and compare? I suspect traversal cost should be a function of the on-disk representation and chosen split method (that's where RTrees tend to become costly). Since BKD splits a sorted space, and it can exploit file system cache? I suspect there aren't many random seeks? Seems benchmarking might be relatively straightforward?
          Hide
          Karl Wright added a comment -

          Just to note again, the conversion from lat/lon to ECEF XYZ is a non-iterative conversion (2 sines, 2 cosines, 1 sqrt).

          Right, but you'd be comparing 2 sines, 2 cosines, and 1 sqrt against only the cost of unpacking, which for 1 billion records would be a noticeable amount of time.

          Since BKD splits a sorted space, and it can exploit file system cache? I suspect there aren't many random seeks? Seems benchmarking might be relatively straightforward?

          Many Lucene systems (including ours) store the index in memory, using memory mapping, and SSDs would be expected too even when not, so I'd think a benchmarking should not overemphasize seeks as being costly. But Mike is the expert as far as that is concerned.

          Show
          Karl Wright added a comment - Just to note again, the conversion from lat/lon to ECEF XYZ is a non-iterative conversion (2 sines, 2 cosines, 1 sqrt). Right, but you'd be comparing 2 sines, 2 cosines, and 1 sqrt against only the cost of unpacking, which for 1 billion records would be a noticeable amount of time. Since BKD splits a sorted space, and it can exploit file system cache? I suspect there aren't many random seeks? Seems benchmarking might be relatively straightforward? Many Lucene systems (including ours) store the index in memory, using memory mapping, and SSDs would be expected too even when not, so I'd think a benchmarking should not overemphasize seeks as being costly. But Mike is the expert as far as that is concerned.
          Hide
          Nicholas Knize added a comment - - edited

          Right, but you'd be comparing 2 sines, 2 cosines, and 1 sqrt against only the cost of unpacking

          Encoding/Decoding ECEF into 96 Bits:

          Avg computation: 664.6969666857143 ns  Trials: 35000000  Total time: 23264.393834 ms
          Avg computation: 664.829008375 ns  Trials: 40000000  Total time: 26593.160335 ms
          Avg computation: 667.3625471333334 ns  Trials: 45000000  Total time: 30031.314621 ms
          Avg computation: 668.46880436 ns  Trials: 50000000  Total time: 33423.440218 ms
          Avg computation: 667.8703028909091 ns  Trials: 55000000  Total time: 36732.866659 ms
          Avg computation: 669.3753888666666 ns  Trials: 60000000  Total time: 40162.523332 ms
          Avg computation: 668.4362739230769 ns  Trials: 65000000  Total time: 43448.357805 ms
          Avg computation: 667.9539851 ns  Trials: 70000000  Total time: 46756.778957 ms
          Avg computation: 667.3638297333333 ns  Trials: 75000000  Total time: 50052.28723 ms
          Avg computation: 675.024778375 ns  Trials: 80000000  Total time: 54001.98227 ms
          Avg computation: 674.1673578352941 ns  Trials: 85000000  Total time: 57304.225416 ms
          Avg computation: 673.4723439777778 ns  Trials: 90000000  Total time: 60612.510958 ms
          Avg computation: 673.0372402842105 ns  Trials: 95000000  Total time: 63938.537827 ms
          Avg computation: 672.55224382 ns  Trials: 100000000  Total time: 67255.224382 ms
          

          Compared to packing/unpacking lat/lon into 64 bits using using GeoPointField morton bit twiddling:

          Avg computation: 60.397136 ns  Trials: 35000000  Total time: 2113.89976 ms
          Avg computation: 61.6391708 ns  Trials: 40000000  Total time: 2465.566832 ms
          Avg computation: 62.744074222222224 ns  Trials: 45000000  Total time: 2823.48334 ms
          Avg computation: 63.51111108 ns  Trials: 50000000  Total time: 3175.555554 ms
          Avg computation: 64.18207294545455 ns  Trials: 55000000  Total time: 3530.014012 ms
          Avg computation: 64.73684656666667 ns  Trials: 60000000  Total time: 3884.210794 ms
          Avg computation: 65.18073341538461 ns  Trials: 65000000  Total time: 4236.747672 ms
          Avg computation: 65.5902512 ns  Trials: 70000000  Total time: 4591.317584 ms
          Avg computation: 65.02902253333333 ns  Trials: 75000000  Total time: 4877.17669 ms
          Avg computation: 63.6249806 ns  Trials: 80000000  Total time: 5089.998448 ms
          Avg computation: 62.4193206 ns  Trials: 85000000  Total time: 5305.642251 ms
          Avg computation: 61.344433977777776 ns  Trials: 90000000  Total time: 5520.999058 ms
          Avg computation: 61.402236642105265 ns  Trials: 95000000  Total time: 5833.212481 ms
          Avg computation: 61.10019762 ns  Trials: 100000000  Total time: 6110.019762 ms
          

          So using the 3D BitSet approach is 10 times longer, with the obvious culprit being the for loop for each bit. This can be optimized, though, using a 3-way bit twiddle and 2 longs if a 64 bit 3D packing yields unacceptable loss of precision.

          so I'd think a benchmarking should not overemphasize seeks as being costly

          Maybe not. But it does depend on the on-disk representation of the tree (and I typically don't use SSDs as an excuse for not paying attention to good file layout). I was mentioning this in the context of index size as a function of a "wasteful" vs. efficient encoding.

          Show
          Nicholas Knize added a comment - - edited Right, but you'd be comparing 2 sines, 2 cosines, and 1 sqrt against only the cost of unpacking Encoding/Decoding ECEF into 96 Bits: Avg computation: 664.6969666857143 ns Trials: 35000000 Total time: 23264.393834 ms Avg computation: 664.829008375 ns Trials: 40000000 Total time: 26593.160335 ms Avg computation: 667.3625471333334 ns Trials: 45000000 Total time: 30031.314621 ms Avg computation: 668.46880436 ns Trials: 50000000 Total time: 33423.440218 ms Avg computation: 667.8703028909091 ns Trials: 55000000 Total time: 36732.866659 ms Avg computation: 669.3753888666666 ns Trials: 60000000 Total time: 40162.523332 ms Avg computation: 668.4362739230769 ns Trials: 65000000 Total time: 43448.357805 ms Avg computation: 667.9539851 ns Trials: 70000000 Total time: 46756.778957 ms Avg computation: 667.3638297333333 ns Trials: 75000000 Total time: 50052.28723 ms Avg computation: 675.024778375 ns Trials: 80000000 Total time: 54001.98227 ms Avg computation: 674.1673578352941 ns Trials: 85000000 Total time: 57304.225416 ms Avg computation: 673.4723439777778 ns Trials: 90000000 Total time: 60612.510958 ms Avg computation: 673.0372402842105 ns Trials: 95000000 Total time: 63938.537827 ms Avg computation: 672.55224382 ns Trials: 100000000 Total time: 67255.224382 ms Compared to packing/unpacking lat/lon into 64 bits using using GeoPointField morton bit twiddling: Avg computation: 60.397136 ns Trials: 35000000 Total time: 2113.89976 ms Avg computation: 61.6391708 ns Trials: 40000000 Total time: 2465.566832 ms Avg computation: 62.744074222222224 ns Trials: 45000000 Total time: 2823.48334 ms Avg computation: 63.51111108 ns Trials: 50000000 Total time: 3175.555554 ms Avg computation: 64.18207294545455 ns Trials: 55000000 Total time: 3530.014012 ms Avg computation: 64.73684656666667 ns Trials: 60000000 Total time: 3884.210794 ms Avg computation: 65.18073341538461 ns Trials: 65000000 Total time: 4236.747672 ms Avg computation: 65.5902512 ns Trials: 70000000 Total time: 4591.317584 ms Avg computation: 65.02902253333333 ns Trials: 75000000 Total time: 4877.17669 ms Avg computation: 63.6249806 ns Trials: 80000000 Total time: 5089.998448 ms Avg computation: 62.4193206 ns Trials: 85000000 Total time: 5305.642251 ms Avg computation: 61.344433977777776 ns Trials: 90000000 Total time: 5520.999058 ms Avg computation: 61.402236642105265 ns Trials: 95000000 Total time: 5833.212481 ms Avg computation: 61.10019762 ns Trials: 100000000 Total time: 6110.019762 ms So using the 3D BitSet approach is 10 times longer, with the obvious culprit being the for loop for each bit. This can be optimized, though, using a 3-way bit twiddle and 2 longs if a 64 bit 3D packing yields unacceptable loss of precision. so I'd think a benchmarking should not overemphasize seeks as being costly Maybe not. But it does depend on the on-disk representation of the tree (and I typically don't use SSDs as an excuse for not paying attention to good file layout). I was mentioning this in the context of index size as a function of a "wasteful" vs. efficient encoding.
          Hide
          Karl Wright added a comment -

          If the decision is made to use (x,y,z) encoding rather than (lat,lon), then FWIW I think the right way forward with spatial3d would be to extend the current Bounds infrastructure to allow the finding of x,y,z bounds for a shape, rather than just lat/lon. I've thought about this for a day and have concluded it would be far and away the fastest solution, since computing the Bounds for any shape would need to be done only once.

          Once x, y, and z bounds are known for a shape, the BKD code itself can make most of its decisions itself, without much computation at all, for most of the recursive steps. Only when descending within the computed bounds would it be necessary to construct a GeoArea XYZArea object to determine the exact relationship between a 3d rectangle and the specified GeoShape.

          Show
          Karl Wright added a comment - If the decision is made to use (x,y,z) encoding rather than (lat,lon), then FWIW I think the right way forward with spatial3d would be to extend the current Bounds infrastructure to allow the finding of x,y,z bounds for a shape, rather than just lat/lon. I've thought about this for a day and have concluded it would be far and away the fastest solution, since computing the Bounds for any shape would need to be done only once. Once x, y, and z bounds are known for a shape, the BKD code itself can make most of its decisions itself, without much computation at all, for most of the recursive steps. Only when descending within the computed bounds would it be necessary to construct a GeoArea XYZArea object to determine the exact relationship between a 3d rectangle and the specified GeoShape.
          Hide
          Michael McCandless added a comment -

          I can't follow all the spatial-heavy discussion here, but:

          Since BKD splits a sorted space, and it can exploit file system cache?

          BKD tree's "tree" (the internal nodes) is already forced into heap, at least in the current impl. So walking that tree should always be fast, and then there's only seeking once we hit the leaf nodes. But that seeking should be sequential walk through the file...

          If the decision is made to use (x,y,z) encoding rather than (lat,lon),

          So, if you turn the BKD descent into something other than (x,y,z), it would imply that you store points for records in something other than (x,y,z) too, no?

          I think the encoding for each point's value can be different from what the BKD tree uses?

          I suspect the points shoudl use (x,y,z) encoding when stored in doc-values per document, because when we need to filter hits for the BKD leaf cells overlapping the shape's boundary, we want to take advantage of the fast math in (x,y,z) space that spatial3d gives us?

          But then, for the inner-nodes (split values) for the BKD tree, we could use either (lat,lon) or (x,y,z), whichever gives us fastest shape relation computations? At each step in the recursion, BKD tree must check a split in one dimension against the query shape.

          Show
          Michael McCandless added a comment - I can't follow all the spatial-heavy discussion here, but: Since BKD splits a sorted space, and it can exploit file system cache? BKD tree's "tree" (the internal nodes) is already forced into heap, at least in the current impl. So walking that tree should always be fast, and then there's only seeking once we hit the leaf nodes. But that seeking should be sequential walk through the file... If the decision is made to use (x,y,z) encoding rather than (lat,lon), So, if you turn the BKD descent into something other than (x,y,z), it would imply that you store points for records in something other than (x,y,z) too, no? I think the encoding for each point's value can be different from what the BKD tree uses? I suspect the points shoudl use (x,y,z) encoding when stored in doc-values per document, because when we need to filter hits for the BKD leaf cells overlapping the shape's boundary, we want to take advantage of the fast math in (x,y,z) space that spatial3d gives us? But then, for the inner-nodes (split values) for the BKD tree, we could use either (lat,lon) or (x,y,z), whichever gives us fastest shape relation computations? At each step in the recursion, BKD tree must check a split in one dimension against the query shape.
          Hide
          Karl Wright added a comment -

          Patch to geo3d creating an XYZArea object, which can be used to determine intersection with any GeoShape. This is a preliminary patch; I will be including unit tests as well as a more general class of XYZArea degenerate objects if that should prove necessary.

          Show
          Karl Wright added a comment - Patch to geo3d creating an XYZArea object, which can be used to determine intersection with any GeoShape. This is a preliminary patch; I will be including unit tests as well as a more general class of XYZArea degenerate objects if that should prove necessary.
          Hide
          Michael McCandless added a comment -

          Thanks Karl Wright, I'll try to generalize BKD to 3 dimensions soon ...

          Show
          Michael McCandless added a comment - Thanks Karl Wright , I'll try to generalize BKD to 3 dimensions soon ...
          Hide
          Karl Wright added a comment -

          Michael McCandless: Turns out that my class is having problems properly relating to GeoWorld. I'm going to have to make some modifications to make things play nicely. Stay tuned.

          Show
          Karl Wright added a comment - Michael McCandless : Turns out that my class is having problems properly relating to GeoWorld. I'm going to have to make some modifications to make things play nicely. Stay tuned.
          Hide
          Karl Wright added a comment -

          Revised patch that fixes some of the corner cases, especially interactions with the GeoWorld shape.

          Show
          Karl Wright added a comment - Revised patch that fixes some of the corner cases, especially interactions with the GeoWorld shape.
          Hide
          Michael McCandless added a comment -

          Initial dirty patch, but its testBasic and testRandom seem to be passing! I just made a 3D version of BKD, and the random test tests across the full cube of +/- 1.022.

          I did a naive double -> int encoding for each dimension, storing in 12 byte binary DV per doc. Multi-valued docs aren't supported yet but I think it shouldn't be too hard.

          There are tons of nocommits because I don't know how to tie into the geo3d APIs...

          I think we should make a branch here, commit the two patches, and then iterate?

          Show
          Michael McCandless added a comment - Initial dirty patch, but its testBasic and testRandom seem to be passing! I just made a 3D version of BKD, and the random test tests across the full cube of +/- 1.022. I did a naive double -> int encoding for each dimension, storing in 12 byte binary DV per doc. Multi-valued docs aren't supported yet but I think it shouldn't be too hard. There are tons of nocommits because I don't know how to tie into the geo3d APIs... I think we should make a branch here, commit the two patches, and then iterate?
          Hide
          Karl Wright added a comment -

          If you create a branch I can generate patches against it.

          Show
          Karl Wright added a comment - If you create a branch I can generate patches against it.
          Hide
          ASF subversion and git services added a comment -

          Commit 1695368 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1695368 ]

          LUCENE-6699: make branch

          Show
          ASF subversion and git services added a comment - Commit 1695368 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1695368 ] LUCENE-6699 : make branch
          Hide
          ASF subversion and git services added a comment -

          Commit 1695369 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1695369 ]

          LUCENE-6699: Karl's patch to extend geo3d apis to 3d rectangles

          Show
          ASF subversion and git services added a comment - Commit 1695369 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1695369 ] LUCENE-6699 : Karl's patch to extend geo3d apis to 3d rectangles
          Hide
          ASF subversion and git services added a comment -

          Commit 1695370 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1695370 ]

          LUCENE-6699: initial 3D BKD implementation

          Show
          ASF subversion and git services added a comment - Commit 1695370 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1695370 ] LUCENE-6699 : initial 3D BKD implementation
          Hide
          Karl Wright added a comment -

          Ok, for a start – the way you get X, Y, and Z ranges for a given planet model is via PlanetModel.getMinimumX(), getMaximumX(), getMinimumY(), getMaximumY(), getMinimumZ(), getMaximumZ(). The GeoShape interface does not provide a means of obtaining the PlanetModel, so you will need to pass this in to your constructor in addition to what you currently have.

          Second, the following code:

          +                                             double x = BKD3DTreeDocValuesFormat.decodeValue(BKD3DTreeDocValuesFormat.readInt(bytes.bytes, bytes.offset));
          +                                             double y = BKD3DTreeDocValuesFormat.decodeValue(BKD3DTreeDocValuesFormat.readInt(bytes.bytes, bytes.offset+4));
          +                                             double z = BKD3DTreeDocValuesFormat.decodeValue(BKD3DTreeDocValuesFormat.readInt(bytes.bytes, bytes.offset+8));
          +                                             //return GeoUtils.pointInPolygon(polyLons, polyLats, lat, lon);
          +                                             // nocommit fixme!
          +                                             return true;
          

          ... should call GeoShape.isWithin(x,y,z) to determine membership within the shape.

          Finally,

          +                                           public BKD3DTreeReader.Relation compare(int xMin, int xMax, int yMin, int yMax, int zMin, int zMax) {
          +                                             // nocommit fixme!
          +                                             return BKD3DTreeReader.Relation.INSIDE;
          +                                           }
          

          ... should do the following:

          GeoArea xyzSolid = new XYZSolid(planetModel, xMin, xMax, yMin, yMax, zMin, zMax);
          return xyzSolid.getRelationship(geoShape) == GeoArea.<WHATEVER>?xxx:yyy
          
          Show
          Karl Wright added a comment - Ok, for a start – the way you get X, Y, and Z ranges for a given planet model is via PlanetModel.getMinimumX(), getMaximumX(), getMinimumY(), getMaximumY(), getMinimumZ(), getMaximumZ(). The GeoShape interface does not provide a means of obtaining the PlanetModel, so you will need to pass this in to your constructor in addition to what you currently have. Second, the following code: + double x = BKD3DTreeDocValuesFormat.decodeValue(BKD3DTreeDocValuesFormat.readInt(bytes.bytes, bytes.offset)); + double y = BKD3DTreeDocValuesFormat.decodeValue(BKD3DTreeDocValuesFormat.readInt(bytes.bytes, bytes.offset+4)); + double z = BKD3DTreeDocValuesFormat.decodeValue(BKD3DTreeDocValuesFormat.readInt(bytes.bytes, bytes.offset+8)); + // return GeoUtils.pointInPolygon(polyLons, polyLats, lat, lon); + // nocommit fixme! + return true ; ... should call GeoShape.isWithin(x,y,z) to determine membership within the shape. Finally, + public BKD3DTreeReader.Relation compare( int xMin, int xMax, int yMin, int yMax, int zMin, int zMax) { + // nocommit fixme! + return BKD3DTreeReader.Relation.INSIDE; + } ... should do the following: GeoArea xyzSolid = new XYZSolid(planetModel, xMin, xMax, yMin, yMax, zMin, zMax); return xyzSolid.getRelationship(geoShape) == GeoArea.<WHATEVER>?xxx:yyy
          Hide
          Michael McCandless added a comment -

          OK I created the branch and committed our two latest patches!

          https://svn.apache.org/repos/asf/lucene/dev/branches/lucene6699

          Show
          Michael McCandless added a comment - OK I created the branch and committed our two latest patches! https://svn.apache.org/repos/asf/lucene/dev/branches/lucene6699
          Hide
          Karl Wright added a comment -

          ok, I'll try to work my proposed changes into them. The biggest thing, though, is that we need access to a PlanetModel instance inside the compare() method. So some plumbing will be necessary to set that up. Stay tuned.

          Show
          Karl Wright added a comment - ok, I'll try to work my proposed changes into them. The biggest thing, though, is that we need access to a PlanetModel instance inside the compare() method. So some plumbing will be necessary to set that up. Stay tuned.
          Hide
          Michael McCandless added a comment -

          Thanks Karl Wright, I started on one part, lemme go commit so we don't stomp on each other!

          Show
          Michael McCandless added a comment - Thanks Karl Wright , I started on one part, lemme go commit so we don't stomp on each other!
          Hide
          ASF subversion and git services added a comment -

          Commit 1695377 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1695377 ]

          LUCENE-6699: fold in some feedback

          Show
          ASF subversion and git services added a comment - Commit 1695377 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1695377 ] LUCENE-6699 : fold in some feedback
          Hide
          Michael McCandless added a comment -

          OK I committed my change, hack away!

          Show
          Michael McCandless added a comment - OK I committed my change, hack away!
          Hide
          Michael McCandless added a comment -

          So some plumbing will be necessary to set that up.

          Ahh yes, now they are static methods. So I guess this means you must pass PlanetModel to the doc values format, and to the query.

          Show
          Michael McCandless added a comment - So some plumbing will be necessary to set that up. Ahh yes, now they are static methods. So I guess this means you must pass PlanetModel to the doc values format, and to the query.
          Hide
          Karl Wright added a comment -

          One other question: what should I return for the four cases listed below?

                                                     @Override
                                                     public BKD3DTreeReader.Relation compare(int xMin, int xMax, int yMin, int yMax, int zMin, int zMax) {
                                                       final GeoArea xyzSolid = new XYZSolid(planetModel, xMin, xMax, yMin, yMax, zMin, zMax);
                                                       final int relationship = xyzSolid.getRelationship(shape);
                                                       switch (relationship) {
                                                       case GeoArea.WITHIN:
                                                         // nocommit: shape is within xyzsolid
                                                         return BKD3DTreeReader.Relation.INSIDE;
                                                       case GeoArea.CONTAINS:
                                                         // nocommit: shape contains xyzsolid
                                                         return BKD3DTreeReader.Relation.INSIDE;
                                                       case GeoArea.OVERLAPS:
                                                         // nocommit: shape overlaps xyzsolid
                                                         return BKD3DTreeReader.Relation.INSIDE;
                                                       case GeoArea.DISJOINT:
                                                         // nocommit: shape has nothing to do with xyzsolid
                                                         return BKD3DTreeReader.Relation.INSIDE;
                                                       default:
                                                         throw new RuntimeException("Unexpected result value from getRelationship(): "+relationship);
                                                       }
                                                     }
                                                   });
          
          Show
          Karl Wright added a comment - One other question: what should I return for the four cases listed below? @Override public BKD3DTreeReader.Relation compare( int xMin, int xMax, int yMin, int yMax, int zMin, int zMax) { final GeoArea xyzSolid = new XYZSolid(planetModel, xMin, xMax, yMin, yMax, zMin, zMax); final int relationship = xyzSolid.getRelationship(shape); switch (relationship) { case GeoArea.WITHIN: // nocommit: shape is within xyzsolid return BKD3DTreeReader.Relation.INSIDE; case GeoArea.CONTAINS: // nocommit: shape contains xyzsolid return BKD3DTreeReader.Relation.INSIDE; case GeoArea.OVERLAPS: // nocommit: shape overlaps xyzsolid return BKD3DTreeReader.Relation.INSIDE; case GeoArea.DISJOINT: // nocommit: shape has nothing to do with xyzsolid return BKD3DTreeReader.Relation.INSIDE; default : throw new RuntimeException( "Unexpected result value from getRelationship(): " +relationship); } } });
          Hide
          Karl Wright added a comment -

          And Michael McCandless, I just ran into something else. tree.intersect() accepts only int values at the moment, but x,y,z are doubles in the range of roughly -1.0 to 1.0, and need to be treated as such. It looks like the integer stuff goes fairly deep into the BKD3DTreeReader code. Question: If I embark on turning these all into doubles, what kinds of problems will I have?

          Show
          Karl Wright added a comment - And Michael McCandless , I just ran into something else. tree.intersect() accepts only int values at the moment, but x,y,z are doubles in the range of roughly -1.0 to 1.0, and need to be treated as such. It looks like the integer stuff goes fairly deep into the BKD3DTreeReader code. Question: If I embark on turning these all into doubles, what kinds of problems will I have?
          Hide
          Nicholas Knize added a comment -

          Wait... PlanetModel just gives WGS84 parameters along with the same Vincenty distance formula provided by GeoDistanceUtils in sandbox (which can be quite an expensive convergence, btw). So why not just use the static calls in sandbox and save passing around a PlanetModel object? You can also use the attached patch I provided which provides static calls to convert to/from unit spheroid and lat/lon.

          Show
          Nicholas Knize added a comment - Wait... PlanetModel just gives WGS84 parameters along with the same Vincenty distance formula provided by GeoDistanceUtils in sandbox (which can be quite an expensive convergence, btw). So why not just use the static calls in sandbox and save passing around a PlanetModel object? You can also use the attached patch I provided which provides static calls to convert to/from unit spheroid and lat/lon.
          Hide
          Karl Wright added a comment -

          Hi Nicholas,

          There's no cost to instantiating PlanetModel.SPHERE or PlanetModel.WGS84; they're already instantiated. I don't understand why you think the cost would be high to pass this object in to the query or the writer? It seems to me to be dirt cheap, and just as efficient as passing in something else to distinguish between various planet models.

          Show
          Karl Wright added a comment - Hi Nicholas, There's no cost to instantiating PlanetModel.SPHERE or PlanetModel.WGS84; they're already instantiated. I don't understand why you think the cost would be high to pass this object in to the query or the writer? It seems to me to be dirt cheap, and just as efficient as passing in something else to distinguish between various planet models.
          Hide
          Nicholas Knize added a comment -

          The biggest thing, though, is that we need access to a PlanetModel instance inside the compare()

          I didn't say anything about cost. It was a matter of saving work, making maintenance less of a nightmare, de-duping code and using what's already available.

          Show
          Nicholas Knize added a comment - The biggest thing, though, is that we need access to a PlanetModel instance inside the compare() I didn't say anything about cost. It was a matter of saving work, making maintenance less of a nightmare, de-duping code and using what's already available.
          Hide
          Michael McCandless added a comment -

          tree.intersect() accepts only int values at the moment, but x,y,z are doubles

          Wait, this was by design (having BKD operate only on int): the encoding of double -> int should happen outside BKD.

          I'm assuming 32 bits precision for each dimension is enough?

          Show
          Michael McCandless added a comment - tree.intersect() accepts only int values at the moment, but x,y,z are doubles Wait, this was by design (having BKD operate only on int): the encoding of double -> int should happen outside BKD. I'm assuming 32 bits precision for each dimension is enough?
          Hide
          Michael McCandless added a comment -

          One other question: what should I return for the four cases listed below?

          Oh this was the part I committed already ... if you svn up do you see conflicts?

          Show
          Michael McCandless added a comment - One other question: what should I return for the four cases listed below? Oh this was the part I committed already ... if you svn up do you see conflicts?
          Hide
          Karl Wright added a comment -

          Thanks, that clarifies. Stay tuned.

          Show
          Karl Wright added a comment - Thanks, that clarifies. Stay tuned.
          Hide
          Michael McCandless added a comment -

          It seems like we need PlanetModel at query time, for the XYZSolid ctor, and also at indexing time, to know the full range for x, y, z during the encode of double -> int (hmm and also at query time for the decode from int -> double, so we can do the per-hit filtering on the boundary cells).

          Show
          Michael McCandless added a comment - It seems like we need PlanetModel at query time, for the XYZSolid ctor, and also at indexing time, to know the full range for x, y, z during the encode of double -> int (hmm and also at query time for the decode from int -> double, so we can do the per-hit filtering on the boundary cells).
          Hide
          Karl Wright added a comment -

          New patch which finishes some of the remaining "fix mes"

          Show
          Karl Wright added a comment - New patch which finishes some of the remaining "fix mes"
          Hide
          Karl Wright added a comment -

          Michael McCandless see the new patch

          Show
          Karl Wright added a comment - Michael McCandless see the new patch
          Hide
          ASF subversion and git services added a comment -

          Commit 1695400 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1695400 ]

          LUCENE-6699: iterate

          Show
          ASF subversion and git services added a comment - Commit 1695400 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1695400 ] LUCENE-6699 : iterate
          Hide
          Michael McCandless added a comment -

          Thanks Karl Wright, I committed.

          Hmm the minX/maxX etc. in the query was supposed to be for the query shape, not for the planet (i.e., the 3D bbox for the query). But if this is problematic I think we could simply remove it, and let BKD recurse from the entire world down.... I put a nocommit about this.

          Show
          Michael McCandless added a comment - Thanks Karl Wright , I committed. Hmm the minX/maxX etc. in the query was supposed to be for the query shape, not for the planet (i.e., the 3D bbox for the query). But if this is problematic I think we could simply remove it, and let BKD recurse from the entire world down.... I put a nocommit about this.
          Hide
          Karl Wright added a comment -

          not to worry; I fixed it in my new patch.

          Show
          Karl Wright added a comment - not to worry; I fixed it in my new patch.
          Hide
          Karl Wright added a comment - - edited

          Ok, did not understand that. We don't yet have the ability to get a Bounds result for a shape that is x,y,z instead of lat/lon. (That was the optimization we discussed but decided to do as a second step). But I presume you do want the ability to know, for a given planet model, the actual bounds of the planet. That's gotta go somewhere.

          Show
          Karl Wright added a comment - - edited Ok, did not understand that. We don't yet have the ability to get a Bounds result for a shape that is x,y,z instead of lat/lon. (That was the optimization we discussed but decided to do as a second step). But I presume you do want the ability to know, for a given planet model, the actual bounds of the planet. That's gotta go somewhere.
          Hide
          Karl Wright added a comment -

          Introducing a stray dependency outside of spatial3d seems like it would make maintenance more of a nightmare rather than less. Objects representing abstractions don't necessarily make things more messy either. Static methods are fine but when you have lack of any significant data abstraction you have a very messy picture indeed. And I don't recall you complaining over the last seven months about geo3d's architecture and organization, so I presume that it was acceptable to you.

          As for duping code, I guess it's all part of the bigger question is whether geo3d should remain separable from Lucene. For me, Geo3d at the moment needs to remain as a working whole without dependencies on the rest of Lucene, because otherwise it will not be usable in my employer's environment. That may change in the long run, after there's a public release of this stuff and we upgrade to it, but for now it would just make things much more complicated for me.

          Show
          Karl Wright added a comment - Introducing a stray dependency outside of spatial3d seems like it would make maintenance more of a nightmare rather than less. Objects representing abstractions don't necessarily make things more messy either. Static methods are fine but when you have lack of any significant data abstraction you have a very messy picture indeed. And I don't recall you complaining over the last seven months about geo3d's architecture and organization, so I presume that it was acceptable to you. As for duping code, I guess it's all part of the bigger question is whether geo3d should remain separable from Lucene. For me, Geo3d at the moment needs to remain as a working whole without dependencies on the rest of Lucene, because otherwise it will not be usable in my employer's environment. That may change in the long run, after there's a public release of this stuff and we upgrade to it, but for now it would just make things much more complicated for me.
          Hide
          Michael McCandless added a comment -

          We don't yet have the ability to get a Bounds result for a shape that is x,y,z instead of lat/lon.

          OK, no problem, this is just an optimization (to reduce the number of compare calls), so it's optional.

          I'll fix BKD tree to have another .intersect that just goes from the global min/max down.

          But I presume you do want the ability to know, for a given planet model, the actual bounds of the planet.

          YES, this is important: the encode/decodeValue methods in BKD3DTreeDocValuesFormat need to be fixed to refer to the PlanetModel instead I think? This is a bit spooky, since it means you could index with one PlanetModel and then query with another ... maybe it'd be better to leave the encoding as is? And have an assert somewhere that the PlanetModel min/max never exceeds what our encoding is using?

          Show
          Michael McCandless added a comment - We don't yet have the ability to get a Bounds result for a shape that is x,y,z instead of lat/lon. OK, no problem, this is just an optimization (to reduce the number of compare calls), so it's optional. I'll fix BKD tree to have another .intersect that just goes from the global min/max down. But I presume you do want the ability to know, for a given planet model, the actual bounds of the planet. YES, this is important: the encode/decodeValue methods in BKD3DTreeDocValuesFormat need to be fixed to refer to the PlanetModel instead I think? This is a bit spooky, since it means you could index with one PlanetModel and then query with another ... maybe it'd be better to leave the encoding as is? And have an assert somewhere that the PlanetModel min/max never exceeds what our encoding is using?
          Hide
          Karl Wright added a comment -

          I think changing the encoding would be scary; let's just assert the proper range somewhere that's not expensive.

          Show
          Karl Wright added a comment - I think changing the encoding would be scary; let's just assert the proper range somewhere that's not expensive.
          Hide
          ASF subversion and git services added a comment -

          Commit 1695409 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1695409 ]

          LUCENE-6699: remove 3D bbox opto

          Show
          ASF subversion and git services added a comment - Commit 1695409 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1695409 ] LUCENE-6699 : remove 3D bbox opto
          Hide
          Michael McCandless added a comment -

          OK, actually, at indexing time you'll hit an exc already if the x/y/z is ever out of bounds (encodeValue checks this) ... but, are there any known PlanetModels with a larger max than the 1.002 I randomly picked!

          I committed the "remove 3D query bbox opto".

          Show
          Michael McCandless added a comment - OK, actually, at indexing time you'll hit an exc already if the x/y/z is ever out of bounds (encodeValue checks this) ... but, are there any known PlanetModels with a larger max than the 1.002 I randomly picked! I committed the "remove 3D query bbox opto".
          Hide
          Karl Wright added a comment -

          are there any known PlanetModels with a larger max than the 1.002 I randomly picked!

          Well, you can construct your own planet model object. Presumably if you were modeling something pretty oblate, like Saturn, 1.002 would not do. I provided the min/max methods precisely so you could determine this for real. But at the moment we can go ahead and get things working before crossing that bridge, if you like.

          Show
          Karl Wright added a comment - are there any known PlanetModels with a larger max than the 1.002 I randomly picked! Well, you can construct your own planet model object. Presumably if you were modeling something pretty oblate, like Saturn, 1.002 would not do. I provided the min/max methods precisely so you could determine this for real. But at the moment we can go ahead and get things working before crossing that bridge, if you like.
          Hide
          Nicholas Knize added a comment -

          Introducing a stray dependency outside of spatial3d seems like it would make maintenance more of a nightmare rather than less.

          I don't recall module dependency for Geo3D being an issue when it was part of the spatial package, now we have spatial code in 3 separate modules and suddenly you oppose module dependency? It makes no sense and this disarray of spatial functionality only introduces more confusion, disorganization, and a wider gap for getting more spatial contributors involved. Nevertheless, I assume (hope) it resolves itself in time, otherwise lucene spatial capabilities will continue to fragment.

          I don't recall you complaining over the last seven months about geo3d's architecture and organization...

          Rather combative. Don't confuse my suggestions for keeping things light, approachable, and organized, as holding any enmity towards geo3d.

          I guess it's all part of the bigger question is whether geo3d should remain separable from Lucene.

          I still prefer it be a part of core/util so that (once again) the 90% geo use case can be accomplished with no dependencies other than core. Having it in a 3d specific package seems no better than simply moving it to Apache SIS (where all EPSG ellipsoids, OGC compliance, etc. are already provided). But that's not my call.

          For me, Geo3d at the moment needs to remain as a working whole without dependencies on the rest of Lucene

          This messaging seems to change based on the agenda. Not that it matters except for keeping in mind whats best for the lucene project as a whole.

          Show
          Nicholas Knize added a comment - Introducing a stray dependency outside of spatial3d seems like it would make maintenance more of a nightmare rather than less. I don't recall module dependency for Geo3D being an issue when it was part of the spatial package, now we have spatial code in 3 separate modules and suddenly you oppose module dependency? It makes no sense and this disarray of spatial functionality only introduces more confusion, disorganization, and a wider gap for getting more spatial contributors involved. Nevertheless, I assume (hope) it resolves itself in time, otherwise lucene spatial capabilities will continue to fragment. I don't recall you complaining over the last seven months about geo3d's architecture and organization... Rather combative. Don't confuse my suggestions for keeping things light, approachable, and organized, as holding any enmity towards geo3d. I guess it's all part of the bigger question is whether geo3d should remain separable from Lucene. I still prefer it be a part of core/util so that (once again) the 90% geo use case can be accomplished with no dependencies other than core. Having it in a 3d specific package seems no better than simply moving it to Apache SIS (where all EPSG ellipsoids, OGC compliance, etc. are already provided). But that's not my call. For me, Geo3d at the moment needs to remain as a working whole without dependencies on the rest of Lucene This messaging seems to change based on the agenda. Not that it matters except for keeping in mind whats best for the lucene project as a whole.
          Hide
          Karl Wright added a comment -

          Rather combative. Don't confuse my suggestions for keeping things light, approachable, and organized, as holding any enmity towards geo3d.

          I am not trying to be combative; I'm interested in the same things you are, although I seem to look at the world a bit differently perhaps. My major concern is that since January there's been quite a bit of back-and-forth with David Smiley and others about various aspects of geo3d organization, structure, abstractions, etc., and it seems like we'd be undoing quite a bit of what was agreed on then to meet your concerns now?

          I still prefer it be a part of core/util so that (once again) the 90% geo use case can be accomplished with no dependencies other than core. Having it in a 3d specific package seems no better than simply moving it to Apache SIS (where all EPSG ellipsoids, OGC compliance, etc. are already provided). But that's not my call.

          Unfortunately, I have constraints, in addition to Lucene. I cannot at the moment contribute to Apache SIS, without going through a laborious and time consuming company process. So if/when geo3d leaves Lucene, I won't immediately be able to leave with it.

          Also, as we've discussed before at some length, geo3d was developed and optimized specifically for the search problem. While that seems like a minor thing at first glance, it's actually quite a big deal. My impression was that this was pretty far from the Apache SIS core mission.

          This messaging seems to change based on the agenda. Not that it matters except for keeping in mind whats best for the lucene project as a whole.

          I've got two masters here. First, it's essential that my company continues to be able to use geo3d, even before it is released via lucene. Remember that development is taking place all the time on both sides. Right now, geo3d is reasonably separable, and we've deliberately built the dependency structure to maintain that. That was one of the reasons behind having a separate module.

          If/when geo3d is actually pulled into core (which I still don't know will definitely happen or not), then it's a different ballgame, and integration with other core code will likely take place. But that hasn't happened yet and may never happen.

          Show
          Karl Wright added a comment - Rather combative. Don't confuse my suggestions for keeping things light, approachable, and organized, as holding any enmity towards geo3d. I am not trying to be combative; I'm interested in the same things you are, although I seem to look at the world a bit differently perhaps. My major concern is that since January there's been quite a bit of back-and-forth with David Smiley and others about various aspects of geo3d organization, structure, abstractions, etc., and it seems like we'd be undoing quite a bit of what was agreed on then to meet your concerns now ? I still prefer it be a part of core/util so that (once again) the 90% geo use case can be accomplished with no dependencies other than core. Having it in a 3d specific package seems no better than simply moving it to Apache SIS (where all EPSG ellipsoids, OGC compliance, etc. are already provided). But that's not my call. Unfortunately, I have constraints, in addition to Lucene. I cannot at the moment contribute to Apache SIS, without going through a laborious and time consuming company process. So if/when geo3d leaves Lucene, I won't immediately be able to leave with it. Also, as we've discussed before at some length, geo3d was developed and optimized specifically for the search problem. While that seems like a minor thing at first glance, it's actually quite a big deal. My impression was that this was pretty far from the Apache SIS core mission. This messaging seems to change based on the agenda. Not that it matters except for keeping in mind whats best for the lucene project as a whole. I've got two masters here. First, it's essential that my company continues to be able to use geo3d, even before it is released via lucene. Remember that development is taking place all the time on both sides. Right now, geo3d is reasonably separable, and we've deliberately built the dependency structure to maintain that. That was one of the reasons behind having a separate module. If/when geo3d is actually pulled into core (which I still don't know will definitely happen or not), then it's a different ballgame, and integration with other core code will likely take place. But that hasn't happened yet and may never happen.
          Hide
          Michael McCandless added a comment -

          I'll address the "rename" nocommits shortly...

          Show
          Michael McCandless added a comment - I'll address the "rename" nocommits shortly...
          Hide
          ASF subversion and git services added a comment -

          Commit 1695494 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1695494 ]

          LUCENE-6699: fix some nocommits

          Show
          ASF subversion and git services added a comment - Commit 1695494 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1695494 ] LUCENE-6699 : fix some nocommits
          Hide
          ASF subversion and git services added a comment -

          Commit 1695510 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1695510 ]

          LUCENE-6699: add basic test for the point field / query

          Show
          ASF subversion and git services added a comment - Commit 1695510 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1695510 ] LUCENE-6699 : add basic test for the point field / query
          Hide
          Michael McCandless added a comment -

          I committed a basic test, but it fails in a fun way:

          Time: 0.202
          There was 1 failure:
          1) testBasic(org.apache.lucene.bkdtree3d.TestGeo3DPointField)
          java.lang.IllegalArgumentException: X values in wrong order or identical
          	at __randomizedtesting.SeedInfo.seed([71BA4E421B49E771:DA405357C495615F]:0)
          	at org.apache.lucene.geo3d.XYZSolid.<init>(XYZSolid.java:94)
          	at org.apache.lucene.bkdtree3d.PointInGeo3DShapeQuery$1$1.compare(PointInGeo3DShapeQuery.java:124)
          	at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:190)
          	at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:129)
          	at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:113)
          	at org.apache.lucene.bkdtree3d.PointInGeo3DShapeQuery$1.scorer(PointInGeo3DShapeQuery.java:99)
          	at org.apache.lucene.search.Weight.bulkScorer(Weight.java:135)
          	at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69)
          	at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69)
          	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
          	at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:92)
          	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:425)
          	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:544)
          	at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:402)
          	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:413)
          	at org.apache.lucene.bkdtree3d.TestGeo3DPointField.testBasic(TestGeo3DPointField.java:58)
          

          I think the X values are in fact identical ... I indexed a single point into the BKD tree, and so minX == maxX and the recursion uses these global min/max when recursing ... I think I might just remove the global min/max and instead recurse from the full int space. This was a change I had "tried" vs the 2D BKD tree and I think I don't like it

          Show
          Michael McCandless added a comment - I committed a basic test, but it fails in a fun way: Time: 0.202 There was 1 failure: 1) testBasic(org.apache.lucene.bkdtree3d.TestGeo3DPointField) java.lang.IllegalArgumentException: X values in wrong order or identical at __randomizedtesting.SeedInfo.seed([71BA4E421B49E771:DA405357C495615F]:0) at org.apache.lucene.geo3d.XYZSolid.<init>(XYZSolid.java:94) at org.apache.lucene.bkdtree3d.PointInGeo3DShapeQuery$1$1.compare(PointInGeo3DShapeQuery.java:124) at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:190) at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:129) at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:113) at org.apache.lucene.bkdtree3d.PointInGeo3DShapeQuery$1.scorer(PointInGeo3DShapeQuery.java:99) at org.apache.lucene.search.Weight.bulkScorer(Weight.java:135) at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69) at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618) at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:92) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:425) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:544) at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:402) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:413) at org.apache.lucene.bkdtree3d.TestGeo3DPointField.testBasic(TestGeo3DPointField.java:58) I think the X values are in fact identical ... I indexed a single point into the BKD tree, and so minX == maxX and the recursion uses these global min/max when recursing ... I think I might just remove the global min/max and instead recurse from the full int space. This was a change I had "tried" vs the 2D BKD tree and I think I don't like it
          Hide
          ASF subversion and git services added a comment -

          Commit 1695513 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1695513 ]

          LUCENE-6699: don't use the global min/max

          Show
          ASF subversion and git services added a comment - Commit 1695513 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1695513 ] LUCENE-6699 : don't use the global min/max
          Hide
          Michael McCandless added a comment -

          I think I don't like it

          OK I removed it and now testBasic passes!

          Show
          Michael McCandless added a comment - I think I don't like it OK I removed it and now testBasic passes!
          Hide
          Michael McCandless added a comment -

          On the module dependencies, I think it's fine if we have some small code dup for now across modules. We will sort this out over time: maybe sandbox depends on spatial3d, or vice versa, or we "graduate" the postings-based and 2D BKD implementations from sandbox into spatial3d (and rename it), or move them into core/util, or ... something. I think we shouldn't fret about it at this point: things are moving quickly and it's a little too early to figure out where things will eventually land.

          I've got two masters here.

          This is fine.

          We all (necessarily: capitalism) have our own sometimes conflicting motives for improving Lucene (and other open-source projects), but it works out that when you sum up all those motives across all players what emerges is something that benefits many, many people.

          moving it to Apache SIS

          I think the Apache SIS project should feel free to poach geo3d at any time, but ...

          Selfishly (for Lucene) I think we should also keep it here as we iterate on the "unique" requirements we have for efficient searching. E.g. here in this issue we already see that we need new APIs in geo3d for the BKD integration, maybe future issues require more such iterating.

          Show
          Michael McCandless added a comment - On the module dependencies, I think it's fine if we have some small code dup for now across modules. We will sort this out over time: maybe sandbox depends on spatial3d, or vice versa, or we "graduate" the postings-based and 2D BKD implementations from sandbox into spatial3d (and rename it), or move them into core/util, or ... something. I think we shouldn't fret about it at this point: things are moving quickly and it's a little too early to figure out where things will eventually land. I've got two masters here. This is fine. We all (necessarily: capitalism) have our own sometimes conflicting motives for improving Lucene (and other open-source projects), but it works out that when you sum up all those motives across all players what emerges is something that benefits many, many people. moving it to Apache SIS I think the Apache SIS project should feel free to poach geo3d at any time, but ... Selfishly (for Lucene) I think we should also keep it here as we iterate on the "unique" requirements we have for efficient searching. E.g. here in this issue we already see that we need new APIs in geo3d for the BKD integration, maybe future issues require more such iterating.
          Hide
          Karl Wright added a comment -

          Well, as we discussed a while back, I'm going to need to implement some "degenerate" solids in any case, if this all works, and a factory method that picks among them. That's a better fix methinks than changing stuff around. I'll get going on that tonight.

          Show
          Karl Wright added a comment - Well, as we discussed a while back, I'm going to need to implement some "degenerate" solids in any case, if this all works, and a factory method that picks among them. That's a better fix methinks than changing stuff around. I'll get going on that tonight.
          Hide
          Karl Wright added a comment -

          Ok – let me synch up and see what new feature should have what priority...

          Show
          Karl Wright added a comment - Ok – let me synch up and see what new feature should have what priority...
          Hide
          ASF subversion and git services added a comment -

          Commit 1695616 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1695616 ]

          LUCENE-6699: fix more names

          Show
          ASF subversion and git services added a comment - Commit 1695616 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1695616 ] LUCENE-6699 : fix more names
          Hide
          ASF subversion and git services added a comment -

          Commit 1695619 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1695619 ]

          LUCENE-6699: fix more nocommits

          Show
          ASF subversion and git services added a comment - Commit 1695619 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1695619 ] LUCENE-6699 : fix more nocommits
          Hide
          Karl Wright added a comment -

          This patch adds the basic architecture for handling degenerate cases of XYZSolids. Still need to implement the specific cases though.

          Show
          Karl Wright added a comment - This patch adds the basic architecture for handling degenerate cases of XYZSolids. Still need to implement the specific cases though.
          Hide
          Karl Wright added a comment -

          Updated patch, including degenerate (but untested) classes for single-dimension degeneracy. Still four additional degeneracy classes to implement.

          Show
          Karl Wright added a comment - Updated patch, including degenerate (but untested) classes for single-dimension degeneracy. Still four additional degeneracy classes to implement.
          Hide
          Michael McCandless added a comment -

          Thanks Karl Wright, I committed the last patch.

          Show
          Michael McCandless added a comment - Thanks Karl Wright , I committed the last patch.
          Hide
          Karl Wright added a comment - - edited

          Hmm, either there was an svn hiccup, or you got the wrong patch.

          Actually, it appears that I uploaded the same patch twice, so it's my fault. But in any case, attaching a new one based on the current branch status.

          Show
          Karl Wright added a comment - - edited Hmm, either there was an svn hiccup, or you got the wrong patch. Actually, it appears that I uploaded the same patch twice, so it's my fault. But in any case, attaching a new one based on the current branch status.
          Hide
          ASF subversion and git services added a comment -

          Commit 1695700 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1695700 ]

          LUCENE-6699: another iteration on degenerate solids

          Show
          ASF subversion and git services added a comment - Commit 1695700 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1695700 ] LUCENE-6699 : another iteration on degenerate solids
          Hide
          Michael McCandless added a comment -

          Thanks Karl Wright, I committed the last patch

          Show
          Michael McCandless added a comment - Thanks Karl Wright , I committed the last patch
          Hide
          Karl Wright added a comment -

          Patch to complete the degenerate solid classes and construction logic.
          Michael McCandless: I will be adding some unit tests for this code, later today or tomorrow.

          Show
          Karl Wright added a comment - Patch to complete the degenerate solid classes and construction logic. Michael McCandless : I will be adding some unit tests for this code, later today or tomorrow.
          Hide
          ASF subversion and git services added a comment -

          Commit 1695950 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1695950 ]

          LUCENE-6699: more degenerate solids

          Show
          ASF subversion and git services added a comment - Commit 1695950 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1695950 ] LUCENE-6699 : more degenerate solids
          Hide
          ASF subversion and git services added a comment -

          Commit 1695951 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1695951 ]

          LUCENE-6699: more degenerate solids

          Show
          ASF subversion and git services added a comment - Commit 1695951 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1695951 ] LUCENE-6699 : more degenerate solids
          Hide
          Michael McCandless added a comment -

          Thanks Karl Wright, I committed these to the branch... I'll work on fixing the test case to test the indexing and searching, not just the lower level direct unit tests for the BKD3DTree.

          Show
          Michael McCandless added a comment - Thanks Karl Wright , I committed these to the branch... I'll work on fixing the test case to test the indexing and searching, not just the lower level direct unit tests for the BKD3DTree.
          Hide
          Karl Wright added a comment -

          Add basic tests for degenerate shapes, as well as a fix for degenerate single point. Michael McCandless, hopefully you can remove all constraints now about generating identical min/max values in your random tests.

          Show
          Karl Wright added a comment - Add basic tests for degenerate shapes, as well as a fix for degenerate single point. Michael McCandless , hopefully you can remove all constraints now about generating identical min/max values in your random tests.
          Hide
          ASF subversion and git services added a comment -

          Commit 1696089 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1696089 ]

          LUCENE-6699: add tests, fix bugs

          Show
          ASF subversion and git services added a comment - Commit 1696089 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1696089 ] LUCENE-6699 : add tests, fix bugs
          Hide
          ASF subversion and git services added a comment -

          Commit 1696090 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1696090 ]

          LUCENE-6699: more degenerate solids

          Show
          ASF subversion and git services added a comment - Commit 1696090 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1696090 ] LUCENE-6699 : more degenerate solids
          Hide
          Michael McCandless added a comment -

          Thanks Karl Wright, I committed your last patch to the branch. I think I'll leave the "global min/max" opto out, but, handling degenerate solids is still necessary because an adversary could have a bunch of identical points as a subset of non-identical points, which can cause degenerate cells even if the global min/max are not degenerate.

          I also got testRandom working, it uncovered some fun bugs, but seems to be passing now!

          I think we're getting close...

          Show
          Michael McCandless added a comment - Thanks Karl Wright , I committed your last patch to the branch. I think I'll leave the "global min/max" opto out, but, handling degenerate solids is still necessary because an adversary could have a bunch of identical points as a subset of non-identical points, which can cause degenerate cells even if the global min/max are not degenerate. I also got testRandom working, it uncovered some fun bugs, but seems to be passing now! I think we're getting close...
          Hide
          Karl Wright added a comment -

          I've been considering the question of how to get x,y,z bounds for the purposes of optimization. It's a bit more complex than I originally thought; maybe I'll have an idea by tomorrow. The complexity revolves around the fact that x,y,z bounds are always explicit, but lat/lon bounds have concepts like "no longitude bound" and "no top latitude bound".

          Show
          Karl Wright added a comment - I've been considering the question of how to get x,y,z bounds for the purposes of optimization. It's a bit more complex than I originally thought; maybe I'll have an idea by tomorrow. The complexity revolves around the fact that x,y,z bounds are always explicit, but lat/lon bounds have concepts like "no longitude bound" and "no top latitude bound".
          Hide
          Karl Wright added a comment -

          I think I worked it out; I should have a usable patch sometime later this morning.
          With that, I think the geo3d additions are complete, so yeah, we're getting closer.

          Show
          Karl Wright added a comment - I think I worked it out; I should have a usable patch sometime later this morning. With that, I think the geo3d additions are complete, so yeah, we're getting closer.
          Hide
          Karl Wright added a comment -

          Hmm, ran into another snag.
          I think I had better think this through over the next day or so.

          Show
          Karl Wright added a comment - Hmm, ran into another snag. I think I had better think this through over the next day or so.
          Hide
          Karl Wright added a comment -

          Revamp how we do bounds, so we can compute (x,y,z) bounds cheaply too.

          This patch has a couple of known issues, and needs tests too, both of which I'll be addressing shortly.

          Show
          Karl Wright added a comment - Revamp how we do bounds, so we can compute (x,y,z) bounds cheaply too. This patch has a couple of known issues, and needs tests too, both of which I'll be addressing shortly.
          Hide
          Michael McCandless added a comment -

          Thanks Karl Wright, I'll commit shortly.

          I ran the same "225 bboxes around London on the 61M OpenStreetMap
          lat/lon points" perf tests with the current branch, plus the 3 other
          existing options:

          • Geo3D + BKD (PointInGeo3DShapeQuery, this issue/branch): index is
            936 MB, took 423 + 96 (close, waiting for merges) sec to build.
            7.01 sec to run 225 queries, total hits 221,118,860.
          • 2D BKD (BKDPointInBBoxQuery): index is 704 MB, took 163 + 97 sec
            to build. 2.32 sec to run 225 queries, total hits 221,118,844.
            It's OK (I think) that number of hits is a wee bit smaller than
            geo3d because the geo3d bbox when projected to 2D space is a bit
            bowed out (I think?) and can therefore include a few more
            points.
          • Postings + DV (GeoPointInBBoxQuery): index is 3.4 GB, took 349 +
            11 sec to build. 3.60 sec to run 225 queries, total 221,120,393
            hits (hmm: I'm not sure why this is not identical to 2D BKD?)
          • PackedQuadPrefixTree (RecursivePrefixTreeStrategy), with maxLevels=25:
            990 + 11 seconds to build, 7.7 GB, 3.82 seconds to run 225 queries, total
            221,123,206 hits.

          All these benchmarks are in luceneutil.

          Show
          Michael McCandless added a comment - Thanks Karl Wright , I'll commit shortly. I ran the same "225 bboxes around London on the 61M OpenStreetMap lat/lon points" perf tests with the current branch, plus the 3 other existing options: Geo3D + BKD (PointInGeo3DShapeQuery, this issue/branch): index is 936 MB, took 423 + 96 (close, waiting for merges) sec to build. 7.01 sec to run 225 queries, total hits 221,118,860. 2D BKD (BKDPointInBBoxQuery): index is 704 MB, took 163 + 97 sec to build. 2.32 sec to run 225 queries, total hits 221,118,844. It's OK (I think) that number of hits is a wee bit smaller than geo3d because the geo3d bbox when projected to 2D space is a bit bowed out (I think?) and can therefore include a few more points. Postings + DV (GeoPointInBBoxQuery): index is 3.4 GB, took 349 + 11 sec to build. 3.60 sec to run 225 queries, total 221,120,393 hits (hmm: I'm not sure why this is not identical to 2D BKD?) PackedQuadPrefixTree (RecursivePrefixTreeStrategy), with maxLevels=25: 990 + 11 seconds to build, 7.7 GB, 3.82 seconds to run 225 queries, total 221,123,206 hits. All these benchmarks are in luceneutil.
          Hide
          ASF subversion and git services added a comment -

          Commit 1696275 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1696275 ]

          LUCENE-6699: first cut to support x,y,z bounds

          Show
          ASF subversion and git services added a comment - Commit 1696275 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1696275 ] LUCENE-6699 : first cut to support x,y,z bounds
          Hide
          Michael McCandless added a comment -

          Karl Wright I committed the patch but svn got a bit furious:

          mike@haswell:/l/3dbkd$ svn patch patch.x
          U         lucene/spatial/src/java/org/apache/lucene/spatial/spatial4j/Geo3dShape.java
          C         lucene/spatial/src/test/org/apache/lucene/spatial/spatial4j/Geo3dShapeRectRelationTestCase.java
          >         rejected hunk @@ -58,7 +58,8 @@
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/Bounds.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoBaseShape.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoCircle.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoCompositeMembershipShape.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoConvexPolygon.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoDegenerateHorizontalLine.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoDegenerateLatitudeZone.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoDegenerateLongitudeSlice.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoDegeneratePoint.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoDegenerateVerticalLine.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoLatitudeZone.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoLongitudeSlice.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoNorthLatitudeZone.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoNorthRectangle.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoPath.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoRectangle.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoShape.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoSouthLatitudeZone.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoSouthRectangle.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoWideDegenerateHorizontalLine.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoWideLongitudeSlice.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoWideNorthRectangle.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoWideRectangle.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoWideSouthRectangle.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoWorld.java
          A         lucene/spatial3d/src/java/org/apache/lucene/geo3d/LatLonBounds.java
          U         lucene/spatial3d/src/java/org/apache/lucene/geo3d/Plane.java
          A         lucene/spatial3d/src/java/org/apache/lucene/geo3d/XYZBounds.java
          >         hunk @@ -138,7 +138,7 @@ already applied
          >         hunk @@ -145,6 +145,7 @@ already applied
          >         hunk @@ -154,6 +155,7 @@ already applied
          >         hunk @@ -160,12 +162,12 @@ already applied
          >         hunk @@ -81,7 +81,7 @@ already applied
          >         hunk @@ -88,6 +88,7 @@ already applied
          >         hunk @@ -96,18 +97,13 @@ already applied
          U         lucene/spatial3d/src/test/org/apache/lucene/geo3d/GeoBBoxTest.java
          U         lucene/spatial3d/src/test/org/apache/lucene/geo3d/GeoCircleTest.java
          U         lucene/spatial3d/src/test/org/apache/lucene/geo3d/GeoConvexPolygonTest.java
          U         lucene/spatial3d/src/test/org/apache/lucene/geo3d/GeoModelTest.java
          U         lucene/spatial3d/src/test/org/apache/lucene/geo3d/GeoPathTest.java
          U         lucene/spatial3d/src/test/org/apache/lucene/geo3d/GeoPolygonTest.java
          >         hunk @@ -27,16 +27,15 @@ already applied
          >         hunk @@ -67,5 +66,160 @@ already applied
          Summary of conflicts:
            Text conflicts: 1
          

          I resolved the one conflict by hand, but can you svn up and confirm things look OK? Tests did pass after applying the patch...

          Show
          Michael McCandless added a comment - Karl Wright I committed the patch but svn got a bit furious: mike@haswell:/l/3dbkd$ svn patch patch.x U lucene/spatial/src/java/org/apache/lucene/spatial/spatial4j/Geo3dShape.java C lucene/spatial/src/test/org/apache/lucene/spatial/spatial4j/Geo3dShapeRectRelationTestCase.java > rejected hunk @@ -58,7 +58,8 @@ U lucene/spatial3d/src/java/org/apache/lucene/geo3d/Bounds.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoBaseShape.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoCircle.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoCompositeMembershipShape.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoConvexPolygon.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoDegenerateHorizontalLine.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoDegenerateLatitudeZone.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoDegenerateLongitudeSlice.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoDegeneratePoint.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoDegenerateVerticalLine.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoLatitudeZone.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoLongitudeSlice.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoNorthLatitudeZone.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoNorthRectangle.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoPath.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoRectangle.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoShape.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoSouthLatitudeZone.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoSouthRectangle.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoWideDegenerateHorizontalLine.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoWideLongitudeSlice.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoWideNorthRectangle.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoWideRectangle.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoWideSouthRectangle.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/GeoWorld.java A lucene/spatial3d/src/java/org/apache/lucene/geo3d/LatLonBounds.java U lucene/spatial3d/src/java/org/apache/lucene/geo3d/Plane.java A lucene/spatial3d/src/java/org/apache/lucene/geo3d/XYZBounds.java > hunk @@ -138,7 +138,7 @@ already applied > hunk @@ -145,6 +145,7 @@ already applied > hunk @@ -154,6 +155,7 @@ already applied > hunk @@ -160,12 +162,12 @@ already applied > hunk @@ -81,7 +81,7 @@ already applied > hunk @@ -88,6 +88,7 @@ already applied > hunk @@ -96,18 +97,13 @@ already applied U lucene/spatial3d/src/test/org/apache/lucene/geo3d/GeoBBoxTest.java U lucene/spatial3d/src/test/org/apache/lucene/geo3d/GeoCircleTest.java U lucene/spatial3d/src/test/org/apache/lucene/geo3d/GeoConvexPolygonTest.java U lucene/spatial3d/src/test/org/apache/lucene/geo3d/GeoModelTest.java U lucene/spatial3d/src/test/org/apache/lucene/geo3d/GeoPathTest.java U lucene/spatial3d/src/test/org/apache/lucene/geo3d/GeoPolygonTest.java > hunk @@ -27,16 +27,15 @@ already applied > hunk @@ -67,5 +66,160 @@ already applied Summary of conflicts: Text conflicts: 1 I resolved the one conflict by hand, but can you svn up and confirm things look OK? Tests did pass after applying the patch...
          Hide
          ASF subversion and git services added a comment -

          Commit 1696276 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1696276 ]

          LUCENE-6699: round up when dividing by 2 to find size of BKD tree

          Show
          ASF subversion and git services added a comment - Commit 1696276 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1696276 ] LUCENE-6699 : round up when dividing by 2 to find size of BKD tree
          Hide
          Karl Wright added a comment -

          It all looks OK. I don't know why svn was unhappy. Maybe I should rebuild my workarea...

          Show
          Karl Wright added a comment - It all looks OK. I don't know why svn was unhappy. Maybe I should rebuild my workarea...
          Hide
          Karl Wright added a comment -

          FWIW, Michael McCandless, I convinced myself that the one case I was worried about should, in fact, not be an issue. So you should be able to start using XYZBounds to obtain bounds information for shapes.

          Specifically, you want to do this:

          XYZBounds bounds = new XYZBounds();
          myshape.getBounds(bounds);
          double minX = bounds.getMinimumX();
          ...
          

          I will work on the tests shortly, but that's basically all there should be to it. Please let me know if you find any weird behavior.

          Show
          Karl Wright added a comment - FWIW, Michael McCandless , I convinced myself that the one case I was worried about should, in fact, not be an issue. So you should be able to start using XYZBounds to obtain bounds information for shapes. Specifically, you want to do this: XYZBounds bounds = new XYZBounds(); myshape.getBounds(bounds); double minX = bounds.getMinimumX(); ... I will work on the tests shortly, but that's basically all there should be to it. Please let me know if you find any weird behavior.
          Hide
          Karl Wright added a comment -

          patch to remove unused code

          Show
          Karl Wright added a comment - patch to remove unused code
          Hide
          ASF subversion and git services added a comment -

          Commit 1696299 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1696299 ]

          LUCENE-6699: remove unused code

          Show
          ASF subversion and git services added a comment - Commit 1696299 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1696299 ] LUCENE-6699 : remove unused code
          Hide
          Michael McCandless added a comment -

          Thanks Karl Wright, I'll try using the bounds in the BKD tree ... and I committed your last patch!

          Show
          Michael McCandless added a comment - Thanks Karl Wright , I'll try using the bounds in the BKD tree ... and I committed your last patch!
          Hide
          ASF subversion and git services added a comment -

          Commit 1696305 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1696305 ]

          LUCENE-6699: use bbox, but test fails

          Show
          ASF subversion and git services added a comment - Commit 1696305 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1696305 ] LUCENE-6699 : use bbox, but test fails
          Hide
          Michael McCandless added a comment -

          OK I fixed the query to use the new XYZBounds API, but I hit NPE:

             [junit4] Suite: org.apache.lucene.bkdtree3d.TestGeo3DPointField
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeo3DPointField -Dtests.method=testBasic -Dtests.seed=1600287909381DEB -Dtests.locale=el_CY -Dtests.timezone=America/Mexico_City -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
             [junit4] ERROR   0.22s J0 | TestGeo3DPointField.testBasic <<<
             [junit4]    > Throwable #1: java.lang.NullPointerException
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([1600287909381DEB:BDFA356CD6E49BC5]:0)
             [junit4]    > 	at org.apache.lucene.bkdtree3d.PointInGeo3DShapeQuery$1.scorer(PointInGeo3DShapeQuery.java:102)
             [junit4]    > 	at org.apache.lucene.search.Weight.bulkScorer(Weight.java:135)
             [junit4]    > 	at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69)
             [junit4]    > 	at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69)
             [junit4]    > 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
             [junit4]    > 	at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:92)
             [junit4]    > 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:425)
             [junit4]    > 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:544)
             [junit4]    > 	at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:402)
             [junit4]    > 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:413)
             [junit4]    > 	at org.apache.lucene.bkdtree3d.TestGeo3DPointField.testBasic(TestGeo3DPointField.java:90)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
          

          Is it expected that sometimes the bounds would be null? If so, I could just swap in Integer.MIN/MAX_VALUE for that dimension... (seems like in this failure it's only the z dimension that's null).

          Show
          Michael McCandless added a comment - OK I fixed the query to use the new XYZBounds API, but I hit NPE: [junit4] Suite: org.apache.lucene.bkdtree3d.TestGeo3DPointField [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeo3DPointField -Dtests.method=testBasic -Dtests.seed=1600287909381DEB -Dtests.locale=el_CY -Dtests.timezone=America/Mexico_City -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 [junit4] ERROR 0.22s J0 | TestGeo3DPointField.testBasic <<< [junit4] > Throwable #1: java.lang.NullPointerException [junit4] > at __randomizedtesting.SeedInfo.seed([1600287909381DEB:BDFA356CD6E49BC5]:0) [junit4] > at org.apache.lucene.bkdtree3d.PointInGeo3DShapeQuery$1.scorer(PointInGeo3DShapeQuery.java:102) [junit4] > at org.apache.lucene.search.Weight.bulkScorer(Weight.java:135) [junit4] > at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69) [junit4] > at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69) [junit4] > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618) [junit4] > at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:92) [junit4] > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:425) [junit4] > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:544) [junit4] > at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:402) [junit4] > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:413) [junit4] > at org.apache.lucene.bkdtree3d.TestGeo3DPointField.testBasic(TestGeo3DPointField.java:90) [junit4] > at java.lang.Thread.run(Thread.java:745) Is it expected that sometimes the bounds would be null? If so, I could just swap in Integer.MIN/MAX_VALUE for that dimension... (seems like in this failure it's only the z dimension that's null).
          Hide
          Karl Wright added a comment -

          They are not expected to be null. I have a fix for the problem (happened to hit it myself just moments ago).

          Show
          Karl Wright added a comment - They are not expected to be null. I have a fix for the problem (happened to hit it myself just moments ago).
          Hide
          Karl Wright added a comment -

          Some tests and some fixes

          Show
          Karl Wright added a comment - Some tests and some fixes
          Hide
          Karl Wright added a comment -

          Michael McCandless: Here's an updated patch that has a fix for the problem I discovered.

          Show
          Karl Wright added a comment - Michael McCandless : Here's an updated patch that has a fix for the problem I discovered.
          Hide
          Karl Wright added a comment -

          And, yet another patch which cleans up stuff

          Show
          Karl Wright added a comment - And, yet another patch which cleans up stuff
          Hide
          ASF subversion and git services added a comment -

          Commit 1696334 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1696334 ]

          LUCENE-6699: cleanups

          Show
          ASF subversion and git services added a comment - Commit 1696334 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1696334 ] LUCENE-6699 : cleanups
          Hide
          Michael McCandless added a comment -

          Thanks Karl Wright, I committed your last patch, but I'm still seeing (different) test failures ...

          Show
          Michael McCandless added a comment - Thanks Karl Wright , I committed your last patch, but I'm still seeing (different) test failures ...
          Hide
          Michael McCandless added a comment -

          Hmm, with the attached patch, which fixes the query to not use the new XYZBounds, the test passes ... so either something is wrong in the bbox computation, or something is wrong in how the BKD tree / the query optimization uses the bbox ...

          Show
          Michael McCandless added a comment - Hmm, with the attached patch, which fixes the query to not use the new XYZBounds, the test passes ... so either something is wrong in the bbox computation, or something is wrong in how the BKD tree / the query optimization uses the bbox ...
          Hide
          Michael McCandless added a comment -

          Hmm, I added an assertion that the XYZBounds contains the shape, but the assertion fails...

          Or could this just be a boundary issue?

          Show
          Michael McCandless added a comment - Hmm, I added an assertion that the XYZBounds contains the shape, but the assertion fails... Or could this just be a boundary issue?
          Hide
          Karl Wright added a comment - - edited

          So what is the code? Are you constructing an XYZSolid from the XYZBounds for the shape?

          The shape should definitely be contained by an XYZ solid constructed from the XYZBounds for the shape. Round off should not be a concern. It's possible, though, that you might be misinterpreting the result from getRelationship(). Either that, or some specific shape has code problems I am unaware of and need to debug. So code would help.

          Show
          Karl Wright added a comment - - edited So what is the code? Are you constructing an XYZSolid from the XYZBounds for the shape? The shape should definitely be contained by an XYZ solid constructed from the XYZBounds for the shape. Round off should not be a concern. It's possible, though, that you might be misinterpreting the result from getRelationship(). Either that, or some specific shape has code problems I am unaware of and need to debug. So code would help.
          Hide
          Michael McCandless added a comment -

          Karl Wright Here's the patch that should trip the assert for you? And it's entirely possible I messed up the order of the relation!

          Show
          Michael McCandless added a comment - Karl Wright Here's the patch that should trip the assert for you? And it's entirely possible I messed up the order of the relation!
          Hide
          Karl Wright added a comment -

          Thanks – I was able to reproduce at least a variant of this in one of my test cases.

          What is happening for me is that I'm getting back OVERLAPS instead of WITHIN. This is tricky because it's happening due to edge intersection between the XYZSolid I created and the GeoRectangle I'm looking at. But the edges really do intersect in this case – indeed, they are on top of each other – so I will need to think hard about how to structure the logic so that it is correct given that.

          In the meantime, you can probably just treat OVERLAPS and WITHIN as being comparable.

          Show
          Karl Wright added a comment - Thanks – I was able to reproduce at least a variant of this in one of my test cases. What is happening for me is that I'm getting back OVERLAPS instead of WITHIN. This is tricky because it's happening due to edge intersection between the XYZSolid I created and the GeoRectangle I'm looking at. But the edges really do intersect in this case – indeed, they are on top of each other – so I will need to think hard about how to structure the logic so that it is correct given that. In the meantime, you can probably just treat OVERLAPS and WITHIN as being comparable.
          Hide
          Karl Wright added a comment -

          I confirmed this basic picture with the following code:

              xyzb = new XYZBounds();
              c.getBounds(xyzb);
          
              GeoArea area = GeoAreaFactory.makeGeoArea(PlanetModel.SPHERE,
                xyzb.getMinimumX() - 2.0 * Vector.MINIMUM_RESOLUTION,
                xyzb.getMaximumX() + 2.0 * Vector.MINIMUM_RESOLUTION,
                xyzb.getMinimumY() - 2.0 * Vector.MINIMUM_RESOLUTION,
                xyzb.getMaximumY() + 2.0 * Vector.MINIMUM_RESOLUTION,
                xyzb.getMinimumZ() - 2.0 * Vector.MINIMUM_RESOLUTION,
                xyzb.getMaximumZ() + 2.0 * Vector.MINIMUM_RESOLUTION);
              assertEquals(GeoArea.WITHIN, area.getRelationship(c));
          

          This always seems to pass for me.
          I believe that this problem is pervasive throughout geo3d, not just for xyz solids. As such, it will be difficult to easily fix. Let me think on it overnight to see if I can come up with something that makes everyone happy.

          Show
          Karl Wright added a comment - I confirmed this basic picture with the following code: xyzb = new XYZBounds(); c.getBounds(xyzb); GeoArea area = GeoAreaFactory.makeGeoArea(PlanetModel.SPHERE, xyzb.getMinimumX() - 2.0 * Vector.MINIMUM_RESOLUTION, xyzb.getMaximumX() + 2.0 * Vector.MINIMUM_RESOLUTION, xyzb.getMinimumY() - 2.0 * Vector.MINIMUM_RESOLUTION, xyzb.getMaximumY() + 2.0 * Vector.MINIMUM_RESOLUTION, xyzb.getMinimumZ() - 2.0 * Vector.MINIMUM_RESOLUTION, xyzb.getMaximumZ() + 2.0 * Vector.MINIMUM_RESOLUTION); assertEquals(GeoArea.WITHIN, area.getRelationship(c)); This always seems to pass for me. I believe that this problem is pervasive throughout geo3d, not just for xyz solids. As such, it will be difficult to easily fix. Let me think on it overnight to see if I can come up with something that makes everyone happy.
          Hide
          Karl Wright added a comment -

          Michael McCandless: In order to reliably distinguish between the case of an edge just touching a shape and actually crossing into it, which is what I need here, I need to count the number of crossings per edge. This is fundamentally easy to do, for a single edge. However, there are quite a number of downstream complications, because shapes include a number of edges, not just one, and we have to put it all together. For example, if one shape happens to go through an endpoint where two edges intersect, it should really count as one intersection, not two. Making matters more complicated, composite shapes may consist of simple shapes that overlap and thus the algorithm will be even more fraught and need correspondingly more thought.

          The upshot of this is that the infrastructure I currently have in place needs a substantial upgrade in order to be able to reliably distinguish WITHIN from OVERLAP in the corner case where the shape just abuts the solid. This will take me a couple of days, most likely. Unfortunately, there's also a corner case with composite shapes where OVERLAP may be returned instead of CONTAINS, so this clearly has to be fixed for BKD to work correctly for all shapes. So I apologize for the delay.

          Stay tuned.

          Show
          Karl Wright added a comment - Michael McCandless : In order to reliably distinguish between the case of an edge just touching a shape and actually crossing into it, which is what I need here, I need to count the number of crossings per edge. This is fundamentally easy to do, for a single edge. However, there are quite a number of downstream complications, because shapes include a number of edges, not just one, and we have to put it all together. For example, if one shape happens to go through an endpoint where two edges intersect, it should really count as one intersection, not two. Making matters more complicated, composite shapes may consist of simple shapes that overlap and thus the algorithm will be even more fraught and need correspondingly more thought. The upshot of this is that the infrastructure I currently have in place needs a substantial upgrade in order to be able to reliably distinguish WITHIN from OVERLAP in the corner case where the shape just abuts the solid. This will take me a couple of days, most likely. Unfortunately, there's also a corner case with composite shapes where OVERLAP may be returned instead of CONTAINS, so this clearly has to be fixed for BKD to work correctly for all shapes. So I apologize for the delay. Stay tuned.
          Hide
          Karl Wright added a comment -

          Michael McCandless: I take part of this back.

          The Bkd query code looks in part like this:

                                                       case GeoArea.OVERLAPS:
                                                         // They do overlap but neither contains the other:
                                                         //System.out.println("    crosses1");
                                                         return BKD3DTreeReader.Relation.CROSSES;
                                                       case GeoArea.WITHIN:
                                                         // Cell fully contains the shape:
                                                         //System.out.println("    crosses2");
                                                         return BKD3DTreeReader.Relation.CROSSES;
          
          

          So, you are already treating WITHIN and OVERLAPS the same.
          As for the case where, instead of CONTAINS, an OVERLAPS result may occur, I believe that will simply mean that the algorithm needs to work harder than it strictly should, in that case. Please tell me if I am wrong! (FWIW, there aren't any current examples of this behavior that I know of, other than to use GeoCompositeShape to glue together other shapes.)

          So I guess what I'm saying is that we should be able to get bkd working without changing geo3d's code further, unless there are other bugs.

          FWIW, this is why the problem hasn't been noted until now; the spatial4j definitions are loose enough to allow the current degree of behavior. Given the difficulty of changing that behavior, I may simply change the comments, if we can get bkd working.

          Show
          Karl Wright added a comment - Michael McCandless : I take part of this back. The Bkd query code looks in part like this: case GeoArea.OVERLAPS: // They do overlap but neither contains the other: // System .out.println( " crosses1" ); return BKD3DTreeReader.Relation.CROSSES; case GeoArea.WITHIN: // Cell fully contains the shape: // System .out.println( " crosses2" ); return BKD3DTreeReader.Relation.CROSSES; So, you are already treating WITHIN and OVERLAPS the same. As for the case where, instead of CONTAINS, an OVERLAPS result may occur, I believe that will simply mean that the algorithm needs to work harder than it strictly should, in that case. Please tell me if I am wrong! (FWIW, there aren't any current examples of this behavior that I know of, other than to use GeoCompositeShape to glue together other shapes.) So I guess what I'm saying is that we should be able to get bkd working without changing geo3d's code further, unless there are other bugs. FWIW, this is why the problem hasn't been noted until now; the spatial4j definitions are loose enough to allow the current degree of behavior. Given the difficulty of changing that behavior, I may simply change the comments, if we can get bkd working.
          Hide
          Karl Wright added a comment -

          Here's the proposed comment change to GeoArea:

            /**
             * Find the spatial relationship between a shape and the current geo area.
             * Note: return value is how the GeoShape relates to the GeoArea, not the
             * other way around. For example, if this GeoArea is entirely within the
             * shape, then CONTAINS should be returned.  If the shape is entirely enclosed
             * by this GeoArea, then WITHIN should be returned.
             *
             * It is permissible to return OVERLAPS instead of WITHIN if the shape
             * intersects with the area at even a single point.  So, a circle inscribed in
             * a rectangle could return either OVERLAPS or WITHIN, depending on
             * implementation.  It is not permissible to return CONTAINS or DISJOINT
             * in this circumstance, however.
             *
             * Similarly, it is permissible to return OVERLAPS instead of CONTAINS
             * under conditions where the shape consists of multiple independent overlapping
             * subshapes, and the area overlaps one of the subshapes.  It is not permissible
             * to return WITHIN or DISJOINT in this circumstance, however.
             *
             * @param shape is the shape to consider.
             * @return the relationship, from the perspective of the shape.
             */
            public int getRelationship(GeoShape shape);
          
          Show
          Karl Wright added a comment - Here's the proposed comment change to GeoArea: /** * Find the spatial relationship between a shape and the current geo area. * Note: return value is how the GeoShape relates to the GeoArea, not the * other way around. For example, if this GeoArea is entirely within the * shape, then CONTAINS should be returned. If the shape is entirely enclosed * by this GeoArea, then WITHIN should be returned. * * It is permissible to return OVERLAPS instead of WITHIN if the shape * intersects with the area at even a single point. So, a circle inscribed in * a rectangle could return either OVERLAPS or WITHIN, depending on * implementation. It is not permissible to return CONTAINS or DISJOINT * in this circumstance, however. * * Similarly, it is permissible to return OVERLAPS instead of CONTAINS * under conditions where the shape consists of multiple independent overlapping * subshapes, and the area overlaps one of the subshapes. It is not permissible * to return WITHIN or DISJOINT in this circumstance, however. * * @param shape is the shape to consider. * @ return the relationship, from the perspective of the shape. */ public int getRelationship(GeoShape shape);
          Hide
          Karl Wright added a comment -

          Same as last patch, except the description for GeoArea has been updated.
          Michael McCandless

          Show
          Karl Wright added a comment - Same as last patch, except the description for GeoArea has been updated. Michael McCandless
          Hide
          Michael McCandless added a comment -

          OK +1 to just improve the docs, I'll commit that.

          Which means my added assert is invalid.

          Which means the bug lies elsewhere: somehow, when using the XYZ bounding box, the test fails.

          Show
          Michael McCandless added a comment - OK +1 to just improve the docs, I'll commit that. Which means my added assert is invalid. Which means the bug lies elsewhere: somehow, when using the XYZ bounding box, the test fails.
          Hide
          ASF subversion and git services added a comment -

          Commit 1696419 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1696419 ]

          LUCENE-6699: improve GeoArea.getRelationship javadocs

          Show
          ASF subversion and git services added a comment - Commit 1696419 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1696419 ] LUCENE-6699 : improve GeoArea.getRelationship javadocs
          Hide
          Karl Wright added a comment -

          Patch which fixes Mike's tests; it seems like the query implementation is unhappy when the bounds does not start outside the area that needs to be searched.

          Show
          Karl Wright added a comment - Patch which fixes Mike's tests; it seems like the query implementation is unhappy when the bounds does not start outside the area that needs to be searched.
          Hide
          Michael McCandless added a comment -

          Hmm why do we need this fudge factor Does +1 not work?

          Show
          Michael McCandless added a comment - Hmm why do we need this fudge factor Does +1 not work?
          Hide
          Karl Wright added a comment -

          +/- 1 would work, but would remove all benefit, and then some.

          Bear in mind that I still don't know why your code needs this, but within Geo3d, everything +/-Vector.MINIMUM_RESOLUTION (which is 1e-12 at the moment) is considered the same.

          Show
          Karl Wright added a comment - +/- 1 would work, but would remove all benefit, and then some. Bear in mind that I still don't know why your code needs this, but within Geo3d, everything +/-Vector.MINIMUM_RESOLUTION (which is 1e-12 at the moment) is considered the same.
          Hide
          Michael McCandless added a comment -

          Hmm I'm still seeing test failures w/ the patch, e.g.:

          ago 18, 2015 11:28:37 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
          ADVERTENCIA: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeo3DPointField]
          java.lang.AssertionError: T0: iter=71 id=5758 docID=1056 lat=-0.7503052868590764 lon=0.03113194606309409 expected true but got: false deleted?=false
            point1=[X=0.731126293194162, Y=0.022768740606807995, Z=-0.6818621032520755], iswithin=true
            point2=[X=0.731126292820613, Y=0.0227687404615659, Z=-0.6818621031259476], iswithin=true
            query=PointInGeo3DShapeQuery: field=point:PlanetModel: PlanetModel.SPHERE Shape: GeoCircle: {planetmodel=PlanetModel.SPHERE, center=[X=0.22316407597143118, Y=0.8640521337091807, Z=-0.45123353757054624], radius=1.0586809798482488(60.65795199607921)}
          	at __randomizedtesting.SeedInfo.seed([BE582B0DCB72C9AE]:0)
          	at org.junit.Assert.fail(Assert.java:93)
          	at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4._run(TestGeo3DPointField.java:617)
          	at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4.run(TestGeo3DPointField.java:514)
          
          EEEENOTE: reproduce with: ant test  -Dtestcase=TestGeo3DPointField -Dtests.method=testRandomMedium -Dtests.seed=BE582B0DCB72C9AE -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=es_VE -Dtests.timezone=Africa/Algiers -Dtests.asserts=true -Dtests.file.encoding=UTF-8
          .I....ENOTE: test params are: codec=FastDecompressionCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=FAST_DECOMPRESSION, chunkSize=13655, maxDocsPerChunk=627, blockSize=8), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=FAST_DECOMPRESSION, chunkSize=13655, blockSize=8)), sim=DefaultSimilarity, locale=es_VE, timezone=Africa/Algiers
          NOTE: Linux 3.13.0-46-generic amd64/Oracle Corporation 1.8.0_40 (64-bit)/cpus=8,threads=1,free=428823904,total=514850816
          NOTE: All tests run in this JVM: [TestGeo3DPointField]
          NOTE: reproduce with: ant test  -Dtestcase=TestGeo3DPointField -Dtests.seed=BE582B0DCB72C9AE -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=es_VE -Dtests.timezone=Africa/Algiers -Dtests.asserts=true -Dtests.file.encoding=UTF-8
          
          Show
          Michael McCandless added a comment - Hmm I'm still seeing test failures w/ the patch, e.g.: ago 18, 2015 11:28:37 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException ADVERTENCIA: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeo3DPointField] java.lang.AssertionError: T0: iter=71 id=5758 docID=1056 lat=-0.7503052868590764 lon=0.03113194606309409 expected true but got: false deleted?=false point1=[X=0.731126293194162, Y=0.022768740606807995, Z=-0.6818621032520755], iswithin=true point2=[X=0.731126292820613, Y=0.0227687404615659, Z=-0.6818621031259476], iswithin=true query=PointInGeo3DShapeQuery: field=point:PlanetModel: PlanetModel.SPHERE Shape: GeoCircle: {planetmodel=PlanetModel.SPHERE, center=[X=0.22316407597143118, Y=0.8640521337091807, Z=-0.45123353757054624], radius=1.0586809798482488(60.65795199607921)} at __randomizedtesting.SeedInfo.seed([BE582B0DCB72C9AE]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4._run(TestGeo3DPointField.java:617) at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4.run(TestGeo3DPointField.java:514) EEEENOTE: reproduce with: ant test -Dtestcase=TestGeo3DPointField -Dtests.method=testRandomMedium -Dtests.seed=BE582B0DCB72C9AE -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=es_VE -Dtests.timezone=Africa/Algiers -Dtests.asserts=true -Dtests.file.encoding=UTF-8 .I....ENOTE: test params are: codec=FastDecompressionCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=FAST_DECOMPRESSION, chunkSize=13655, maxDocsPerChunk=627, blockSize=8), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=FAST_DECOMPRESSION, chunkSize=13655, blockSize=8)), sim=DefaultSimilarity, locale=es_VE, timezone=Africa/Algiers NOTE: Linux 3.13.0-46-generic amd64/Oracle Corporation 1.8.0_40 (64-bit)/cpus=8,threads=1,free=428823904,total=514850816 NOTE: All tests run in this JVM: [TestGeo3DPointField] NOTE: reproduce with: ant test -Dtestcase=TestGeo3DPointField -Dtests.seed=BE582B0DCB72C9AE -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=es_VE -Dtests.timezone=Africa/Algiers -Dtests.asserts=true -Dtests.file.encoding=UTF-8
          Hide
          Karl Wright added a comment -

          I was able to repro this with that seed as well. Looking deeper.

          Show
          Karl Wright added a comment - I was able to repro this with that seed as well. Looking deeper.
          Hide
          ASF subversion and git services added a comment -

          Commit 1696511 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1696511 ]

          LUCENE-6699: rename relations

          Show
          ASF subversion and git services added a comment - Commit 1696511 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1696511 ] LUCENE-6699 : rename relations
          Hide
          Michael McCandless added a comment -

          Karl Wright I committed a renaming of the relations: I was driving myself insane trying to reason about them.

          I also added a new relation, for when the cell fully contains the shape, and a new assertion which now trips.

          I didn't commit the fudge factors yet ... if you find they are needed I can ... but I'm still not sure where the bug is

          Show
          Michael McCandless added a comment - Karl Wright I committed a renaming of the relations: I was driving myself insane trying to reason about them. I also added a new relation, for when the cell fully contains the shape, and a new assertion which now trips. I didn't commit the fudge factors yet ... if you find they are needed I can ... but I'm still not sure where the bug is
          Hide
          Karl Wright added a comment -

          Michael McCandless: This has GOT to be wrong:

              if (cellXMin >= state.xMin ||
                  cellXMax <= state.xMax ||
                  cellYMin >= state.yMin ||
                  cellYMin >= state.yMin ||
                  cellZMin <= state.zMin ||
                  cellZMax <= state.zMax) {
          

          ... if for no other reason than you have two identical clauses...

          Show
          Karl Wright added a comment - Michael McCandless : This has GOT to be wrong: if (cellXMin >= state.xMin || cellXMax <= state.xMax || cellYMin >= state.yMin || cellYMin >= state.yMin || cellZMin <= state.zMin || cellZMax <= state.zMax) { ... if for no other reason than you have two identical clauses...
          Hide
          ASF subversion and git services added a comment -

          Commit 1696512 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1696512 ]

          LUCENE-6699: fix wrong cell/bbox check

          Show
          ASF subversion and git services added a comment - Commit 1696512 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1696512 ] LUCENE-6699 : fix wrong cell/bbox check
          Hide
          Michael McCandless added a comment -

          You're right! Two silly bugs in that multi-line if statement I committed a fix ... but test is still angry.

          Show
          Michael McCandless added a comment - You're right! Two silly bugs in that multi-line if statement I committed a fix ... but test is still angry.
          Hide
          Karl Wright added a comment -

          The failing test doesn't even seem to be exercising geo3d? At least, it never creates a query. It seems to just be looking for proper bkd decomposition?

          Show
          Karl Wright added a comment - The failing test doesn't even seem to be exercising geo3d? At least, it never creates a query. It seems to just be looking for proper bkd decomposition?
          Hide
          Michael McCandless added a comment -

          I'm seeing failures in both testBKDRandom (proper BKD decomposition), from the new assert I added, and in testRandomMedium, where hits were expected but not found ... I'll dig on the first one, seems maybe easier

          Show
          Michael McCandless added a comment - I'm seeing failures in both testBKDRandom (proper BKD decomposition), from the new assert I added, and in testRandomMedium, where hits were expected but not found ... I'll dig on the first one, seems maybe easier
          Hide
          ASF subversion and git services added a comment -

          Commit 1696591 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1696591 ]

          LUCENE-6699: also detect when SHAPE_INSIDE_CELL in test case

          Show
          ASF subversion and git services added a comment - Commit 1696591 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1696591 ] LUCENE-6699 : also detect when SHAPE_INSIDE_CELL in test case
          Hide
          Michael McCandless added a comment -

          OK I committed a fix for the testBKDRandom ... just a test bug: it failed to detect the SHAPE_INSIDE_CELL case.

          Just need to understand the assert trips for testRandomXXX ...

          Show
          Michael McCandless added a comment - OK I committed a fix for the testBKDRandom ... just a test bug: it failed to detect the SHAPE_INSIDE_CELL case. Just need to understand the assert trips for testRandomXXX ...
          Hide
          Karl Wright added a comment - - edited

          Ok, I'm swamped at the moment, so anything you can do to describe the sequence of interactions with Geo3D that demonstrate a problem or inconsistency would be very useful. I will have time Thursday evening and Friday morning to look at those in detail I think.

          Show
          Karl Wright added a comment - - edited Ok, I'm swamped at the moment, so anything you can do to describe the sequence of interactions with Geo3D that demonstrate a problem or inconsistency would be very useful. I will have time Thursday evening and Friday morning to look at those in detail I think.
          Hide
          Michael McCandless added a comment -

          OK I'll see if I can contain the failure somehow ...

          Or: we could maybe disable this buggy opto for now, and wrap up / land this branch? I.e. open a follow-on issue for the opto.

          The tests seem to pass w/o the opto.

          The opto should be a sizable win for smallish query shapes, because it means BKD tree can recurse with very fast bbox overlap checking (a few if statements, no new objects) instead of building a GeoArea for each BKD cell (3d rect) and then relating that to the shape, as it recurses.

          Show
          Michael McCandless added a comment - OK I'll see if I can contain the failure somehow ... Or: we could maybe disable this buggy opto for now, and wrap up / land this branch? I.e. open a follow-on issue for the opto. The tests seem to pass w/o the opto. The opto should be a sizable win for smallish query shapes, because it means BKD tree can recurse with very fast bbox overlap checking (a few if statements, no new objects) instead of building a GeoArea for each BKD cell (3d rect) and then relating that to the shape, as it recurses.
          Hide
          Karl Wright added a comment -

          Hi Michael McCandless,

          I'm not comfortable with landing the branch until we at least understand the problem. If the tests always pass without the optimization, then the problem must be that the bounds are simply incorrect for some particular shape. It should be trivial to determine which shape leads to a bad bounds, and I can chase it from there. Can we confirm that picture?

          Show
          Karl Wright added a comment - Hi Michael McCandless , I'm not comfortable with landing the branch until we at least understand the problem. If the tests always pass without the optimization, then the problem must be that the bounds are simply incorrect for some particular shape. It should be trivial to determine which shape leads to a bad bounds, and I can chase it from there. Can we confirm that picture?
          Hide
          Karl Wright added a comment -

          Ok, I was able to isolate one case of failure and incorporate it into a simple test:

              // Test case from BKD
              c = new GeoCircle(PlanetModel.SPHERE, -0.765816119338, 0.991848766844, 0.8153163226330487);
              GeoPoint p1 = new GeoPoint(0.7692262265236023, -0.055089298115534646, -0.6365973465711254);
              assertTrue(c.isWithin(p1));
              xyzb = new XYZBounds();
              c.getBounds(xyzb);
              assertTrue(p1.x >= xyzb.getMinimumX() && p1.x <= xyzb.getMaximumX());
              assertTrue(p1.y >= xyzb.getMinimumY() && p1.y <= xyzb.getMaximumY());
              assertTrue(p1.z >= xyzb.getMinimumZ() && p1.z <= xyzb.getMaximumZ());
          
          

          Now I can look at it.

          Show
          Karl Wright added a comment - Ok, I was able to isolate one case of failure and incorporate it into a simple test: // Test case from BKD c = new GeoCircle(PlanetModel.SPHERE, -0.765816119338, 0.991848766844, 0.8153163226330487); GeoPoint p1 = new GeoPoint(0.7692262265236023, -0.055089298115534646, -0.6365973465711254); assertTrue(c.isWithin(p1)); xyzb = new XYZBounds(); c.getBounds(xyzb); assertTrue(p1.x >= xyzb.getMinimumX() && p1.x <= xyzb.getMaximumX()); assertTrue(p1.y >= xyzb.getMinimumY() && p1.y <= xyzb.getMaximumY()); assertTrue(p1.z >= xyzb.getMinimumZ() && p1.z <= xyzb.getMaximumZ()); Now I can look at it.
          Hide
          Karl Wright added a comment -

          Michael McCandless: Found it. Patch attached.

          Show
          Karl Wright added a comment - Michael McCandless : Found it. Patch attached.
          Hide
          Karl Wright added a comment -

          Investigating another failure:

             [junit4] Suite: org.apache.lucene.bkdtree3d.TestGeo3DPointField
             [junit4]   2> VIII 19, 2015 11:13:35 PM com.carrotsearch.randomizedtesting.Ra
          ndomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
             [junit4]   2> WARNING: Uncaught exception in thread: Thread[T3,5,TGRP-TestGeo
          3DPointField]
             [junit4]   2> java.lang.AssertionError
             [junit4]   2>        at __randomizedtesting.SeedInfo.seed([D03EF31A709F9117]:
          0)
             [junit4]   2>        at org.apache.lucene.bkdtree3d.PointInGeo3DShapeQuery$1.
          scorer(PointInGeo3DShapeQuery.java:105)
             [junit4]   2>        at org.apache.lucene.search.LRUQueryCache$CachingWrapper
          Weight.scorer(LRUQueryCache.java:581)
             [junit4]   2>        at org.apache.lucene.search.Weight.bulkScorer(Weight.jav
          a:135)
             [junit4]   2>        at org.apache.lucene.search.AssertingWeight.bulkScorer(A
          ssertingWeight.java:69)
             [junit4]   2>        at org.apache.lucene.search.AssertingWeight.bulkScorer(A
          ssertingWeight.java:69)
             [junit4]   2>        at org.apache.lucene.search.IndexSearcher.search(IndexSe
          archer.java:618)
             [junit4]   2>        at org.apache.lucene.search.AssertingIndexSearcher.searc
          h(AssertingIndexSearcher.java:92)
             [junit4]   2>        at org.apache.lucene.search.IndexSearcher.search(IndexSe
          archer.java:425)
             [junit4]   2>        at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4._ru
          n(TestGeo3DPointField.java:586)
             [junit4]   2>        at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4.run
          (TestGeo3DPointField.java:520)
             [junit4]   2>
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeo3DPointField
          -Dtests.method=testRandomTiny -Dtests.seed=D03EF31A709F9117 -Dtests.slow=true -D
          tests.locale=bg -Dtests.timezone=Indian/Kerguelen -Dtests.asserts=true -Dtests.f
          ile.encoding=Cp1252
             [junit4] ERROR   0.62s J0 | TestGeo3DPointField.testRandomTiny <<<
             [junit4]    > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExcept
          ionError: Captured an uncaught exception in thread: Thread[id=17, name=T3, state
          =RUNNABLE, group=TGRP-TestGeo3DPointField]
             [junit4]    >        at __randomizedtesting.SeedInfo.seed([D03EF31A709F9117:9
          9792D5C2EBEA9BB]:0)
             [junit4]    > Caused by: java.lang.AssertionError
             [junit4]    >        at __randomizedtesting.SeedInfo.seed([D03EF31A709F9117]:
          0)
             [junit4]    >        at org.apache.lucene.bkdtree3d.PointInGeo3DShapeQuery$1.
          scorer(PointInGeo3DShapeQuery.java:105)
             [junit4]    >        at org.apache.lucene.search.LRUQueryCache$CachingWrapper
          Weight.scorer(LRUQueryCache.java:581)
             [junit4]    >        at org.apache.lucene.search.Weight.bulkScorer(Weight.jav
          a:135)
             [junit4]    >        at org.apache.lucene.search.AssertingWeight.bulkScorer(A
          ssertingWeight.java:69)
             [junit4]    >        at org.apache.lucene.search.AssertingWeight.bulkScorer(A
          ssertingWeight.java:69)
             [junit4]    >        at org.apache.lucene.search.IndexSearcher.search(IndexSe
          archer.java:618)
             [junit4]    >        at org.apache.lucene.search.AssertingIndexSearcher.searc
          h(AssertingIndexSearcher.java:92)
             [junit4]    >        at org.apache.lucene.search.IndexSearcher.search(IndexSe
          archer.java:425)
             [junit4]    >        at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4._ru
          n(TestGeo3DPointField.java:586)
             [junit4]    >        at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4.run
          (TestGeo3DPointField.java:520)
             [junit4] IGNOR/A 0.02s J0 | TestGeo3DPointField.testRandomBig
             [junit4]    > Assumption #1: 'nightly' test group is disabled (@Nightly())
             [junit4]   2> NOTE: test params are: codec=Asserting(Lucene53): {}, docValues
          :{}, sim=DefaultSimilarity, locale=bg, timezone=Indian/Kerguelen
             [junit4]   2> NOTE: Windows 7 6.1 amd64/Oracle Corporation 1.8.0_05 (64-bit)/
          cpus=4,threads=1,free=171382464,total=245366784
             [junit4]   2> NOTE: All tests run in this JVM: [TestGeo3DPointField]
          

          Stay tuned...

          Show
          Karl Wright added a comment - Investigating another failure: [junit4] Suite: org.apache.lucene.bkdtree3d.TestGeo3DPointField [junit4] 2> VIII 19, 2015 11:13:35 PM com.carrotsearch.randomizedtesting.Ra ndomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [junit4] 2> WARNING: Uncaught exception in thread: Thread [T3,5,TGRP-TestGeo 3DPointField] [junit4] 2> java.lang.AssertionError [junit4] 2> at __randomizedtesting.SeedInfo.seed([D03EF31A709F9117]: 0) [junit4] 2> at org.apache.lucene.bkdtree3d.PointInGeo3DShapeQuery$1. scorer(PointInGeo3DShapeQuery.java:105) [junit4] 2> at org.apache.lucene.search.LRUQueryCache$CachingWrapper Weight.scorer(LRUQueryCache.java:581) [junit4] 2> at org.apache.lucene.search.Weight.bulkScorer(Weight.jav a:135) [junit4] 2> at org.apache.lucene.search.AssertingWeight.bulkScorer(A ssertingWeight.java:69) [junit4] 2> at org.apache.lucene.search.AssertingWeight.bulkScorer(A ssertingWeight.java:69) [junit4] 2> at org.apache.lucene.search.IndexSearcher.search(IndexSe archer.java:618) [junit4] 2> at org.apache.lucene.search.AssertingIndexSearcher.searc h(AssertingIndexSearcher.java:92) [junit4] 2> at org.apache.lucene.search.IndexSearcher.search(IndexSe archer.java:425) [junit4] 2> at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4._ru n(TestGeo3DPointField.java:586) [junit4] 2> at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4.run (TestGeo3DPointField.java:520) [junit4] 2> [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeo3DPointField -Dtests.method=testRandomTiny -Dtests.seed=D03EF31A709F9117 -Dtests.slow= true -D tests.locale=bg -Dtests.timezone=Indian/Kerguelen -Dtests.asserts= true -Dtests.f ile.encoding=Cp1252 [junit4] ERROR 0.62s J0 | TestGeo3DPointField.testRandomTiny <<< [junit4] > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExcept ionError: Captured an uncaught exception in thread: Thread [id=17, name=T3, state =RUNNABLE, group=TGRP-TestGeo3DPointField] [junit4] > at __randomizedtesting.SeedInfo.seed([D03EF31A709F9117:9 9792D5C2EBEA9BB]:0) [junit4] > Caused by: java.lang.AssertionError [junit4] > at __randomizedtesting.SeedInfo.seed([D03EF31A709F9117]: 0) [junit4] > at org.apache.lucene.bkdtree3d.PointInGeo3DShapeQuery$1. scorer(PointInGeo3DShapeQuery.java:105) [junit4] > at org.apache.lucene.search.LRUQueryCache$CachingWrapper Weight.scorer(LRUQueryCache.java:581) [junit4] > at org.apache.lucene.search.Weight.bulkScorer(Weight.jav a:135) [junit4] > at org.apache.lucene.search.AssertingWeight.bulkScorer(A ssertingWeight.java:69) [junit4] > at org.apache.lucene.search.AssertingWeight.bulkScorer(A ssertingWeight.java:69) [junit4] > at org.apache.lucene.search.IndexSearcher.search(IndexSe archer.java:618) [junit4] > at org.apache.lucene.search.AssertingIndexSearcher.searc h(AssertingIndexSearcher.java:92) [junit4] > at org.apache.lucene.search.IndexSearcher.search(IndexSe archer.java:425) [junit4] > at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4._ru n(TestGeo3DPointField.java:586) [junit4] > at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4.run (TestGeo3DPointField.java:520) [junit4] IGNOR/A 0.02s J0 | TestGeo3DPointField.testRandomBig [junit4] > Assumption #1: 'nightly' test group is disabled (@Nightly()) [junit4] 2> NOTE: test params are: codec=Asserting(Lucene53): {}, docValues :{}, sim=DefaultSimilarity, locale=bg, timezone=Indian/Kerguelen [junit4] 2> NOTE: Windows 7 6.1 amd64/Oracle Corporation 1.8.0_05 (64-bit)/ cpus=4,threads=1,free=171382464,total=245366784 [junit4] 2> NOTE: All tests run in this JVM: [TestGeo3DPointField] Stay tuned...
          Hide
          Karl Wright added a comment -

          Michael McCandless: This second failure looks like it may be due to rounding error. The shape is a really tiny geocircle (radius 1.6e-5 radians). I'll confirm this picture, but I have to ask: what's the minimum resolution that BKD expects to descend to? because this is pretty small...

          Show
          Karl Wright added a comment - Michael McCandless : This second failure looks like it may be due to rounding error. The shape is a really tiny geocircle (radius 1.6e-5 radians). I'll confirm this picture, but I have to ask: what's the minimum resolution that BKD expects to descend to? because this is pretty small...
          Hide
          Karl Wright added a comment - - edited

          Hmm, I couldn't reproduce this with a simple test.
          Here's the failure detail:

             [junit4]   2> java.lang.AssertionError:
             Solid=XYZSolid: {planetmodel=PlanetModel.SPHERE, isWholeWorld=false, minXplane=[A=1.0, B=0.0, C=0.0, D=-0.9999778774751769, side=1.0], maxXplane=[A=1.0, B=0.0, C=0.0, D=-0.9999780900134368, side=-1.0], minYplane=[A=0.0, B=1.0, C=0.0, D=0.002943435994670142, side=1.0], maxYplane=[A=0.0, B=1.0, C=0.0, D=0.0029114063562165494, side=-1.0], minZplane=[A=0.0, B=0.0, C=1.0, D=0.005971010432932473, side=1.0], maxZplane=[A=0.0, B=0.0, C=1.0, D=0.005938981247250581, side=-1.0]};
             Shape=GeoCircle: {planetmodel=PlanetModel.SPHERE, center=[X=0.9999779838725235, Y=-0.0029274211758186968, Z=-0.0059549958440800015], radius=1.601488279374338E-5(9.175851934781766E-4)}
          

          Here's the test code I created that passes:

              c = new GeoCircle(PlanetModel.SPHERE, -0.00595503104063, -0.00292747726474, 1.601488279374338E-5);
              xyzb = new XYZBounds();
              c.getBounds(xyzb);
              GeoArea area = GeoAreaFactory.makeGeoArea(PlanetModel.SPHERE,
                xyzb.getMinimumX(), xyzb.getMaximumX(), xyzb.getMinimumY(), xyzb.getMaximumY(), xyzb.getMinimumZ(), xyzb.getMaximumZ());
              
              int relationship = area.getRelationship(c);
              assertTrue(relationship == GeoArea.WITHIN || relationship == GeoArea.OVERLAPS);
          

          Here's the math I did to get there:

          >>> Z=-0.0059549958440800015
          >>> Y=-0.0029274211758186968
          >>> X=0.9999779838725235
          >>> print math.asin(Z)
          -0.00595503104063
          >>> print math.atan2(Y,X)
          -0.00292747726474
          >>>
          
          Show
          Karl Wright added a comment - - edited Hmm, I couldn't reproduce this with a simple test. Here's the failure detail: [junit4] 2> java.lang.AssertionError: Solid=XYZSolid: {planetmodel=PlanetModel.SPHERE, isWholeWorld= false , minXplane=[A=1.0, B=0.0, C=0.0, D=-0.9999778774751769, side=1.0], maxXplane=[A=1.0, B=0.0, C=0.0, D=-0.9999780900134368, side=-1.0], minYplane=[A=0.0, B=1.0, C=0.0, D=0.002943435994670142, side=1.0], maxYplane=[A=0.0, B=1.0, C=0.0, D=0.0029114063562165494, side=-1.0], minZplane=[A=0.0, B=0.0, C=1.0, D=0.005971010432932473, side=1.0], maxZplane=[A=0.0, B=0.0, C=1.0, D=0.005938981247250581, side=-1.0]}; Shape=GeoCircle: {planetmodel=PlanetModel.SPHERE, center=[X=0.9999779838725235, Y=-0.0029274211758186968, Z=-0.0059549958440800015], radius=1.601488279374338E-5(9.175851934781766E-4)} Here's the test code I created that passes: c = new GeoCircle(PlanetModel.SPHERE, -0.00595503104063, -0.00292747726474, 1.601488279374338E-5); xyzb = new XYZBounds(); c.getBounds(xyzb); GeoArea area = GeoAreaFactory.makeGeoArea(PlanetModel.SPHERE, xyzb.getMinimumX(), xyzb.getMaximumX(), xyzb.getMinimumY(), xyzb.getMaximumY(), xyzb.getMinimumZ(), xyzb.getMaximumZ()); int relationship = area.getRelationship(c); assertTrue(relationship == GeoArea.WITHIN || relationship == GeoArea.OVERLAPS); Here's the math I did to get there: >>> Z=-0.0059549958440800015 >>> Y=-0.0029274211758186968 >>> X=0.9999779838725235 >>> print math.asin(Z) -0.00595503104063 >>> print math.atan2(Y,X) -0.00292747726474 >>>
          Hide
          Karl Wright added a comment -

          Michael McCandless After some debugging, I increased the value of MINIMUM_RESOLUTION to 5e-12. This made all tests pass.

          It appears that BKD really drills into potential precision issues in geo3d.

          Show
          Karl Wright added a comment - Michael McCandless After some debugging, I increased the value of MINIMUM_RESOLUTION to 5e-12. This made all tests pass. It appears that BKD really drills into potential precision issues in geo3d.
          Hide
          Michael McCandless added a comment -

          what's the minimum resolution that BKD expects to descend to?

          BKD itself has no resolution limits: it descends to a region that has < N points, at which point it does a linear scan of those points checking if they match the shape.

          But, the encoding we use is limited precision, using 32 bits for each of x, y, z (96 bits total), with range -1.002 to 1.002. This means a point that goes in, using 3 doubles, will be quantized (pixelated). However, the test takes this into account: when it's computing the expected value, it does the same pixelation that the doc values encoding did.

          So, even miniscule circles should still work correctly?

          Show
          Michael McCandless added a comment - what's the minimum resolution that BKD expects to descend to? BKD itself has no resolution limits: it descends to a region that has < N points, at which point it does a linear scan of those points checking if they match the shape. But, the encoding we use is limited precision, using 32 bits for each of x, y, z (96 bits total), with range -1.002 to 1.002. This means a point that goes in, using 3 doubles, will be quantized (pixelated). However, the test takes this into account: when it's computing the expected value, it does the same pixelation that the doc values encoding did. So, even miniscule circles should still work correctly?
          Hide
          ASF subversion and git services added a comment -

          Commit 1696657 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1696657 ]

          LUCENE-6699: fix one bug, add fudge factors, add nocommits

          Show
          ASF subversion and git services added a comment - Commit 1696657 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1696657 ] LUCENE-6699 : fix one bug, add fudge factors, add nocommits
          Hide
          Michael McCandless added a comment -

          Karl Wright, woops, I committed the patch before your most recent one...

          Show
          Michael McCandless added a comment - Karl Wright , woops, I committed the patch before your most recent one...
          Hide
          ASF subversion and git services added a comment -

          Commit 1696659 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1696659 ]

          LUCENE-6699: increase MINIMUM_RESOLUTION

          Show
          ASF subversion and git services added a comment - Commit 1696659 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1696659 ] LUCENE-6699 : increase MINIMUM_RESOLUTION
          Hide
          Michael McCandless added a comment -

          Karl Wright OK I tried to apply your last patch, skipping dup parts from the previous already committed patch, and committed it. Can you svn up and make sure it's right? Thanks.

          Show
          Michael McCandless added a comment - Karl Wright OK I tried to apply your last patch, skipping dup parts from the previous already committed patch, and committed it. Can you svn up and make sure it's right? Thanks.
          Hide
          ASF subversion and git services added a comment -

          Commit 1696661 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1696661 ]

          LUCENE-6699: another nocommit, show details on assert trip

          Show
          ASF subversion and git services added a comment - Commit 1696661 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1696661 ] LUCENE-6699 : another nocommit, show details on assert trip
          Hide
          Michael McCandless added a comment -

          I'm not a fan of the +/- 2.0 fudge factors

          Because: without the opto, the shape relation checks of smaller and smaller XYZ solids as the BKD tree recurses, were working correctly?

          Show
          Michael McCandless added a comment - I'm not a fan of the +/- 2.0 fudge factors Because: without the opto, the shape relation checks of smaller and smaller XYZ solids as the BKD tree recurses, were working correctly?
          Hide
          Michael McCandless added a comment -

          Hmm, I'm seeing this test failure:

          [junit4:pickseed] Seed property 'tests.seed' already defined: 27BE02D86F469AAF
             [junit4] <JUnit4> says שלום! Master seed: 27BE02D86F469AAF
             [junit4] Executing 1 suite with 1 JVM.
             [junit4] 
             [junit4] Started J0 PID(14081@localhost).
             [junit4] Suite: org.apache.lucene.bkdtree3d.TestGeo3DPointField
             [junit4]   2> ?? 19, 2015 5:13:35 ?? com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
             [junit4]   2> WARNING: Uncaught exception in thread: Thread[T1,5,TGRP-TestGeo3DPointField]
             [junit4]   2> java.lang.AssertionError: got 0
             [junit4]   2> 	at __randomizedtesting.SeedInfo.seed([27BE02D86F469AAF]:0)
             [junit4]   2> 	at org.apache.lucene.bkdtree3d.PointInGeo3DShapeQuery$1.scorer(PointInGeo3DShapeQuery.java:105)
             [junit4]   2> 	at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.scorer(LRUQueryCache.java:589)
             [junit4]   2> 	at org.apache.lucene.search.Weight.bulkScorer(Weight.java:135)
             [junit4]   2> 	at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69)
             [junit4]   2> 	at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69)
             [junit4]   2> 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
             [junit4]   2> 	at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:92)
             [junit4]   2> 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:425)
             [junit4]   2> 	at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4._run(TestGeo3DPointField.java:586)
             [junit4]   2> 	at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4.run(TestGeo3DPointField.java:520)
             [junit4]   2> 
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeo3DPointField -Dtests.method=testRandomMedium -Dtests.seed=27BE02D86F469AAF -Dtests.locale=zh -Dtests.timezone=Atlantic/Stanley -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
          

          It's the assert that the XZYBounds relationship to the shape is either WITHIN or OVERLAPS, but in this case it's returning CONTAINS (shape CONTAINS the bbox).

          Maybe the assert is too demanding?

          Show
          Michael McCandless added a comment - Hmm, I'm seeing this test failure: [junit4:pickseed] Seed property 'tests.seed' already defined: 27BE02D86F469AAF [junit4] <JUnit4> says שלום! Master seed: 27BE02D86F469AAF [junit4] Executing 1 suite with 1 JVM. [junit4] [junit4] Started J0 PID(14081@localhost). [junit4] Suite: org.apache.lucene.bkdtree3d.TestGeo3DPointField [junit4] 2> ?? 19, 2015 5:13:35 ?? com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [junit4] 2> WARNING: Uncaught exception in thread: Thread[T1,5,TGRP-TestGeo3DPointField] [junit4] 2> java.lang.AssertionError: got 0 [junit4] 2> at __randomizedtesting.SeedInfo.seed([27BE02D86F469AAF]:0) [junit4] 2> at org.apache.lucene.bkdtree3d.PointInGeo3DShapeQuery$1.scorer(PointInGeo3DShapeQuery.java:105) [junit4] 2> at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.scorer(LRUQueryCache.java:589) [junit4] 2> at org.apache.lucene.search.Weight.bulkScorer(Weight.java:135) [junit4] 2> at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69) [junit4] 2> at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69) [junit4] 2> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618) [junit4] 2> at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:92) [junit4] 2> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:425) [junit4] 2> at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4._run(TestGeo3DPointField.java:586) [junit4] 2> at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4.run(TestGeo3DPointField.java:520) [junit4] 2> [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeo3DPointField -Dtests.method=testRandomMedium -Dtests.seed=27BE02D86F469AAF -Dtests.locale=zh -Dtests.timezone=Atlantic/Stanley -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 It's the assert that the XZYBounds relationship to the shape is either WITHIN or OVERLAPS, but in this case it's returning CONTAINS (shape CONTAINS the bbox). Maybe the assert is too demanding?
          Hide
          ASF subversion and git services added a comment -

          Commit 1696665 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1696665 ]

          LUCENE-6699: comment out assert (nocommit) remove fudge factors

          Show
          ASF subversion and git services added a comment - Commit 1696665 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1696665 ] LUCENE-6699 : comment out assert (nocommit) remove fudge factors
          Hide
          Michael McCandless added a comment -

          OK I commented out the problematic assert and put a nocommit.

          I also tested w/o the fudge factors and the test seems to be happy...

          Show
          Michael McCandless added a comment - OK I commented out the problematic assert and put a nocommit. I also tested w/o the fudge factors and the test seems to be happy...
          Hide
          Karl Wright added a comment -

          Merge looks fine.

          The reason for the "fudge factor" in the test was simply to confirm that you would get a WITHIN rather than an OVERLAPS if you expand the bounding box by an amount sufficient to insure there are no overlaps between the shape and the box. Otherwise according to the (revised) definition of getRelationship(), you could technically get either one.

          Show
          Karl Wright added a comment - Merge looks fine. The reason for the "fudge factor" in the test was simply to confirm that you would get a WITHIN rather than an OVERLAPS if you expand the bounding box by an amount sufficient to insure there are no overlaps between the shape and the box. Otherwise according to the (revised) definition of getRelationship(), you could technically get either one.
          Hide
          Michael McCandless added a comment -

          OK I will put the fudge factor back and put that explanation on top!

          Show
          Michael McCandless added a comment - OK I will put the fudge factor back and put that explanation on top!
          Hide
          ASF subversion and git services added a comment -

          Commit 1696669 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1696669 ]

          LUCENE-6699: put fudge factor back

          Show
          ASF subversion and git services added a comment - Commit 1696669 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1696669 ] LUCENE-6699 : put fudge factor back
          Hide
          Michael McCandless added a comment -

          Another failure:

             [junit4] Suite: org.apache.lucene.bkdtree3d.TestGeo3DPointField
             [junit4]   2> серп. 20, 2015 1:05:40 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
             [junit4]   2> WARNING: Uncaught exception in thread: Thread[T2,5,TGRP-TestGeo3DPointField]
             [junit4]   2> java.lang.AssertionError: T2: iter=164 id=15842 docID=16827 lat=0.006224927111830945 lon=0.005597367237251763 expected true but got: false deleted?=false
             [junit4]   2>   point1=[lat=0.006224927111830945, lon=0.005597367237251763], iswithin=true
             [junit4]   2>   point2=[X=1.0010836083810235, Y=0.005603490759433942, Z=0.006231850560862502], iswithin=true
             [junit4]   2>   query=PointInGeo3DShapeQuery: field=point:PlanetModel: PlanetModel.WGS84 Shape: GeoCircle: {planetmodel=PlanetModel.WGS84, center=[lat=0.006229478708446979, lon=0.005570196723795424], radius=3.840276763694387E-5(0.0022003165072184694)}
             [junit4]   2> 	at __randomizedtesting.SeedInfo.seed([7F0E3F7C95B65717]:0)
             [junit4]   2> 	at org.junit.Assert.fail(Assert.java:93)
             [junit4]   2> 	at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4._run(TestGeo3DPointField.java:624)
             [junit4]   2> 	at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4.run(TestGeo3DPointField.java:520)
             [junit4]   2> 
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeo3DPointField -Dtests.method=testRandomMedium -Dtests.seed=7F0E3F7C95B65717 -Dtests.multiplier=4 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=uk -Dtests.timezone=Asia/Dubai -Dtests.asserts=true -Dtests.file.encoding=UTF-8
             [junit4] ERROR   6.18s | TestGeo3DPointField.testRandomMedium <<<
          

          This may be the "tiny circle" case?

          Show
          Michael McCandless added a comment - Another failure: [junit4] Suite: org.apache.lucene.bkdtree3d.TestGeo3DPointField [junit4] 2> серп. 20, 2015 1:05:40 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [junit4] 2> WARNING: Uncaught exception in thread: Thread[T2,5,TGRP-TestGeo3DPointField] [junit4] 2> java.lang.AssertionError: T2: iter=164 id=15842 docID=16827 lat=0.006224927111830945 lon=0.005597367237251763 expected true but got: false deleted?=false [junit4] 2> point1=[lat=0.006224927111830945, lon=0.005597367237251763], iswithin=true [junit4] 2> point2=[X=1.0010836083810235, Y=0.005603490759433942, Z=0.006231850560862502], iswithin=true [junit4] 2> query=PointInGeo3DShapeQuery: field=point:PlanetModel: PlanetModel.WGS84 Shape: GeoCircle: {planetmodel=PlanetModel.WGS84, center=[lat=0.006229478708446979, lon=0.005570196723795424], radius=3.840276763694387E-5(0.0022003165072184694)} [junit4] 2> at __randomizedtesting.SeedInfo.seed([7F0E3F7C95B65717]:0) [junit4] 2> at org.junit.Assert.fail(Assert.java:93) [junit4] 2> at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4._run(TestGeo3DPointField.java:624) [junit4] 2> at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4.run(TestGeo3DPointField.java:520) [junit4] 2> [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeo3DPointField -Dtests.method=testRandomMedium -Dtests.seed=7F0E3F7C95B65717 -Dtests.multiplier=4 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=uk -Dtests.timezone=Asia/Dubai -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 6.18s | TestGeo3DPointField.testRandomMedium <<< This may be the "tiny circle" case?
          Hide
          Karl Wright added a comment -

          No, it's not that. It may indicate we need an even higher MINIMUM_RESOLUTION value. I'm looking at it now.

          Show
          Karl Wright added a comment - No, it's not that. It may indicate we need an even higher MINIMUM_RESOLUTION value. I'm looking at it now.
          Hide
          Karl Wright added a comment -

          Michael McCandless: This is actually a test problem, although I think you should be the one to fix it.

          The test indexes points using PlanetModel.SPHERE, but then searches for them using PlanetModel.WGS84. That doesn't work. You can see the results here:

              area = GeoAreaFactory.makeGeoArea(PlanetModel.WGS84,
                xyzb.getMinimumX(), xyzb.getMaximumX(), xyzb.getMinimumY(), xyzb.getMaximumY(), xyzb.getMinimumZ(), xyzb.getMaximumZ());
              GeoPoint p2 = new GeoPoint(1.0010836083810235, 0.005603490759433942, 0.006231850560862502);
              assertTrue(PlanetModel.WGS84.pointOnSurface(p2)); // Fails, because the GeoPoint is not actually on the WGS84 planet surface
          
          Show
          Karl Wright added a comment - Michael McCandless : This is actually a test problem, although I think you should be the one to fix it. The test indexes points using PlanetModel.SPHERE, but then searches for them using PlanetModel.WGS84. That doesn't work. You can see the results here: area = GeoAreaFactory.makeGeoArea(PlanetModel.WGS84, xyzb.getMinimumX(), xyzb.getMaximumX(), xyzb.getMinimumY(), xyzb.getMaximumY(), xyzb.getMinimumZ(), xyzb.getMaximumZ()); GeoPoint p2 = new GeoPoint(1.0010836083810235, 0.005603490759433942, 0.006231850560862502); assertTrue(PlanetModel.WGS84.pointOnSurface(p2)); // Fails, because the GeoPoint is not actually on the WGS84 planet surface
          Hide
          Michael McCandless added a comment -

          Wait, the test up front (top of verify method) randomly picks a planet model and consistently uses that one for indexing and searching?

          Show
          Michael McCandless added a comment - Wait, the test up front (top of verify method) randomly picks a planet model and consistently uses that one for indexing and searching?
          Hide
          Karl Wright added a comment -

          So, there are two problems. One I just fixed, which was that pointOnSurface() was being too strict about accuracy. But:

              p1 = new GeoPoint(PlanetModel.WGS84, 0.006224927111830945, 0.005597367237251763);
              p2 = new GeoPoint(1.0010836083810235, 0.005603490759433942, 0.006231850560862502);
              assertTrue(PlanetModel.WGS84.pointOnSurface(p1));
              assertTrue(PlanetModel.WGS84.pointOnSurface(p2)); // this fails
          

          ... so the packing accuracy is not high enough to guarantee that an XYZSolid constructed from bounds on your shape when applied to an unpacked point will be guaranteed to contain it. That's problem 1, and it argues for adding some fudge factor to the bounds to account for the lack of full accuracy in the points. More about that later.

          Problem 2 is that even p1 above is not within the XYZSolid object computed from the bounds; in fact it's pretty far out (1e-7 or more). I have to drill further into why that is; it's likely related to the WGS84 model, but I don't know how yet. Once again, stay tuned.

          Show
          Karl Wright added a comment - So, there are two problems. One I just fixed, which was that pointOnSurface() was being too strict about accuracy. But: p1 = new GeoPoint(PlanetModel.WGS84, 0.006224927111830945, 0.005597367237251763); p2 = new GeoPoint(1.0010836083810235, 0.005603490759433942, 0.006231850560862502); assertTrue(PlanetModel.WGS84.pointOnSurface(p1)); assertTrue(PlanetModel.WGS84.pointOnSurface(p2)); // this fails ... so the packing accuracy is not high enough to guarantee that an XYZSolid constructed from bounds on your shape when applied to an unpacked point will be guaranteed to contain it. That's problem 1, and it argues for adding some fudge factor to the bounds to account for the lack of full accuracy in the points. More about that later. Problem 2 is that even p1 above is not within the XYZSolid object computed from the bounds; in fact it's pretty far out (1e-7 or more). I have to drill further into why that is; it's likely related to the WGS84 model, but I don't know how yet. Once again, stay tuned.
          Hide
          Karl Wright added a comment - - edited

          Ok, I know what is going on, and it is indeed related to the WGS84 model. But I have to think this through carefully. The strategy used to compute the X and Y bounds in XYZBound is subtlely flawed. Working on this now.

          Show
          Karl Wright added a comment - - edited Ok, I know what is going on, and it is indeed related to the WGS84 model. But I have to think this through carefully. The strategy used to compute the X and Y bounds in XYZBound is subtlely flawed. Working on this now.
          Hide
          Karl Wright added a comment -

          In hopes of clarifying my own thinking, I'm going to lay down what the problem is.

          For the Z bound, pretend that you are looking from the north (or south) pole onto the earth. There's a plane whose intersection with the earth you are trying to compute the Z bounds for. From the pole, the earth, no matter how oblate, is a circle in cross-section. The plane intersects part of that circle. And here's the important point: if you construct another plane that is perpendicular to the original plane, which also includes the Z axis, that plane must pass directly through the points that have the greatest Z extent for the intersection of the original plane and the earth. Got that?

          Anyway, for the X and Y bounds, I basically just copied that code and instead pretended I was looking up the X axis and up the Y axis, instead of the Z axis. But if you have an oblate earth, then the cross section from either the X or the Y axis is not a circle, but rather an ellipse. So a plane that is perpendicular to the original plane passing through (say) the X axis will NOT go through the points that have the greatest X extent for the intersection of the original plane and the earth. The plane we want to use instead is parallel to that one, but offset by some amount, which I don't yet know how to compute. And that, in a nutshell, is the problem.

          Show
          Karl Wright added a comment - In hopes of clarifying my own thinking, I'm going to lay down what the problem is. For the Z bound, pretend that you are looking from the north (or south) pole onto the earth. There's a plane whose intersection with the earth you are trying to compute the Z bounds for. From the pole, the earth, no matter how oblate, is a circle in cross-section. The plane intersects part of that circle. And here's the important point: if you construct another plane that is perpendicular to the original plane, which also includes the Z axis, that plane must pass directly through the points that have the greatest Z extent for the intersection of the original plane and the earth. Got that? Anyway, for the X and Y bounds, I basically just copied that code and instead pretended I was looking up the X axis and up the Y axis, instead of the Z axis. But if you have an oblate earth, then the cross section from either the X or the Y axis is not a circle, but rather an ellipse. So a plane that is perpendicular to the original plane passing through (say) the X axis will NOT go through the points that have the greatest X extent for the intersection of the original plane and the earth. The plane we want to use instead is parallel to that one, but offset by some amount, which I don't yet know how to compute. And that, in a nutshell, is the problem.
          Hide
          Karl Wright added a comment -

          In hopes of clarifying my own thinking, I'm going to lay down what the problem is.

          For the Z bound, pretend that you are looking from the north (or south) pole onto the earth. There's a plane whose intersection with the earth you are trying to compute the Z bounds for. From the pole, the earth, no matter how oblate, is a circle in cross-section. The plane intersects part of that circle. And here's the important point: if you construct another plane that is perpendicular to the original plane, which also includes the Z axis, that plane must pass directly through the points that have the greatest Z extent for the intersection of the original plane and the earth. Got that?

          Anyway, for the X and Y bounds, I basically just copied that code and instead pretended I was looking up the X axis and up the Y axis, instead of the Z axis. But if you have an oblate earth, then the cross section from either the X or the Y axis is not a circle, but rather an ellipse. So a plane that is perpendicular to the original plane passing through (say) the X axis will NOT go through the points that have the greatest X extent for the intersection of the original plane and the earth. The plane we want to use instead is parallel to that one, but offset by some amount, which I don't yet know how to compute. And that, in a nutshell, is the problem.

          Show
          Karl Wright added a comment - In hopes of clarifying my own thinking, I'm going to lay down what the problem is. For the Z bound, pretend that you are looking from the north (or south) pole onto the earth. There's a plane whose intersection with the earth you are trying to compute the Z bounds for. From the pole, the earth, no matter how oblate, is a circle in cross-section. The plane intersects part of that circle. And here's the important point: if you construct another plane that is perpendicular to the original plane, which also includes the Z axis, that plane must pass directly through the points that have the greatest Z extent for the intersection of the original plane and the earth. Got that? Anyway, for the X and Y bounds, I basically just copied that code and instead pretended I was looking up the X axis and up the Y axis, instead of the Z axis. But if you have an oblate earth, then the cross section from either the X or the Y axis is not a circle, but rather an ellipse. So a plane that is perpendicular to the original plane passing through (say) the X axis will NOT go through the points that have the greatest X extent for the intersection of the original plane and the earth. The plane we want to use instead is parallel to that one, but offset by some amount, which I don't yet know how to compute. And that, in a nutshell, is the problem.
          Hide
          Karl Wright added a comment - - edited

          I know how to do it, PROVIDED that it is true that for any plane and any ellipsoid, the intersection of the plane and the ellipsoid is a simple ellipse. I don't yet know whether this is true, however.

          HA. Yes, it is true: http://www.scirp.org/journal/PaperInformation.aspx?PaperID=24506

          Show
          Karl Wright added a comment - - edited I know how to do it, PROVIDED that it is true that for any plane and any ellipsoid, the intersection of the plane and the ellipsoid is a simple ellipse. I don't yet know whether this is true, however. HA. Yes, it is true: http://www.scirp.org/journal/PaperInformation.aspx?PaperID=24506
          Hide
          Karl Wright added a comment -

          Patch to address WGS84 bounds issue.

          Show
          Karl Wright added a comment - Patch to address WGS84 bounds issue.
          Hide
          ASF subversion and git services added a comment -

          Commit 1696797 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1696797 ]

          LUCENE-6699: fix math for WGS84 PlanetModel

          Show
          ASF subversion and git services added a comment - Commit 1696797 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1696797 ] LUCENE-6699 : fix math for WGS84 PlanetModel
          Hide
          ASF subversion and git services added a comment -

          Commit 1696864 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1696864 ]

          LUCENE-6699: address nocommits

          Show
          ASF subversion and git services added a comment - Commit 1696864 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1696864 ] LUCENE-6699 : address nocommits
          Hide
          Michael McCandless added a comment -

          I removed the remaining nocommits, but after some beasting I hit this failure:

             [junit4] Suite: org.apache.lucene.bkdtree3d.TestGeo3DPointField
             [junit4]   2> Rgp 21, 2015 3:28:32 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
             [junit4]   2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeo3DPointField]
             [junit4]   2> java.lang.AssertionError: expected WITHIN (1) or OVERLAPS (2) but got 0
             [junit4]   2> 	at __randomizedtesting.SeedInfo.seed([6700D50161C38330]:0)
             [junit4]   2> 	at org.apache.lucene.bkdtree3d.PointInGeo3DShapeQuery$1.scorer(PointInGeo3DShapeQuery.java:105)
             [junit4]   2> 	at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.scorer(LRUQueryCache.java:589)
             [junit4]   2> 	at org.apache.lucene.search.Weight.bulkScorer(Weight.java:135)
             [junit4]   2> 	at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69)
             [junit4]   2> 	at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69)
             [junit4]   2> 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
             [junit4]   2> 	at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:92)
             [junit4]   2> 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:425)
             [junit4]   2> 	at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4._run(TestGeo3DPointField.java:586)
             [junit4]   2> 	at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4.run(TestGeo3DPointField.java:520)
             [junit4]   2> 
          

          This is an assert that just re-enabled ... it's a little spooky because it's asserting that the bbox opto is in fact valid, checking that the query shape is either WITHIN or OVERLAPS the bbox cell (which in turn is fully within the fudge-factor'd x,y,z bbox).

          Show
          Michael McCandless added a comment - I removed the remaining nocommits, but after some beasting I hit this failure: [junit4] Suite: org.apache.lucene.bkdtree3d.TestGeo3DPointField [junit4] 2> Rgp 21, 2015 3:28:32 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [junit4] 2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeo3DPointField] [junit4] 2> java.lang.AssertionError: expected WITHIN (1) or OVERLAPS (2) but got 0 [junit4] 2> at __randomizedtesting.SeedInfo.seed([6700D50161C38330]:0) [junit4] 2> at org.apache.lucene.bkdtree3d.PointInGeo3DShapeQuery$1.scorer(PointInGeo3DShapeQuery.java:105) [junit4] 2> at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.scorer(LRUQueryCache.java:589) [junit4] 2> at org.apache.lucene.search.Weight.bulkScorer(Weight.java:135) [junit4] 2> at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69) [junit4] 2> at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69) [junit4] 2> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618) [junit4] 2> at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:92) [junit4] 2> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:425) [junit4] 2> at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4._run(TestGeo3DPointField.java:586) [junit4] 2> at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4.run(TestGeo3DPointField.java:520) [junit4] 2> This is an assert that just re-enabled ... it's a little spooky because it's asserting that the bbox opto is in fact valid, checking that the query shape is either WITHIN or OVERLAPS the bbox cell (which in turn is fully within the fudge-factor'd x,y,z bbox).
          Hide
          Karl Wright added a comment -

          Do you happen to have the how-to-reproduce line? This produces no failure: "ant -Dtests.seed=6700D50161C38330 -Dtestcase=TestGeo3dPointField test"

          Show
          Karl Wright added a comment - Do you happen to have the how-to-reproduce line? This produces no failure: "ant -Dtests.seed=6700D50161C38330 -Dtestcase=TestGeo3dPointField test"
          Hide
          Michael McCandless added a comment -

          Ugh sorry I meant to include it in the copy/paste:

             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeo3DPointField -Dtests.method=testRandomTiny -Dtests.seed=6700D50161C38330 -Dtests.multiplier=5 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=lt_LT -Dtests.timezone=Asia/Calcutta -Dtests.asserts=true -Dtests.file.encoding=UTF-8
          

          It's interesting it's testRandomTiny: it should be easier to debug!

          Show
          Michael McCandless added a comment - Ugh sorry I meant to include it in the copy/paste: [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeo3DPointField -Dtests.method=testRandomTiny -Dtests.seed=6700D50161C38330 -Dtests.multiplier=5 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=lt_LT -Dtests.timezone=Asia/Calcutta -Dtests.asserts=true -Dtests.file.encoding=UTF-8 It's interesting it's testRandomTiny: it should be easier to debug!
          Hide
          Karl Wright added a comment -

          It looks like a precision error again. The circle is quite small (radius about 1e-6), and the computed bounds lie just inside the circle's provided edge point (distance, 2e-11). I'll try increasing MINIMUM_RESOLUTION just a bit more to see if that corrects the problem.

          Show
          Karl Wright added a comment - It looks like a precision error again. The circle is quite small (radius about 1e-6), and the computed bounds lie just inside the circle's provided edge point (distance, 2e-11). I'll try increasing MINIMUM_RESOLUTION just a bit more to see if that corrects the problem.
          Hide
          Karl Wright added a comment -

          It did correct the problem. Attached a new patch.

          Show
          Karl Wright added a comment - It did correct the problem. Attached a new patch.
          Hide
          ASF subversion and git services added a comment -

          Commit 1696880 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1696880 ]

          LUCENE-6699: bump up MINIMUM_RESOLUTION some more

          Show
          ASF subversion and git services added a comment - Commit 1696880 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1696880 ] LUCENE-6699 : bump up MINIMUM_RESOLUTION some more
          Hide
          Michael McCandless added a comment -

          OK I committed that patch, thanks Karl Wright, but unfortunately I hit another failure:

             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeo3DPointField -Dtests.method=testRandomMedium -Dtests.seed=71C652F660067AD3 -Dtests.multiplier=5 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=zh_HK -Dtests.timezone=America/Mazatlan -Dtests.asserts=true -Dtests.file.encoding=UTF-8
             [junit4] ERROR   9.72s | TestGeo3DPointField.testRandomMedium <<<
             [junit4]    > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=14, name=T0, state=RUNNABLE, group=TGRP-TestGeo3DPointField]
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([71C652F660067AD3:CC18655E216319B5]:0)
             [junit4]    > Caused by: java.lang.AssertionError: expected WITHIN (1) or OVERLAPS (2) but got 0; shape=GeoCircle: {planetmodel=PlanetModel.SPHERE, center=[lat=-0.004282454525970269, lon=-1.6739831367422277E-4], radius=1.959639723134033E-6(1.1227908550176523E-4)}; XYZSolid=XYZSolid: {planetmodel=PlanetModel.SPHERE, isWholeWorld=false, minXplane=[A=1.0, B=0.0, C=0.0, D=-0.999990807894643, side=1.0], maxXplane=[A=1.0, B=0.0, C=0.0, D=-0.9999908246908629, side=-1.0], minYplane=[A=0.0, B=1.0, C=0.0, D=1.693563105447845E-4, side=1.0], maxYplane=[A=0.0, B=1.0, C=0.0, D=1.6543724525666504E-4, side=-1.0], minZplane=[A=0.0, B=0.0, C=1.0, D=0.004284400993353207, side=1.0], maxZplane=[A=0.0, B=0.0, C=1.0, D=0.004280481873941856, side=-1.0]}
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([71C652F660067AD3]:0)
             [junit4]    > 	at org.apache.lucene.bkdtree3d.PointInGeo3DShapeQuery$1.scorer(PointInGeo3DShapeQuery.java:105)
             [junit4]    > 	at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.scorer(LRUQueryCache.java:589)
             [junit4]    > 	at org.apache.lucene.search.Weight.bulkScorer(Weight.java:135)
             [junit4]    > 	at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69)
             [junit4]    > 	at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69)
             [junit4]    > 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
             [junit4]    > 	at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:92)
             [junit4]    > 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:425)
             [junit4]    > 	at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4._run(TestGeo3DPointField.java:586)
             [junit4]    > 	at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4.run(TestGeo3DPointField.java:520)
             [junit4]   2> NOTE: test params are: codec=FastDecompressionCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=FAST_DECOMPRESSION, chunkSize=26391, maxDocsPerChunk=1, blockSize=992), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=FAST_DECOMPRESSION, chunkSize=26391, blockSize=992)), sim=RandomSimilarityProvider(queryNorm=false,coord=yes): {}, locale=zh_HK, timezone=America/Mazatlan
             [junit4]   2> NOTE: Linux 3.13.0-46-generic amd64/Oracle Corporation 1.8.0_40 (64-bit)/cpus=8,threads=1,free=333320104,total=449314816
             [junit4]   2> NOTE: All tests run in this JVM: [TestGeo3DPointField]
             [junit4] Completed [1/1] in 9.85s, 1 test, 1 error <<< FAILURES!
          
          Show
          Michael McCandless added a comment - OK I committed that patch, thanks Karl Wright , but unfortunately I hit another failure: [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeo3DPointField -Dtests.method=testRandomMedium -Dtests.seed=71C652F660067AD3 -Dtests.multiplier=5 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=zh_HK -Dtests.timezone=America/Mazatlan -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 9.72s | TestGeo3DPointField.testRandomMedium <<< [junit4] > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=14, name=T0, state=RUNNABLE, group=TGRP-TestGeo3DPointField] [junit4] > at __randomizedtesting.SeedInfo.seed([71C652F660067AD3:CC18655E216319B5]:0) [junit4] > Caused by: java.lang.AssertionError: expected WITHIN (1) or OVERLAPS (2) but got 0; shape=GeoCircle: {planetmodel=PlanetModel.SPHERE, center=[lat=-0.004282454525970269, lon=-1.6739831367422277E-4], radius=1.959639723134033E-6(1.1227908550176523E-4)}; XYZSolid=XYZSolid: {planetmodel=PlanetModel.SPHERE, isWholeWorld=false, minXplane=[A=1.0, B=0.0, C=0.0, D=-0.999990807894643, side=1.0], maxXplane=[A=1.0, B=0.0, C=0.0, D=-0.9999908246908629, side=-1.0], minYplane=[A=0.0, B=1.0, C=0.0, D=1.693563105447845E-4, side=1.0], maxYplane=[A=0.0, B=1.0, C=0.0, D=1.6543724525666504E-4, side=-1.0], minZplane=[A=0.0, B=0.0, C=1.0, D=0.004284400993353207, side=1.0], maxZplane=[A=0.0, B=0.0, C=1.0, D=0.004280481873941856, side=-1.0]} [junit4] > at __randomizedtesting.SeedInfo.seed([71C652F660067AD3]:0) [junit4] > at org.apache.lucene.bkdtree3d.PointInGeo3DShapeQuery$1.scorer(PointInGeo3DShapeQuery.java:105) [junit4] > at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.scorer(LRUQueryCache.java:589) [junit4] > at org.apache.lucene.search.Weight.bulkScorer(Weight.java:135) [junit4] > at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69) [junit4] > at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69) [junit4] > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618) [junit4] > at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:92) [junit4] > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:425) [junit4] > at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4._run(TestGeo3DPointField.java:586) [junit4] > at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4.run(TestGeo3DPointField.java:520) [junit4] 2> NOTE: test params are: codec=FastDecompressionCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=FAST_DECOMPRESSION, chunkSize=26391, maxDocsPerChunk=1, blockSize=992), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=FAST_DECOMPRESSION, chunkSize=26391, blockSize=992)), sim=RandomSimilarityProvider(queryNorm=false,coord=yes): {}, locale=zh_HK, timezone=America/Mazatlan [junit4] 2> NOTE: Linux 3.13.0-46-generic amd64/Oracle Corporation 1.8.0_40 (64-bit)/cpus=8,threads=1,free=333320104,total=449314816 [junit4] 2> NOTE: All tests run in this JVM: [TestGeo3DPointField] [junit4] Completed [1/1] in 9.85s, 1 test, 1 error <<< FAILURES!
          Hide
          Karl Wright added a comment -

          Same cause, same fix.

          The circle radius was about 1/6 the previous small value (1e-6), and the corresponding required increase in MINIMUM_RESOLUTION wound up being about 3x (to 6e-11). There is still 4.5 orders of magnitude between these values. If the pattern continues, we seem to be converging on a value for MINIMUM_RESOLUTION that is somewhere around 1e-9.

          Unfortunately, I've tried MINIMUM_RESOLUTION values around 1e-10 before and other stuff started breaking as a result. So I really hope this pattern doesn't continue.

          Show
          Karl Wright added a comment - Same cause, same fix. The circle radius was about 1/6 the previous small value (1e-6), and the corresponding required increase in MINIMUM_RESOLUTION wound up being about 3x (to 6e-11). There is still 4.5 orders of magnitude between these values. If the pattern continues, we seem to be converging on a value for MINIMUM_RESOLUTION that is somewhere around 1e-9. Unfortunately, I've tried MINIMUM_RESOLUTION values around 1e-10 before and other stuff started breaking as a result. So I really hope this pattern doesn't continue.
          Hide
          Karl Wright added a comment -

          Michael McCandless Looking at the source of the actual error, here's how it works.

          The GeoCircle shape constructs its plane using the following code:

                // Construct normal plane
                final Plane normalPlane = Plane.constructNormalizedZPlane(upperPoint, lowerPoint, center);
                // Construct a sided plane that goes through the two points and whose normal is in the normalPlane.
                this.circlePlane = SidedPlane.constructNormalizedPerpendicularSidedPlane(center, normalPlane, upperPoint, lowerPoint);
                if (circlePlane == null)
                  throw new RuntimeException("Couldn't construct circle plane.  Cutoff angle = "+cutoffAngle+"; upperPoint = "+upperPoint+"; lowerPoint = "+lowerPoint);
                this.edgePoints = new GeoPoint[]{upperPoint};
          

          So, it constructs a plane including the Z axis first, then constructs a plane perpendicular to it. The edge point of that plane is one of the points it used to construct both the Z plane and the perpendicular plane. Tests show that the edge point is not quite perfectly on the circle edge, but is close enough to pass isWithin(). (This is expected given the math that has to take place to construct the circle plane. The error gets larger the smaller the circle is.)

          When the bounds are computed, another plane intersection is computed. This adds more error, of a comparable amount. So we are dealing here with twice the amount of error as usual.

          Solutions: I could compute the edge point directly from the vertical plane and the circle plane. This adds an additional square root and a couple of allocations to the cost of constructing a circle. Or, I could add in the edge point to the GeoCircle bounds computation. That's not quite right but would effectively solve the problem forever at little cost. Finally, we could keep fiddling with MINIMUM_RESOLUTION which technically would also be able to solve the problem at some point. Which would you prefer?

          Show
          Karl Wright added a comment - Michael McCandless Looking at the source of the actual error, here's how it works. The GeoCircle shape constructs its plane using the following code: // Construct normal plane final Plane normalPlane = Plane.constructNormalizedZPlane(upperPoint, lowerPoint, center); // Construct a sided plane that goes through the two points and whose normal is in the normalPlane. this .circlePlane = SidedPlane.constructNormalizedPerpendicularSidedPlane(center, normalPlane, upperPoint, lowerPoint); if (circlePlane == null ) throw new RuntimeException( "Couldn't construct circle plane. Cutoff angle = " +cutoffAngle+ "; upperPoint = " +upperPoint+ "; lowerPoint = " +lowerPoint); this .edgePoints = new GeoPoint[]{upperPoint}; So, it constructs a plane including the Z axis first, then constructs a plane perpendicular to it. The edge point of that plane is one of the points it used to construct both the Z plane and the perpendicular plane. Tests show that the edge point is not quite perfectly on the circle edge, but is close enough to pass isWithin(). (This is expected given the math that has to take place to construct the circle plane. The error gets larger the smaller the circle is.) When the bounds are computed, another plane intersection is computed. This adds more error, of a comparable amount. So we are dealing here with twice the amount of error as usual. Solutions: I could compute the edge point directly from the vertical plane and the circle plane. This adds an additional square root and a couple of allocations to the cost of constructing a circle. Or, I could add in the edge point to the GeoCircle bounds computation. That's not quite right but would effectively solve the problem forever at little cost. Finally, we could keep fiddling with MINIMUM_RESOLUTION which technically would also be able to solve the problem at some point. Which would you prefer?
          Hide
          Karl Wright added a comment -

          Updated patch. This includes an IllegalArgumentException for a circle which is too small by MINIMUM_RESOLUTION standards. Also includes a slight hack to insure that GeoCircle bounds includes the edge point for the GeoCircle,

          Show
          Karl Wright added a comment - Updated patch. This includes an IllegalArgumentException for a circle which is too small by MINIMUM_RESOLUTION standards. Also includes a slight hack to insure that GeoCircle bounds includes the edge point for the GeoCircle,
          Hide
          ASF subversion and git services added a comment -

          Commit 1697048 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1697048 ]

          LUCENE-6699: throw IllegalArgExc for too-small GeoCircles; increase MINIMUM_RESOLUTION again

          Show
          ASF subversion and git services added a comment - Commit 1697048 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1697048 ] LUCENE-6699 : throw IllegalArgExc for too-small GeoCircles; increase MINIMUM_RESOLUTION again
          Hide
          Michael McCandless added a comment -

          Another failure:

             [junit4] Suite: org.apache.lucene.bkdtree3d.TestGeo3DPointField
             [junit4]   2> Ogos 21, 2015 8:29:29 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
             [junit4]   2> WARNING: Uncaught exception in thread: Thread[T3,5,TGRP-TestGeo3DPointField]
             [junit4]   2> java.lang.AssertionError: T3: iter=341 id=70337 docID=417 lat=-0.002164069780096702 lon=0.007505617500830066 expected false but got: true deleted?=false
             [junit4]   2>   point1=[lat=-0.002164069780096702, lon=0.007505617500830066], iswithin=false
             [junit4]   2>   point2=[X=1.0010882593761607, Y=0.007513926205930265, Z=-0.0021664888729185277], iswithin=false
             [junit4]   2>   query=PointInGeo3DShapeQuery: field=point:PlanetModel: PlanetModel.WGS84 Shape: GeoCircle: {planetmodel=PlanetModel.WGS84, center=[lat=-0.006450320645814321, lon=0.004660694205115142], radius=0.00489710732634323(0.28058358162206176)}
             [junit4]   2> 	at __randomizedtesting.SeedInfo.seed([9CF59027DCD28E6D]:0)
             [junit4]   2> 	at org.junit.Assert.fail(Assert.java:93)
             [junit4]   2> 	at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4._run(TestGeo3DPointField.java:624)
             [junit4]   2> 	at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4.run(TestGeo3DPointField.java:520)
             [junit4]   2> 
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeo3DPointField -Dtests.method=testRandomMedium -Dtests.seed=9CF59027DCD28E6D -Dtests.multiplier=5 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=ms -Dtests.timezone=Africa/Nouakchott -Dtests.asserts=true -Dtests.file.encoding=UTF-8
             [junit4] ERROR   9.55s | TestGeo3DPointField.testRandomMedium <<<
             [junit4]    > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=50, name=T3, state=RUNNABLE, group=TGRP-Test
          
          Show
          Michael McCandless added a comment - Another failure: [junit4] Suite: org.apache.lucene.bkdtree3d.TestGeo3DPointField [junit4] 2> Ogos 21, 2015 8:29:29 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [junit4] 2> WARNING: Uncaught exception in thread: Thread[T3,5,TGRP-TestGeo3DPointField] [junit4] 2> java.lang.AssertionError: T3: iter=341 id=70337 docID=417 lat=-0.002164069780096702 lon=0.007505617500830066 expected false but got: true deleted?=false [junit4] 2> point1=[lat=-0.002164069780096702, lon=0.007505617500830066], iswithin=false [junit4] 2> point2=[X=1.0010882593761607, Y=0.007513926205930265, Z=-0.0021664888729185277], iswithin=false [junit4] 2> query=PointInGeo3DShapeQuery: field=point:PlanetModel: PlanetModel.WGS84 Shape: GeoCircle: {planetmodel=PlanetModel.WGS84, center=[lat=-0.006450320645814321, lon=0.004660694205115142], radius=0.00489710732634323(0.28058358162206176)} [junit4] 2> at __randomizedtesting.SeedInfo.seed([9CF59027DCD28E6D]:0) [junit4] 2> at org.junit.Assert.fail(Assert.java:93) [junit4] 2> at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4._run(TestGeo3DPointField.java:624) [junit4] 2> at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4.run(TestGeo3DPointField.java:520) [junit4] 2> [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeo3DPointField -Dtests.method=testRandomMedium -Dtests.seed=9CF59027DCD28E6D -Dtests.multiplier=5 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=ms -Dtests.timezone=Africa/Nouakchott -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 9.55s | TestGeo3DPointField.testRandomMedium <<< [junit4] > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=50, name=T3, state=RUNNABLE, group=TGRP-Test
          Hide
          Karl Wright added a comment -

          Michael McCandless: It seems like you drilled down to a point which BKD thought should be in the resultset but which geo3d says should not. How can this happen? Specifically, what information from geo3d must be incorrect for this situation to occur?

          Show
          Karl Wright added a comment - Michael McCandless : It seems like you drilled down to a point which BKD thought should be in the resultset but which geo3d says should not. How can this happen? Specifically, what information from geo3d must be incorrect for this situation to occur?
          Hide
          ASF subversion and git services added a comment -

          Commit 1697064 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1697064 ]

          LUCENE-6699: add assert

          Show
          ASF subversion and git services added a comment - Commit 1697064 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1697064 ] LUCENE-6699 : add assert
          Hide
          Michael McCandless added a comment -

          Oh I see, the failure is opposite from before (expected=false but actual=true).

          This can happen if we incorrectly got CELL_INSIDE_SHAPE (GeoArea.CONTAINS from geo3d) when we asked the shape to compare itself to an x,y,z rect.

          I added a new assert to confirm this:

          java.lang.AssertionError
          	at __randomizedtesting.SeedInfo.seed([9CF59027DCD28E6D]:0)
          	at org.apache.lucene.bkdtree3d.BKD3DTreeReader.addAll(BKD3DTreeReader.java:153)
          	at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:191)
          	at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:317)
          	at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:317)
          	at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:293)
          	at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:307)
          	at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:283)
          	at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:317)
          	at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:293)
          	at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:268)
          	at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:258)
          	at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:293)
          	at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:307)
          	at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:115)
          	at org.apache.lucene.bkdtree3d.PointInGeo3DShapeQuery$1.scorer(PointInGeo3DShapeQuery.java:114)
          	at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.scorer(LRUQueryCache.java:581)
          	at org.apache.lucene.search.Weight.bulkScorer(Weight.java:135)
          	at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69)
          	at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69)
          	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
          	at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:92)
          	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:425)
          	at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4._run(TestGeo3DPointField.java:586)
          	at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4.run(TestGeo3DPointField.java:520)
          
          Show
          Michael McCandless added a comment - Oh I see, the failure is opposite from before (expected=false but actual=true). This can happen if we incorrectly got CELL_INSIDE_SHAPE (GeoArea.CONTAINS from geo3d) when we asked the shape to compare itself to an x,y,z rect. I added a new assert to confirm this: java.lang.AssertionError at __randomizedtesting.SeedInfo.seed([9CF59027DCD28E6D]:0) at org.apache.lucene.bkdtree3d.BKD3DTreeReader.addAll(BKD3DTreeReader.java:153) at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:191) at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:317) at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:317) at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:293) at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:307) at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:283) at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:317) at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:293) at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:268) at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:258) at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:293) at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:307) at org.apache.lucene.bkdtree3d.BKD3DTreeReader.intersect(BKD3DTreeReader.java:115) at org.apache.lucene.bkdtree3d.PointInGeo3DShapeQuery$1.scorer(PointInGeo3DShapeQuery.java:114) at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.scorer(LRUQueryCache.java:581) at org.apache.lucene.search.Weight.bulkScorer(Weight.java:135) at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69) at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:69) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618) at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:92) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:425) at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4._run(TestGeo3DPointField.java:586) at org.apache.lucene.bkdtree3d.TestGeo3DPointField$4.run(TestGeo3DPointField.java:520)
          Hide
          Karl Wright added a comment -

          This is the "accept" that is confirming that the shape contains the point:

                                                     @Override
                                                     public boolean accept(int docID) {
                                                       //System.out.println("  accept? docID=" + docID);
                                                       BytesRef bytes = treeDV.get(docID);
                                                       if (bytes == null) {
                                                         //System.out.println("    false (null)");
                                                         return false;
                                                       }
          
                                                       assert bytes.length == 12;
                                                       double x = Geo3DDocValuesFormat.decodeValue(Geo3DDocValuesFormat.readInt(bytes.bytes, bytes.offset));
                                                       double y = Geo3DDocValuesFormat.decodeValue(Geo3DDocValuesFormat.readInt(bytes.bytes, bytes.offset+4));
                                                       double z = Geo3DDocValuesFormat.decodeValue(Geo3DDocValuesFormat.readInt(bytes.bytes, bytes.offset+8));
                                                       // True if x,y,z is within shape
                                                       //System.out.println("    x=" + x + " y=" + y + " z=" + z);
                                                       //System.out.println("    ret: " + shape.isWithin(x, y, z));
          
                                                       return shape.isWithin(x, y, z);
                                                     }
          
          

          The "accept" is apparently returning false when you evaluated the point earlier to be within the XYZ bounds of ... well, what exactly? What was the actual bit of information in this case from geo3d that is wrong? Also remember that the point is not exact, because there is imprecision at a level greater than MINIMUM_RESOLUTION, so that it may well be inside the GeoArea but not inside the actual shape, due to rounding issues. We should figure out a way to determine if that is in fact what is happening.

          Show
          Karl Wright added a comment - This is the "accept" that is confirming that the shape contains the point: @Override public boolean accept( int docID) { // System .out.println( " accept? docID=" + docID); BytesRef bytes = treeDV.get(docID); if (bytes == null ) { // System .out.println( " false ( null )" ); return false ; } assert bytes.length == 12; double x = Geo3DDocValuesFormat.decodeValue(Geo3DDocValuesFormat.readInt(bytes.bytes, bytes.offset)); double y = Geo3DDocValuesFormat.decodeValue(Geo3DDocValuesFormat.readInt(bytes.bytes, bytes.offset+4)); double z = Geo3DDocValuesFormat.decodeValue(Geo3DDocValuesFormat.readInt(bytes.bytes, bytes.offset+8)); // True if x,y,z is within shape // System .out.println( " x=" + x + " y=" + y + " z=" + z); // System .out.println( " ret: " + shape.isWithin(x, y, z)); return shape.isWithin(x, y, z); } The "accept" is apparently returning false when you evaluated the point earlier to be within the XYZ bounds of ... well, what exactly? What was the actual bit of information in this case from geo3d that is wrong? Also remember that the point is not exact, because there is imprecision at a level greater than MINIMUM_RESOLUTION, so that it may well be inside the GeoArea but not inside the actual shape, due to rounding issues. We should figure out a way to determine if that is in fact what is happening.
          Hide
          Michael McCandless added a comment -

          OK I extracted some info from the failure (see the attached patch, but it makes tons of output!):

          This is the query it's running:

          Thread[T2,5,TGRP-TestGeo3DPointField]: TEST: iter=342 shape=GeoCircle: {planetmodel=PlanetModel.WGS84, center=[lat=-0.006450320645814321, lon=0.004660694205115142], radius=0.00489710732634323(0.28058358162206176)}
            using query: PointInGeo3DShapeQuery: field=point:PlanetModel: PlanetModel.WGS84 Shape: GeoCircle: {planetmodel=PlanetModel.WGS84, center=[lat=-0.006450320645814321, lon=0.004660694205115142], radius=0.00489710732634323(0.28058358162206176)}
          

          Then, while BKD is recursing, it hits a point where this cell is
          supposedly (incorrectly) fully contained in the query shape:

          Thread[T2,5,TGRP-TestGeo3DPointField]: switch to addAll at cell x=1.0010822580620098 to 1.0010945779732867 y=0.007079167343247293 to 0.007541006774427837 z=-0.0021855011220022575 to -0.001896122718181518
          

          But then this docID fails the new assert (is not within the query shape):

          T2: FAILED: docID=1123
            accept docID=1123 point: x=1.0010893045436076 y=0.007380935180644008 z=-0.002140671370616495
          
          Show
          Michael McCandless added a comment - OK I extracted some info from the failure (see the attached patch, but it makes tons of output!): This is the query it's running: Thread[T2,5,TGRP-TestGeo3DPointField]: TEST: iter=342 shape=GeoCircle: {planetmodel=PlanetModel.WGS84, center=[lat=-0.006450320645814321, lon=0.004660694205115142], radius=0.00489710732634323(0.28058358162206176)} using query: PointInGeo3DShapeQuery: field=point:PlanetModel: PlanetModel.WGS84 Shape: GeoCircle: {planetmodel=PlanetModel.WGS84, center=[lat=-0.006450320645814321, lon=0.004660694205115142], radius=0.00489710732634323(0.28058358162206176)} Then, while BKD is recursing, it hits a point where this cell is supposedly (incorrectly) fully contained in the query shape: Thread[T2,5,TGRP-TestGeo3DPointField]: switch to addAll at cell x=1.0010822580620098 to 1.0010945779732867 y=0.007079167343247293 to 0.007541006774427837 z=-0.0021855011220022575 to -0.001896122718181518 But then this docID fails the new assert (is not within the query shape): T2: FAILED: docID=1123 accept docID=1123 point: x=1.0010893045436076 y=0.007380935180644008 z=-0.002140671370616495
          Hide
          Karl Wright added a comment -

          Thanks, looking now to see what's happening.

          Show
          Karl Wright added a comment - Thanks, looking now to see what's happening.
          Hide
          Karl Wright added a comment -

          Ok, I wrote this test code to reproduce the problem simply:

              // Sixth BKD discovered failure
              p1 = new GeoPoint(1.0010893045436076,0.007380935180644008,-0.002140671370616495);
              c = new GeoCircle(PlanetModel.WGS84,-0.006450320645814321,0.004660694205115142,0.00489710732634323);
              //xyzb = new XYZBounds();
              //c.getBounds(xyzb);
              //System.err.println("xmin="+xyzb.getMinimumX()+", xmax="+xyzb.getMaximumX()+",ymin="+xyzb.getMinimumY()+", ymax="+xyzb.getMaximumY()+",zmin="+xyzb.getMinimumZ()+", zmax="+xyzb.getMaximumZ());
              //xmin=1.0010356621420726, xmax=1.0011141249179447,ymin=-2.5326643901354566E-4, ymax=0.009584741915757169,zmin=-0.011359874956269283, zmax=-0.0015549504447452225
              area = GeoAreaFactory.makeGeoArea(PlanetModel.WGS84,1.0010822580620098,1.0010945779732867,0.007079167343247293,0.007541006774427837,-0.0021855011220022575,-0.001896122718181518);
              assertTrue(!c.isWithin(p1));
              assertTrue(GeoArea.CONTAINS == area.getRelationship(c));
              assertTrue(PlanetModel.WGS84.pointOnSurface(p1)); // This fails
              assertTrue(!area.isWithin(p1)); // This fails
          

          Note that the bounds for the GeoCircle are in fact outside the XYZsolid that you are searching. So CONTAINS is not unreasonable. (I'll make certain of this shortly). Also note that the point being considered is not technically on the surface, as we discussed. This has definite effects on the math. Any GeoArea is defined as being the intersection of the shape with the planet surface, so the behavior of points off that surface are outside the formal geo3d contract. Since xyzsolid has a lot of space that isn't on the surface, it's pretty likely that the point in question is off the surface in a way that allows it to fit inside the xyzsolid but is too far inside to be on the correct side of the circle plane. When I confirm that the XYZSolid really is within the GeoCircle, by the terms of the GeoArea contract, then we'll know that that is what is going on.

          Off-surface (x,y,z) values are not fatal but it is something we have to consider carefully and explicitly compensate for. The simplest way is to return them, even though they technically fail shape membership, which I think is what your code does with the asserts.

          Stay tuned.

          Show
          Karl Wright added a comment - Ok, I wrote this test code to reproduce the problem simply: // Sixth BKD discovered failure p1 = new GeoPoint(1.0010893045436076,0.007380935180644008,-0.002140671370616495); c = new GeoCircle(PlanetModel.WGS84,-0.006450320645814321,0.004660694205115142,0.00489710732634323); //xyzb = new XYZBounds(); //c.getBounds(xyzb); // System .err.println( "xmin=" +xyzb.getMinimumX()+ ", xmax=" +xyzb.getMaximumX()+ ",ymin=" +xyzb.getMinimumY()+ ", ymax=" +xyzb.getMaximumY()+ ",zmin=" +xyzb.getMinimumZ()+ ", zmax=" +xyzb.getMaximumZ()); //xmin=1.0010356621420726, xmax=1.0011141249179447,ymin=-2.5326643901354566E-4, ymax=0.009584741915757169,zmin=-0.011359874956269283, zmax=-0.0015549504447452225 area = GeoAreaFactory.makeGeoArea(PlanetModel.WGS84,1.0010822580620098,1.0010945779732867,0.007079167343247293,0.007541006774427837,-0.0021855011220022575,-0.001896122718181518); assertTrue(!c.isWithin(p1)); assertTrue(GeoArea.CONTAINS == area.getRelationship(c)); assertTrue(PlanetModel.WGS84.pointOnSurface(p1)); // This fails assertTrue(!area.isWithin(p1)); // This fails Note that the bounds for the GeoCircle are in fact outside the XYZsolid that you are searching. So CONTAINS is not unreasonable. (I'll make certain of this shortly). Also note that the point being considered is not technically on the surface, as we discussed. This has definite effects on the math. Any GeoArea is defined as being the intersection of the shape with the planet surface, so the behavior of points off that surface are outside the formal geo3d contract. Since xyzsolid has a lot of space that isn't on the surface, it's pretty likely that the point in question is off the surface in a way that allows it to fit inside the xyzsolid but is too far inside to be on the correct side of the circle plane. When I confirm that the XYZSolid really is within the GeoCircle, by the terms of the GeoArea contract, then we'll know that that is what is going on. Off-surface (x,y,z) values are not fatal but it is something we have to consider carefully and explicitly compensate for. The simplest way is to return them, even though they technically fail shape membership, which I think is what your code does with the asserts. Stay tuned.
          Hide
          Karl Wright added a comment -

          Ok, Michael McCandless, I've confirmed this picture.

          The XYZ solid's edges of intersection with the planet have no interaction with the circle edge, and are all considered to be within the GeoCircle's area. So the relationship seems to be correct.

          I'll think on this overnight and make a final recommendation in the morning. Or maybe I better ask you this: if the system behaved exactly as it does now, what do you think the right result would be to return from the GeoPoint query?

          Show
          Karl Wright added a comment - Ok, Michael McCandless , I've confirmed this picture. The XYZ solid's edges of intersection with the planet have no interaction with the circle edge, and are all considered to be within the GeoCircle's area. So the relationship seems to be correct. I'll think on this overnight and make a final recommendation in the morning. Or maybe I better ask you this: if the system behaved exactly as it does now, what do you think the right result would be to return from the GeoPoint query?
          Hide
          Karl Wright added a comment - - edited

          Hmm, as a final check, I took the original point from the original failure, which is not adjusted and is therefore on the WGS84 surface. Unfortunately, that too fails in the same way:

              c = new GeoCircle(PlanetModel.WGS84,-0.006450320645814321,0.004660694205115142,0.00489710732634323);
              //xyzb = new XYZBounds();
              //c.getBounds(xyzb);
              //System.err.println("xmin="+xyzb.getMinimumX()+", xmax="+xyzb.getMaximumX()+",ymin="+xyzb.getMinimumY()+", ymax="+xyzb.getMaximumY()+",zmin="+xyzb.getMinimumZ()+", zmax="+xyzb.getMaximumZ());
              //xmin=1.0010356621420726, xmax=1.0011141249179447,ymin=-2.5326643901354566E-4, ymax=0.009584741915757169,zmin=-0.011359874956269283, zmax=-0.0015549504447452225
              area = GeoAreaFactory.makeGeoArea(PlanetModel.WGS84,1.0010822580620098,1.0010945779732867,0.007079167343247293,0.007541006774427837,-0.0021855011220022575,-0.001896122718181518);
              assertTrue(GeoArea.CONTAINS == area.getRelationship(c));
              p2 = new GeoPoint(PlanetModel.WGS84,-0.002164069780096702, 0.007505617500830066);
              assertTrue(PlanetModel.WGS84.pointOnSurface(p2));
              assertTrue(!c.isWithin(p2));
              assertTrue(!area.isWithin(p2)); // fails
          

          So there is something more subtle going on than I originally thought. Looking into it now.

          Show
          Karl Wright added a comment - - edited Hmm, as a final check, I took the original point from the original failure, which is not adjusted and is therefore on the WGS84 surface. Unfortunately, that too fails in the same way: c = new GeoCircle(PlanetModel.WGS84,-0.006450320645814321,0.004660694205115142,0.00489710732634323); //xyzb = new XYZBounds(); //c.getBounds(xyzb); // System .err.println( "xmin=" +xyzb.getMinimumX()+ ", xmax=" +xyzb.getMaximumX()+ ",ymin=" +xyzb.getMinimumY()+ ", ymax=" +xyzb.getMaximumY()+ ",zmin=" +xyzb.getMinimumZ()+ ", zmax=" +xyzb.getMaximumZ()); //xmin=1.0010356621420726, xmax=1.0011141249179447,ymin=-2.5326643901354566E-4, ymax=0.009584741915757169,zmin=-0.011359874956269283, zmax=-0.0015549504447452225 area = GeoAreaFactory.makeGeoArea(PlanetModel.WGS84,1.0010822580620098,1.0010945779732867,0.007079167343247293,0.007541006774427837,-0.0021855011220022575,-0.001896122718181518); assertTrue(GeoArea.CONTAINS == area.getRelationship(c)); p2 = new GeoPoint(PlanetModel.WGS84,-0.002164069780096702, 0.007505617500830066); assertTrue(PlanetModel.WGS84.pointOnSurface(p2)); assertTrue(!c.isWithin(p2)); assertTrue(!area.isWithin(p2)); // fails So there is something more subtle going on than I originally thought. Looking into it now.
          Hide
          Michael McCandless added a comment -

          I'm going to open a "continuation issue" here! It takes my browser too long to scroll through the many comments here ...

          Show
          Michael McCandless added a comment - I'm going to open a "continuation issue" here! It takes my browser too long to scroll through the many comments here ...
          Hide
          Michael McCandless added a comment -

          OK I opened LUCENE-6759, let's continue discussion there?

          Show
          Michael McCandless added a comment - OK I opened LUCENE-6759 , let's continue discussion there?
          Hide
          Karl Wright added a comment -

          Michael McCandless Here's the analysis, so far.

          I first enabled evaluation of all four points where the XYZSolid intersected the planet surface. As you can see, only one of them comes back as being inside the GeoCircle:

             [junit4]   2>  Point 1.0010913867774043 0.007079167343247293 -0.0021855011220022575: shape.isWithin? true; minx=9.128715394490783E-6, maxx=-3.191195882434883E-6, miny=0.0, maxy=-4.618394311805439E-4, minz=0.0, maxz=-2.893784038207395E-4
             [junit4]   2>  Point 1.0010919806760743 0.007079167343247293 -0.001896122718181518: shape.isWithin? false; minx=9.722614064511248E-6, maxx=-2.597297212414418E-6, miny=0.0, maxy=-4.618394311805439E-4, minz=2.893784038207395E-4, maxz=0.0
             [junit4]   2>  Point 1.001088014365874 0.007541006774427837 -0.0021855011220022575: shape.isWithin? false; minx=5.7563038642349795E-6, maxx=-6.563607412690686E-6, miny=4.618394311805439E-4, maxy=0.0, minz=0.0, maxz=-2.893784038207395E-4
             [junit4]   2>  Point 1.0010886082665449 0.007541006774427837 -0.001896122718181518: shape.isWithin? false; minx=6.35020453509938E-6, maxx=-5.969706741826286E-6, miny=4.618394311805439E-4, maxy=0.0, minz=2.893784038207395E-4, maxz=0.0
          

          If the above is an accurate picture, then there should be intersections between the GeoCircle and two of the edge planes.

          miny should intersect:

             [junit4]   2>  Point 1.0010913867774043 0.007079167343247293 -0.0021855011220022575: shape.isWithin? true; minx=9.128715394490783E-6, maxx=-3.191195882434883E-6, miny=0.0, maxy=-4.618394311805439E-4, minz=0.0, maxz=-2.893784038207395E-4
             [junit4]   2>  Point 1.0010919806760743 0.007079167343247293 -0.001896122718181518: shape.isWithin? false; minx=9.722614064511248E-6, maxx=-2.597297212414418E-6, miny=0.0, maxy=-4.618394311805439E-4, minz=2.893784038207395E-4, maxz=0.0
          

          And, minz should intersect:

             [junit4]   2>  Point 1.0010913867774043 0.007079167343247293 -0.0021855011220022575: shape.isWithin? true; minx=9.128715394490783E-6, maxx=-3.191195882434883E-6, miny=0.0, maxy=-4.618394311805439E-4, minz=0.0, maxz=-2.893784038207395E-4
             [junit4]   2>  Point 1.001088014365874 0.007541006774427837 -0.0021855011220022575: shape.isWithin? false; minx=5.7563038642349795E-6, maxx=-6.563607412690686E-6, miny=4.618394311805439E-4, maxy=0.0, minz=0.0, maxz=-2.893784038207395E-4
          

          These two intersections are not being detected, and after much careful analysis, I concluded that the reason that they are not being detected is because no intersection actually happens. Looking at the miny plane:

             [junit4]   2> Checking for intersections that should be found...
             [junit4]   2>  Not identical plane
             [junit4]   2> Looking for intersection between plane [A=-0.9999680546313309, B=-0.0046605790633783275, C=0.006493744653569968, D=1.0011065916522879, side=-1.0] and plane [A=0.0, B=1.0, C=0.0, D=-0.007079167343247293, side=1.0] within bounds
             [junit4]   2>  Two points of intersection
             [junit4]   2>   [X=1.0010359045488204, Y=0.0070791673432472925, Z=-0.010729178478687706] this=(0.0) q=(-8.673617379884035E-19), and [X=1.0010913867758835, Y=0.0070791673432472925, Z=-0.0021855018140558226] this=(0.0) q=(-8.673617379884035E-19)
          

          Two points of intersection are detected, but both are outside the X or Z bounds of the XYZSolid, so they do not represent intersection.

          So, how can this be? Well, the reason for the discrepancy is because the first point of the four mentioned at the top is, in fact, not really inside the GeoCircle. It is coming up as being inside the GeoCircle only because of the fact that we've increased MINIMUM_RESOLUTION from its original value of 1e-12:

             [junit4]   2> circlePlane eval = 2.9731772599461692E-12
          

          So the problem is that ONE measure of error (point within GeoCircle) disagrees with another measure of error (intersection points in or out of XYZSolid), leading to an incorrect assessment.

          This is obviously going to be challenging to address. I may need to introduce two distinct error bounds in order for this logic to be robust. But I have to think it through carefully.

          Show
          Karl Wright added a comment - Michael McCandless Here's the analysis, so far. I first enabled evaluation of all four points where the XYZSolid intersected the planet surface. As you can see, only one of them comes back as being inside the GeoCircle: [junit4] 2> Point 1.0010913867774043 0.007079167343247293 -0.0021855011220022575: shape.isWithin? true ; minx=9.128715394490783E-6, maxx=-3.191195882434883E-6, miny=0.0, maxy=-4.618394311805439E-4, minz=0.0, maxz=-2.893784038207395E-4 [junit4] 2> Point 1.0010919806760743 0.007079167343247293 -0.001896122718181518: shape.isWithin? false ; minx=9.722614064511248E-6, maxx=-2.597297212414418E-6, miny=0.0, maxy=-4.618394311805439E-4, minz=2.893784038207395E-4, maxz=0.0 [junit4] 2> Point 1.001088014365874 0.007541006774427837 -0.0021855011220022575: shape.isWithin? false ; minx=5.7563038642349795E-6, maxx=-6.563607412690686E-6, miny=4.618394311805439E-4, maxy=0.0, minz=0.0, maxz=-2.893784038207395E-4 [junit4] 2> Point 1.0010886082665449 0.007541006774427837 -0.001896122718181518: shape.isWithin? false ; minx=6.35020453509938E-6, maxx=-5.969706741826286E-6, miny=4.618394311805439E-4, maxy=0.0, minz=2.893784038207395E-4, maxz=0.0 If the above is an accurate picture, then there should be intersections between the GeoCircle and two of the edge planes. miny should intersect: [junit4] 2> Point 1.0010913867774043 0.007079167343247293 -0.0021855011220022575: shape.isWithin? true ; minx=9.128715394490783E-6, maxx=-3.191195882434883E-6, miny=0.0, maxy=-4.618394311805439E-4, minz=0.0, maxz=-2.893784038207395E-4 [junit4] 2> Point 1.0010919806760743 0.007079167343247293 -0.001896122718181518: shape.isWithin? false ; minx=9.722614064511248E-6, maxx=-2.597297212414418E-6, miny=0.0, maxy=-4.618394311805439E-4, minz=2.893784038207395E-4, maxz=0.0 And, minz should intersect: [junit4] 2> Point 1.0010913867774043 0.007079167343247293 -0.0021855011220022575: shape.isWithin? true ; minx=9.128715394490783E-6, maxx=-3.191195882434883E-6, miny=0.0, maxy=-4.618394311805439E-4, minz=0.0, maxz=-2.893784038207395E-4 [junit4] 2> Point 1.001088014365874 0.007541006774427837 -0.0021855011220022575: shape.isWithin? false ; minx=5.7563038642349795E-6, maxx=-6.563607412690686E-6, miny=4.618394311805439E-4, maxy=0.0, minz=0.0, maxz=-2.893784038207395E-4 These two intersections are not being detected, and after much careful analysis, I concluded that the reason that they are not being detected is because no intersection actually happens. Looking at the miny plane: [junit4] 2> Checking for intersections that should be found... [junit4] 2> Not identical plane [junit4] 2> Looking for intersection between plane [A=-0.9999680546313309, B=-0.0046605790633783275, C=0.006493744653569968, D=1.0011065916522879, side=-1.0] and plane [A=0.0, B=1.0, C=0.0, D=-0.007079167343247293, side=1.0] within bounds [junit4] 2> Two points of intersection [junit4] 2> [X=1.0010359045488204, Y=0.0070791673432472925, Z=-0.010729178478687706] this =(0.0) q=(-8.673617379884035E-19), and [X=1.0010913867758835, Y=0.0070791673432472925, Z=-0.0021855018140558226] this =(0.0) q=(-8.673617379884035E-19) Two points of intersection are detected, but both are outside the X or Z bounds of the XYZSolid, so they do not represent intersection. So, how can this be? Well, the reason for the discrepancy is because the first point of the four mentioned at the top is, in fact, not really inside the GeoCircle. It is coming up as being inside the GeoCircle only because of the fact that we've increased MINIMUM_RESOLUTION from its original value of 1e-12: [junit4] 2> circlePlane eval = 2.9731772599461692E-12 So the problem is that ONE measure of error (point within GeoCircle) disagrees with another measure of error (intersection points in or out of XYZSolid), leading to an incorrect assessment. This is obviously going to be challenging to address. I may need to introduce two distinct error bounds in order for this logic to be robust. But I have to think it through carefully.
          Hide
          Karl Wright added a comment -

          Ok, I'll move my latest comment over.

          Show
          Karl Wright added a comment - Ok, I'll move my latest comment over.
          Hide
          ASF subversion and git services added a comment -

          Commit 1697386 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1697386 ]

          LUCENE-6699: MINIMUM_RESOLUTION back to 1e-12

          Show
          ASF subversion and git services added a comment - Commit 1697386 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1697386 ] LUCENE-6699 : MINIMUM_RESOLUTION back to 1e-12
          Hide
          ASF subversion and git services added a comment -

          Commit 1697388 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1697388 ]

          LUCENE-6699: woops

          Show
          ASF subversion and git services added a comment - Commit 1697388 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1697388 ] LUCENE-6699 : woops
          Hide
          ASF subversion and git services added a comment -

          Commit 1697469 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1697469 ]

          LUCENE-6699: iterate

          Show
          ASF subversion and git services added a comment - Commit 1697469 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1697469 ] LUCENE-6699 : iterate
          Hide
          ASF subversion and git services added a comment -

          Commit 1697517 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1697517 ]

          LUCENE-6699: more fudge

          Show
          ASF subversion and git services added a comment - Commit 1697517 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1697517 ] LUCENE-6699 : more fudge
          Hide
          ASF subversion and git services added a comment -

          Commit 1697865 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1697865 ]

          LUCENE-6699: comment out assert

          Show
          ASF subversion and git services added a comment - Commit 1697865 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1697865 ] LUCENE-6699 : comment out assert
          Hide
          ASF subversion and git services added a comment -

          Commit 1698004 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1698004 ]

          LUCENE-6699: increase fudge factor; don't check hits if quantization changed the expected result

          Show
          ASF subversion and git services added a comment - Commit 1698004 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1698004 ] LUCENE-6699 : increase fudge factor; don't check hits if quantization changed the expected result
          Hide
          ASF subversion and git services added a comment -

          Commit 1698157 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1698157 ]

          LUCENE-6699: too-tiny GeoCircles now throw IllegalArgumentException

          Show
          ASF subversion and git services added a comment - Commit 1698157 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1698157 ] LUCENE-6699 : too-tiny GeoCircles now throw IllegalArgumentException
          Hide
          ASF subversion and git services added a comment -

          Commit 1698186 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1698186 ]

          LUCENE-6699: ensure the BKD cell is rounded up/down to correctly handle quantization

          Show
          ASF subversion and git services added a comment - Commit 1698186 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1698186 ] LUCENE-6699 : ensure the BKD cell is rounded up/down to correctly handle quantization
          Hide
          ASF subversion and git services added a comment -

          Commit 1698202 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1698202 ]

          LUCENE-6699: clip the min/max in the global bbox to fall within the planet model; add nocommit

          Show
          ASF subversion and git services added a comment - Commit 1698202 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1698202 ] LUCENE-6699 : clip the min/max in the global bbox to fall within the planet model; add nocommit
          Hide
          ASF subversion and git services added a comment -

          Commit 1698232 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1698232 ]

          LUCENE-6699: store planet's max in the index, and validate at search time

          Show
          ASF subversion and git services added a comment - Commit 1698232 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1698232 ] LUCENE-6699 : store planet's max in the index, and validate at search time
          Hide
          ASF subversion and git services added a comment -

          Commit 1698321 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1698321 ]

          LUCENE-6699: fix encode to round instead of truncate to reduce quantization error; fix quantixed bbox decode

          Show
          ASF subversion and git services added a comment - Commit 1698321 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1698321 ] LUCENE-6699 : fix encode to round instead of truncate to reduce quantization error; fix quantixed bbox decode
          Hide
          ASF subversion and git services added a comment -

          Commit 1698454 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1698454 ]

          LUCENE-6699: add test case

          Show
          ASF subversion and git services added a comment - Commit 1698454 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1698454 ] LUCENE-6699 : add test case
          Hide
          ASF subversion and git services added a comment -

          Commit 1698455 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1698455 ]

          LUCENE-6699: fix rename

          Show
          ASF subversion and git services added a comment - Commit 1698455 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1698455 ] LUCENE-6699 : fix rename
          Hide
          ASF subversion and git services added a comment -

          Commit 1700049 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1700049 ]

          LUCENE-6699: iterate

          Show
          ASF subversion and git services added a comment - Commit 1700049 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1700049 ] LUCENE-6699 : iterate
          Hide
          ASF subversion and git services added a comment -

          Commit 1700102 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1700102 ]

          LUCENE-6699: iterate

          Show
          ASF subversion and git services added a comment - Commit 1700102 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1700102 ] LUCENE-6699 : iterate
          Hide
          ASF subversion and git services added a comment -

          Commit 1700165 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1700165 ]

          LUCENE-6699: randomize other shapes too; fix a nocommit

          Show
          ASF subversion and git services added a comment - Commit 1700165 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1700165 ] LUCENE-6699 : randomize other shapes too; fix a nocommit
          Hide
          ASF subversion and git services added a comment -

          Commit 1700313 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1700313 ]

          LUCENE-6699: make encode/decodeValue symmetric

          Show
          ASF subversion and git services added a comment - Commit 1700313 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1700313 ] LUCENE-6699 : make encode/decodeValue symmetric
          Hide
          ASF subversion and git services added a comment -

          Commit 1700450 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1700450 ]

          LUCENE-6699: increase bounds fudge factor

          Show
          ASF subversion and git services added a comment - Commit 1700450 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1700450 ] LUCENE-6699 : increase bounds fudge factor
          Hide
          ASF subversion and git services added a comment -

          Commit 1700457 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1700457 ]

          LUCENE-6699: only print verbose details of the failing iter

          Show
          ASF subversion and git services added a comment - Commit 1700457 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1700457 ] LUCENE-6699 : only print verbose details of the failing iter
          Hide
          ASF subversion and git services added a comment -

          Commit 1700459 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1700459 ]

          LUCENE-6699: include the shape in the failure message

          Show
          ASF subversion and git services added a comment - Commit 1700459 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1700459 ] LUCENE-6699 : include the shape in the failure message
          Hide
          ASF subversion and git services added a comment -

          Commit 1700587 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1700587 ]

          LUCENE-6699: increase fudge factor further

          Show
          ASF subversion and git services added a comment - Commit 1700587 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1700587 ] LUCENE-6699 : increase fudge factor further
          Hide
          ASF subversion and git services added a comment -

          Commit 1700788 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1700788 ]

          LUCENE-6699: remove nocommits

          Show
          ASF subversion and git services added a comment - Commit 1700788 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1700788 ] LUCENE-6699 : remove nocommits
          Hide
          ASF subversion and git services added a comment -

          Commit 1700800 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1700800 ]

          LUCENE-6699: merge trunk

          Show
          ASF subversion and git services added a comment - Commit 1700800 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1700800 ] LUCENE-6699 : merge trunk
          Hide
          ASF subversion and git services added a comment -

          Commit 1700820 from Michael McCandless in branch 'dev/branches/lucene6699'
          [ https://svn.apache.org/r1700820 ]

          LUCENE-6699: fix precommit

          Show
          ASF subversion and git services added a comment - Commit 1700820 from Michael McCandless in branch 'dev/branches/lucene6699' [ https://svn.apache.org/r1700820 ] LUCENE-6699 : fix precommit
          Hide
          ASF subversion and git services added a comment -

          Commit 1700883 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1700883 ]

          LUCENE-6699: add geo3d + BKD for fast, accurate earth-surface point-in-shape queries

          Show
          ASF subversion and git services added a comment - Commit 1700883 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1700883 ] LUCENE-6699 : add geo3d + BKD for fast, accurate earth-surface point-in-shape queries
          Hide
          ASF subversion and git services added a comment -

          Commit 1700887 from Michael McCandless in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1700887 ]

          LUCENE-6699: add geo3d + BKD for fast, accurate earth-surface point-in-shape queries

          Show
          ASF subversion and git services added a comment - Commit 1700887 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1700887 ] LUCENE-6699 : add geo3d + BKD for fast, accurate earth-surface point-in-shape queries
          Hide
          Michael McCandless added a comment -

          Thanks Karl Wright!

          Show
          Michael McCandless added a comment - Thanks Karl Wright !
          Hide
          ASF subversion and git services added a comment -

          Commit 1703537 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1703537 ]

          LUCENE-6699: fix silly test bug

          Show
          ASF subversion and git services added a comment - Commit 1703537 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1703537 ] LUCENE-6699 : fix silly test bug
          Hide
          ASF subversion and git services added a comment -

          Commit 1703538 from Michael McCandless in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1703538 ]

          LUCENE-6699: fix silly test bug

          Show
          ASF subversion and git services added a comment - Commit 1703538 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1703538 ] LUCENE-6699 : fix silly test bug
          Hide
          Terry Smith added a comment -

          Karl, were you able to find that packing scheme? I'm interested in poking the x,y,z values into a SortedNumericDocValuesField to see how well it would perform.

          Show
          Terry Smith added a comment - Karl, were you able to find that packing scheme? I'm interested in poking the x,y,z values into a SortedNumericDocValuesField to see how well it would perform.
          Hide
          Karl Wright added a comment -

          no time, I'm afraid...

          Show
          Karl Wright added a comment - no time, I'm afraid...
          Show
          Michael McCandless added a comment - Terry Smith maybe it's this comment? https://issues.apache.org/jira/browse/LUCENE-6480?focusedCommentId=14543396&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14543396
          Hide
          Nicholas Knize added a comment -

          Terry Smith I updated the Geo3DPacking code some time ago to avoid the overhead of BitSet and use raw morton bit twiddling. The intent is to use it in LUCENE-6480. Since that issue has stalled a bit I went ahead and attached the standalone class (with benchmarks) to the LUCENE-6480 issue if you're interested in tinkering.

          Show
          Nicholas Knize added a comment - Terry Smith I updated the Geo3DPacking code some time ago to avoid the overhead of BitSet and use raw morton bit twiddling. The intent is to use it in LUCENE-6480 . Since that issue has stalled a bit I went ahead and attached the standalone class (with benchmarks) to the LUCENE-6480 issue if you're interested in tinkering.
          Hide
          Terry Smith added a comment -

          Thanks guys. I was hoping to squeeze those x,y,z values into a 64 bits instead of 96. I'm not a bit twiddler but I'll take a look at Nicholas' patch and see if I can adapt it.

          Show
          Terry Smith added a comment - Thanks guys. I was hoping to squeeze those x,y,z values into a 64 bits instead of 96. I'm not a bit twiddler but I'll take a look at Nicholas' patch and see if I can adapt it.
          Hide
          Nicholas Knize added a comment -

          I started by packing all 3 values into a 64 bit long - in fact those MAGIC numbers are still there (MAGIC[7:12]). The problem with this is precision loss from compressing each value into 21 bits. The decoded values will give, at best, 3 decimal precision accuracy - which is ~110m at the equator. If you're fine with course grain accuracy then copy the BitUtil.

          {interleave | deinterleave}

          methods and use MAGIC[7:12]. The code is much simpler with the accuracy trade-off.

          Show
          Nicholas Knize added a comment - I started by packing all 3 values into a 64 bit long - in fact those MAGIC numbers are still there (MAGIC [7:12] ). The problem with this is precision loss from compressing each value into 21 bits. The decoded values will give, at best, 3 decimal precision accuracy - which is ~110m at the equator. If you're fine with course grain accuracy then copy the BitUtil. {interleave | deinterleave} methods and use MAGIC [7:12] . The code is much simpler with the accuracy trade-off.

            People

            • Assignee:
              Michael McCandless
              Reporter:
              Michael McCandless
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development