Lucene - Core
  1. Lucene - Core
  2. LUCENE-2359

CartesianPolyFilterBuilder doesn't handle edge case around the 180 meridian

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.9, 2.9.1, 2.9.2, 3.0, 3.0.1
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: modules/spatial
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Test case:
      Points all around the globe, plus two points at 0, 179.9 and 0,-179.9 (on each side of the meridian). Then, do a Cartesian Tier filter on a point right near those two. It will return all the points when it should just return those two.

      The flawed logic is in the else clause below:

      if (longX2 != 0.0) {
      		//We are around the prime meridian
      		if (longX == 0.0) {
      			longX = longX2;
      			longY = 0.0;
              	shape = getShapeLoop(shape,ctp,latX,longX,latY,longY);
      		} else {//we are around the 180th longitude
      			longX = longX2;
      			longY = -180.0;
      			shape = getShapeLoop(shape,ctp,latY,longY,latX,longX);
      	}
      

      Basically, the Y and X values are transposed. This currently says go from longY (-180) all the way around to longX which is the lower left longitude of the box formed. Instead, it should go from the lower left long to -180.

      1. LUCENE-2359.patch
        12 kB
        Nicolas Helleringer
      2. LUCENE-2359.patch
        12 kB
        Nicolas Helleringer
      3. LUCENE-2359.patch
        19 kB
        Nicolas Helleringer
      4. LUCENE-2359.patch
        2 kB
        Grant Ingersoll
      5. TEST-2359.patch
        1 kB
        Nicolas Helleringer

        Issue Links

          Activity

          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1
          Hide
          Uwe Schindler added a comment -

          Resolving again as this issue will not be backported to 2.9/3.0 branches.

          Show
          Uwe Schindler added a comment - Resolving again as this issue will not be backported to 2.9/3.0 branches.
          Hide
          Nicolas Helleringer added a comment -

          Robert,

          Here is the patch updated to work on rev 1035291

          Validating Unit Tests are included

          Show
          Nicolas Helleringer added a comment - Robert, Here is the patch updated to work on rev 1035291 Validating Unit Tests are included
          Hide
          Robert Muir added a comment -

          reopening for possible 2.9.4/3.0.3 backport.

          Show
          Robert Muir added a comment - reopening for possible 2.9.4/3.0.3 backport.
          Hide
          Grant Ingersoll added a comment -

          Reverted the last patch and the other related ones. Let's have a discussion on the mailing list about coordinating all of this. I'd like to see the patches be focused on solving the specific issues and then we can open up a new issue for refactoring this to make for pluggable best fit, etc.

          Show
          Grant Ingersoll added a comment - Reverted the last patch and the other related ones. Let's have a discussion on the mailing list about coordinating all of this. I'd like to see the patches be focused on solving the specific issues and then we can open up a new issue for refactoring this to make for pluggable best fit, etc.
          Hide
          Nicolas Helleringer added a comment -

          Edit done.

          I am currently browsing to find a good reference.

          Show
          Nicolas Helleringer added a comment - Edit done. I am currently browsing to find a good reference.
          Hide
          Grant Ingersoll added a comment -

          So, you're saying then that your approach only ever has to retrieve 4 boxes no matter the radius? Do you have a reference URL to where we can read more about it?

          Also, please edit your table to reflect the error

          Show
          Grant Ingersoll added a comment - So, you're saying then that your approach only ever has to retrieve 4 boxes no matter the radius? Do you have a reference URL to where we can read more about it? Also, please edit your table to reflect the error
          Hide
          Nicolas Helleringer added a comment -

          Yonik,

          It is the case, but the points left out are for sure not in the search area.

          Grant,

          You were right ! It was a c&p error ! As you can see in the above table 'TileLength' for Tier 9 is 48,63671875 not 24,31835938 and then the 'new bestFit number of Box to fetch' becomes ... 4 ! =)

          Show
          Nicolas Helleringer added a comment - Yonik, It is the case, but the points left out are for sure not in the search area. Grant, You were right ! It was a c&p error ! As you can see in the above table 'TileLength' for Tier 9 is 48,63671875 not 24,31835938 and then the 'new bestFit number of Box to fetch' becomes ... 4 ! =)
          Hide
          Yonik Seeley added a comment -

          Perhaps I'm misreading the table? I had assumed that your new algorithm was often less selective (allowed more points through the filter) than the old. Is this not the case?

          Show
          Yonik Seeley added a comment - Perhaps I'm misreading the table? I had assumed that your new algorithm was often less selective (allowed more points through the filter) than the old. Is this not the case?
          Hide
          Nicolas Helleringer added a comment -

          hi Yonik,

          I do not aggre : as the 4 tiles requested go cover all the search area there is no gain in beeing less accurate.

          Show
          Nicolas Helleringer added a comment - hi Yonik, I do not aggre : as the 4 tiles requested go cover all the search area there is no gain in beeing less accurate.
          Hide
          Yonik Seeley added a comment -

          Hi Nicolas, I like the idea of reducing the number of tiles that need to be queried, but it does look like the current reduction might be a little aggressive for the default. Perhaps we could have some sort of filtering accuracy parameter that could give more precise control over the trade-off?

          Show
          Yonik Seeley added a comment - Hi Nicolas, I like the idea of reducing the number of tiles that need to be queried, but it does look like the current reduction might be a little aggressive for the default. Perhaps we could have some sort of filtering accuracy parameter that could give more precise control over the trade-off?
          Hide
          Nicolas Helleringer added a comment -

          I do agree, it is odd.

          I shall go through the process again and watch where it comes from.

          Show
          Nicolas Helleringer added a comment - I do agree, it is odd. I shall go through the process again and watch where it comes from.
          Hide
          Grant Ingersoll added a comment -

          Thanks, Nicolas. To me, based on these values, the answer is to revert and then refactor.

          Also, is the 9 in the last column of the second table (radius 25) an outlier or a c & p error? That seems really odd.

          Show
          Grant Ingersoll added a comment - Thanks, Nicolas. To me, based on these values, the answer is to revert and then refactor. Also, is the 9 in the last column of the second table (radius 25) an outlier or a c & p error? That seems really odd.
          Hide
          Nicolas Helleringer added a comment - - edited

          Summary tables :

          Tile Level TierLegnth TierBoxes TileXLength (miles)
          0 1 1 24902
          1 2 4 12451
          2 4 16 6225,5
          3 8 64 3112,75
          4 16 256 1556,375
          5 32 1024 778,1875
          6 64 4096 389,09375
          7 128 16384 194,546875
          8 256 65536 97,2734375
          9 512 262144 48,63671875
          10 1024 1048576 24,31835938
          11 2048 4194304 12,15917969
          12 4096 16777216 6,079589844
          13 8192 67108864 3,039794922
          14 16384 268435456 1,519897461
          15 32768 1073741824 0,75994873
          Radius (miles) legacy bestFit legacy bestFit TileLength legacy bestFit max number of Box to fetch new bestFit new bestFit TileLength new bestFit number of Box to fetch
          1 18 0,75994873 9 14 1,519897461 4
          5 16 0,75994873 64 12 6,079589844 4
          10 15 0,75994873 225 11 12,15917969 4
          25 13 3,039794922 100 9 48,63671875 4
          50 12 6,079589844 100 8 97,2734375 4
          100 11 12,15917969 100 7 194,546875 4
          250 10 24,31835938 144 6 389,09375 4
          500 9 48,63671875 144 5 778,1875 4
          1000 8 97,2734375 144 4 1556,375 4
          2500 7 194,546875 196 3 3112,75 4
          5000 6 389,09375 196 2 6225,5 4
          10000 5 778,1875 196 1 12451 4
          Show
          Nicolas Helleringer added a comment - - edited Summary tables : Tile Level TierLegnth TierBoxes TileXLength (miles) 0 1 1 24902 1 2 4 12451 2 4 16 6225,5 3 8 64 3112,75 4 16 256 1556,375 5 32 1024 778,1875 6 64 4096 389,09375 7 128 16384 194,546875 8 256 65536 97,2734375 9 512 262144 48,63671875 10 1024 1048576 24,31835938 11 2048 4194304 12,15917969 12 4096 16777216 6,079589844 13 8192 67108864 3,039794922 14 16384 268435456 1,519897461 15 32768 1073741824 0,75994873 Radius (miles) legacy bestFit legacy bestFit TileLength legacy bestFit max number of Box to fetch new bestFit new bestFit TileLength new bestFit number of Box to fetch 1 18 0,75994873 9 14 1,519897461 4 5 16 0,75994873 64 12 6,079589844 4 10 15 0,75994873 225 11 12,15917969 4 25 13 3,039794922 100 9 48,63671875 4 50 12 6,079589844 100 8 97,2734375 4 100 11 12,15917969 100 7 194,546875 4 250 10 24,31835938 144 6 389,09375 4 500 9 48,63671875 144 5 778,1875 4 1000 8 97,2734375 144 4 1556,375 4 2500 7 194,546875 196 3 3112,75 4 5000 6 389,09375 196 2 6225,5 4 10000 5 778,1875 196 1 12451 4
          Hide
          Nicolas Helleringer added a comment -

          What my code do :

          It looks how many times you can fit the search diameter (2.0d * range) into the distance that will be split into longitudes range (I-e distanceUnit.earthCircumference()).

          And then it takes the first biggest level of Tier that will have a range just above the search diameter (int bestFit = (int) Math.ceil(log2(times))

          This way you ll have the better comprise betwenn fetching the less number of boxes and not fetching too big boxes with too many documents in them.

          Show
          Nicolas Helleringer added a comment - What my code do : It looks how many times you can fit the search diameter (2.0d * range) into the distance that will be split into longitudes range (I-e distanceUnit.earthCircumference()). And then it takes the first biggest level of Tier that will have a range just above the search diameter (int bestFit = (int) Math.ceil(log2(times)) This way you ll have the better comprise betwenn fetching the less number of boxes and not fetching too big boxes with too many documents in them.
          Hide
          Grant Ingersoll added a comment -

          Nicolas,

          Why the change in the best fit algorithm? Do you have a reference to calculation of this?

          Show
          Grant Ingersoll added a comment - Nicolas, Why the change in the best fit algorithm? Do you have a reference to calculation of this?
          Hide
          Grant Ingersoll added a comment -

          Committed Nicolas' updates to trunk

          Show
          Grant Ingersoll added a comment - Committed Nicolas' updates to trunk
          Hide
          Nicolas Helleringer added a comment -

          Worked again around the code in CartesianPolyFilterBuilder and finally set up this test (attached).

          Here lies the difference between the two patchs : as I reworked all the logic I did change the code in getShapeLoop that you did not touch.

          Maybe it shall be adressed in another issue.

          Show
          Nicolas Helleringer added a comment - Worked again around the code in CartesianPolyFilterBuilder and finally set up this test (attached). Here lies the difference between the two patchs : as I reworked all the logic I did change the code in getShapeLoop that you did not touch. Maybe it shall be adressed in another issue.
          Hide
          Nicolas Helleringer added a comment - - edited

          Re checked and you are right.

          I read too fast your patch without the global code context.
          With your fix the code is doing the right thing concerning the prime meridian.

          So, mine should only be considered as a cosmetic one.
          If you do no find that the logic is clearer with it, you can ignore it.

          Show
          Nicolas Helleringer added a comment - - edited Re checked and you are right. I read too fast your patch without the global code context. With your fix the code is doing the right thing concerning the prime meridian. So, mine should only be considered as a cosmetic one. If you do no find that the logic is clearer with it, you can ignore it.
          Hide
          Grant Ingersoll added a comment -

          AFAICT, that is covered by the current code.

          Show
          Grant Ingersoll added a comment - AFAICT, that is covered by the current code.
          Hide
          Nicolas Helleringer added a comment -
              if (longUpperRight < longLowerLeft) { // Box cross the 180 meridian
                addBoxes(shape, ctp, latLowerLeft, longLowerLeft, latUpperRight, LatLng.LONGITUDE_DEGREE_RANGE / 2);
                addBoxes(shape, ctp, latLowerLeft, -LatLng.LONGITUDE_DEGREE_RANGE / 2, latUpperRight, longUpperRight);
              } else {
                addBoxes(shape, ctp, latLowerLeft, longLowerLeft, latUpperRight, longUpperRight);
          

          If the search area cross the prime meridian we spilt the addBoxes call in two : one half East of the meridian, one half West to handle the jump from -180 degree to 180
          If not we do a normal addBoxes call

          Show
          Nicolas Helleringer added a comment - if (longUpperRight < longLowerLeft) { // Box cross the 180 meridian addBoxes(shape, ctp, latLowerLeft, longLowerLeft, latUpperRight, LatLng.LONGITUDE_DEGREE_RANGE / 2); addBoxes(shape, ctp, latLowerLeft, -LatLng.LONGITUDE_DEGREE_RANGE / 2, latUpperRight, longUpperRight); } else { addBoxes(shape, ctp, latLowerLeft, longLowerLeft, latUpperRight, longUpperRight); If the search area cross the prime meridian we spilt the addBoxes call in two : one half East of the meridian, one half West to handle the jump from -180 degree to 180 If not we do a normal addBoxes call
          Hide
          Grant Ingersoll added a comment -

          handles East AND West tiles when close to the prime meridian (either by 0 or by 180 values) where yours only catch East ones

          How so?

          Show
          Grant Ingersoll added a comment - handles East AND West tiles when close to the prime meridian (either by 0 or by 180 values) where yours only catch East ones How so?
          Hide
          Nicolas Helleringer added a comment -

          Here is the re worked patch, sorry for the formatting :s

          This patch :

          • handles East AND West tiles when close to the prime meridian (either by 0 or by 180 values) where yours only catch East ones
          • reworks the logic to have algorithm in the code logic instead of in variables values
          • gives variavbles full names to better understand the code logic
          • separates the logic of handling the prime meridian which stays in getBoxShape from the logic of add boxes to the shape in addBoxes : addBoxes is called one time in normal case and two times for search area crossing the prime meridian (half by half)

          Hope it better fits your need now

          Show
          Nicolas Helleringer added a comment - Here is the re worked patch, sorry for the formatting :s This patch : handles East AND West tiles when close to the prime meridian (either by 0 or by 180 values) where yours only catch East ones reworks the logic to have algorithm in the code logic instead of in variables values gives variavbles full names to better understand the code logic separates the logic of handling the prime meridian which stays in getBoxShape from the logic of add boxes to the shape in addBoxes : addBoxes is called one time in normal case and two times for search area crossing the prime meridian (half by half) Hope it better fits your need now
          Hide
          Grant Ingersoll added a comment -

          Nicolas,

          How does this compare to what I already committed? Also, can you put up a patch that doesn't have all the formatting changes and just has the bug fixes?

          Show
          Grant Ingersoll added a comment - Nicolas, How does this compare to what I already committed? Also, can you put up a patch that doesn't have all the formatting changes and just has the bug fixes?
          Hide
          Nicolas Helleringer added a comment -

          Here is a patch to import the CartesianPolyFilterBuilder:getBoxShape part of the code from the worked we have done with Chris @ GoogleCode
          I tried to keep it as minimal as possible

          Next step is to patch LatLngRectangle in the same way

          Show
          Nicolas Helleringer added a comment - Here is a patch to import the CartesianPolyFilterBuilder:getBoxShape part of the code from the worked we have done with Chris @ GoogleCode I tried to keep it as minimal as possible Next step is to patch LatLngRectangle in the same way
          Hide
          Grant Ingersoll added a comment -

          Committed revision 929956.

          Show
          Grant Ingersoll added a comment - Committed revision 929956.
          Hide
          Chris Male added a comment -

          Hi Nicolas,

          Sorry I hadn't seen that you had been working on this. I will create a patch based on your work so we can get this fixed up in the next day or two.

          Show
          Chris Male added a comment - Hi Nicolas, Sorry I hadn't seen that you had been working on this. I will create a patch based on your work so we can get this fixed up in the next day or two.
          Hide
          Nicolas Helleringer added a comment -

          If you want to check / compare code while waiting for the commit from Chris :

          http://code.google.com/p/spatial-search-lucene/source/browse/trunk/src/main/java/org/apache/lucene/spatial/tier/CartesianPolyFilterBuilder.java

          Regards,

          Nicolas

          Show
          Nicolas Helleringer added a comment - If you want to check / compare code while waiting for the commit from Chris : http://code.google.com/p/spatial-search-lucene/source/browse/trunk/src/main/java/org/apache/lucene/spatial/tier/CartesianPolyFilterBuilder.java Regards, Nicolas
          Hide
          Nicolas Helleringer added a comment -

          This is fixed in the pendig re work done by Chris Male

          Should be comitted in a whole soon

          Show
          Nicolas Helleringer added a comment - This is fixed in the pendig re work done by Chris Male Should be comitted in a whole soon
          Hide
          Grant Ingersoll added a comment -

          Fix. Will commit today or tomorrow. Going to double check logic again.

          Show
          Grant Ingersoll added a comment - Fix. Will commit today or tomorrow. Going to double check logic again.

            People

            • Assignee:
              Grant Ingersoll
              Reporter:
              Grant Ingersoll
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development