Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-9154

Remove encodeCeil() to encode bounding box queries

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None
    • New

    Description

      We currently have the following logic in LatLonPoint#newBoxquery():

       // exact double values of lat=90.0D and lon=180.0D must be treated special as they are not represented in the encoding
      // and should not drag in extra bogus junk! TODO: should encodeCeil just throw ArithmeticException to be less trappy here?
      if (minLatitude == 90.0) {
        // range cannot match as 90.0 can never exist
        return new MatchNoDocsQuery("LatLonPoint.newBoxQuery with minLatitude=90.0");
      }
      if (minLongitude == 180.0) {
        if (maxLongitude == 180.0) {
          // range cannot match as 180.0 can never exist
          return new MatchNoDocsQuery("LatLonPoint.newBoxQuery with minLongitude=maxLongitude=180.0");
        } else if (maxLongitude < minLongitude) {
          // encodeCeil() with dateline wrapping!
          minLongitude = -180.0;
        }
      }
      byte[] lower = encodeCeil(minLatitude, minLongitude);
      byte[] upper = encode(maxLatitude, maxLongitude);
      

       

      IMO opinion this is confusing and can lead to strange results. For example a query with minLatitude = minLatitude = 90 does not match points with latitude = 90. On the other hand a query with minLatitude = minLatitude = 89.99999996}} will match points at latitude = 90.

      I don't really understand the statement that says: 90.0 can never exist as this is as well true for values > 89.99999995809048 which is the maximum quantize value. In this argument, this will be true for all values between quantize coordinates as they do not exist in the index, why 90D is so special? I guess because it cannot be ceil up without overflowing the encoding.

      Another argument to remove this function is that it opens the room to have false negatives in the result of the query. if a query has minLon = 89.999999957, it won't match points with longitude = 89.999999957 as it is rounded up to 89.99999995809048.

      The only merit I can see in the current approach is that if you only index points that are already quantize, then all queries would be exact. But does it make sense for someone to only index quantize values and then query by non-quantize bounding boxes?

       

      I hope I am missing something, but my proposal is to remove encodeCeil all together and remove all the special handling at the positive pole and positive dateline.

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ivera Ignacio Vera
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m