Lucene - Core
  1. Lucene - Core
  2. LUCENE-5779

Improve BBox AreaSimilarity algorithm to consider lines and points

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.10, 6.0
    • Component/s: modules/spatial
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      GeoPortal's area overlap algorithm didn't consider lines and points; they end up turning the score 0. I've thought about this for a bit and I've come up with an alternative scoring algorithm. (already coded and tested and documented):
      New Javadocs:

      /**
       * The algorithm is implemented as envelope on envelope overlays rather than
       * complex polygon on complex polygon overlays.
       * <p/>
       * <p/>
       * Spatial relevance scoring algorithm:
       * <DL>
       *   <DT>queryArea</DT> <DD>the area of the input query envelope</DD>
       *   <DT>targetArea</DT> <DD>the area of the target envelope (per Lucene document)</DD>
       *   <DT>intersectionArea</DT> <DD>the area of the intersection between the query and target envelopes</DD>
       *   <DT>queryTargetProportion</DT> <DD>A 0-1 factor that divides the score proportion between query and target.
       *   0.5 is evenly.</DD>
       *
       *   <DT>queryRatio</DT> <DD>intersectionArea / queryArea; (see note)</DD>
       *   <DT>targetRatio</DT> <DD>intersectionArea / targetArea; (see note)</DD>
       *   <DT>queryFactor</DT> <DD>queryRatio * queryTargetProportion;</DD>
       *   <DT>targetFactor</DT> <DD>targetRatio * (1 - queryTargetProportion);</DD>
       *   <DT>score</DT> <DD>queryFactor + targetFactor;</DD>
       * </DL>
       * Note: The actual computation of queryRatio and targetRatio is more complicated so that it considers
       * points and lines. Lines have the ratio of overlap, and points are either 1.0 or 0.0 depending on wether
       * it intersects or not.
       * <p />
       * Based on Geoportal's
       * <a href="http://geoportal.svn.sourceforge.net/svnroot/geoportal/Geoportal/trunk/src/com/esri/gpt/catalog/lucene/SpatialRankingValueSource.java">
       *   SpatialRankingValueSource</a> but modified. GeoPortal's algorithm will yield a score of 0
       * if either a line or point is compared, and it's doesn't output a 0-1 normalized score (it multiplies the factors).
       *
       * @lucene.experimental
       */
      

        Issue Links

          Activity

          Hide
          David Smiley added a comment -

          The attached patch is a partial patch from LUCENE-5714 including just the AreaSimilarity class, and the new test for BBoxStrategy which includes the test for this new similarity showing examples scores. Developing it surfaced a variety of dateline related bugs when computing intersection width & height.

          Show
          David Smiley added a comment - The attached patch is a partial patch from LUCENE-5714 including just the AreaSimilarity class, and the new test for BBoxStrategy which includes the test for this new similarity showing examples scores. Developing it surfaced a variety of dateline related bugs when computing intersection width & height.
          Hide
          Ryan McKinley added a comment -

          +1 thanks for looking at this

          Show
          Ryan McKinley added a comment - +1 thanks for looking at this
          Hide
          ASF subversion and git services added a comment -

          Commit 1606905 from David Smiley in branch 'dev/trunk'
          [ https://svn.apache.org/r1606905 ]

          LUCENE-5771: Remove BBoxStrategy's support for Overlaps because it never actually did work.

          This is a partial commit for this issue – just the BBox portion so as not to interfere with LUCENE-5779. Trunk only (bbox isn't in 4x yet).

          Show
          ASF subversion and git services added a comment - Commit 1606905 from David Smiley in branch 'dev/trunk' [ https://svn.apache.org/r1606905 ] LUCENE-5771 : Remove BBoxStrategy's support for Overlaps because it never actually did work. This is a partial commit for this issue – just the BBox portion so as not to interfere with LUCENE-5779 . Trunk only (bbox isn't in 4x yet).
          Hide
          David Smiley added a comment -

          LUCENE-5714's latest patch addresses this issue. It includes a new minSideLength option to this algorithm, plus a new ShapeAreaValueSource which is probably a better choice when your query is a point and you have indexed rects.

          Show
          David Smiley added a comment - LUCENE-5714 's latest patch addresses this issue. It includes a new minSideLength option to this algorithm, plus a new ShapeAreaValueSource which is probably a better choice when your query is a point and you have indexed rects.
          Hide
          ASF subversion and git services added a comment -

          Commit 1608793 from David Smiley in branch 'dev/trunk'
          [ https://svn.apache.org/r1608793 ]

          LUCENE-5714, LUCENE-5779: Enhance BBoxStrategy & Overlap similarity. Configurable docValues / index usage.
          Add new ShapeAreaValueSource.

          Show
          ASF subversion and git services added a comment - Commit 1608793 from David Smiley in branch 'dev/trunk' [ https://svn.apache.org/r1608793 ] LUCENE-5714 , LUCENE-5779 : Enhance BBoxStrategy & Overlap similarity. Configurable docValues / index usage. Add new ShapeAreaValueSource.
          Hide
          ASF subversion and git services added a comment -

          Commit 1608987 from David Smiley in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1608987 ]

          LUCENE-5714, LUCENE-5779: Enhance BBoxStrategy & Overlap similarity. Configurable docValues / index usage.
          Add new ShapeAreaValueSource.

          Show
          ASF subversion and git services added a comment - Commit 1608987 from David Smiley in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1608987 ] LUCENE-5714 , LUCENE-5779 : Enhance BBoxStrategy & Overlap similarity. Configurable docValues / index usage. Add new ShapeAreaValueSource.

            People

            • Assignee:
              David Smiley
              Reporter:
              David Smiley
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development