Solr
  1. Solr
  2. SOLR-2268

Add support for Point in Polygon searches

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0
    • Component/s: None
    • Labels:
      None

      Description

      In spatial applications, it is common to ask whether a point is inside of a polygon. Solr could support two forms of this:

      1. A field contains a polygon and the user supplies a point. If it does, the doc is returned.
      2. A document contains a point and the user supplies a polygon. If the point is in the polygon, return the document

      With both of these case, it would be good to support the negative assertion, too.

      1. SOLR-2268.patch
        42 kB
        Grant Ingersoll

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Closed after release.

          Show
          Uwe Schindler added a comment - Closed after release.
          Hide
          David Smiley added a comment -

          SOLR-3304 is committed, and consequently there is now polygon support; I'm closing this issue.

          Polygons and various other shapes are implemented by Spatial4j (ASL licensed lib) used by the new Lucene 4 spatial module. Points, rectangles, and circles are implemented directly by Spatial4j whereas JTS is used for polygons and other shapes supported in the WKT format. Spatial4j wraps JTS to add geospatial awareness – notably dateline wrap. Pole wrap is not yet supported. The new SpatialRecursivePrefixTreeFieldType added to Solr can index any Spatial4j shape and query by them as well.

          I've got a preliminary wiki page on this but I need to update it and see how to integrate this content with the rest of the wiki. http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4

          Show
          David Smiley added a comment - SOLR-3304 is committed, and consequently there is now polygon support; I'm closing this issue. Polygons and various other shapes are implemented by Spatial4j (ASL licensed lib) used by the new Lucene 4 spatial module. Points, rectangles, and circles are implemented directly by Spatial4j whereas JTS is used for polygons and other shapes supported in the WKT format. Spatial4j wraps JTS to add geospatial awareness – notably dateline wrap. Pole wrap is not yet supported. The new SpatialRecursivePrefixTreeFieldType added to Solr can index any Spatial4j shape and query by them as well. I've got a preliminary wiki page on this but I need to update it and see how to integrate this content with the rest of the wiki. http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
          Hide
          David Smiley added a comment -

          In a day or two you'll see a patch to SOLR-3304 (Add Solr support for the new Lucene Spatial Module). Then you'll be able to use polygons (note: triggers need for JTS jar to be on classpath). It's still got the shortcomings I mentioned in my previous post, which alas, I didn't get to yet.

          Show
          David Smiley added a comment - In a day or two you'll see a patch to SOLR-3304 (Add Solr support for the new Lucene Spatial Module). Then you'll be able to use polygons (note: triggers need for JTS jar to be on classpath). It's still got the shortcomings I mentioned in my previous post, which alas, I didn't get to yet.
          Hide
          Kiran Sugana added a comment -

          Any update on this ?

          Show
          Kiran Sugana added a comment - Any update on this ?
          Hide
          David Smiley added a comment -

          FWIW, this has been made top of my priority list for addition to LSP (which is Solr trunk only right now). It does polygons already but they are not geodetic – no dateline crossing, no pole wrapping, and Mercator projection is assumed (flat earth). My initial goal is simply adding dateline wrap which will make it useful for a majority of contexts which use the Mercator projection and can't scroll/pan beyond a pole any way (such as Google Maps, etc.). I expect this to be done in a couple weeks. Adding pole wrap and great-circle-distance lines (not mercator), will probably follow thereafter.

          Show
          David Smiley added a comment - FWIW, this has been made top of my priority list for addition to LSP (which is Solr trunk only right now). It does polygons already but they are not geodetic – no dateline crossing, no pole wrapping, and Mercator projection is assumed (flat earth). My initial goal is simply adding dateline wrap which will make it useful for a majority of contexts which use the Mercator projection and can't scroll/pan beyond a pole any way (such as Google Maps, etc.). I expect this to be done in a couple weeks. Adding pole wrap and great-circle-distance lines (not mercator), will probably follow thereafter.
          Hide
          Randy Jones added a comment -

          Another vote for this functionality. Any updates?

          Show
          Randy Jones added a comment - Another vote for this functionality. Any updates?
          Hide
          David Smiley added a comment -

          Thanks for the pointers Teun; I will consider them when I improve the polygon support in LSP. LSP supports WKT, in case you didn't know. And since June it supports indexing shapes (i.e. not just points), which AFAIK is the only Lucene/Solr implementation I know to do this. There should be a lot more progress soon.

          What do you mean by faceting on a polygon? Or did you mean faceting on "all places", but still, what do you mean by that?

          Show
          David Smiley added a comment - Thanks for the pointers Teun; I will consider them when I improve the polygon support in LSP. LSP supports WKT, in case you didn't know. And since June it supports indexing shapes (i.e. not just points), which AFAIK is the only Lucene/Solr implementation I know to do this. There should be a lot more progress soon. What do you mean by faceting on a polygon? Or did you mean faceting on "all places", but still, what do you mean by that?
          Hide
          Teun Duynstee added a comment -

          I don't want to interfere with the process on this issue, but I can give my perspective on this issue as a heavy user of both Solr and geospatial data (not coming from the GIS field though).

          In my opinion, the WKT format is a bit of a must if you want users of geospatial data to take your solution seriously. Yes, it is verbose, and I'm not sure how you should implement the actual storage, but you should be able to regenerate the WKT data. Defining polygons is a lot more complex than you'd think from the start. If you check out public datasets like the OpenStreetMap data, you'll see that it is common for a geo-shape to have several parts, holes and little islands inside the holes. All vendors of GIS tooling use this format (actually, they use many, but WKT is the one they all use). If you're interested in more concise ways of storing polygons, it could be worth looking into the work Google did (http://code.google.com/apis/maps/documentation/utilities/polylinealgorithm.html, although be aware that these are lossy algorithms). WKB is meant to be smaller and non-lossy.

          Think of al the scenarios that open up when you can combine Solr search with the output from other GIS-like systems. You could imagine calculating a polygon of all places that you can drive to within an hour and facet on that (in PostGIS on PostgresSQL, you can do routing on OpenStreetmap data using the PGRouting module). I am very excited about the work that is done on the Lucene Spatial Playground, wish I where capable enough to help.

          Show
          Teun Duynstee added a comment - I don't want to interfere with the process on this issue, but I can give my perspective on this issue as a heavy user of both Solr and geospatial data (not coming from the GIS field though). In my opinion, the WKT format is a bit of a must if you want users of geospatial data to take your solution seriously. Yes, it is verbose, and I'm not sure how you should implement the actual storage, but you should be able to regenerate the WKT data. Defining polygons is a lot more complex than you'd think from the start. If you check out public datasets like the OpenStreetMap data, you'll see that it is common for a geo-shape to have several parts, holes and little islands inside the holes. All vendors of GIS tooling use this format (actually, they use many, but WKT is the one they all use). If you're interested in more concise ways of storing polygons, it could be worth looking into the work Google did ( http://code.google.com/apis/maps/documentation/utilities/polylinealgorithm.html , although be aware that these are lossy algorithms). WKB is meant to be smaller and non-lossy. Think of al the scenarios that open up when you can combine Solr search with the output from other GIS-like systems. You could imagine calculating a polygon of all places that you can drive to within an hour and facet on that (in PostGIS on PostgresSQL, you can do routing on OpenStreetmap data using the PGRouting module). I am very excited about the work that is done on the Lucene Spatial Playground, wish I where capable enough to help.
          Hide
          Alexander Kanarsky added a comment -

          Robert,

          in addition to David's work (SOLR-2155) you can also try my polygon extension for the JTeam's (Chris Male et al) Spatial Solr Plugin (SSP). The SSP is a standalone Solr module that works with Solr 1.4.x, you can get it here:

          http://www.jteam.nl/products/spatialsolrplugin.html

          It allows Radius search for the lat,lon documents.

          The polygon/polyline extension for SSP 2.0 in addition to this allows polygon search. It is located here:

          http://sourceforge.net/projects/ssplex/files/

          There has some limitations (for example, plane geometry is used) but it may work just well for you, depending on your situation.

          Show
          Alexander Kanarsky added a comment - Robert, in addition to David's work ( SOLR-2155 ) you can also try my polygon extension for the JTeam's (Chris Male et al) Spatial Solr Plugin (SSP). The SSP is a standalone Solr module that works with Solr 1.4.x, you can get it here: http://www.jteam.nl/products/spatialsolrplugin.html It allows Radius search for the lat,lon documents. The polygon/polyline extension for SSP 2.0 in addition to this allows polygon search. It is located here: http://sourceforge.net/projects/ssplex/files/ There has some limitations (for example, plane geometry is used) but it may work just well for you, depending on your situation.
          Hide
          David Smiley added a comment -

          For a polygon query shape searching indexed point data, I have this working on SOLR-2155. The search won't wrap around the date line though.

          For a query searching indexed polygon data, there is work in progress that I expect will see the light of day in a month or two.

          Ryan McKinley, Chris Male, and I have joined forces to work on a joint effort for a Lucene/Solr geospatial module. It includes both polygon query shapes and indexed polygons. Presently the work is in progress at http://code.google.com/p/lucene-spatial-playground/ but it's only there temporarily. I'll try to remember to post a comment on this issue so that you and others are informed.

          Show
          David Smiley added a comment - For a polygon query shape searching indexed point data, I have this working on SOLR-2155 . The search won't wrap around the date line though. For a query searching indexed polygon data, there is work in progress that I expect will see the light of day in a month or two. Ryan McKinley, Chris Male, and I have joined forces to work on a joint effort for a Lucene/Solr geospatial module. It includes both polygon query shapes and indexed polygons. Presently the work is in progress at http://code.google.com/p/lucene-spatial-playground/ but it's only there temporarily. I'll try to remember to post a comment on this issue so that you and others are informed.
          Hide
          Robert Baillie added a comment -

          Is there any progress on this? We would very much be interested in this functionality...

          Show
          Robert Baillie added a comment - Is there any progress on this? We would very much be interested in this functionality...
          Hide
          Grant Ingersoll added a comment -

          This patch should in no way shape or form be seen as useful at this point. I'm mostly putting it up to document my work so far.

          Show
          Grant Ingersoll added a comment - This patch should in no way shape or form be seen as useful at this point. I'm mostly putting it up to document my work so far.
          Hide
          Grant Ingersoll added a comment -

          Useful link on the algorithms: http://www.linuxjournal.com/article/2029

          Mark: Mmm doughnuts! Interesting idea!

          Show
          Grant Ingersoll added a comment - Useful link on the algorithms: http://www.linuxjournal.com/article/2029 Mark: Mmm doughnuts! Interesting idea!
          Hide
          Mark Harwood added a comment -

          The extra parens can be used to let you add holes to the polygons.
          An example application is to offer users the option to "search further afield" having read through initial batch of search results. The subsequent search polygon is then effectively a doughnut shape with a hole that avoids returning results already seen from the first search.

          I know you can achieve the same effect with "NOT" clauses but it illustrates the flexibility on offer.

          Show
          Mark Harwood added a comment - The extra parens can be used to let you add holes to the polygons. An example application is to offer users the option to "search further afield" having read through initial batch of search results. The subsequent search polygon is then effectively a doughnut shape with a hole that avoids returning results already seen from the first search. I know you can achieve the same effect with "NOT" clauses but it illustrates the flexibility on offer.
          Hide
          Chris A. Mattmann added a comment -

          JTS is good, but it is LGPL.

          That's why we started Apache SIS [1].

          [1] http://incubator.apache.org/sis/

          Show
          Chris A. Mattmann added a comment - JTS is good, but it is LGPL. That's why we started Apache SIS [1] . [1] http://incubator.apache.org/sis/
          Hide
          Grant Ingersoll added a comment -

          The spatially enabled databases follow the "Simple Features Specification for SQL" spec which defines the ironically named "Well Known Text" format (WKT) for representing points polygons etc

          http://en.wikipedia.org/wiki/Well-known_text

          Hmm, seems a bit verbose. For instance, we will already know it is a poly by it's field type and the extra parens seem like they could easily be replaced. Then again, if it is a standard...

          Using any AWT class can introduce issues related to servers and headless mode. I'd recommend instead the JTS Java Topology Suite which has a lot of useful stuff - including WKT readers and writers

          http://goo.gl/IyNeD

          AWT won't work anyway, as it only takes ints. I've found a simple ASL Polygon class that I think does what we need it. JTS is good, but it is LGPL.

          Show
          Grant Ingersoll added a comment - The spatially enabled databases follow the "Simple Features Specification for SQL" spec which defines the ironically named "Well Known Text" format (WKT) for representing points polygons etc http://en.wikipedia.org/wiki/Well-known_text Hmm, seems a bit verbose. For instance, we will already know it is a poly by it's field type and the extra parens seem like they could easily be replaced. Then again, if it is a standard... Using any AWT class can introduce issues related to servers and headless mode. I'd recommend instead the JTS Java Topology Suite which has a lot of useful stuff - including WKT readers and writers http://goo.gl/IyNeD AWT won't work anyway, as it only takes ints. I've found a simple ASL Polygon class that I think does what we need it. JTS is good, but it is LGPL.
          Hide
          Mark Harwood added a comment -

          I would propose the following format: vertices are separated by semi-colons, points are separated by commas

          The spatially enabled databases follow the "Simple Features Specification for SQL" spec which defines the ironically named "Well Known Text" format (WKT) for representing points polygons etc
          http://en.wikipedia.org/wiki/Well-known_text

          For PointType, we can just use capabilities of java.awt.Polygon

          Using any AWT class can introduce issues related to servers and headless mode. I'd recommend instead the JTS Java Topology Suite which has a lot of useful stuff - including WKT readers and writers
          http://goo.gl/IyNeD

          Show
          Mark Harwood added a comment - I would propose the following format: vertices are separated by semi-colons, points are separated by commas The spatially enabled databases follow the "Simple Features Specification for SQL" spec which defines the ironically named "Well Known Text" format (WKT) for representing points polygons etc http://en.wikipedia.org/wiki/Well-known_text For PointType, we can just use capabilities of java.awt.Polygon Using any AWT class can introduce issues related to servers and headless mode. I'd recommend instead the JTS Java Topology Suite which has a lot of useful stuff - including WKT readers and writers http://goo.gl/IyNeD
          Hide
          Lance Norskog added a comment -

          A second variation: a multiValued field of vertex pairs which contain a polygon. The incoming point searches for vertex point. This is faster than the bitwise filter, but uses more space for larger polygons. The bitwise filter uses constant memory for each document.

          Show
          Lance Norskog added a comment - A second variation: a multiValued field of vertex pairs which contain a polygon. The incoming point searches for vertex point. This is faster than the bitwise filter, but uses more space for larger polygons. The bitwise filter uses constant memory for each document.
          Hide
          Lance Norskog added a comment - - edited

          1 trick for speeding up "document holds polygons", using vertex-based hashing of lat/long values. (It's a variation on a kind of bitwise filtering whose name I cannot remember: if the bit is off, there is no match, but if the bit is on there may be a match.)

          Master data: A field with one or more polygon descriptions.
          Bitwise data: Two bit fields, latitude&longitude, with a string of bits for each vertex. For example, given a Level Of Detail (LOD) of 1 degree, there would be 360 bits in either bitfield. The document would have one of each bitfield. Each degree's bit is true if any polygon has area within that bit's degree.

          The first phase of searching for point in all polygons is to check the latitude and longitude bitfields for that point.

          Show
          Lance Norskog added a comment - - edited 1 trick for speeding up "document holds polygons", using vertex-based hashing of lat/long values. (It's a variation on a kind of bitwise filtering whose name I cannot remember: if the bit is off, there is no match, but if the bit is on there may be a match.) Master data: A field with one or more polygon descriptions. Bitwise data: Two bit fields, latitude&longitude, with a string of bits for each vertex. For example, given a Level Of Detail (LOD) of 1 degree, there would be 360 bits in either bitfield. The document would have one of each bitfield. Each degree's bit is true if any polygon has area within that bit's degree. The first phase of searching for point in all polygons is to check the latitude and longitude bitfields for that point.
          Hide
          Grant Ingersoll added a comment -

          This is a work in progress. Here are a few ideas:
          I think this can all be accomplished via a few things:

          For the case where the field is a polygon and the user supplies a point, we need a new FieldType, PolygonType.

          I would propose the following format: vertices are separated by semi-colons, points are separated by commas just as they are for the other capabilities, i.e.: 1.0,1.0;0.0,0.0;3.0,3.0 gives the vertices <1.0,1.0> <0,0>, <3, 3>. Lines are assumed between each point. See the java.awt.Polygon class

          Next, I think we can cover everything else through some function queries:
          For case one above

          pip(pt, dimension, boost) -- pt can be a PointType or a Vector.  Boost says how much score to give if a point is in a polygon
          
          pipll(latlonPt, boost) -- Use spherical calculations to determine if the lat lon point is in the polygon, as it is laid on a sphere 
          //Note, we may just fold this into the one above, but I think the calculations could be different enough that we would want to avoid instanceof checks.  Plus the parsing is simpler
          

          For case two above, the user would pass in a polygon as defined above for the PolygonType. In this case, we still need a function query:

          pip(poly, boost) -- poly is the passed in polygon, boost is the value to give if the point is in a polygon
          

          For PointType, we can just use capabilities of java.awt.Polygon, for lat lon, I'm still investigating. It could be we still use Polygon, but maybe we can just scale it a little bit bigger and live with some error. Otherwise, there seems to be some decent algorithms for doing it w/ lat/lon (http://msdn.microsoft.com/en-us/library/cc451895.aspx for one). Not sure that one is practical at scale, but it could be a start.

          While we are at it, it shouldn't be that hard to do the same for lines, i.e. is the point on a line.

          Show
          Grant Ingersoll added a comment - This is a work in progress. Here are a few ideas: I think this can all be accomplished via a few things: For the case where the field is a polygon and the user supplies a point, we need a new FieldType, PolygonType. I would propose the following format: vertices are separated by semi-colons, points are separated by commas just as they are for the other capabilities, i.e.: 1.0,1.0;0.0,0.0;3.0,3.0 gives the vertices <1.0,1.0> <0,0>, <3, 3>. Lines are assumed between each point. See the java.awt.Polygon class Next, I think we can cover everything else through some function queries: For case one above pip(pt, dimension, boost) -- pt can be a PointType or a Vector. Boost says how much score to give if a point is in a polygon pipll(latlonPt, boost) -- Use spherical calculations to determine if the lat lon point is in the polygon, as it is laid on a sphere //Note, we may just fold this into the one above, but I think the calculations could be different enough that we would want to avoid instanceof checks. Plus the parsing is simpler For case two above, the user would pass in a polygon as defined above for the PolygonType. In this case, we still need a function query: pip(poly, boost) -- poly is the passed in polygon, boost is the value to give if the point is in a polygon For PointType, we can just use capabilities of java.awt.Polygon, for lat lon, I'm still investigating. It could be we still use Polygon, but maybe we can just scale it a little bit bigger and live with some error. Otherwise, there seems to be some decent algorithms for doing it w/ lat/lon ( http://msdn.microsoft.com/en-us/library/cc451895.aspx for one). Not sure that one is practical at scale, but it could be a start. While we are at it, it shouldn't be that hard to do the same for lines, i.e. is the point on a line.

            People

            • Assignee:
              David Smiley
              Reporter:
              Grant Ingersoll
            • Votes:
              7 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development