Add support for Point in Polygon searches

Details

• Type: New Feature
• Status: Closed
• Priority: Major
• Resolution: Fixed
• Affects Version/s: None
• Fix Version/s:
• Component/s: None
• Labels:
None

Description

In spatial applications, it is common to ask whether a point is inside of a polygon. Solr could support two forms of this:

1. A field contains a polygon and the user supplies a point. If it does, the doc is returned.
2. A document contains a point and the user supplies a polygon. If the point is in the polygon, return the document

With both of these case, it would be good to support the negative assertion, too.

Attachments

1. SOLR-2268.patch
42 kB
Grant Ingersoll

Activity

Hide
Grant Ingersoll added a comment -

This is a work in progress. Here are a few ideas:
I think this can all be accomplished via a few things:

For the case where the field is a polygon and the user supplies a point, we need a new FieldType, PolygonType.

I would propose the following format: vertices are separated by semi-colons, points are separated by commas just as they are for the other capabilities, i.e.: 1.0,1.0;0.0,0.0;3.0,3.0 gives the vertices <1.0,1.0> <0,0>, <3, 3>. Lines are assumed between each point. See the java.awt.Polygon class

Next, I think we can cover everything else through some function queries:
For case one above

```pip(pt, dimension, boost) -- pt can be a PointType or a Vector.  Boost says how much score to give if a point is in a polygon

pipll(latlonPt, boost) -- Use spherical calculations to determine if the lat lon point is in the polygon, as it is laid on a sphere
//Note, we may just fold this into the one above, but I think the calculations could be different enough that we would want to avoid instanceof checks.  Plus the parsing is simpler
```

For case two above, the user would pass in a polygon as defined above for the PolygonType. In this case, we still need a function query:

```pip(poly, boost) -- poly is the passed in polygon, boost is the value to give if the point is in a polygon
```

For PointType, we can just use capabilities of java.awt.Polygon, for lat lon, I'm still investigating. It could be we still use Polygon, but maybe we can just scale it a little bit bigger and live with some error. Otherwise, there seems to be some decent algorithms for doing it w/ lat/lon (http://msdn.microsoft.com/en-us/library/cc451895.aspx for one). Not sure that one is practical at scale, but it could be a start.

While we are at it, it shouldn't be that hard to do the same for lines, i.e. is the point on a line.

Show
Grant Ingersoll added a comment - This is a work in progress. Here are a few ideas: I think this can all be accomplished via a few things: For the case where the field is a polygon and the user supplies a point, we need a new FieldType, PolygonType. I would propose the following format: vertices are separated by semi-colons, points are separated by commas just as they are for the other capabilities, i.e.: 1.0,1.0;0.0,0.0;3.0,3.0 gives the vertices <1.0,1.0> <0,0>, <3, 3>. Lines are assumed between each point. See the java.awt.Polygon class Next, I think we can cover everything else through some function queries: For case one above pip(pt, dimension, boost) -- pt can be a PointType or a Vector. Boost says how much score to give if a point is in a polygon pipll(latlonPt, boost) -- Use spherical calculations to determine if the lat lon point is in the polygon, as it is laid on a sphere //Note, we may just fold this into the one above, but I think the calculations could be different enough that we would want to avoid instanceof checks. Plus the parsing is simpler For case two above, the user would pass in a polygon as defined above for the PolygonType. In this case, we still need a function query: pip(poly, boost) -- poly is the passed in polygon, boost is the value to give if the point is in a polygon For PointType, we can just use capabilities of java.awt.Polygon, for lat lon, I'm still investigating. It could be we still use Polygon, but maybe we can just scale it a little bit bigger and live with some error. Otherwise, there seems to be some decent algorithms for doing it w/ lat/lon ( http://msdn.microsoft.com/en-us/library/cc451895.aspx for one). Not sure that one is practical at scale, but it could be a start. While we are at it, it shouldn't be that hard to do the same for lines, i.e. is the point on a line.
Hide
Lance Norskog added a comment - - edited

1 trick for speeding up "document holds polygons", using vertex-based hashing of lat/long values. (It's a variation on a kind of bitwise filtering whose name I cannot remember: if the bit is off, there is no match, but if the bit is on there may be a match.)

Master data: A field with one or more polygon descriptions.
Bitwise data: Two bit fields, latitude&longitude, with a string of bits for each vertex. For example, given a Level Of Detail (LOD) of 1 degree, there would be 360 bits in either bitfield. The document would have one of each bitfield. Each degree's bit is true if any polygon has area within that bit's degree.

The first phase of searching for point in all polygons is to check the latitude and longitude bitfields for that point.

Show
Lance Norskog added a comment - - edited 1 trick for speeding up "document holds polygons", using vertex-based hashing of lat/long values. (It's a variation on a kind of bitwise filtering whose name I cannot remember: if the bit is off, there is no match, but if the bit is on there may be a match.) Master data: A field with one or more polygon descriptions. Bitwise data: Two bit fields, latitude&longitude, with a string of bits for each vertex. For example, given a Level Of Detail (LOD) of 1 degree, there would be 360 bits in either bitfield. The document would have one of each bitfield. Each degree's bit is true if any polygon has area within that bit's degree. The first phase of searching for point in all polygons is to check the latitude and longitude bitfields for that point.
Hide
Lance Norskog added a comment -

A second variation: a multiValued field of vertex pairs which contain a polygon. The incoming point searches for vertex point. This is faster than the bitwise filter, but uses more space for larger polygons. The bitwise filter uses constant memory for each document.

Show
Lance Norskog added a comment - A second variation: a multiValued field of vertex pairs which contain a polygon. The incoming point searches for vertex point. This is faster than the bitwise filter, but uses more space for larger polygons. The bitwise filter uses constant memory for each document.
Hide
Mark Harwood added a comment -

I would propose the following format: vertices are separated by semi-colons, points are separated by commas

The spatially enabled databases follow the "Simple Features Specification for SQL" spec which defines the ironically named "Well Known Text" format (WKT) for representing points polygons etc
http://en.wikipedia.org/wiki/Well-known_text

For PointType, we can just use capabilities of java.awt.Polygon

Using any AWT class can introduce issues related to servers and headless mode. I'd recommend instead the JTS Java Topology Suite which has a lot of useful stuff - including WKT readers and writers
http://goo.gl/IyNeD

Show
Mark Harwood added a comment - I would propose the following format: vertices are separated by semi-colons, points are separated by commas The spatially enabled databases follow the "Simple Features Specification for SQL" spec which defines the ironically named "Well Known Text" format (WKT) for representing points polygons etc http://en.wikipedia.org/wiki/Well-known_text For PointType, we can just use capabilities of java.awt.Polygon Using any AWT class can introduce issues related to servers and headless mode. I'd recommend instead the JTS Java Topology Suite which has a lot of useful stuff - including WKT readers and writers http://goo.gl/IyNeD
Hide
Grant Ingersoll added a comment -

The spatially enabled databases follow the "Simple Features Specification for SQL" spec which defines the ironically named "Well Known Text" format (WKT) for representing points polygons etc

http://en.wikipedia.org/wiki/Well-known_text

Hmm, seems a bit verbose. For instance, we will already know it is a poly by it's field type and the extra parens seem like they could easily be replaced. Then again, if it is a standard...

Using any AWT class can introduce issues related to servers and headless mode. I'd recommend instead the JTS Java Topology Suite which has a lot of useful stuff - including WKT readers and writers

http://goo.gl/IyNeD

AWT won't work anyway, as it only takes ints. I've found a simple ASL Polygon class that I think does what we need it. JTS is good, but it is LGPL.

Show
Grant Ingersoll added a comment - The spatially enabled databases follow the "Simple Features Specification for SQL" spec which defines the ironically named "Well Known Text" format (WKT) for representing points polygons etc http://en.wikipedia.org/wiki/Well-known_text Hmm, seems a bit verbose. For instance, we will already know it is a poly by it's field type and the extra parens seem like they could easily be replaced. Then again, if it is a standard... Using any AWT class can introduce issues related to servers and headless mode. I'd recommend instead the JTS Java Topology Suite which has a lot of useful stuff - including WKT readers and writers http://goo.gl/IyNeD AWT won't work anyway, as it only takes ints. I've found a simple ASL Polygon class that I think does what we need it. JTS is good, but it is LGPL.
Hide
Chris A. Mattmann added a comment -

JTS is good, but it is LGPL.

That's why we started Apache SIS [1].

Show
Chris A. Mattmann added a comment - JTS is good, but it is LGPL. That's why we started Apache SIS [1] . [1] http://incubator.apache.org/sis/
Hide
Mark Harwood added a comment -

The extra parens can be used to let you add holes to the polygons.
An example application is to offer users the option to "search further afield" having read through initial batch of search results. The subsequent search polygon is then effectively a doughnut shape with a hole that avoids returning results already seen from the first search.

I know you can achieve the same effect with "NOT" clauses but it illustrates the flexibility on offer.

Show
Mark Harwood added a comment - The extra parens can be used to let you add holes to the polygons. An example application is to offer users the option to "search further afield" having read through initial batch of search results. The subsequent search polygon is then effectively a doughnut shape with a hole that avoids returning results already seen from the first search. I know you can achieve the same effect with "NOT" clauses but it illustrates the flexibility on offer.
Hide
Grant Ingersoll added a comment -

Useful link on the algorithms: http://www.linuxjournal.com/article/2029

Mark: Mmm doughnuts! Interesting idea!

Show
Grant Ingersoll added a comment - Useful link on the algorithms: http://www.linuxjournal.com/article/2029 Mark: Mmm doughnuts! Interesting idea!
Hide
Grant Ingersoll added a comment -

This patch should in no way shape or form be seen as useful at this point. I'm mostly putting it up to document my work so far.

Show
Grant Ingersoll added a comment - This patch should in no way shape or form be seen as useful at this point. I'm mostly putting it up to document my work so far.
Hide
Robert Baillie added a comment -

Is there any progress on this? We would very much be interested in this functionality...

Show
Robert Baillie added a comment - Is there any progress on this? We would very much be interested in this functionality...
Hide
David Smiley added a comment -

For a polygon query shape searching indexed point data, I have this working on SOLR-2155. The search won't wrap around the date line though.

For a query searching indexed polygon data, there is work in progress that I expect will see the light of day in a month or two.

Ryan McKinley, Chris Male, and I have joined forces to work on a joint effort for a Lucene/Solr geospatial module. It includes both polygon query shapes and indexed polygons. Presently the work is in progress at http://code.google.com/p/lucene-spatial-playground/ but it's only there temporarily. I'll try to remember to post a comment on this issue so that you and others are informed.

Show
David Smiley added a comment - For a polygon query shape searching indexed point data, I have this working on SOLR-2155 . The search won't wrap around the date line though. For a query searching indexed polygon data, there is work in progress that I expect will see the light of day in a month or two. Ryan McKinley, Chris Male, and I have joined forces to work on a joint effort for a Lucene/Solr geospatial module. It includes both polygon query shapes and indexed polygons. Presently the work is in progress at http://code.google.com/p/lucene-spatial-playground/ but it's only there temporarily. I'll try to remember to post a comment on this issue so that you and others are informed.
Hide
Alexander Kanarsky added a comment -

Robert,

in addition to David's work (SOLR-2155) you can also try my polygon extension for the JTeam's (Chris Male et al) Spatial Solr Plugin (SSP). The SSP is a standalone Solr module that works with Solr 1.4.x, you can get it here:

It allows Radius search for the lat,lon documents.

The polygon/polyline extension for SSP 2.0 in addition to this allows polygon search. It is located here:

http://sourceforge.net/projects/ssplex/files/

There has some limitations (for example, plane geometry is used) but it may work just well for you, depending on your situation.

Show
Alexander Kanarsky added a comment - Robert, in addition to David's work ( SOLR-2155 ) you can also try my polygon extension for the JTeam's (Chris Male et al) Spatial Solr Plugin (SSP). The SSP is a standalone Solr module that works with Solr 1.4.x, you can get it here: http://www.jteam.nl/products/spatialsolrplugin.html It allows Radius search for the lat,lon documents. The polygon/polyline extension for SSP 2.0 in addition to this allows polygon search. It is located here: http://sourceforge.net/projects/ssplex/files/ There has some limitations (for example, plane geometry is used) but it may work just well for you, depending on your situation.
Hide
Teun Duynstee added a comment -

I don't want to interfere with the process on this issue, but I can give my perspective on this issue as a heavy user of both Solr and geospatial data (not coming from the GIS field though).

In my opinion, the WKT format is a bit of a must if you want users of geospatial data to take your solution seriously. Yes, it is verbose, and I'm not sure how you should implement the actual storage, but you should be able to regenerate the WKT data. Defining polygons is a lot more complex than you'd think from the start. If you check out public datasets like the OpenStreetMap data, you'll see that it is common for a geo-shape to have several parts, holes and little islands inside the holes. All vendors of GIS tooling use this format (actually, they use many, but WKT is the one they all use). If you're interested in more concise ways of storing polygons, it could be worth looking into the work Google did (http://code.google.com/apis/maps/documentation/utilities/polylinealgorithm.html, although be aware that these are lossy algorithms). WKB is meant to be smaller and non-lossy.

Think of al the scenarios that open up when you can combine Solr search with the output from other GIS-like systems. You could imagine calculating a polygon of all places that you can drive to within an hour and facet on that (in PostGIS on PostgresSQL, you can do routing on OpenStreetmap data using the PGRouting module). I am very excited about the work that is done on the Lucene Spatial Playground, wish I where capable enough to help.

Show
Teun Duynstee added a comment - I don't want to interfere with the process on this issue, but I can give my perspective on this issue as a heavy user of both Solr and geospatial data (not coming from the GIS field though). In my opinion, the WKT format is a bit of a must if you want users of geospatial data to take your solution seriously. Yes, it is verbose, and I'm not sure how you should implement the actual storage, but you should be able to regenerate the WKT data. Defining polygons is a lot more complex than you'd think from the start. If you check out public datasets like the OpenStreetMap data, you'll see that it is common for a geo-shape to have several parts, holes and little islands inside the holes. All vendors of GIS tooling use this format (actually, they use many, but WKT is the one they all use). If you're interested in more concise ways of storing polygons, it could be worth looking into the work Google did ( http://code.google.com/apis/maps/documentation/utilities/polylinealgorithm.html , although be aware that these are lossy algorithms). WKB is meant to be smaller and non-lossy. Think of al the scenarios that open up when you can combine Solr search with the output from other GIS-like systems. You could imagine calculating a polygon of all places that you can drive to within an hour and facet on that (in PostGIS on PostgresSQL, you can do routing on OpenStreetmap data using the PGRouting module). I am very excited about the work that is done on the Lucene Spatial Playground, wish I where capable enough to help.
Hide
David Smiley added a comment -

Thanks for the pointers Teun; I will consider them when I improve the polygon support in LSP. LSP supports WKT, in case you didn't know. And since June it supports indexing shapes (i.e. not just points), which AFAIK is the only Lucene/Solr implementation I know to do this. There should be a lot more progress soon.

What do you mean by faceting on a polygon? Or did you mean faceting on "all places", but still, what do you mean by that?

Show
David Smiley added a comment - Thanks for the pointers Teun; I will consider them when I improve the polygon support in LSP. LSP supports WKT, in case you didn't know. And since June it supports indexing shapes (i.e. not just points), which AFAIK is the only Lucene/Solr implementation I know to do this. There should be a lot more progress soon. What do you mean by faceting on a polygon? Or did you mean faceting on "all places", but still, what do you mean by that?
Hide
Randy Jones added a comment -

Another vote for this functionality. Any updates?

Show
Randy Jones added a comment - Another vote for this functionality. Any updates?
Hide
David Smiley added a comment -

FWIW, this has been made top of my priority list for addition to LSP (which is Solr trunk only right now). It does polygons already but they are not geodetic – no dateline crossing, no pole wrapping, and Mercator projection is assumed (flat earth). My initial goal is simply adding dateline wrap which will make it useful for a majority of contexts which use the Mercator projection and can't scroll/pan beyond a pole any way (such as Google Maps, etc.). I expect this to be done in a couple weeks. Adding pole wrap and great-circle-distance lines (not mercator), will probably follow thereafter.

Show
David Smiley added a comment - FWIW, this has been made top of my priority list for addition to LSP (which is Solr trunk only right now). It does polygons already but they are not geodetic – no dateline crossing, no pole wrapping, and Mercator projection is assumed (flat earth). My initial goal is simply adding dateline wrap which will make it useful for a majority of contexts which use the Mercator projection and can't scroll/pan beyond a pole any way (such as Google Maps, etc.). I expect this to be done in a couple weeks. Adding pole wrap and great-circle-distance lines (not mercator), will probably follow thereafter.
Hide
Kiran Sugana added a comment -

Any update on this ?

Show
Kiran Sugana added a comment - Any update on this ?
Hide
David Smiley added a comment -

In a day or two you'll see a patch to SOLR-3304 (Add Solr support for the new Lucene Spatial Module). Then you'll be able to use polygons (note: triggers need for JTS jar to be on classpath). It's still got the shortcomings I mentioned in my previous post, which alas, I didn't get to yet.

Show
David Smiley added a comment - In a day or two you'll see a patch to SOLR-3304 (Add Solr support for the new Lucene Spatial Module). Then you'll be able to use polygons (note: triggers need for JTS jar to be on classpath). It's still got the shortcomings I mentioned in my previous post, which alas, I didn't get to yet.
Hide
David Smiley added a comment -

SOLR-3304 is committed, and consequently there is now polygon support; I'm closing this issue.

Polygons and various other shapes are implemented by Spatial4j (ASL licensed lib) used by the new Lucene 4 spatial module. Points, rectangles, and circles are implemented directly by Spatial4j whereas JTS is used for polygons and other shapes supported in the WKT format. Spatial4j wraps JTS to add geospatial awareness – notably dateline wrap. Pole wrap is not yet supported. The new SpatialRecursivePrefixTreeFieldType added to Solr can index any Spatial4j shape and query by them as well.

I've got a preliminary wiki page on this but I need to update it and see how to integrate this content with the rest of the wiki. http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4

Show
David Smiley added a comment - SOLR-3304 is committed, and consequently there is now polygon support; I'm closing this issue. Polygons and various other shapes are implemented by Spatial4j (ASL licensed lib) used by the new Lucene 4 spatial module. Points, rectangles, and circles are implemented directly by Spatial4j whereas JTS is used for polygons and other shapes supported in the WKT format. Spatial4j wraps JTS to add geospatial awareness – notably dateline wrap. Pole wrap is not yet supported. The new SpatialRecursivePrefixTreeFieldType added to Solr can index any Spatial4j shape and query by them as well. I've got a preliminary wiki page on this but I need to update it and see how to integrate this content with the rest of the wiki. http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
Hide
Uwe Schindler added a comment -

Closed after release.

Show
Uwe Schindler added a comment - Closed after release.

People

• Assignee:
David Smiley
Reporter:
Grant Ingersoll