I completely re-did this with a summer intern, Liviy Ambrose. It's similar but simpler to the first approach; it isn't based on it. Unlike the first patch, it does not modify any of the existing benchmark code (aside from the build.xml of course). I intend to enhance the benchmark code under separate issues, so that this patch can focus on just spatial benchmarking.
The build.xml grabs a tab-separated values file from geonames.org, which contains millions of latitude & longitude based points. I want to take a snapshot (for reproducible tests), randomize the line order, and put it on http://people.apache.org/~dsmiley/. Additionally, Spatial4j's tests has a file containing a WKT-formatted polygon for many countries. I want to host that as well in a format readable by LineDocSource.
Source files (only 3):
- GeonamesLineParser.java: This is designed for use with LineDocSource. Geonames.org data comes in a tab-separated value file.
- SpatialDocMaker.java: This class is key.
- It holds a reference to the Lucene SpatialStrategy which it configures from the algorithm file, mostly via factories. It's possible to test quite a variety of spatial configurations, although it does assume RecursivePrefixTree.
- This DocMaker has the specialization to convert the shape-formatted string in the body field to a Shape object to be indexed. It also has a configurable ShapeConverter to optionally convert a point to a circle or bounding box.
- SpatialFileQueryMaker.java: Instead of hard-coded queries (as seen in other non-spatial tests), it configures a private LineDocSource instance and it reads the shapes off that to use as spatial queries. For now you'd use it with GeonamesLineParser. Furthermore, it re-uses SpatialDocMaker's ShapeConverter so that the points can then become circle or rectangle queries.
The provided spatial.alg shows how to use it.
- The spatial data is placed into the "body" field of a standard benchmark DocData class as a string. Originally I experimented with a custom SpatialDocData but I determined it was needless to do that since the existing class can work. And after all, if you're testing spatial, you don't need to be simultaneously testing text. I didn't put it in DocData's attached Properties instance because that seems kinda heavyweight or at least medium-weight
The patch is not ready – I need to add documentation, pending input on this approach.