Details
-
Improvement
-
Status: Resolved
-
High
-
Resolution: Fixed
-
None
Description
Recently we've looked into the performance of geometry serdes, since it greatly impacts the performance of Spatial SQL. After benchmarking and assessing the geometry serializers currently in Apache Sedona (ShapeSerde, WKB-based GeometrySerializer, etc.), we came up with a high performance geometry serde implementation which outperforms existing serdes in both benchmarks and Spatial SQL end-to-end tests. It makes simple range queries like this speed up by 2x:
SELECT COUNT(1) FROM traj_points WHERE ST_Within(geom, ST_GeomFromText('POLYGON((120.40586018622339 31.429636201527515,120.84256672919214 31.429636201527515,120.84256672919214 31.089198624963103,120.40586018622339 31.089198624963103,120.40586018622339 31.429636201527515))'))
Here is the benchmark code and result of geometry serdes. The benchmark was performed on an ECS instance with 4 Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz CPUs, using OpenJDK 1.8.0_352.
Besides of the performance improvements, the proposed serde also supports SRID and 3D/4D geometries (Z/M dimensions). I'll write a detailed documentation for the proposed geometry serde in the next few days. There're still a lot of things to do to integrate it into Apache Sedona. We'll implement a python version of proposed serde as a C extension, and also implement a pure python version using struct package as a fallback.
Attachments
Attachments
Issue Links
- fixes
-
SEDONA-227 Python SerDe Performance Degradation
- Resolved
-
SEDONA-28 Add WKB serializer in RDD and SQL API and let the user choose the SerDe
- Closed
- relates to
-
SEDONA-222 GeoParquet reader does not work in non-local mode
- Resolved
-
SEDONA-226 Support reading and writing GeoParquet file metadata
- Resolved
- links to