Uploaded image for project: 'Apache Sedona'
  1. Apache Sedona
  2. SEDONA-207

Faster serialization/deserialization of geometry objects

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • High
    • Resolution: Fixed
    • None
    • 1.4.0

    Description

      Recently we've looked into the performance of geometry serdes, since it greatly impacts the performance of Spatial SQL. After benchmarking and assessing the geometry serializers currently in Apache Sedona (ShapeSerde, WKB-based GeometrySerializer, etc.), we came up with a high performance geometry serde implementation which outperforms existing serdes in both benchmarks and Spatial SQL end-to-end tests. It makes simple range queries like this speed up by 2x:
       

      SELECT COUNT(1) FROM traj_points WHERE ST_Within(geom, ST_GeomFromText('POLYGON((120.40586018622339 31.429636201527515,120.84256672919214 31.429636201527515,120.84256672919214 31.089198624963103,120.40586018622339 31.089198624963103,120.40586018622339 31.429636201527515))'))
      

      Here is the benchmark code and result of geometry serdes. The benchmark was performed on an ECS instance with 4 Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz CPUs, using OpenJDK 1.8.0_352.


      Besides of the performance improvements, the proposed serde also supports SRID and 3D/4D geometries (Z/M dimensions). I'll write a detailed documentation for the proposed geometry serde in the next few days. There're still a lot of things to do to integrate it into Apache Sedona. We'll implement a python version of proposed serde as a C extension, and also implement a pure python version using struct package as a fallback.

      Attachments

        1. image-2022-12-02-20-19-36-449.png
          37 kB
          Kristin Cowalcijk
        2. image-2022-12-02-20-19-15-597.png
          40 kB
          Kristin Cowalcijk

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kontinuation Kristin Cowalcijk
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h
                  2h