Uploaded image for project: 'Apache Sedona'
  1. Apache Sedona
  2. SEDONA-189

Prepare geometries in broadcast join

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.3.0

    Description

      With complex polygons using prepared geometries can improve query performance by an order of magnitude.

      A test, where I had 1M points and 5k polygons, a simple broadcast join and count with ST_Contains had a performance increase from 40s down to 10s (4x improvement).

      points.join(broadcast(polygons), expr("ST_Contains(polygon, point)")).count()
      

      If the relative number of points to polygons increases, then the speedup gets better. For

      points.union(points).join(broadcast(polygons), expr("ST_Contains(polygon, point)")).count()
      

      it is 6x (70s -> 12s).

      Attachments

        1. points.csv.gz
          3.61 MB
          Tanel Kiis
        2. polygons.csv.gz
          3.69 MB
          Tanel Kiis

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tanelk Tanel Kiis
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m