Uploaded image for project: 'Apache Sedona'
  1. Apache Sedona
  2. SEDONA-189

Prepare geometries in broadcast join

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.3.0

    Description

      With complex polygons using prepared geometries can improve query performance by an order of magnitude.

      A test, where I had 1M points and 5k polygons, a simple broadcast join and count with ST_Contains had a performance increase from 40s down to 10s (4x improvement).

      points.join(broadcast(polygons), expr("ST_Contains(polygon, point)")).count()
      

      If the relative number of points to polygons increases, then the speedup gets better. For

      points.union(points).join(broadcast(polygons), expr("ST_Contains(polygon, point)")).count()
      

      it is 6x (70s -> 12s).

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            tanelk Tanel Kiis
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 50m
                50m

                Slack

                  Issue deployment