Uploaded image for project: 'Apache Sedona'
  1. Apache Sedona
  2. SEDONA-453

Performance degrade when indexing points using Quadtree

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.5.1

    Description

      We found that the index efficiency of Quadtree drastically degrades when indexing datasets made up of points. The index returns way more candidates than expected when querying the Quadtree using envelopes. The reason is that JTS Quadtree automatically expands indexed envelopes by 0.5 if the envelope has zero width and height (see Quadtree.java#L61-L96), this makes the indexed envelopes of points are way larger than necessary, especially when indexed points are WGS84 coordinates.

      Suppose that we are indexing the following dataset using Quadtree:

      The envelopes indexed by Quadtree happens to be something like this:

      One possible workaround for this problem is to manually extend envelopes with 0 width or height by 1e-3. This will prevent JTS Quadtree from extending the envelopes by 0.5, and 1e-3 is small enough to cope with the most common use cases while not increasing the size of the Quadtree too much.

      Maybe we should change the default index type sedona.global.index from quadtree to rtree, since STRtree has better performance in most cases and does not have such pitfalls.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            kontinuation Kristin Cowalcijk
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 0.5h
                0.5h

                Slack

                  Issue deployment