Uploaded image for project: 'Apache Sedona'
  1. Apache Sedona
  2. SEDONA-221

Outer join throws NPE for null geometries

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.4.0

    Description

      The following query throws a NullPointerException.

      select /*+ BROADCAST(t2) */ * from t1 left join t2 on st_intersects(t1.geom, t2.geom)
      
      java.lang.NullPointerException
      	at org.locationtech.jts.io.WKBReader.read(WKBReader.java:159)
      	at org.apache.sedona.sql.utils.GeometrySerializer$.deserialize(GeometrySerializer.scala:50)
      	at org.apache.spark.sql.sedona_sql.strategy.join.TraitJoinQueryBase.$anonfun$toSpatialRDD$1(TraitJoinQueryBase.scala:45)
      	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
      	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
      	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
      	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
      	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
      

      The failure happens when the streaming side is mapped to a SpatialRDD. The NPE doesn't happen for inner join with null geometries. I suspect Spark is pushing a not null predicate since rows with null geometries would be excluded in an inner join anyway.

      Looking at the code I suspect there are more errors in the new broadcast join types. InternalRow is encoded in the user data field in the geometry. That doesn't work if the geometry is null. For a left join the InternalRow on the left side has to be emitted even if the geometry is null. Instead of using a SpatialRDD it might be better to map the RDD[InternalRow] to a RDD[Pair[Geometry, InternalRow]] where Geometry might be null.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            umartin Martin Andersson
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m

                Slack

                  Issue deployment