Uploaded image for project: 'Apache Sedona'
  1. Apache Sedona
  2. SEDONA-221

Outer join throws NPE for null geometries

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.4.0

    Description

      The following query throws a NullPointerException.

      select /*+ BROADCAST(t2) */ * from t1 left join t2 on st_intersects(t1.geom, t2.geom)
      
      java.lang.NullPointerException
      	at org.locationtech.jts.io.WKBReader.read(WKBReader.java:159)
      	at org.apache.sedona.sql.utils.GeometrySerializer$.deserialize(GeometrySerializer.scala:50)
      	at org.apache.spark.sql.sedona_sql.strategy.join.TraitJoinQueryBase.$anonfun$toSpatialRDD$1(TraitJoinQueryBase.scala:45)
      	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
      	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
      	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
      	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
      	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
      

      The failure happens when the streaming side is mapped to a SpatialRDD. The NPE doesn't happen for inner join with null geometries. I suspect Spark is pushing a not null predicate since rows with null geometries would be excluded in an inner join anyway.

      Looking at the code I suspect there are more errors in the new broadcast join types. InternalRow is encoded in the user data field in the geometry. That doesn't work if the geometry is null. For a left join the InternalRow on the left side has to be emitted even if the geometry is null. Instead of using a SpatialRDD it might be better to map the RDD[InternalRow] to a RDD[Pair[Geometry, InternalRow]] where Geometry might be null.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              umartin Martin Andersson
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m