Uploaded image for project: 'Apache Sedona'
  1. Apache Sedona
  2. SEDONA-605

RS_AsRaster(useGeometryExtent=false) does not work with reference rasters with scaleX/Y < 1

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0
    • 1.6.1

    Description

      This problem is reported by users on Discord. They found that RS_ZonalStats does not work with a raster tile in EPSG:4326. Using the attached data you can see that the zonal stats computed are mostly NaN:

      rawDf = spark.read.format("binaryFile").option("pathGlobFilter", "*.tiff").load("zonal_stats_issue/data_andalusia")
      rawDf.createOrReplaceTempView("rawdf")
      rasterDf = sedona.sql("""
      SELECT
        RS_FromGeoTiff(content) as tile,
        path
      FROM rawdf
      """)
      rasterDf.createOrReplaceTempView("l8imgs")
      
      parcels = ShapefileReader.readToGeometryRDD(sedona, "zonal_stats_issue/parcelas")
      parcles_df = Adapter.toDf(parcels, sedona)
      parcles_df.createOrReplaceTempView("parcels")
      
      features_df = sedona.sql("""
      WITH matched_tile AS (
          SELECT path, tile, geometry, idPanel
          FROM l8imgs, parcels
          WHERE ST_Intersects(RS_Envelope(tile), parcels.geometry) OR ST_Within(RS_Envelope(tile), parcels.geometry)
      )
      SELECT path, idPanel, RS_ZonalStats(tile, geometry, 1, 'mean') as stats_mean FROM matched_tile
      """)
      features_df.show(1000, False). # <-- Lots of NaN here.
      

      Output:

      +----------------------------------------------------+--------------------+------------------+
      |path                                                |idPanel             |stats_mean        |
      +----------------------------------------------------+--------------------+------------------+
      |zonal_stats_issue/data_andalusia/s2_20240604_01.tiff|14:38:0:0:14:9002:2 |NaN               |
      |zonal_stats_issue/data_andalusia/s2_20240604_01.tiff|14:38:0:0:14:32:4   |NaN               |
      |zonal_stats_issue/data_andalusia/s2_20240604_01.tiff|14:38:0:0:14:32:3   |NaN               |
      |zonal_stats_issue/data_andalusia/s2_20240604_01.tiff|14:38:0:0:14:30:2   |NaN               |
      |zonal_stats_issue/data_andalusia/s2_20240604_01.tiff|14:38:0:0:14:26:3   |NaN               |
      |zonal_stats_issue/data_andalusia/s2_20240604_01.tiff|14:38:0:0:14:27:4   |NaN               |
      ...
      

      This problem is caused by incorrect rasterization of the parcel geometries when the reference raster has scaleX/scaleY smaller than 1. There's some bad double->int casting when computing the extent of the result of rasterization, which is:

      1. Unnecessary when we're using the extent of the reference raster
      2. Problematic when handling rasters with non-integral scaleX or scaleY values

      This bug affects the following RS functions:

      1. RS_AsRaster
      2. RS_ZonalStats
      3. RS_ZonalStatsAll

      Attachments

        1. zonal_stats_issue.zip
          5.11 MB
          Kristin Cowalcijk

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kontinuation Kristin Cowalcijk
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m