Uploaded image for project: 'Apache Sedona'
  1. Apache Sedona
  2. SEDONA-646

Shapefile data source for DataFrame API

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.7.0

    Description

      The current shapefile reader returns a SpatialRDD, if users want a DataFrame, they must use the Adapter.toDf to convert the SpatialRDD to a DataFrame. A better approach is to support loading shapefiles as DataFrames using the DataFrame API:

      df = sedona.read.format("shapefile").load("/path/to/shapefile")
      

      This is more intuitive than

      rdd = ShapefileReader.readToGeometryRDD(spark.sparkContext, "/path/to/shapefile")
      df = Adapter.toDf(rdd, spark)
      

      We'll also make several more improvements:

      1. Making the non-spatial attributes having appropriate data types. Adapter.toDf converts all non-spatial fields to string fields, which loses the original data types of non-spatial attributes.
      2. Better handling of input paths. We should support paths of directories and paths of .shp files.
      3. Infer code page from .cpg file, so that users don't have to change the Java system property sedona.global.charset to combat with encoding problems.
      4. Infer the SRID of geometries from .prj file.

      Attachments

        Activity

          People

            Unassigned Unassigned
            kontinuation Kristin Cowalcijk
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 0.5h
                0.5h