Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
The current shapefile reader returns a SpatialRDD, if users want a DataFrame, they must use the Adapter.toDf to convert the SpatialRDD to a DataFrame. A better approach is to support loading shapefiles as DataFrames using the DataFrame API:
df = sedona.read.format("shapefile").load("/path/to/shapefile")
This is more intuitive than
rdd = ShapefileReader.readToGeometryRDD(spark.sparkContext, "/path/to/shapefile")
df = Adapter.toDf(rdd, spark)
We'll also make several more improvements:
1. Making the non-spatial attributes having appropriate data types. Adapter.toDf converts all non-spatial fields to string fields, which loses the original data types of non-spatial attributes.
2. Better handling of input paths. We should support paths of directories and paths of .shp files.
3. Infer code page from .cpg file, so that users don't have to change the Java system property sedona.global.charset to combat with encoding problems.
4. Infer the SRID of geometries from .prj file.