[SEDONA-646] Shapefile data source for DataFrame API - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.7.0
Labels:
- pull-request-available

Description

The current shapefile reader returns a SpatialRDD, if users want a DataFrame, they must use the Adapter.toDf to convert the SpatialRDD to a DataFrame. A better approach is to support loading shapefiles as DataFrames using the DataFrame API:

df = sedona.read.format("shapefile").load("/path/to/shapefile")

This is more intuitive than

rdd = ShapefileReader.readToGeometryRDD(spark.sparkContext, "/path/to/shapefile")
df = Adapter.toDf(rdd, spark)

We'll also make several more improvements:

1. Making the non-spatial attributes having appropriate data types. Adapter.toDf converts all non-spatial fields to string fields, which loses the original data types of non-spatial attributes.
2. Better handling of input paths. We should support paths of directories and paths of .shp files.
3. Infer code page from .cpg file, so that users don't have to change the Java system property sedona.global.charset to combat with encoding problems.
4. Infer the SRID of geometries from .prj file.

Attachments

Issue Links

links to

GitHub Pull Request #1553

GitHub Pull Request #1573

Activity

People

Assignee:: Unassigned

Reporter:: Kristin Cowalcijk

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 22/Aug/24 16:54

Updated:: 24/Nov/24 08:01

Resolved:: 24/Nov/24 08:01

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

0.5h