Uploaded image for project: 'Apache Sedona'
  1. Apache Sedona
  2. SEDONA-455

Add a new data source namely geoparquet.metadata

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.5.1

    Description

      Can we add a new data source to only read the file level metadata of a parquet file? This is crucial for entry-level users to explore an unknown parquet file including geoparquet. In our geoparquet case, this will help user know the projjson value since we are not able to properly parse it to a known epsg code.

      I understand that a Spark DataFrame only allows the schema to be the metadata, which cannot be used to hold such information.

      So I suggest that we add a new data source namely geoparquet.metadata, which loads these metadata using ParquetFileReader (https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java). One good example is from DuckDB: duckdb.org/docs/data/parquet/metadata.html

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jiayu Jia Yu
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h