Uploaded image for project: 'Apache Sedona'
  1. Apache Sedona
  2. SEDONA-455

Add a new data source namely geoparquet.metadata

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.5.1

    Description

      Can we add a new data source to only read the file level metadata of a parquet file? This is crucial for entry-level users to explore an unknown parquet file including geoparquet. In our geoparquet case, this will help user know the projjson value since we are not able to properly parse it to a known epsg code.

      I understand that a Spark DataFrame only allows the schema to be the metadata, which cannot be used to hold such information.

      So I suggest that we add a new data source namely geoparquet.metadata, which loads these metadata using ParquetFileReader (https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java). One good example is from DuckDB: duckdb.org/docs/data/parquet/metadata.html

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            jiayu Jia Yu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 1h
              1h

              Slack

                Issue deployment