Uploaded image for project: 'Apache Sedona'
  1. Apache Sedona
  2. SEDONA-465

Support reading legacy parquet files written by Apache Sedona <= 1.3.1-incubating

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.5.1

    Description

      Due to a breaking change in Apache Sedona 1.4.0 to the SQL type of GeometryUDT
      (SEDONA-205) as well as the
      serialization format of geometry values (SEDONA-207), Parquet files
      containing geometry columns written by Apache Sedona 1.3.1 or earlier cannot be read by Apache Sedona 1.4.0 or later.
      Here is an example of an exception when trying to read such files:

      24/01/08 12:52:56 ERROR Executor: Exception in task 0.0 in stage 12.0 (TID 11)
      org.apache.spark.sql.AnalysisException: Invalid Spark read type: expected required group geom (LIST) {
      repeated group list {
      required int32 element (INTEGER(8,true));
      }
      } to be list type but found Some(BinaryType)
      at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkConversionRequirement(ParquetSchemaConverter.scala:745)
      at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convertGroupField$3(ParquetSchemaConverter.scala:343)
      at scala.Option.fold(Option.scala:251)
      at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertGroupField(ParquetSchemaConverter.scala:324)
      at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertField(ParquetSchemaConverter.scala:188)
      at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convertInternal$3(ParquetSchemaConverter.scala:147)
      at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convertInternal$3$adapted(ParquetSchemaConverter.scala:117)
      at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
      at scala.collection.immutable.Range.foreach(Range.scala:158)
      at scala.collection.TraversableLike.map(TraversableLike.scala:286)
      at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
      at scala.collection.AbstractTraversable.map(Traversable.scala:108)
      ...
      

      We'll extend the GeoParquet reader to support reading such legacy parquet files. Users can specify .option("legacyMode", "true") when reading such files to read the geometry columns correctly:

      val df = sedona.read.format("geoparquet").option("legacyMode", "true").load("path/to/legacy-parquet-files")
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kontinuation Kristin Cowalcijk
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m