Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.3.0, 1.3.1
-
None
Description
spark.read.format("geoparquet").load("/path/to/geoparquet").count() raises a java.lang.ClassCastException exception since Spark expects to load the dataframe in batch mode, while vectorized read was not implemented by GeoParquetFormat:
spark.read.format("geoparquet").load("/path/to/example1.parquet").count() 22/12/29 15:58:12 WARN GeoParquetFileFormat: GeoParquet currently does not support vectorized reader. Falling back to parquet-mr 22/12/29 15:58:12 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.UnsafeRow cannot be cast to org.apache.spark.sql.vectorized.ColumnarBatch at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.next(DataSourceScanExec.scala:560) at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.next(DataSourceScanExec.scala:549) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hashAgg_doAggregateWithoutKey_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at org.apache.spark.scheduler.Task.run(Task.scala:136) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)
Attachments
Issue Links
- is related to
-
SEDONA-222 GeoParquet reader does not work in non-local mode
- Resolved