Details
-
Task
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
0.7.0
-
None
Description
Starting Hudi 0.7, HoodieInputFormat comes with UseRecordReaderFromInputFormat annotation. As a result, we are skipping all optimizations in parquet PageSource and using basic GenericHiveRecordCursor which has several limitations:
1) No support for timestamp
2) No support for synthesized columns
3) No support for vectorized reading?
Example errors we saw:
Error#1
java.lang.IllegalStateException: column type must be regular at com.google.common.base.Preconditions.checkState(Preconditions.java:507) at com.facebook.presto.hive.GenericHiveRecordCursor.<init>(GenericHiveRecordCursor.java:167) at com.facebook.presto.hive.GenericHiveRecordCursorProvider.createRecordCursor(GenericHiveRecordCursorProvider.java:79) at com.facebook.presto.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:449) at com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:177) at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:63) at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:80) at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:231) at com.facebook.presto.operator.Driver.processInternal(Driver.java:418) at com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:301) at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:722) at com.facebook.presto.operator.Driver.processFor(Driver.java:294) at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077) at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162) at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:545) at com.facebook.presto.$gen.Presto_0_247_17f857e____20210506_210241_1.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)
Error#2
java.lang.ClassCastException: class org.apache.hadoop.io.LongWritable cannot be cast to class org.apache.hadoop.hive.serde2.io.TimestampWritable (org.apache.hadoop.io.LongWritable and org.apache.hadoop.hive.serde2.io.TimestampWritable are in unnamed module of loader com.facebook.presto.server.PluginClassLoader @5c4e86e7) at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector.getPrimitiveJavaObject(WritableTimestampObjectInspector.java:39) at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector.getPrimitiveJavaObject(WritableTimestampObjectInspector.java:25) at com.facebook.presto.hive.GenericHiveRecordCursor.parseLongColumn(GenericHiveRecordCursor.java:286) at com.facebook.presto.hive.GenericHiveRecordCursor.parseColumn(GenericHiveRecordCursor.java:550) at com.facebook.presto.hive.GenericHiveRecordCursor.isNull(GenericHiveRecordCursor.java:508) at com.facebook.presto.hive.HiveRecordCursor.isNull(HiveRecordCursor.java:233) at com.facebook.presto.spi.RecordPageSource.getNextPage(RecordPageSource.java:112) at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:251) at com.facebook.presto.operator.Driver.processInternal(Driver.java:418) at com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:301) at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:722) at com.facebook.presto.operator.Driver.processFor(Driver.java:294) at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077) at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162) at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:545) at com.facebook.presto.$gen.Presto_0_247_17f857e____20210506_210241_1.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)
In addition to errors above, performance also seems to have slowed down substantially.