Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
1.18.0
-
None
-
Drill 1.18
Ambari 2.7.4
Spark 3.0.2
Description
I create a dataset using spark ml, when I use drill 1.18 to query this dataset folder, it report error this:
[Error Id: 92d3f331-ffca-46b5-a64c-87453b88a108 on xxx.xxx.xxx:31010] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657) at org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:788) at org.apache.drill.exec.work.foreman.QueryStateProcessor.checkCommonStates(QueryStateProcessor.java:322) at org.apache.drill.exec.work.foreman.QueryStateProcessor.planning(QueryStateProcessor.java:216) at org.apache.drill.exec.work.foreman.QueryStateProcessor.moveToState(QueryStateProcessor.java:76) at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:300) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: Error while applying rule DrillPushProjectIntoScanRule:enumerable, args [rel#478:LogicalProject.NONE.ANY([]).[](input=RelSubset#477,label=$1), rel#452:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[hdfs_dataset.default, /home/spark/dataset/default/test2/*.parquet])] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:301) ... 3 common frames omitted Caused by: java.lang.RuntimeException: Error while applying rule DrillPushProjectIntoScanRule:enumerable, args [rel#478:LogicalProject.NONE.ANY([]).[](input=RelSubset#477,label=$1), rel#452:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[hdfs_dataset.default, /home/spark/dataset/default/test2/*.parquet])] at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:235) at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:633) at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:327) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:405) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:351) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRawDrel(DefaultSqlHandler.java:245) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:308) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:173) at org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:283) at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:163) at org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:140) at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:93) at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:593) at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:274) ... 3 common frames omitted Caused by: java.lang.NullPointerException: null at org.apache.drill.exec.store.parquet.ParquetGroupScanStatistics.checkForPartitionColumn(ParquetGroupScanStatistics.java:186) at org.apache.drill.exec.store.parquet.ParquetGroupScanStatistics.collect(ParquetGroupScanStatistics.java:119) at org.apache.drill.exec.store.parquet.ParquetGroupScanStatistics.<init>(ParquetGroupScanStatistics.java:59) at org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.getParquetGroupScanStatistics(BaseParquetMetadataProvider.java:293) at org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.getTableMetadata(BaseParquetMetadataProvider.java:249) at org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.initializeMetadata(BaseParquetMetadataProvider.java:203) at org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.init(BaseParquetMetadataProvider.java:170) at org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl.<init>(ParquetTableMetadataProviderImpl.java:95) at org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl.<init>(ParquetTableMetadataProviderImpl.java:48) at org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl$Builder.build(ParquetTableMetadataProviderImpl.java:415) at org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:150) at org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:120) at org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:202) at org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:79) at org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:226) at org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:209) at org.apache.drill.exec.planner.logical.DrillTable.getGroupScan(DrillTable.java:119) at org.apache.drill.exec.planner.logical.DrillPushProjectIntoScanRule.canPushProjectIntoScan(DrillPushProjectIntoScanRule.java:190) at org.apache.drill.exec.planner.logical.DrillPushProjectIntoScanRule.onMatch(DrillPushProjectIntoScanRule.java:107) at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:208) ... 16 common frames omitted
It is same like issue https://issues.apache.org/jira/browse/DRILL-7769.
I add some log information and found this:
TRACE o.a.d.e.s.p.ParquetGroupScanStatistics - check schema path `features`.`values`.`list`.`element` with major type null current partitionColTypeMap = {`features`.`indices`.`list`.`element`=null, `features`.`type`=minor_type: TINYINT mode: REQUIRED , `features`.`size`=minor_type: INT mode: OPTIONAL } 2021-05-25 15:39:21,066 [1f535658-f840-9f0e-1a7b-21080514bb7b:foreman] TRACE o.a.d.e.s.p.ParquetGroupScanStatistics - check schema path `label` with major type minor_type: FLOAT8 mode: REQUIRED current partitionColTypeMap = {`features`.`indices`.`list`.`element`=null, `features`.`type`=minor_type: TINYINT mode: REQUIRED , `features`.`size`=minor_type: INT mode: OPTIONAL } 2021-05-25 15:39:21,066 [1f535658-f840-9f0e-1a7b-21080514bb7b:foreman] TRACE o.a.d.e.s.p.ParquetGroupScanStatistics - check schema path `features`.`size` with major type minor_type: INT mode: OPTIONAL current partitionColTypeMap = {`features`.`indices`.`list`.`element`=null, `features`.`type`=minor_type: TINYINT mode: REQUIRED , `features`.`size`=minor_type: INT mode: OPTIONAL }
So that there is some condition major type is null, if drill use this code, it will catch NullPointerException error:
TypeProtos.MajorType majorType = columnMetadata != null ? columnMetadata.majorType() : null; # 121 !partitionColTypeMap.get(schemaPath).equals(type) # 189
we need to change null to org.apache.drill.common.types.Types.NULL to avoid NullPointerException error