Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.15.0
-
None
-
None
-
None
Description
Prerequisites:
- Put the attached json file to dfs:
hadoop fs -put -f ./lineitem.json /tmp/
- Import it to MapRDB:
mapr importJSON -idField "l_orderkey" -src /tmp/lineitem.json -dst /tmp/lineitem
- Create Hive External table:
CREATE EXTERNAL TABLE lineitem ( l_orderkey string, l_comment string, l_commitdate string, l_discount string, l_extendedprice string, l_linenumber string, l_linestatus string, l_partkey string, l_quantity string, l_receiptdate string, l_returnflag string, l_shipdate string, l_shipinstruct string, l_shipmode string, l_suppkey string, l_tax int ) STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' TBLPROPERTIES("maprdb.table.name" = "/tmp/lineitem","maprdb.column.id" = "l_orderkey");
- In Drill:
set store.hive.maprdb_json.optimize_scan_with_native_reader = true;
Query:
select * from hive.`lineitem` where l_orderkey < 100
Expected results:
The query should return result
Actual result:
Exception happens:
SYSTEM ERROR: IllegalArgumentException: A INT value can not be used for '_id' field. (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception during fragment initialization: Error while applying rule MapRDBPushFilterIntoScan:Filter_On_Scan, args [rel#1751:FilterPrel.PHYSICAL.SINGLETON([]).[](input=rel#1746:Subset#3.PHYSICAL.SINGLETON([]).[],condition=<($0, 100)), rel#1745:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=/tmp/lineitem, condition=null], columns=[`_id`, `l_comment`, `l_commitdate`, `l_discount`, `l_extendedprice`, `l_linenumber`, `l_linestatus`, `l_partkey`, `l_quantity`, `l_receiptdate`, `l_returnflag`, `l_shipdate`, `l_shipinstruct`, `l_shipmode`, `l_suppkey`, `l_tax`, `**`]])] org.apache.drill.exec.work.foreman.Foreman.run():294 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():748 Caused By (java.lang.RuntimeException) Error while applying rule MapRDBPushFilterIntoScan:Filter_On_Scan, args [rel#1751:FilterPrel.PHYSICAL.SINGLETON([]).[](input=rel#1746:Subset#3.PHYSICAL.SINGLETON([]).[],condition=<($0, 100)), rel#1745:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=/tmp/lineitem, condition=null], columns=[`_id`, `l_comment`, `l_commitdate`, `l_discount`, `l_extendedprice`, `l_linenumber`, `l_linestatus`, `l_partkey`, `l_quantity`, `l_receiptdate`, `l_returnflag`, `l_shipdate`, `l_shipinstruct`, `l_shipmode`, `l_suppkey`, `l_tax`, `**`]])] org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():236 org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():652 org.apache.calcite.tools.Programs$RuleSetProgram.run():368 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():430 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel():460 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():182 org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():145 org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():83 org.apache.drill.exec.work.foreman.Foreman.runSQL():567 org.apache.drill.exec.work.foreman.Foreman.run():266 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():748 Caused By (java.lang.IllegalArgumentException) A INT value can not be used for '_id' field. com.mapr.db.impl.ConditionLeaf.checkArgs():308 com.mapr.db.impl.ConditionLeaf.<init>():100 com.mapr.db.impl.ConditionLeaf.<init>():86 com.mapr.db.impl.ConditionLeaf.<init>():82 com.mapr.db.impl.ConditionImpl.is():407 com.mapr.db.impl.ConditionImpl.is():402 com.mapr.db.impl.ConditionImpl.is():43 org.apache.drill.exec.store.mapr.db.json.JsonConditionBuilder.setIsCondition():127 org.apache.drill.exec.store.mapr.db.json.JsonConditionBuilder.createJsonScanSpec():181 org.apache.drill.exec.store.mapr.db.json.JsonConditionBuilder.visitFunctionCall():80 org.apache.drill.exec.store.mapr.db.json.JsonConditionBuilder.visitFunctionCall():33 org.apache.drill.common.expression.FunctionCall.accept():60 org.apache.drill.exec.store.mapr.db.json.JsonConditionBuilder.parseTree():48 org.apache.drill.exec.store.mapr.db.MapRDBPushFilterIntoScan.doPushFilterIntoJsonGroupScan():135 org.apache.drill.exec.store.mapr.db.MapRDBPushFilterIntoScan$1.onMatch():64 org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():212 org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():652 org.apache.calcite.tools.Programs$RuleSetProgram.run():368 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():430 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel():460 org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():182 org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():145 org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():83 org.apache.drill.exec.work.foreman.Foreman.runSQL():567 org.apache.drill.exec.work.foreman.Foreman.run():266 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():748
Notes:
- The same query works fine if store.hive.maprdb_json.optimize_scan_with_native_reader=false
- The same exception happens, if select using dfs:
select * from dfs.tmp.`lineitem` where _id < 100
- The last query works fine, if disable filter pushdown in maprdb format plugin:
"maprdb": { "type": "maprdb", "enablePushdown": false }