Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6686

Exception happens when trying to filter by id from a MaprDB json table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.15.0
    • None
    • None
    • None

    Description

      Prerequisites:

      • Put the attached json file to dfs:
        hadoop fs -put -f ./lineitem.json /tmp/
        
      • Import it to MapRDB:
        mapr importJSON -idField "l_orderkey" -src /tmp/lineitem.json -dst /tmp/lineitem
        
      • Create Hive External table:
        CREATE EXTERNAL TABLE lineitem ( 
        l_orderkey string, 
        l_comment string, 
        l_commitdate string,
        l_discount string,
        l_extendedprice string,
        l_linenumber string,
        l_linestatus string,
        l_partkey string,
        l_quantity string,
        l_receiptdate string,
        l_returnflag string,
        l_shipdate string,
        l_shipinstruct string,
        l_shipmode string,
        l_suppkey string,
        l_tax int
        ) 
        STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' 
        TBLPROPERTIES("maprdb.table.name" = "/tmp/lineitem","maprdb.column.id" = "l_orderkey");
        
      • In Drill:
        set store.hive.maprdb_json.optimize_scan_with_native_reader = true;
        

      Query:

      select * from hive.`lineitem` where l_orderkey < 100
      

      Expected results:
      The query should return result

      Actual result:
      Exception happens:

      SYSTEM ERROR: IllegalArgumentException: A INT value can not be used for '_id' field.
      
      
      
        (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception during fragment initialization: Error while applying rule MapRDBPushFilterIntoScan:Filter_On_Scan, args [rel#1751:FilterPrel.PHYSICAL.SINGLETON([]).[](input=rel#1746:Subset#3.PHYSICAL.SINGLETON([]).[],condition=<($0, 100)), rel#1745:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=/tmp/lineitem, condition=null], columns=[`_id`, `l_comment`, `l_commitdate`, `l_discount`, `l_extendedprice`, `l_linenumber`, `l_linestatus`, `l_partkey`, `l_quantity`, `l_receiptdate`, `l_returnflag`, `l_shipdate`, `l_shipinstruct`, `l_shipmode`, `l_suppkey`, `l_tax`, `**`]])]
          org.apache.drill.exec.work.foreman.Foreman.run():294
          java.util.concurrent.ThreadPoolExecutor.runWorker():1149
          java.util.concurrent.ThreadPoolExecutor$Worker.run():624
          java.lang.Thread.run():748
        Caused By (java.lang.RuntimeException) Error while applying rule MapRDBPushFilterIntoScan:Filter_On_Scan, args [rel#1751:FilterPrel.PHYSICAL.SINGLETON([]).[](input=rel#1746:Subset#3.PHYSICAL.SINGLETON([]).[],condition=<($0, 100)), rel#1745:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=/tmp/lineitem, condition=null], columns=[`_id`, `l_comment`, `l_commitdate`, `l_discount`, `l_extendedprice`, `l_linenumber`, `l_linestatus`, `l_partkey`, `l_quantity`, `l_receiptdate`, `l_returnflag`, `l_shipdate`, `l_shipinstruct`, `l_shipmode`, `l_suppkey`, `l_tax`, `**`]])]
          org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():236
          org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():652
          org.apache.calcite.tools.Programs$RuleSetProgram.run():368
          org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():430
          org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel():460
          org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():182
          org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():145
          org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():83
          org.apache.drill.exec.work.foreman.Foreman.runSQL():567
          org.apache.drill.exec.work.foreman.Foreman.run():266
          java.util.concurrent.ThreadPoolExecutor.runWorker():1149
          java.util.concurrent.ThreadPoolExecutor$Worker.run():624
          java.lang.Thread.run():748
        Caused By (java.lang.IllegalArgumentException) A INT value can not be used for '_id' field.
          com.mapr.db.impl.ConditionLeaf.checkArgs():308
          com.mapr.db.impl.ConditionLeaf.<init>():100
          com.mapr.db.impl.ConditionLeaf.<init>():86
          com.mapr.db.impl.ConditionLeaf.<init>():82
          com.mapr.db.impl.ConditionImpl.is():407
          com.mapr.db.impl.ConditionImpl.is():402
          com.mapr.db.impl.ConditionImpl.is():43
          org.apache.drill.exec.store.mapr.db.json.JsonConditionBuilder.setIsCondition():127
          org.apache.drill.exec.store.mapr.db.json.JsonConditionBuilder.createJsonScanSpec():181
          org.apache.drill.exec.store.mapr.db.json.JsonConditionBuilder.visitFunctionCall():80
          org.apache.drill.exec.store.mapr.db.json.JsonConditionBuilder.visitFunctionCall():33
          org.apache.drill.common.expression.FunctionCall.accept():60
          org.apache.drill.exec.store.mapr.db.json.JsonConditionBuilder.parseTree():48
          org.apache.drill.exec.store.mapr.db.MapRDBPushFilterIntoScan.doPushFilterIntoJsonGroupScan():135
          org.apache.drill.exec.store.mapr.db.MapRDBPushFilterIntoScan$1.onMatch():64
          org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():212
          org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():652
          org.apache.calcite.tools.Programs$RuleSetProgram.run():368
          org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():430
          org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel():460
          org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():182
          org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():145
          org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():83
          org.apache.drill.exec.work.foreman.Foreman.runSQL():567
          org.apache.drill.exec.work.foreman.Foreman.run():266
          java.util.concurrent.ThreadPoolExecutor.runWorker():1149
          java.util.concurrent.ThreadPoolExecutor$Worker.run():624
          java.lang.Thread.run():748
      

      Notes:

      • The same query works fine if store.hive.maprdb_json.optimize_scan_with_native_reader=false
      • The same exception happens, if select using dfs:
        select * from dfs.tmp.`lineitem` where _id < 100
        
      • The last query works fine, if disable filter pushdown in maprdb format plugin:
            "maprdb": {
              "type": "maprdb",
              "enablePushdown": false
            }
        

      Attachments

        1. lineitem.json
          161 kB
          Anton Gozhiy

        Activity

          People

            Unassigned Unassigned
            angozhiy Anton Gozhiy
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: