Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5913

DrillReduceAggregatesRule mixed the same functions of the same inputRef which have different dataTypes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.9.0, 1.11.0
    • None
    • None

    Description

      sample query:

      select stddev_samp(cast(employee_id as int)) as col1, sum(cast(employee_id as int)) as col2 from cp.`employee.json`
      

      error info:

      
      org.apache.drill.exec.rpc.RpcException: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: AssertionError: Type mismatch:
      rel rowtype:
      RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, INTEGER $f3) NOT NULL
      equivRel rowtype:
      RecordType(INTEGER $f0, INTEGER $f1, BIGINT NOT NULL $f2, BIGINT $f3) NOT NULL
      [Error Id: f5114e62-a57b-46b1-afe8-ae652f390896 on localhost:31010]
        (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception during fragment initialization: Internal error: Error while applying rule DrillReduceAggregatesRule, args [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))]
          org.apache.drill.exec.work.foreman.Foreman.run():294
          java.util.concurrent.ThreadPoolExecutor.runWorker():1142
          java.util.concurrent.ThreadPoolExecutor$Worker.run():617
          java.lang.Thread.run():745
        Caused By (java.lang.AssertionError) Internal error: Error while applying rule DrillReduceAggregatesRule, args [rel#29:LogicalAggregate.NONE.ANY([]).[](input=rel#28:Subset#3.NONE.ANY([]).[],group={},agg#0=SUM($1),agg#1=SUM($0),agg#2=COUNT($0),agg#3=$SUM0($0))]
          org.apache.calcite.util.Util.newInternal():792
          org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():251
          org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():811
      

      The reason is that stddev_samp(cast(employee_id as int)) will be reduced as sum($0) ,sum($1) ,count($0) while the sum(cast(employee_id as int)) will be reduced as sum0($0) by the DrillReduceAggregatesRule's first time matching. The second time's matching will reduce stddev_samp's sum($0) to sum0($0) too . But this sum0($0) 's data type is different from the first time's sum0($0) : one is integer ,the other is bigint . But Calcite's addAggCall method treat them as the same by ignoring their data type. This leads to the bigint sum0($0) be replaced by the integer sum0($0).

      Attachments

        Activity

          People

            Unassigned Unassigned
            weijie Weijie Tong
            Arina Ielchiieva Arina Ielchiieva
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: