Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-8403

Generated aggregate function calls are missing required filters when used with PIVOT

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.21.0
    • 1.21.1
    • None
    • None

    Description

      The following query should generate aggregates grouped by education_level and containing filters on marital_status but the requisite filters are lost during function rewriting.

      apache drill> SELECT
      2..semicolon> *
      3..semicolon> FROM
      4..semicolon> (SELECT
      5..........)> education_level,
      6..........)> salary,
      7..........)> marital_status,
      8..........)> extract(year from age(birth_date)) age
      9..........)> FROM
      10.........)> cp.`employee.json`)
      11.semicolon> PIVOT (
      12.........)> avg(salary) avg_salary, avg(age) avg_age FOR marital_status IN ('M' married, 'S' single)
      13.........)> );
      -------------------{}----------------{}+{}----------------{}{}----------------{}+{}------------------

      education_level married_avg_salary married_avg_age single_avg_salary single_avg_age

      -------------------{}----------------{}+{}----------------{}{}----------------{}+{}------------------

      Graduate Degree 4392.823529411765 100.32352941176471 4392.823529411765 100.32352941176471
      Bachelors Degree 4492.404181184669 102.22996515679442 4492.404181184669 102.22996515679442
      Partial College 4047.1180555555557 100.10069444444444 4047.1180555555557 100.10069444444444
      High School Degree 3516.1565836298932 103.12811387900356 3516.1565836298932 103.12811387900356
      Partial High School 3511.0852713178297 102.30232558139535 3511.0852713178297 102.30232558139535

      -------------------{}----------------{}+{}----------------{}{}----------------{}+{}------------------
      5 rows selected (0.285 seconds)
       
      00-00 Screen : rowType = RecordType(ANY education_level, ANY married_min_salary, DOUBLE married_avg_age, ANY single_min_salary, DOUBLE single_avg_age): rowcount = 46.3, cumulative cost = {1486.23 rows, 35748.229999999996 cpu, 474630.0 io, 0.0 network, 8148.800000000001 memory}, id = 812
      00-01 Project(education_level=[$0], married_min_salary=[$1], married_avg_age=[$2], single_min_salary=[$3], single_avg_age=[$4]) : rowType = RecordType(ANY education_level, ANY married_min_salary, DOUBLE married_avg_age, ANY single_min_salary, DOUBLE single_avg_age): rowcount = 46.3, cumulative cost = {1481.6 rows, 35743.6 cpu, 474630.0 io, 0.0 network, 8148.800000000001 memory}, id = 811
      00-02 Project(education_level=[$0], married_min_salary=[divide(CastHigh(CASE(=($2, 0), null:NULL, $1)), $2)], married_avg_age=[divide(CastHigh(CASE(=($4, 0), null:NULL, $3)), $4)], single_min_salary=[divide(CastHigh(CASE(=($2, 0), null:NULL, $1)), $2)], single_avg_age=[divide(CastHigh(CASE(=($4, 0), null:NULL, $3)), $4)]) : rowType = RecordType(ANY education_level, ANY married_min_salary, DOUBLE married_avg_age, ANY single_min_salary, DOUBLE single_avg_age): rowcount = 46.3, cumulative cost = {1435.3 rows, 35512.1 cpu, 474630.0 io, 0.0 network, 8148.800000000001 memory}, id = 808
      00-03 HashAgg(group=[

      {0}

      ], agg#0=[$SUM0($2)], agg#1=[COUNT($2)], agg#2=[$SUM0($3)], agg#3=[COUNT($3)]) : rowType = RecordType(ANY education_level, ANY $f1, BIGINT $f2, BIGINT $f3, BIGINT $f4): rowcount = 46.3, cumulative cost = {1389.0 rows, 34725.0 cpu, 474630.0 io, 0.0 network, 8148.800000000001 memory}, id = 807
      00-04 Project(education_level=[$0], marital_status=[$1], salary=[$2], age=[EXTRACT(FLAG(YEAR), AGE($3))], $f4=[IS TRUE(=($1, 'M'))], $f5=[IS TRUE(=($1, 'S'))]) : rowType = RecordType(ANY education_level, ANY marital_status, ANY salary, BIGINT age, BOOLEAN $f4, BOOLEAN $f5): rowcount = 463.0, cumulative cost = {926.0 rows, 8797.0 cpu, 474630.0 io, 0.0 network, 0.0 memory}, id = 806
      00-05 Scan(table=[[cp, employee.json]], groupscan=[EasyGroupScan [selectionRoot=classpath:/employee.json, numFiles=1, columns=[`education_level`, `marital_status`, `salary`, `birth_date`], files=[classpath:/employee.json], usedMetastore=false, limit=-1, formatConfig=JSONFormatConfig [extensions=[json]]]]) : rowType = RecordType(ANY education_level, ANY marital_status, ANY salary, ANY birth_date): rowcount = 463.0, cumulative cost = {463.0 rows, 1852.0 cpu, 474630.0 io, 0.0 network, 0.0 memory}, id = 805

      Attachments

        Activity

          People

            volodymyr Vova Vysotskyi
            dzamo James Turton
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: