[DRILL-8403] Generated aggregate function calls are missing required filters when used with PIVOT - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.21.0
Fix Version/s: 1.21.1
Component/s: None
Labels:
None

Description

The following query should generate aggregates grouped by education_level and containing filters on marital_status but the requisite filters are lost during function rewriting.

apache drill> SELECT
2..semicolon> *
3..semicolon> FROM
4..semicolon> (SELECT
5..........)> education_level,
6..........)> salary,
7..........)> marital_status,
8..........)> extract(year from age(birth_date)) age
9..........)> FROM
10.........)> cp.`employee.json`)
11.semicolon> PIVOT (
12.........)> avg(salary) avg_salary, avg(age) avg_age FOR marital_status IN ('M' married, 'S' single)
13.........)> );
-------------------{}----------------{}+{}----------------{}{}----------------{}+{}------------------

education_level

married_avg_salary

married_avg_age

single_avg_salary

single_avg_age

-------------------{}----------------{}+{}----------------{}{}----------------{}+{}------------------

Graduate Degree	4392.823529411765	100.32352941176471	4392.823529411765	100.32352941176471
Bachelors Degree	4492.404181184669	102.22996515679442	4492.404181184669	102.22996515679442
Partial College	4047.1180555555557	100.10069444444444	4047.1180555555557	100.10069444444444
High School Degree	3516.1565836298932	103.12811387900356	3516.1565836298932	103.12811387900356
Partial High School	3511.0852713178297	102.30232558139535	3511.0852713178297	102.30232558139535

-------------------{}----------------{}+{}----------------{}{}----------------{}+{}------------------
5 rows selected (0.285 seconds)

00-00 Screen : rowType = RecordType(ANY education_level, ANY married_min_salary, DOUBLE married_avg_age, ANY single_min_salary, DOUBLE single_avg_age): rowcount = 46.3, cumulative cost = {1486.23 rows, 35748.229999999996 cpu, 474630.0 io, 0.0 network, 8148.800000000001 memory}, id = 812
00-01 Project(education_level=[$0], married_min_salary=[$1], married_avg_age=[$2], single_min_salary=[$3], single_avg_age=[$4]) : rowType = RecordType(ANY education_level, ANY married_min_salary, DOUBLE married_avg_age, ANY single_min_salary, DOUBLE single_avg_age): rowcount = 46.3, cumulative cost = {1481.6 rows, 35743.6 cpu, 474630.0 io, 0.0 network, 8148.800000000001 memory}, id = 811
00-02 Project(education_level=[$0], married_min_salary=[divide(CastHigh(CASE(=($2, 0), null:NULL, $1)), $2)], married_avg_age=[divide(CastHigh(CASE(=($4, 0), null:NULL, $3)), $4)], single_min_salary=[divide(CastHigh(CASE(=($2, 0), null:NULL, $1)), $2)], single_avg_age=[divide(CastHigh(CASE(=($4, 0), null:NULL, $3)), $4)]) : rowType = RecordType(ANY education_level, ANY married_min_salary, DOUBLE married_avg_age, ANY single_min_salary, DOUBLE single_avg_age): rowcount = 46.3, cumulative cost = {1435.3 rows, 35512.1 cpu, 474630.0 io, 0.0 network, 8148.800000000001 memory}, id = 808
00-03 HashAgg(group=[

{0}

], agg#0=[$SUM0($2)], agg#1=[COUNT($2)], agg#2=[$SUM0($3)], agg#3=[COUNT($3)]) : rowType = RecordType(ANY education_level, ANY $f1, BIGINT $f2, BIGINT $f3, BIGINT $f4): rowcount = 46.3, cumulative cost = {1389.0 rows, 34725.0 cpu, 474630.0 io, 0.0 network, 8148.800000000001 memory}, id = 807
00-04 Project(education_level=[$0], marital_status=[$1], salary=[$2], age=[EXTRACT(FLAG(YEAR), AGE($3))], $f4=[IS TRUE(=($1, 'M'))], $f5=[IS TRUE(=($1, 'S'))]) : rowType = RecordType(ANY education_level, ANY marital_status, ANY salary, BIGINT age, BOOLEAN $f4, BOOLEAN $f5): rowcount = 463.0, cumulative cost = {926.0 rows, 8797.0 cpu, 474630.0 io, 0.0 network, 0.0 memory}, id = 806
00-05 Scan(table=[[cp, employee.json]], groupscan=[EasyGroupScan [selectionRoot=classpath:/employee.json, numFiles=1, columns=[`education_level`, `marital_status`, `salary`, `birth_date`], files=[classpath:/employee.json], usedMetastore=false, limit=-1, formatConfig=JSONFormatConfig [extensions=[json]]]]) : rowType = RecordType(ANY education_level, ANY marital_status, ANY salary, ANY birth_date): rowcount = 463.0, cumulative cost = {463.0 rows, 1852.0 cpu, 474630.0 io, 0.0 network, 0.0 memory}, id = 805

Attachments

Activity

People

Assignee:: Vova Vysotskyi

Reporter:: James Turton

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 22/Feb/23 13:09

Updated:: 23/Feb/23 08:55

Resolved:: 23/Feb/23 08:55