Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
have a two deep partitioning structure. Drill is not pruning partitions correctly as it reads all files under every directory. My source files are tab delimited files.
My query:
select dir0 server, dir1 dayId, max(LENGTH(columns[2])) maxSize from dfs.`/archive/psn` where dir1 >= 20151001 group by dir0,dir1 order by maxSize
plan snippet showing Drill reading uneeded files:
00-00 Screen : rowType = RecordType(ANY server, ANY dayId, ANY maxSize): rowcount = 5.1772689218999994E8, cumulative cost =
{4.898214127009591E11 rows, 3.373451719812133E12 cpu, 0.0 io, 9.966863946928127E13 network, 1.51590434033232E12 memory}, id = 44973
00-01 Project(server=[$0], dayId=[$1], maxSize=[$2]) : rowType = RecordType(ANY server, ANY dayId, ANY maxSize): rowcount = 5.1772689218999994E8, cumulative cost =
00-02 SingleMergeExchange(sort0=[2 ASC]) : rowType = RecordType(ANY server, ANY dayId, ANY maxSize): rowcount = 5.1772689218999994E8, cumulative cost = {4.897696400117401E11 rows, 3.3733999471229136E12 cpu, 0.0 io, 9.966863946928127E13 network, 1.51590434033232E12 memory}
, id = 44971
01-01 SelectionVectorRemover : rowType = RecordType(ANY server, ANY dayId, ANY maxSize): rowcount = 5.1772689218999994E8, cumulative cost =
, id = 44970
01-02 Sort(sort0=[$2], dir0=[ASC]) : rowType = RecordType(ANY server, ANY dayId, ANY maxSize): rowcount = 5.1772689218999994E8, cumulative cost =
, id = 44969
01-03 Project(server=[$0], dayId=[$1], maxSize=[$2]) : rowType = RecordType(ANY server, ANY dayId, ANY maxSize): rowcount = 5.1772689218999994E8, cumulative cost =
01-04 HashToRandomExchange(dist0=[[$2]]) : rowType = RecordType(ANY server, ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.1772689218999994E8, cumulative cost = {4.882164593351701E11 rows, 3.2984380301424897E12 cpu, 0.0 io, 9.330681141805055E13 network, 1.50347889491976E12 memory}
, id = 44967
02-01 UnorderedMuxExchange : rowType = RecordType(ANY server, ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.1772689218999994E8, cumulative cost =
, id = 44966
03-01 Project(server=[$0], dayId=[$1], maxSize=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($2))]) : rowType = RecordType(ANY server, ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.1772689218999994E8, cumulative cost =
, id = 44965
03-02 HashAgg(group=[
03-03 Project(server=[$0], dayId=[$1], maxSize=[$2]) : rowType = RecordType(ANY server, ANY dayId, ANY maxSize): rowcount = 5.1772689219E9, cumulative cost = {4.814860097367001E11 rows, 3.1426022355933E12 cpu, 0.0 io, 8.48243740164096E13 network, 1.3667989953816E12 memory}, id = 44963
03-04 HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) : rowType = RecordType(ANY server, ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.1772689219E9, cumulative cost = {4.814860097367001E11 rows, 3.1426022355933E12 cpu, 0.0 io, 8.48243740164096E13 network, 1.3667989953816E12 memory}, id = 44962
04-01 UnorderedMuxExchange : rowType = RecordType(ANY server, ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.1772689219E9, cumulative cost = {4.7630874081480005E11 rows, 3.0804750085305E12 cpu, 0.0 io, 0.0 network, 1.3667989953816E12 memory}, id = 44961
05-01 Project(server=[$0], dayId=[$1], maxSize=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1, hash64AsDouble($0)))]) : rowType = RecordType(ANY server, ANY dayId, ANY maxSize, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 5.1772689219E9, cumulative cost = {4.711314718929E11 rows, 3.0752977396086E12 cpu, 0.0 io, 0.0 network, 1.3667989953816E12 memory}, id = 44960
05-02 HashAgg(group=[{0, 1}
], maxSize=[MAX($2)]) : rowType = RecordType(ANY server, ANY dayId, ANY maxSize): rowcount = 5.1772689219E9, cumulative cost =
{4.65954202971E11 rows, 3.054588663921E12 cpu, 0.0 io, 0.0 network, 1.3667989953816E12 memory}, id = 44959
05-03 Project(server=[$0], dayId=[$1], $f2=[LENGTH($2)]) : rowType = RecordType(ANY server, ANY dayId, ANY $f2): rowcount = 5.1772689219E10, cumulative cost =
, id = 44958
05-04 SelectionVectorRemover : rowType = RecordType(ANY dir0, ANY dir1, ANY ITEM): rowcount = 5.1772689219E10, cumulative cost =
, id = 44957
05-05 Filter(condition=[>=($1, 20151001)]) : rowType = RecordType(ANY dir0, ANY dir1, ANY ITEM): rowcount = 5.1772689219E10, cumulative cost =
, id = 44956
05-06 Project(dir0=[$0], dir1=[$2], ITEM=[ITEM($1, 2)]) : rowType = RecordType(ANY dir0, ANY dir1, ANY ITEM): rowcount = 1.03545378438E11, cumulative cost =
, id = 44955
05-07 Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/archive/psn, numFiles=116213, columns=[`dir0`, `dir1`, `columns`[2]], files=[maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-02.15.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-04.45.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-18.30.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-02.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-17.15.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-23.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-09.15.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-15.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-14.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-20.45.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-09.45.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-13.45.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-13.30.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-10.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-01.45.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-07.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-13.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-18.15.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-21.15.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-11.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-08.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-15.45.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-05.45.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-16.00.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-19.30.sink, maprfs:/archive/psn/PAWCHSCCOMPA2/20150130/psns-2015.01.30-07.15.sink, ...