Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7245

TPCDS queries 1, 45, 65, 97 are 3x slower when Statistics is enabled at sf 100

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.16.0
    • 1.17.0
    • None
    • None

    Description

      Here is query 65:

      SELECT s_store_name, 
                     i_item_desc, 
                     sc.revenue, 
                     i_current_price, 
                     i_wholesale_cost, 
                     i_brand 
      FROM   store, 
             item, 
             (SELECT ss_store_sk, 
                     Avg(revenue) AS ave 
              FROM   (SELECT ss_store_sk, 
                             ss_item_sk, 
                             Sum(ss_sales_price) AS revenue 
                      FROM   store_sales, 
                             date_dim 
                      WHERE  ss_sold_date_sk = d_date_sk 
                             AND d_month_seq BETWEEN 1199 AND 1199 + 11 
                      GROUP  BY ss_store_sk, 
                                ss_item_sk) sa 
              GROUP  BY ss_store_sk) sb, 
             (SELECT ss_store_sk, 
                     ss_item_sk, 
                     Sum(ss_sales_price) AS revenue 
              FROM   store_sales, 
                     date_dim 
              WHERE  ss_sold_date_sk = d_date_sk 
                     AND d_month_seq BETWEEN 1199 AND 1199 + 11 
              GROUP  BY ss_store_sk, 
                        ss_item_sk) sc 
      WHERE  sb.ss_store_sk = sc.ss_store_sk 
             AND sc.revenue <= 0.1 * sb.ave 
             AND s_store_sk = sc.ss_store_sk 
             AND i_item_sk = sc.ss_item_sk 
      ORDER  BY s_store_name, 
                i_item_desc
      LIMIT 100; 
      

      Here is the new plan.

      00-00    Screen : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402776556728E9 rows, 9.49229008144672E9 cpu, 2.305289192E9 io, 5.9855814764732E10 network, 1.3320421632282749E8 memory}, id = 45433
      00-01      Project(s_store_name=[$0], i_item_desc=[$1], revenue=[$2], i_current_price=[$3], i_wholesale_cost=[$4], i_brand=[$5]) : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402766556728E9 rows, 9.49229007144672E9 cpu, 2.305289192E9 io, 5.9855814764732E10 network, 1.3320421632282749E8 memory}, id = 45432
      00-02        SelectionVectorRemover : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402666556728E9 rows, 9.49228947144672E9 cpu, 2.305289192E9 io, 5.9855814764732E10 network, 1.3320421632282749E8 memory}, id = 45431
      00-03          Limit(fetch=[100]) : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402566556728E9 rows, 9.49228937144672E9 cpu, 2.305289192E9 io, 5.9855814764732E10 network, 1.3320421632282749E8 memory}, id = 45430
      00-04            SingleMergeExchange(sort0=[0], sort1=[1]) : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402466556728E9 rows, 9.49228897144672E9 cpu, 2.305289192E9 io, 5.9855814764732E10 network, 1.3320421632282749E8 memory}, id = 45429
      01-01              OrderedMuxExchange(sort0=[0], sort1=[1]) : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402366556728E9 rows, 9.492286842675482E9 cpu, 2.305289192E9 io, 5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45428
      02-01                SelectionVectorRemover : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402266556728E9 rows, 9.492286742675482E9 cpu, 2.305289192E9 io, 5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45427
      02-02                  Limit(fetch=[100]) : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 100.0, cumulative cost = {1.170402166556728E9 rows, 9.492286642675482E9 cpu, 2.305289192E9 io, 5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45426
      02-03                    SelectionVectorRemover : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 416478.4359081372, cumulative cost = {1.170402066556728E9 rows, 9.492286242675482E9 cpu, 2.305289192E9 io, 5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45425
      02-04                      TopN(limit=[100]) : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 416478.4359081372, cumulative cost = {1.1699855881208198E9 rows, 9.491869764239574E9 cpu, 2.305289192E9 io, 5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45424
      02-05                        HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 416478.4359081372, cumulative cost = {1.1695691096849117E9 rows, 9.469733581565046E9 cpu, 2.305289192E9 io, 5.9853357164732E10 network, 1.3320421632282749E8 memory}, id = 45423
      03-01                          Project(s_store_name=[$1], i_item_desc=[$3], revenue=[$11], i_current_price=[$4], i_wholesale_cost=[$5], i_brand=[$6]) : rowType = RecordType(ANY s_store_name, ANY i_item_desc, ANY revenue, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 416478.4359081372, cumulative cost = {1.1691526312490036E9 rows, 9.464735840334148E9 cpu, 2.305289192E9 io, 4.961798312385362E10 network, 1.3320421632282749E8 memory}, id = 45422
      03-02                            SelectionVectorRemover : rowType = RecordType(ANY s_store_sk, ANY s_store_name, ANY i_item_sk, ANY i_item_desc, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand, ANY ss_store_sk, ANY ave, ANY ss_store_sk0, ANY ss_item_sk, ANY revenue): rowcount = 416478.4359081372, cumulative cost = {1.1687361528130956E9 rows, 9.4622369697187E9 cpu, 2.305289192E9 io, 4.961798312385362E10 network, 1.3320421632282749E8 memory}, id = 45421
      03-03                              Filter(condition=[<=($11, *(0.1, $8))]) : rowType = RecordType(ANY s_store_sk, ANY s_store_name, ANY i_item_sk, ANY i_item_desc, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand, ANY ss_store_sk, ANY ave, ANY ss_store_sk0, ANY ss_item_sk, ANY revenue): rowcount = 416478.4359081372, cumulative cost = {1.1683196743771875E9 rows, 9.461820491282793E9 cpu, 2.305289192E9 io, 4.961798312385362E10 network, 1.3320421632282749E8 memory}, id = 45420
      03-04                                Project(s_store_sk=[$5], s_store_name=[$6], i_item_sk=[$7], i_item_desc=[$8], i_current_price=[$9], i_wholesale_cost=[$10], i_brand=[$11], ss_store_sk=[$3], ave=[$4], ss_store_sk0=[$0], ss_item_sk=[$1], revenue=[$2]) : rowType = RecordType(ANY s_store_sk, ANY s_store_name, ANY i_item_sk, ANY i_item_desc, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand, ANY ss_store_sk, ANY ave, ANY ss_store_sk0, ANY ss_item_sk, ANY revenue): rowcount = 832956.8718162744, cumulative cost = {1.1674867175053713E9 rows, 9.456822750051895E9 cpu, 2.305289192E9 io, 4.961798312385362E10 network, 1.3320421632282749E8 memory}, id = 45419
      03-05                                  HashJoin(condition=[=($7, $1)], joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY revenue, ANY ss_store_sk0, ANY ave, ANY s_store_sk, ANY s_store_name, ANY i_item_sk, ANY i_item_desc, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 832956.8718162744, cumulative cost = {1.1666537606335552E9 rows, 9.4468272675901E9 cpu, 2.305289192E9 io, 4.961798312385362E10 network, 1.3320421632282749E8 memory}, id = 45418
      03-07                                    HashToRandomExchange(dist0=[[$1]]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY revenue, ANY ss_store_sk0, ANY ave, ANY s_store_sk, ANY s_store_name): rowcount = 832377.0685035395, cumulative cost = {1.1652093835650516E9 rows, 9.430922742768057E9 cpu, 2.304269192E9 io, 4.544006312385362E10 network, 1.2961381632282749E8 memory}, id = 45415
      04-01                                      HashJoin(condition=[=($5, $0)], joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY revenue, ANY ss_store_sk0, ANY ave, ANY s_store_sk, ANY s_store_name): rowcount = 832377.0685035395, cumulative cost = {1.164377006496548E9 rows, 9.417604709672E9 cpu, 2.304269192E9 io, 2.157414781572014E10 network, 1.2961381632282749E8 memory}, id = 45414
      04-03                                        HashJoin(condition=[=($3, $0)], joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY revenue, ANY ss_store_sk0, ANY ave): rowcount = 832377.0685035397, cumulative cost = {1.1635434234280443E9 rows, 9.407608948849958E9 cpu, 2.304268388E9 io, 2.1544179841320137E10 network, 1.2960674112282749E8 memory}, id = 45411
      04-05                                          HashAgg(group=[{0, 1}], revenue=[SUM($2)]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY revenue): rowcount = 832377.0685035397, cumulative cost = {5.804395044044166E8 rows, 4.68265977388499E9 cpu, 1.152134194E9 io, 1.0253148182571495E10 network, 5.674418293829948E7 memory}, id = 45393
      04-07                                            HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY revenue): rowcount = 832377.0685035397, cumulative cost = {5.796071273359131E8 rows, 4.65935321596689E9 cpu, 1.152134194E9 io, 1.0253148182571495E10 network, 3.476942832980602E7 memory}, id = 45392
      06-01                                              HashAgg(group=[{0, 1}], revenue=[SUM($2)]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY revenue): rowcount = 832377.0685035397, cumulative cost = {5.787747502674096E8 rows, 4.649364691144848E9 cpu, 1.152134194E9 io, 2.48987648E7 network, 3.476942832980602E7 memory}, id = 45391
      06-02                                                Project(ss_store_sk=[$2], ss_item_sk=[$1], ss_sales_price=[$3]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY ss_sales_price): rowcount = 1316801.1337047734, cumulative cost = {5.774579491337048E8 rows, 4.612494259401114E9 cpu, 1.152134194E9 io, 2.48987648E7 network, 5878.400000000001 memory}, id = 45390
      06-03                                                  HashJoin(condition=[=($0, $4)], joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY ss_sold_date_sk, ANY ss_item_sk, ANY ss_store_sk, ANY ss_sales_price, ANY d_date_sk, ANY d_month_seq): rowcount = 1316801.1337047734, cumulative cost = {5.76141148E8 rows, 4.608543856E9 cpu, 1.152134194E9 io, 2.48987648E7 network, 5878.400000000001 memory}, id = 45389
      06-05                                                    Scan(table=[[dfs, /tpcdsParquet10/SF100/store_sales]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tpcdsParquet10/SF100/store_sales]], selectionRoot=maprfs:/tpcdsParquet10/SF100/store_sales, numFiles=1, numRowGroups=166, usedMetadataFile=false, columns=[`ss_sold_date_sk`, `ss_item_sk`, `ss_store_sk`, `ss_sales_price`]]]) : rowType = RecordType(ANY ss_sold_date_sk, ANY ss_item_sk, ANY ss_store_sk, ANY ss_sales_price): rowcount = 2.87997024E8, cumulative cost = {2.87997024E8 rows, 1.151988096E9 cpu, 1.151988096E9 io, 0.0 network, 0.0 memory}, id = 45384
      06-04                                                    BroadcastExchange : rowType = RecordType(ANY d_date_sk, ANY d_month_seq): rowcount = 334.0, cumulative cost = {146766.0 rows, 588800.0 cpu, 146098.0 io, 2.48987648E7 network, 0.0 memory}, id = 45388
      09-01                                                      SelectionVectorRemover : rowType = RecordType(ANY d_date_sk, ANY d_month_seq): rowcount = 334.0, cumulative cost = {146432.0 rows, 586128.0 cpu, 146098.0 io, 0.0 network, 0.0 memory}, id = 45387
      09-02                                                        Filter(condition=[AND(>=($1, 1199), <=($1, 1210))]) : rowType = RecordType(ANY d_date_sk, ANY d_month_seq): rowcount = 334.0, cumulative cost = {146098.0 rows, 585794.0 cpu, 146098.0 io, 0.0 network, 0.0 memory}, id = 45386
      09-03                                                          Scan(table=[[dfs, /tpcdsParquet10/SF100/date_dim]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tpcdsParquet10/SF100/date_dim]], selectionRoot=maprfs:/tpcdsParquet10/SF100/date_dim, numFiles=1, numRowGroups=1, usedMetadataFile=false, filter=booleanAnd(greater_than_or_equal_to(`d_month_seq`, 1199) , less_than_or_equal_to(`d_month_seq`, 1210) ) , columns=[`d_date_sk`, `d_month_seq`]]]) : rowType = RecordType(ANY d_date_sk, ANY d_month_seq): rowcount = 73049.0, cumulative cost = {73049.0 rows, 146098.0 cpu, 146098.0 io, 0.0 network, 0.0 memory}, id = 45385
      04-04                                          Project(ss_store_sk0=[$0], ave=[$1]) : rowType = RecordType(ANY ss_store_sk0, ANY ave): rowcount = 202.0, cumulative cost = {5.822713399551244E8 rows, 4.714959034142926E9 cpu, 1.152134194E9 io, 1.1291031658748644E10 network, 7.2859002984528E7 memory}, id = 45410
      04-06                                            BroadcastExchange : rowType = RecordType(ANY ss_store_sk, ANY ave): rowcount = 202.0, cumulative cost = {5.822711379551244E8 rows, 4.714958630142926E9 cpu, 1.152134194E9 io, 1.1291031658748644E10 network, 7.2859002984528E7 memory}, id = 45409
      07-01                                              Project(ss_store_sk=[$0], ave=[divide(CastHigh(CASE(=($2, 0), null, $1)), $2)]) : rowType = RecordType(ANY ss_store_sk, ANY ave): rowcount = 202.0, cumulative cost = {5.822709359551244E8 rows, 4.714957014142926E9 cpu, 1.152134194E9 io, 1.1275973124348644E10 network, 7.2859002984528E7 memory}, id = 45408
      07-02                                                HashAgg(group=[{0}], agg#0=[$SUM0($1)], agg#1=[$SUM0($2)]) : rowType = RecordType(ANY ss_store_sk, ANY $f1, BIGINT $f2): rowcount = 202.0, cumulative cost = {5.822707339551244E8 rows, 4.714956004142926E9 cpu, 1.152134194E9 io, 1.1275973124348644E10 network, 7.2859002984528E7 memory}, id = 45407
      07-03                                                  HashToRandomExchange(dist0=[[$0]]) : rowType = RecordType(ANY ss_store_sk, ANY $f1, BIGINT $f2): rowcount = 83237.70685035396, cumulative cost = {5.82187496248274E8 rows, 4.712292397523715E9 cpu, 1.152134194E9 io, 1.1275973124348644E10 network, 7.139401934396178E7 memory}, id = 45406
      10-01                                                    HashAgg(group=[{0}], agg#0=[$SUM0($1)], agg#1=[COUNT($1)]) : rowType = RecordType(ANY ss_store_sk, ANY $f1, BIGINT $f2): rowcount = 83237.70685035396, cumulative cost = {5.821042585414236E8 rows, 4.710960594214109E9 cpu, 1.152134194E9 io, 1.0253148182571495E10 network, 7.139401934396178E7 memory}, id = 45405
      10-02                                                      Project(ss_store_sk=[$0], revenue=[$2]) : rowType = RecordType(ANY ss_store_sk, ANY revenue): rowcount = 832377.0685035397, cumulative cost = {5.812718814729201E8 rows, 4.6843245280219965E9 cpu, 1.152134194E9 io, 1.0253148182571495E10 network, 5.674418293829948E7 memory}, id = 45404
      10-03                                                        HashAgg(group=[{0, 1}], revenue=[SUM($2)]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY revenue): rowcount = 832377.0685035397, cumulative cost = {5.804395044044166E8 rows, 4.68265977388499E9 cpu, 1.152134194E9 io, 1.0253148182571495E10 network, 5.674418293829948E7 memory}, id = 45403
      10-04                                                          HashToRandomExchange(dist0=[[$0]], dist1=[[$1]]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY revenue): rowcount = 832377.0685035397, cumulative cost = {5.796071273359131E8 rows, 4.65935321596689E9 cpu, 1.152134194E9 io, 1.0253148182571495E10 network, 3.476942832980602E7 memory}, id = 45402
      11-01                                                            HashAgg(group=[{0, 1}], revenue=[SUM($2)]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY revenue): rowcount = 832377.0685035397, cumulative cost = {5.787747502674096E8 rows, 4.649364691144848E9 cpu, 1.152134194E9 io, 2.48987648E7 network, 3.476942832980602E7 memory}, id = 45401
      11-02                                                              Project(ss_store_sk=[$2], ss_item_sk=[$1], ss_sales_price=[$3]) : rowType = RecordType(ANY ss_store_sk, ANY ss_item_sk, ANY ss_sales_price): rowcount = 1316801.1337047734, cumulative cost = {5.774579491337048E8 rows, 4.612494259401114E9 cpu, 1.152134194E9 io, 2.48987648E7 network, 5878.400000000001 memory}, id = 45400
      11-03                                                                HashJoin(condition=[=($0, $4)], joinType=[inner], semi-join: =[false]) : rowType = RecordType(ANY ss_sold_date_sk, ANY ss_item_sk, ANY ss_store_sk, ANY ss_sales_price, ANY d_date_sk, ANY d_month_seq): rowcount = 1316801.1337047734, cumulative cost = {5.76141148E8 rows, 4.608543856E9 cpu, 1.152134194E9 io, 2.48987648E7 network, 5878.400000000001 memory}, id = 45399
      11-05                                                                  Scan(table=[[dfs, /tpcdsParquet10/SF100/store_sales]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tpcdsParquet10/SF100/store_sales]], selectionRoot=maprfs:/tpcdsParquet10/SF100/store_sales, numFiles=1, numRowGroups=166, usedMetadataFile=false, columns=[`ss_sold_date_sk`, `ss_item_sk`, `ss_store_sk`, `ss_sales_price`]]]) : rowType = RecordType(ANY ss_sold_date_sk, ANY ss_item_sk, ANY ss_store_sk, ANY ss_sales_price): rowcount = 2.87997024E8, cumulative cost = {2.87997024E8 rows, 1.151988096E9 cpu, 1.151988096E9 io, 0.0 network, 0.0 memory}, id = 45394
      11-04                                                                  BroadcastExchange : rowType = RecordType(ANY d_date_sk, ANY d_month_seq): rowcount = 334.0, cumulative cost = {146766.0 rows, 588800.0 cpu, 146098.0 io, 2.48987648E7 network, 0.0 memory}, id = 45398
      12-01                                                                    SelectionVectorRemover : rowType = RecordType(ANY d_date_sk, ANY d_month_seq): rowcount = 334.0, cumulative cost = {146432.0 rows, 586128.0 cpu, 146098.0 io, 0.0 network, 0.0 memory}, id = 45397
      12-02                                                                      Filter(condition=[AND(>=($1, 1199), <=($1, 1210))]) : rowType = RecordType(ANY d_date_sk, ANY d_month_seq): rowcount = 334.0, cumulative cost = {146098.0 rows, 585794.0 cpu, 146098.0 io, 0.0 network, 0.0 memory}, id = 45396
      12-03                                                                        Scan(table=[[dfs, /tpcdsParquet10/SF100/date_dim]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tpcdsParquet10/SF100/date_dim]], selectionRoot=maprfs:/tpcdsParquet10/SF100/date_dim, numFiles=1, numRowGroups=1, usedMetadataFile=false, filter=booleanAnd(greater_than_or_equal_to(`d_month_seq`, 1199) , less_than_or_equal_to(`d_month_seq`, 1210) ) , columns=[`d_date_sk`, `d_month_seq`]]]) : rowType = RecordType(ANY d_date_sk, ANY d_month_seq): rowcount = 73049.0, cumulative cost = {73049.0 rows, 146098.0 cpu, 146098.0 io, 0.0 network, 0.0 memory}, id = 45395
      04-02                                        BroadcastExchange : rowType = RecordType(ANY s_store_sk, ANY s_store_name): rowcount = 402.0, cumulative cost = {804.0 rows, 4020.0 cpu, 804.0 io, 2.99679744E7 network, 0.0 memory}, id = 45413
      08-01                                          Scan(table=[[dfs, /tpcdsParquet10/SF100/store]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tpcdsParquet10/SF100/store]], selectionRoot=maprfs:/tpcdsParquet10/SF100/store, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`s_store_sk`, `s_store_name`]]]) : rowType = RecordType(ANY s_store_sk, ANY s_store_name): rowcount = 402.0, cumulative cost = {402.0 rows, 804.0 cpu, 804.0 io, 0.0 network, 0.0 memory}, id = 45412
      03-06                                    HashToRandomExchange(dist0=[[$0]]) : rowType = RecordType(ANY i_item_sk, ANY i_item_desc, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 204000.0, cumulative cost = {408000.0 rows, 4284000.0 cpu, 1020000.0 io, 4.17792E9 network, 0.0 memory}, id = 45417
      05-01                                      Scan(table=[[dfs, /tpcdsParquet10/SF100/item]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tpcdsParquet10/SF100/item]], selectionRoot=maprfs:/tpcdsParquet10/SF100/item, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`i_item_sk`, `i_item_desc`, `i_current_price`, `i_wholesale_cost`, `i_brand`]]]) : rowType = RecordType(ANY i_item_sk, ANY i_item_desc, ANY i_current_price, ANY i_wholesale_cost, ANY i_brand): rowcount = 204000.0, cumulative cost = {204000.0 rows, 1020000.0 cpu, 1020000.0 io, 0.0 network, 0.0 memory}, id = 45416
      

      For query 65, there are two HashAggs (operator 04-05 and 10-03) that are taking a long time. Their rowcounts are incorrect. Operator 04-05 is estimated to be 800K and is actually 55M. Operator 10-03 is estimated to be 800K and is also 55M. As a result, these two operators only have 9 minor fragments in the new plan. This may be caused by the HashJoin rowcount for operator (06-03). The estimated rowcount for the HashJoin is 1.3M rows but the actual rowcount is 55M rows.

      For query 1, hash join operator 03-04 may have a similar problem.

      For query 45, hash join operator 16-02 may have a similar problem.

      Attachments

        Issue Links

          Activity

            People

              gparai Gautam Parai
              gparai Gautam Parai
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 24h
                  24h
                  Remaining:
                  Remaining Estimate - 24h
                  24h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified