Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5138

TopN operator on top of ~110 GB data set is very slow

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None

    Description

      git.commit.id.abbrev=cf2b7c7

      No of cores : 23
      No of disks : 5
      DRILL_MAX_DIRECT_MEMORY="24G"
      DRILL_MAX_HEAP="12G"

      The below query ran for more than 4 hours and did not complete. The table is ~110 GB

      select * from catalog_sales order by cs_quantity, cs_wholesale_cost limit 1;
      

      Physical Plan :

      00-00    Screen : rowType = RecordType(ANY *): rowcount = 1.0, cumulative cost = {1.00798629141E10 rows, 4.17594320691E10 cpu, 0.0 io, 4.1287118487552E13 network, 0.0 memory}, id = 352
      00-01      Project(*=[$0]) : rowType = RecordType(ANY *): rowcount = 1.0, cumulative cost = {1.0079862914E10 rows, 4.1759432069E10 cpu, 0.0 io, 4.1287118487552E13 network, 0.0 memory}, id = 351
      00-02        Project(T0¦¦*=[$0]) : rowType = RecordType(ANY T0¦¦*): rowcount = 1.0, cumulative cost = {1.0079862914E10 rows, 4.1759432069E10 cpu, 0.0 io, 4.1287118487552E13 network, 0.0 memory}, id = 350
      00-03          SelectionVectorRemover : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost): rowcount = 1.0, cumulative cost = {1.0079862914E10 rows, 4.1759432069E10 cpu, 0.0 io, 4.1287118487552E13 network, 0.0 memory}, id = 349
      00-04            Limit(fetch=[1]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost): rowcount = 1.0, cumulative cost = {1.0079862913E10 rows, 4.1759432068E10 cpu, 0.0 io, 4.1287118487552E13 network, 0.0 memory}, id = 348
      00-05              SingleMergeExchange(sort0=[1 ASC], sort1=[2 ASC]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative cost = {1.0079862912E10 rows, 4.1759432064E10 cpu, 0.0 io, 4.1287118487552E13 network, 0.0 memory}, id = 347
      01-01                SelectionVectorRemover : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative cost = {8.639882496E9 rows, 3.0239588736E10 cpu, 0.0 io, 2.3592639135744E13 network, 0.0 memory}, id = 346
      01-02                  TopN(limit=[1]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative cost = {7.19990208E9 rows, 2.879960832E10 cpu, 0.0 io, 2.3592639135744E13 network, 0.0 memory}, id = 345
      01-03                    Project(T0¦¦*=[$0], cs_quantity=[$1], cs_wholesale_cost=[$2]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative cost = {5.759921664E9 rows, 2.879960832E10 cpu, 0.0 io, 2.3592639135744E13 network, 0.0 memory}, id = 344
      01-04                      HashToRandomExchange(dist0=[[$1]], dist1=[[$2]]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.439980416E9, cumulative cost = {5.759921664E9 rows, 2.879960832E10 cpu, 0.0 io, 2.3592639135744E13 network, 0.0 memory}, id = 343
      02-01                        UnorderedMuxExchange : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.439980416E9, cumulative cost = {4.319941248E9 rows, 1.1519843328E10 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 342
      03-01                          Project(T0¦¦*=[$0], cs_quantity=[$1], cs_wholesale_cost=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($2, hash32AsDouble($1))]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.439980416E9, cumulative cost = {2.879960832E9 rows, 1.0079862912E10 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 341
      03-02                            Project(T0¦¦*=[$0], cs_quantity=[$1], cs_wholesale_cost=[$2]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative cost = {1.439980416E9 rows, 4.319941248E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 340
      03-03                              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/tpcds/parquet/sf1000/catalog_sales]], selectionRoot=maprfs:/drill/testdata/tpcds/parquet/sf1000/catalog_sales, numFiles=1, usedMetadataFile=false, columns=[`*`]]]) : rowType = (DrillRecordRow[*, cs_quantity, cs_wholesale_cost]): rowcount = 1.439980416E9, cumulative cost = {1.439980416E9 rows, 4.319941248E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 339
      

      Attachments

        Issue Links

          Activity

            People

              timothyfarkas Timothy Farkas
              rkins Rahul Kumar Challapalli
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: