Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-602

Query with join and group-by on join column hangs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.4.0
    • None
    • None

    Description

      Following query hangs on latest master branch:
      select ps.ps_partkey, count from cp.`tpch/lineitem.parquet` l, cp.`tpch/partsupp.parquet` ps where l.l_partkey = ps.ps_partkey and ps.ps_partkey = 30 group by ps.ps_partkey;

      Plan looks ok:

      ScreenPrel: rowcount = 22.5, cumulative cost =

      {1398.3592534906718 rows, 307.0 cpu, 0.0 io}

      , id = 400
      UnionExchangePrel: rowcount = 22.5, cumulative cost =

      {1396.1092534906718 rows, 304.75 cpu, 0.0 io}

      , id = 399
      StreamAggPrel(group=[

      {0}

      ], EXPR$1=[COUNT()]): rowcount = 22.5, cumulative cost =

      {1393.8592534906718 rows, 302.5 cpu, 0.0 io}

      , id = 398
      SortPrel(sort0=[$0], dir0=[ASC]): rowcount = 225.0, cumulative cost =

      {1371.3592534906718 rows, 302.5 cpu, 0.0 io}

      , id = 397
      HashToRandomExchangePrel(dist0=[[$0]]): rowcount = 225.0, cumulative cost =

      {883.910217292274 rows, 280.0 cpu, 0.0 io}

      , id = 396
      ProjectPrel(ps_partkey=[$3]): rowcount = 225.0, cumulative cost =

      {861.410217292274 rows, 257.5 cpu, 0.0 io}

      , id = 395
      MergeJoinPrel(condition=[=($1, $3)], joinType=[inner]): rowcount = 225.0, cumulative cost =

      {838.910217292274 rows, 235.0 cpu, 0.0 io}

      , id = 394
      SortPrel(sort0=[$1], dir0=[ASC]): rowcount = 100.0, cumulative cost =

      {478.4136148790474 rows, 121.0 cpu, 0.0 io}

      , id = 390
      HashToRandomExchangePrel(dist0=[[$1]]): rowcount = 100.0, cumulative cost =

      {110.0 rows, 111.0 cpu, 0.0 io}

      , id = 389
      ScanPrel(table=[[cp, tpch/lineitem.parquet]]): rowcount = 100.0, cumulative cost =

      {100.0 rows, 101.0 cpu, 0.0 io}, id = 247
      SortPrel(sort0=[$1], dir0=[ASC]): rowcount = 15.0, cumulative cost = {135.49660241322653 rows, 114.0 cpu, 0.0 io}, id = 393
      HashToRandomExchangePrel(dist0=[[$1]]): rowcount = 15.0, cumulative cost = {103.0 rows, 112.5 cpu, 0.0 io}, id = 392
      FilterPrel(condition=[=(CAST($1):INTEGER NOT NULL, 30)]): rowcount = 15.0, cumulative cost = {101.5 rows, 111.0 cpu, 0.0 io}, id = 391
      ScanPrel(table=[[cp, tpch/partsupp.parquet]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}

      , id = 191

      Attachments

        Activity

          People

            Unassigned Unassigned
            amansinha100 Aman Sinha
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: