Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-1337

TPCH Q13 may return incorrect rows : Drill may incorrectly pull up a local right filter in a left outer join condition.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.5.0
    • None
    • None

    Description

      For TPCH Q13, Drill may discard some qualified rows, since Drill pulls up a local RHS filter in a left outer join, and put it on top of Join. There means the local RHS is treated as a post-join condition, which will incorrectly discard some qualified rows.

      select
      c_count,
      count as custdist
      from
      (
      select
      c.c_custkey,
      count(o.o_orderkey)
      from
      cp.`tpch/customer.parquet` c
      left outer join cp.`tpch/orders.parquet` o
      on c.c_custkey = o.o_custkey
      and o.o_comment not like '%special%requests%'
      group by
      c.c_custkey
      ) as orders (c_custkey, c_count)
      group by
      c_count
      order by
      custdist desc,
      c_count desc;

      Drill Physical :
      .......................
      02-06 Filter(condition=[$3]): rowcount = 3750.0, cumulative cost =

      {79500.0 rows, 568512.0 cpu, 0.0 io, 1.90464E8 network, 264000.0 memory}

      , id = 2649
      02-07 HashJoin(condition=[=($0, $1)], joinType=[left]): rowcount = 15000.0, cumulative cost =

      {64500.0 rows, 508512.0 cpu, 0.0 io, 1.90464E8 network, 264000.0 memory}

      , id = 2648
      02-09 HashToRandomExchange(dist0=[[$0]]): rowcount = 1500.0, cumulative cost =

      {3000.0 rows, 25500.0 cpu, 0.0 io, 6144000.0 network, 0.0 memory}

      , id = 2644
      03-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/customer.parquet]], selectionRoot=/tpch/customer.parquet, columns=[SchemaPath [`c_custkey`]]]]): rowcount = 1500.0, cumulative cost =

      {1500.0 rows, 1500.0 cpu, 0.0 io, 0.0 network, 0.0 memory}

      , id = 2643
      02-08 HashToRandomExchange(dist0=[[$0]]): rowcount = 15000.0, cumulative cost =

      {45000.0 rows, 285012.0 cpu, 0.0 io, 1.8432E8 network, 0.0 memory}

      , id = 2647
      04-01 Project(o_custkey=[$1], o_orderkey=[$0], $f4=[NOT(LIKE($2, '%special%requests%'))]): rowcount = 15000.0, cumulative cost =

      {30000.0 rows, 45012.0 cpu, 0.0 io, 0.0 network, 0.0 memory}

      , id = 2646
      04-02 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/orders.parquet]], selectionRoot=/tpch/orders.parquet, columns=[SchemaPath [`o_custkey`], SchemaPath [`o_orderkey`], SchemaPath [`o_comment`]]]]): rowcount = 15000.0, cumulative cost =

      {15000.0 rows, 45000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}

      , id = 2645

      Attachments

        Activity

          People

            DrillCommitter DrillCommitter
            jni Jinfeng Ni
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: