Apache Drill
  1. Apache Drill
  2. DRILL-1091

Planner generating invalid plan for tpc-h 18

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4.0
    • Component/s: None
    • Labels:
      None

      Description

      The planner is currently including a streaming aggregate on l_orderkey without previously sorting. This is causing invalid results.

        Activity

        Hide
        Jacques Nadeau added a comment -

        Fixed by 810a204 or earlier.

        Show
        Jacques Nadeau added a comment - Fixed by 810a204 or earlier.
        Hide
        Aman Sinha added a comment -

        Uploaded a new patch...this has the filter trait propagation (for TPCH 18) plus a modification to StreamingAggPrule and HashAggPrule 's pattern matching to consider the child RelNode (this is needed for a modified version of TPCH 19). Also added a modified version of TPCH 19 to JUnit test suite.

        Show
        Aman Sinha added a comment - Uploaded a new patch...this has the filter trait propagation (for TPCH 18) plus a modification to StreamingAggPrule and HashAggPrule 's pattern matching to consider the child RelNode (this is needed for a modified version of TPCH 19). Also added a modified version of TPCH 19 to JUnit test suite.
        Hide
        Aman Sinha added a comment -

        Uploaded an updated patch for this.

        Show
        Aman Sinha added a comment - Uploaded an updated patch for this.
        Hide
        Aman Sinha added a comment -

        Uploaded a patch for this. The fix is to propagate the collation trait from the Filter's child. Tested with manually inspecting explain plans for various queries containing HAVING predicate in the aggregation followed by ordering requirement. With this patch, the TPCH 18 query will do hash aggregate in both places instead of streaming aggregate.

        Show
        Aman Sinha added a comment - Uploaded a patch for this. The fix is to propagate the collation trait from the Filter's child. Tested with manually inspecting explain plans for various queries containing HAVING predicate in the aggregation followed by ordering requirement. With this patch, the TPCH 18 query will do hash aggregate in both places instead of streaming aggregate.
        Hide
        Aman Sinha added a comment -

        This is actually an issue with trait propagation for Filters. TPCH 18 is doing 2 grouped aggregations. The first grouped aggregation (a hash aggregate) has a HAVING predicate and the filter operator seems to incorrectly produce an output collation trait, so there's no sort enforcer added before the second grouped aggregation (a streaming aggregate).

        Here's a simple example to reproduce the same issue. Note that no sort is getting added after the filter even though the input is not sorted and there's an order-by requirement.

        explain plan for select n_nationkey from cp.`tpch/nation.parquet` group by n_nationkey having n_nationkey < 5 order by n_nationkey");

        00-00 Screen
        00-01 SelectionVectorRemover
        00-02 Filter(condition=[<($0, 5)])
        00-03 HashAgg(group=[

        {0}

        ])
        00-04 ProducerConsumer
        00-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, columns=[SchemaPath [`n_nationkey`]]]])

        I am in the process of testing a fix for this.

        Show
        Aman Sinha added a comment - This is actually an issue with trait propagation for Filters. TPCH 18 is doing 2 grouped aggregations. The first grouped aggregation (a hash aggregate) has a HAVING predicate and the filter operator seems to incorrectly produce an output collation trait, so there's no sort enforcer added before the second grouped aggregation (a streaming aggregate). Here's a simple example to reproduce the same issue. Note that no sort is getting added after the filter even though the input is not sorted and there's an order-by requirement. explain plan for select n_nationkey from cp.`tpch/nation.parquet` group by n_nationkey having n_nationkey < 5 order by n_nationkey"); 00-00 Screen 00-01 SelectionVectorRemover 00-02 Filter(condition= [<($0, 5)] ) 00-03 HashAgg(group=[ {0} ]) 00-04 ProducerConsumer 00-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet] ], selectionRoot=/tpch/nation.parquet, columns=[SchemaPath [`n_nationkey`] ]]]) I am in the process of testing a fix for this.

          People

          • Assignee:
            DrillCommitter
            Reporter:
            Steven Phillips
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development