Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1876

Push projections through Aggregate to CsvTableScan

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.15.0
    • Component/s: csv-adapter
    • Labels:
      None

      Description

      Create a rule to push the projections used in aggregate functions. From Julian Hyde:

      Calcite should realize that Aggregate has an implied Project (because it only uses a few columns) and push that projection into the CsvTableScan, but it doesn’t.

      A query scans only the used projection when no aggregation is used:

      explain plan for select name from emps;
      
      CsvTableScan(table=[[SALES, EMPS]], fields=[[1]])
      

      But it scans all the projections when an aggregation is used:

      explain plan for select max(name) from emps;
      
      EnumerableAggregate(group=[{}], EXPR$0=[MAX($1)])
        CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
      

        Activity

        Hide
        julianhyde Julian Hyde added a comment -

        Resolved in release 1.15.0 (2017-12-11).

        Show
        julianhyde Julian Hyde added a comment - Resolved in release 1.15.0 (2017-12-11).
        Show
        julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/77a35491 ; thanks for the PR, Luis Fernando Kauer !
        Show
        julianhyde Julian Hyde added a comment - Reviewing and testing https://github.com/apache/calcite/pull/562/commits/969b3e192711984138ae414f87d23428f4d9a5ff now.
        Hide
        lfkauer Luis Fernando Kauer added a comment -

        Pull Request:
        https://github.com/apache/calcite/pull/562

        I realized that the correct plan was being generated but it was not being selected.
        Solved the problem by computing the cost of CsvTableScan using the number of projects used.

        Show
        lfkauer Luis Fernando Kauer added a comment - Pull Request: https://github.com/apache/calcite/pull/562 I realized that the correct plan was being generated but it was not being selected. Solved the problem by computing the cost of CsvTableScan using the number of projects used.
        Hide
        julianhyde Julian Hyde added a comment -

        The CSV adapter was written as an example (hence its path example/csv). Let's keep it simple (and flawed, because the flaws are informative to people learning how to write an adapter). Let's modify the file adapter instead, which has (I believe) all of the capabilities of the CSV adapter and more.

        Show
        julianhyde Julian Hyde added a comment - The CSV adapter was written as an example (hence its path example/csv). Let's keep it simple (and flawed, because the flaws are informative to people learning how to write an adapter). Let's modify the file adapter instead, which has (I believe) all of the capabilities of the CSV adapter and more.
        Hide
        lfkauer Luis Fernando Kauer added a comment -

        The CSV adapter uses TranslatableTable and Calcite's built in rules don't simplify this kind of aggregate query.
        However, I noticed that you mention to use ProjectableFilterableTable instead of TranslatableTable.
        After implementing ProjectableFilterableTable in CSV adapter this aggregate query was simplified as expected with Calcite's built in rules, with no need to create a new rule.
        So I wonder if a new rule should be created for CSV Adapter or if CSV Adapter should just implement ProjectableFilterableTable and take advantage of all the rules already implemented.

        Show
        lfkauer Luis Fernando Kauer added a comment - The CSV adapter uses TranslatableTable and Calcite's built in rules don't simplify this kind of aggregate query. However, I noticed that you mention to use ProjectableFilterableTable instead of TranslatableTable. After implementing ProjectableFilterableTable in CSV adapter this aggregate query was simplified as expected with Calcite's built in rules, with no need to create a new rule. So I wonder if a new rule should be created for CSV Adapter or if CSV Adapter should just implement ProjectableFilterableTable and take advantage of all the rules already implemented.
        Hide
        julianhyde Julian Hyde added a comment -

        I see two options:

        1. A rule, called say AggregateProjectableTableScanRule, that matches an Aggregate on top of a TableScan for a ProjectableFilterableTable, and pushes the implicit projects into the table
        2. A rule, called say AggregateInduceProjectRule, that converts an Aggregate into an Aggregate on top of a Project.

        The second of these, AggregateInduceProjectRule, is more powerful. It can handle intermediate relational expressions, for example select max(name) from Emp where deptno = 10 it would create a Project that could be pushed through the Filter and then into the TableScan. The first could not do this.

        But we'd have to be careful that AggregateInduceProjectRule didn't fire too often. It should not fire if an Aggregate uses all of its input fields (this would be a sign that the Aggregate had been created by the rule). Also note that it is the converse of AggregateProjectMergeRule, so obviously those rules need to be kept a safe distance from each other.

        Show
        julianhyde Julian Hyde added a comment - I see two options: A rule, called say AggregateProjectableTableScanRule, that matches an Aggregate on top of a TableScan for a ProjectableFilterableTable, and pushes the implicit projects into the table A rule, called say AggregateInduceProjectRule, that converts an Aggregate into an Aggregate on top of a Project. The second of these, AggregateInduceProjectRule, is more powerful. It can handle intermediate relational expressions, for example select max(name) from Emp where deptno = 10 it would create a Project that could be pushed through the Filter and then into the TableScan. The first could not do this. But we'd have to be careful that AggregateInduceProjectRule didn't fire too often. It should not fire if an Aggregate uses all of its input fields (this would be a sign that the Aggregate had been created by the rule). Also note that it is the converse of AggregateProjectMergeRule , so obviously those rules need to be kept a safe distance from each other.

          People

          • Assignee:
            julianhyde Julian Hyde
            Reporter:
            lfkauer Luis Fernando Kauer
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development