Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-2044

Tweak cost of BindableTableScan to make sure Project is pushed through Aggregate

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.15.0
    • Component/s: core
    • Labels:
      None

      Description

      Similar to CALCITE-1876.
      Projects are not pushed to BindableTableScan when using ProjectableFilterableTable with aggregate functions.
      The reason is that the cost of BindableTableScan does not use projects (and filters), so the planner chooses a plan with Project node removed by ProjectRemoveRule.
      By tweaking the cost to use the number of used projects solved the problem.
      Any suggestion on the cost formula to take both projects and filters into account?

        Activity

        Hide
        julianhyde Julian Hyde added a comment -

        Resolved in release 1.15.0 (2017-12-11).

        Show
        julianhyde Julian Hyde added a comment - Resolved in release 1.15.0 (2017-12-11).
        Hide
        julianhyde Julian Hyde added a comment -

        Fixed in a8505d2b. Thanks for the PR, Luis Fernando Kauer!

        Show
        julianhyde Julian Hyde added a comment - Fixed in a8505d2b . Thanks for the PR, Luis Fernando Kauer !
        Hide
        julianhyde Julian Hyde added a comment -

        RelDataTypeFactory.builder() isn't actually deprecated. It just has the annotation @SuppressWarnings("deprecation") to stop the compiler complaining that we are using FieldInfoBuilder, which was deprecated as of CALCITE-1929. I wanted to change the return type to RelDataTypeFactory.Builder, which is FieldInfoBuilder's base class and is not deprecated, but I could not do so due to binary compatibility.

        So yes, use RelDataTypeFactory.builder(). I will make that change for you.

        Show
        julianhyde Julian Hyde added a comment - RelDataTypeFactory.builder() isn't actually deprecated. It just has the annotation @SuppressWarnings("deprecation") to stop the compiler complaining that we are using FieldInfoBuilder , which was deprecated as of CALCITE-1929 . I wanted to change the return type to RelDataTypeFactory.Builder , which is FieldInfoBuilder 's base class and is not deprecated, but I could not do so due to binary compatibility. So yes, use RelDataTypeFactory.builder() . I will make that change for you.
        Hide
        lfkauer Luis Fernando Kauer added a comment -

        Thanks, Julian.
        Looks great.
        I changed typeFactory.builder() because it is deprecated and javadoc recomends using Builder, not because I don't like it. I thought I was updating to the recommended way to do it.
        Should I continue using typeFactory.builder() for now?

        Show
        lfkauer Luis Fernando Kauer added a comment - Thanks, Julian. Looks great. I changed typeFactory.builder() because it is deprecated and javadoc recomends using Builder, not because I don't like it. I thought I was updating to the recommended way to do it. Should I continue using typeFactory.builder() for now?
        Hide
        julianhyde Julian Hyde added a comment -

        Reviewing https://github.com/apache/calcite/pull/565/commits/ddd2a65f69617f84432d91e72783c6c4b0f3ad25:

        • Thanks for modernizing ScannableTableTest - much needed!
        • I intend to go a bit further and change returns to returnsUnordered in a few places.
        • I don't understand why you changed typeFactory.builder() in a couple of places - could I revert? (Yes I know FieldInfoBuilder is not nice but it is deprecated so it won't be with us forever.)
        Show
        julianhyde Julian Hyde added a comment - Reviewing https://github.com/apache/calcite/pull/565/commits/ddd2a65f69617f84432d91e72783c6c4b0f3ad25: Thanks for modernizing ScannableTableTest - much needed! I intend to go a bit further and change returns to returnsUnordered in a few places. I don't understand why you changed typeFactory.builder() in a couple of places - could I revert? (Yes I know FieldInfoBuilder is not nice but it is deprecated so it won't be with us forever.)
        Show
        lfkauer Luis Fernando Kauer added a comment - https://github.com/apache/calcite/pull/565
        Hide
        lfkauer Luis Fernando Kauer added a comment -

        Alexey, I'd like to understand your real need.
        You see, the computed cost does not have to be exact and the overall cost depends on the costs of all other nodes in the plan.
        It is used to choose between different plans, but if you don't adjust the cost of all other nodes of the plan, like joins, filters and projects not pushed, aggregates, then you might not get the best plan anyway.
        Here we are talking about BindableTableScan cost only and how the filters and projects change the cost.
        Even though you may have expensive columns, I assume that if the query requires expensive columns you'll have to return them anyway, so usually pushing the used projects is desirable.
        With filters is the same thing, I think.
        Julian, I would like to know some counter examples to take it into account, if you may.

        Show
        lfkauer Luis Fernando Kauer added a comment - Alexey, I'd like to understand your real need. You see, the computed cost does not have to be exact and the overall cost depends on the costs of all other nodes in the plan. It is used to choose between different plans, but if you don't adjust the cost of all other nodes of the plan, like joins, filters and projects not pushed, aggregates, then you might not get the best plan anyway. Here we are talking about BindableTableScan cost only and how the filters and projects change the cost. Even though you may have expensive columns, I assume that if the query requires expensive columns you'll have to return them anyway, so usually pushing the used projects is desirable. With filters is the same thing, I think. Julian, I would like to know some counter examples to take it into account, if you may.
        Hide
        alexeyroytman Alexey Roytman added a comment -

        Julian, once you talked about 80%, now they're raised to 90%. So the ProjectableFilterableTable is improving!

        ...For my project I moved to TranslatableTable (implementing only Filter and Project rules there), but it's a complicated thing, and I'm in 3rd or 4th iteration of re-learning. Thus I compare my implementation of TranslatableTable with the one I already have for ProjectableFilterableTable, comparing results, performance etc. For sure, I'd like to use ProjectableFilterableTable if it allows me the flexibility of cost evaluation having in hand the arguments of ProjectableFilterableTable.scan(), but alas.

        Show
        alexeyroytman Alexey Roytman added a comment - Julian, once you talked about 80%, now they're raised to 90%. So the ProjectableFilterableTable is improving! ...For my project I moved to TranslatableTable (implementing only Filter and Project rules there), but it's a complicated thing, and I'm in 3rd or 4th iteration of re-learning. Thus I compare my implementation of TranslatableTable with the one I already have for ProjectableFilterableTable , comparing results, performance etc. For sure, I'd like to use ProjectableFilterableTable if it allows me the flexibility of cost evaluation having in hand the arguments of ProjectableFilterableTable.scan() , but alas.
        Hide
        julianhyde Julian Hyde added a comment -

        Agreed; it's usually better to project and to filter, but there are counter-examples and they are crucial. Alexey Roytman, I feel that you are more comfortable with the "simplified" APIs such as ProjectableFilterableTable rather than the more powerful model based on RelOptRule and cost models. No question that the simplified APIs are good for 90% of cases; but they are very wrong for the difficult 10%. Until you grok the volcano model I fear we are going to find ourselves at odds over and over again.

        Show
        julianhyde Julian Hyde added a comment - Agreed; it's usually better to project and to filter, but there are counter-examples and they are crucial. Alexey Roytman , I feel that you are more comfortable with the "simplified" APIs such as ProjectableFilterableTable rather than the more powerful model based on RelOptRule and cost models. No question that the simplified APIs are good for 90% of cases; but they are very wrong for the difficult 10%. Until you grok the volcano model I fear we are going to find ourselves at odds over and over again.
        Hide
        lfkauer Luis Fernando Kauer added a comment -

        ProjectableFilterableTable is meant for simple uses.
        I opened this Jira to solve the common problem of projects not being pushed to the table scan with aggregate.
        I propose a simple solution, without the need to implement any other interface, just by reducing the cost when projects and filters are applied.
        You are proposing a new feature that would allow more control of the cost, but Calcite already has many pluggable ways to control costs and provide statistics, and you should try them first. See Statistics, RelMetadataQuery, BuiltInMetadata ...

        You can also use TranslatableTable and have full control. I know, the csv example does not include filter push down. Maybe we could improve the example to include it to show how it can be done.

        Show
        lfkauer Luis Fernando Kauer added a comment - ProjectableFilterableTable is meant for simple uses. I opened this Jira to solve the common problem of projects not being pushed to the table scan with aggregate. I propose a simple solution, without the need to implement any other interface, just by reducing the cost when projects and filters are applied. You are proposing a new feature that would allow more control of the cost, but Calcite already has many pluggable ways to control costs and provide statistics, and you should try them first. See Statistics, RelMetadataQuery, BuiltInMetadata ... You can also use TranslatableTable and have full control. I know, the csv example does not include filter push down. Maybe we could improve the example to include it to show how it can be done.
        Hide
        alexeyroytman Alexey Roytman added a comment - - edited

        The cost formula shall be an implementable policy.
        I.e. if I implement an "XTable implements ProjectableFilterableTable" and also "implements ProjectableFilterableCostEstimator" then XTable has a function to estimate this behavior.
        Or, we may wish to have an AbstractProjectableFilterable with such cost function (parameters like in ProjectableFilterableTable.scan()).

        Anyway, project presence is better than project absence, and filter presence is better than filter absence... This seems to be more universal mantra.

        Show
        alexeyroytman Alexey Roytman added a comment - - edited The cost formula shall be an implementable policy. I.e. if I implement an " XTable implements ProjectableFilterableTable " and also " implements ProjectableFilterableCostEstimator " then XTable has a function to estimate this behavior. Or, we may wish to have an AbstractProjectableFilterable with such cost function (parameters like in ProjectableFilterableTable.scan() ). Anyway, project presence is better than project absence, and filter presence is better than filter absence... This seems to be more universal mantra.

          People

          • Assignee:
            julianhyde Julian Hyde
            Reporter:
            lfkauer Luis Fernando Kauer
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development