Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Component/s: Table API & SQL
    • Labels:
      None

      Description

      The estimateRowCount method of DataSetCalc didn't work in the following situation.
      If I run the following code,

      Table table = tableEnv.sql("select a, avg(a), sum(b), count(c) from t1 where a==1 group by a");
      

      the cost of every node in Optimized node tree is :

      DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, 5000.0 cpu, 28000.0 io}
        DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io}
            DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 1000.0 cpu, 0.0 io}
      

      We expect the input rowcount of DataSetAggregate less than 1000, however the actual input rowcount is still 1000 because the the estimateRowCount method of DataSetCalc didn't work.

      The problem is similar to the issue https://issues.apache.org/jira/browse/FLINK-5394 which is already solved.

      I find although we set metadata provider to FlinkDefaultRelMetadataProvider in FlinkRelBuilder, but after run

      planner.rel(...) 

      to translate SqlNode to RelNode, the metadata provider would be overrided from FlinkDefaultRelMetadataProvider to DefaultRelMetadataProvider again because of the following code:

            val cluster: RelOptCluster = RelOptCluster.create(planner, rexBuilder)
            val config = SqlToRelConverter.configBuilder()
              .withTrimUnusedFields(false).withConvertTableAccess(false).build()
            val sqlToRelConverter: SqlToRelConverter = new SqlToRelConverter(
              new ViewExpanderImpl, validator, createCatalogReader, cluster, convertletTable, config)
      

        Issue Links

          Activity

          Hide
          twalthr Timo Walther added a comment -

          Fixed in 1.3.0: 86d32ac847d3f5663f350210cecc70205b9fa0b9.

          Show
          twalthr Timo Walther added a comment - Fixed in 1.3.0: 86d32ac847d3f5663f350210cecc70205b9fa0b9.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3559

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3559
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user beyond1920 opened a pull request:

          https://github.com/apache/flink/pull/3559

          [flink-6037] [Table API & SQL]hotfix: metadata provider didn't work in SQL

          This pr aims to fix a bug referenced by https://issues.apache.org/jira/browse/FLINK-6037.
          After using the right MetadataProvider, org.apache.flink.table.ExpressionReductionTest.testReduceCalcExpressionForBatchSQL test cannot pass because the optimized plan is changed (The problem is referenced by https://issues.apache.org/jira/browse/FLINK-6067 which would be fixed in another pr). I simply changed test sql to make it pass in this pr.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/alibaba/flink hotfix

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3559.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3559


          commit bd83041507b3f4fdea737b538fb39af4a249e6d2
          Author: jingzhang <beyond1920@126.com>
          Date: 2017-03-17T09:47:35Z

          fix the bug: metadata provider didn't work in SQL


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user beyond1920 opened a pull request: https://github.com/apache/flink/pull/3559 [flink-6037] [Table API & SQL] hotfix: metadata provider didn't work in SQL This pr aims to fix a bug referenced by https://issues.apache.org/jira/browse/FLINK-6037 . After using the right MetadataProvider, org.apache.flink.table.ExpressionReductionTest.testReduceCalcExpressionForBatchSQL test cannot pass because the optimized plan is changed (The problem is referenced by https://issues.apache.org/jira/browse/FLINK-6067 which would be fixed in another pr). I simply changed test sql to make it pass in this pr. You can merge this pull request into a Git repository by running: $ git pull https://github.com/alibaba/flink hotfix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3559.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3559 commit bd83041507b3f4fdea737b538fb39af4a249e6d2 Author: jingzhang <beyond1920@126.com> Date: 2017-03-17T09:47:35Z fix the bug: metadata provider didn't work in SQL
          Hide
          fhueske Fabian Hueske added a comment -

          Ah, I see. Thanks for the clarification jingzhang!

          Show
          fhueske Fabian Hueske added a comment - Ah, I see. Thanks for the clarification jingzhang !
          Hide
          jinyu.zj jingzhang added a comment -

          Fabian Hueske, this issue is different from https://issues.apache.org/jira/browse/FLINK-5394, this issue only happens in the SQL.
          I agree there has no difference between Table API and SQL since both are represented the same way at the optimization layer. However, when using SqlToRelConverter to convert SqlNode to RelNode, the metadata provider would be overrided from FlinkDefaultRelMetadataProvider to DefaultRelMetadataProvider again because of the following code:

                val cluster: RelOptCluster = RelOptCluster.create(planner, rexBuilder)
                val config = SqlToRelConverter.configBuilder()
                  .withTrimUnusedFields(false).withConvertTableAccess(false).build()
                val sqlToRelConverter: SqlToRelConverter = new SqlToRelConverter(
                  new ViewExpanderImpl, validator, createCatalogReader, cluster, convertletTable, config)
          

          .
          So in the optimization phase, Table API uses FlinkDefaultRelMetadataProvider , but SQL uses DefaultRelMetadataProvider.

          Show
          jinyu.zj jingzhang added a comment - Fabian Hueske , this issue is different from https://issues.apache.org/jira/browse/FLINK-5394 , this issue only happens in the SQL. I agree there has no difference between Table API and SQL since both are represented the same way at the optimization layer. However, when using SqlToRelConverter to convert SqlNode to RelNode, the metadata provider would be overrided from FlinkDefaultRelMetadataProvider to DefaultRelMetadataProvider again because of the following code: val cluster: RelOptCluster = RelOptCluster.create(planner, rexBuilder) val config = SqlToRelConverter.configBuilder() .withTrimUnusedFields( false ).withConvertTableAccess( false ).build() val sqlToRelConverter: SqlToRelConverter = new SqlToRelConverter( new ViewExpanderImpl, validator, createCatalogReader, cluster, convertletTable, config) . So in the optimization phase, Table API uses FlinkDefaultRelMetadataProvider , but SQL uses DefaultRelMetadataProvider .
          Hide
          fhueske Fabian Hueske added a comment -

          Hi jingzhang, can you add a description and explain how this issue is different from FLINK-5394.
          At the optimization layer, there should not be a difference between Table API and SQL since both are represented the same way.

          Thanks, Fabian

          Show
          fhueske Fabian Hueske added a comment - Hi jingzhang , can you add a description and explain how this issue is different from FLINK-5394 . At the optimization layer, there should not be a difference between Table API and SQL since both are represented the same way. Thanks, Fabian

            People

            • Assignee:
              jinyu.zj jingzhang
              Reporter:
              jinyu.zj jingzhang
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development