Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.0
    • Component/s: Table API & SQL
    • Labels:

      Description

      The flink-table component consists of

      several APIs

      • Scala-embedded Table API
      • String-based Table API (for Java)
      • SQL

      and compiles to two execution backends:

      • DataStream API
      • DataSet API

      There are many different translation paths involved until a query is executed:

      1. Table API String -> Table API logical plan
      2. Table API Scala-expressions -> Table API logical plan
      3. Table API logical plan -> Calcite RelNode plans
      4. SQL -> Calcite RelNode plans (done by exclusively via Calcite)
      5. Calcite RelNodes -> DataSet RelNodes
      6. DataSet RelNodes -> DataSet program
      7. Calcite RelNodes -> DataStream RelNodes
      8. DataStream RelNodes -> DataStream program
      9. Calcite RexNode expressions -> generated code

      which need to be thoroughly tested.

      Initially, many tests were done as end-to-end integration tests with high overhead.
      However, due to the combinations of APIs and execution back-ends, this approach causes many redundant tests and long build times.

      Therefore, I propose the following testing scheme:

      1. Table API String -> Table API expression:
      The String-based Table API is tested by comparing the resulting logical plan (Table.logicalPlan) to the logical plan of an equivalent Table program that uses the Scala-embedded syntax. The logical plan is the Table API internal representation which is later converted into a Calcite RelNode plan.
      All existing integration tests that check the "Java" Table API should be ported to unit tests. There will also be duplicated tests because, the Java Table API is tested for batch and streaming which is not necessary anymore.

      2. Table API Scala-expressions -> Table API logical plan -> Calcite RelNodes -> DataSet RelNodes / DataStream RelNodes
      These tests cover the translation and optimization of Table API queries and verify the Calcite optimized plan. We need distinct tests for DataSet and DataStream environments since features and translation rules vary. These test will also identify if added or modified rules or cost functions result in different plans. These should be the main tests for the Table API and very extensive.
      These tests should be implemented by extending the TableTestBase which is a base class for unit tests and hence very lightweight.

      3. SQL -> Calcite RelNodes -> DataSet RelNodes / DataStream RelNodes
      These are the same tests as described for 2. (Table API Scala-expressions -> DataSet / DataStream RelNodes) but just for SQL.

      4. DataSet RelNode -> DataSet program
      Unfortunately, the DataSet API lacks a good mechanism to test generated programs, i.e., get a plan traversable of all operators with access to all user-defined functions. Until such a testing utility is available, I propose to test the translation to DataSet programs as end-to-end integration tests. However, I think we can run most tests on a Collection ExecutionEnvironment, which does not start a Flink cluster but runs all code on Java collections. This makes these tests much more lightweight than cluster-based ITCases. The goal of these tests should be to cover all translation paths from DataSetRel to DataSet program, i.e., all DataSetRel nodes and their translation logic. These tests should be implemented by extending the TableProgramsCollectionTestBase (see FLINK-5268).
      Moreover, we should have very few cluster-based ITCases in place that check the execution path with the actual operators, serializers, and comparators. However, we should limit these tests to the minimum to keep build time low. These tests should be implemented by extending the TableProgramsClusterTestBase (FLINK-5268) and all be located in the same class to avoid repeated instantiation of the Flink MiniCluster.

      5. DataStream RelNode -> DataStream program
      Here basically the same applies as for the DataSet programs. I'm not aware of a good way to test generated DataStream programs without executing them. A testing utility would be great for all libraries that are built on top of the API. Until then, I propose to use end-to-end integration tests. Unfortunately, the DataStream API does not feature a collection execution mode, so all tests need to be run on a MiniCluster. Therefore, we should again keep these tests to the minimum. These tests should be implemented by extending the StreamingMultipleProgramsTestBase and be located in few classes to avoid repeated instantiations of the FLink MiniCluster.

      6. (Scala expressions | String-parsed expressions | SQL expressions) -> RexNode expressions -> Generated Code
      In order to avoid extensive optimization tests for each supported expression or built-in function, we have the ExpressionTestBase which compiles expressions into generated code and tests for the correctness of results. All supported expressions and built-in function should be tested by extending the ExpressionTestBase instead of running a full integration test.

      I will add a few JIRAs to migrate existing tests to the new testing scheme.

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Hide
          twalthr Timo Walther added a comment -

          Many tests can be converted in tests extending ExpressionTestBase.

          Show
          twalthr Timo Walther added a comment - Many tests can be converted in tests extending ExpressionTestBase .
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user twalthr opened a pull request:

          https://github.com/apache/flink/pull/2563

          FLINK-3656 [table] Reduce number of ITCases

          This is the first step of reducing the number of ITCases of the Table API. This PR modifies the TableProgramsTestBase to test only the default configuration in collection execution environment. Other configurations will be enabled individually per test.

          It reduces the number of tests. From 955 to 403 tests.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/twalthr/flink FLINK-3656

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/2563.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #2563


          commit e13354a03fd6a1207a53894f2f58739326d804e6
          Author: twalthr <twalthr@apache.org>
          Date: 2016-09-28T15:23:18Z

          FLINK-3656 [table] Reduce number of ITCases


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user twalthr opened a pull request: https://github.com/apache/flink/pull/2563 FLINK-3656 [table] Reduce number of ITCases This is the first step of reducing the number of ITCases of the Table API. This PR modifies the TableProgramsTestBase to test only the default configuration in collection execution environment. Other configurations will be enabled individually per test. It reduces the number of tests. From 955 to 403 tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/twalthr/flink FLINK-3656 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2563.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2563 commit e13354a03fd6a1207a53894f2f58739326d804e6 Author: twalthr <twalthr@apache.org> Date: 2016-09-28T15:23:18Z FLINK-3656 [table] Reduce number of ITCases
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/2563

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2563
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user twalthr opened a pull request:

          https://github.com/apache/flink/pull/2566

          FLINK-3656 [table] Consolidate ITCases

          This PR merges ITCases that are belong together logically. It also ensures that all tests extend from TableProgramsTest base.

          It reduces the build time by 20 seconds.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/twalthr/flink FLINK-3656_STEP_2

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/2566.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #2566


          commit afe77695e852171bdff7f238e7bb5951eda5a8ac
          Author: twalthr <twalthr@apache.org>
          Date: 2016-09-29T08:20:35Z

          Merge FilterIT/SelectIT to CalcITCases

          commit 0f7f080dd058cf0fe2298d34510c6cf2a6bca09b
          Author: twalthr <twalthr@apache.org>
          Date: 2016-09-29T09:04:34Z

          Merge FromDataSet/ToTable to TableEnvironmentITCases

          commit 97b553a76f444db17dcbf734cbc043d46b1d9f9c
          Author: twalthr <twalthr@apache.org>
          Date: 2016-09-29T09:22:16Z

          Merge aggregating ITCases

          commit ed6a67058aa32d621266952f61383d61446464ed
          Author: twalthr <twalthr@apache.org>
          Date: 2016-09-29T09:29:49Z

          All batch ITCases use TableProgramsTestBase


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user twalthr opened a pull request: https://github.com/apache/flink/pull/2566 FLINK-3656 [table] Consolidate ITCases This PR merges ITCases that are belong together logically. It also ensures that all tests extend from TableProgramsTest base. It reduces the build time by 20 seconds. You can merge this pull request into a Git repository by running: $ git pull https://github.com/twalthr/flink FLINK-3656 _STEP_2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2566.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2566 commit afe77695e852171bdff7f238e7bb5951eda5a8ac Author: twalthr <twalthr@apache.org> Date: 2016-09-29T08:20:35Z Merge FilterIT/SelectIT to CalcITCases commit 0f7f080dd058cf0fe2298d34510c6cf2a6bca09b Author: twalthr <twalthr@apache.org> Date: 2016-09-29T09:04:34Z Merge FromDataSet/ToTable to TableEnvironmentITCases commit 97b553a76f444db17dcbf734cbc043d46b1d9f9c Author: twalthr <twalthr@apache.org> Date: 2016-09-29T09:22:16Z Merge aggregating ITCases commit ed6a67058aa32d621266952f61383d61446464ed Author: twalthr <twalthr@apache.org> Date: 2016-09-29T09:29:49Z All batch ITCases use TableProgramsTestBase
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/2566

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2566
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user twalthr opened a pull request:

          https://github.com/apache/flink/pull/2567

          FLINK-3656 [table] Convert expression tests to unit tests

          This PR replaces 6 ITCases with unit tests in `ScalarOperatorsTest`.

          It reduces the build time by about 10 seconds.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/twalthr/flink FLINK-3656_STEP_3

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/2567.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #2567


          commit 851ae1189252bf359bdf4e0177c3f8968543a87a
          Author: twalthr <twalthr@apache.org>
          Date: 2016-09-29T12:54:36Z

          FLINK-3656 [table] Convert expression tests to unit tests


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user twalthr opened a pull request: https://github.com/apache/flink/pull/2567 FLINK-3656 [table] Convert expression tests to unit tests This PR replaces 6 ITCases with unit tests in `ScalarOperatorsTest`. It reduces the build time by about 10 seconds. You can merge this pull request into a Git repository by running: $ git pull https://github.com/twalthr/flink FLINK-3656 _STEP_3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2567.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2567 commit 851ae1189252bf359bdf4e0177c3f8968543a87a Author: twalthr <twalthr@apache.org> Date: 2016-09-29T12:54:36Z FLINK-3656 [table] Convert expression tests to unit tests
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/2567

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2567
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user twalthr opened a pull request:

          https://github.com/apache/flink/pull/2595

          FLINK-3656 [table] Test base for logical unit testing

          This PR introduces a test base for logical unit testing. It could replace most of current ITCases as it checks if the API is correctly translated into `DataSet`/`DataStream` RelNodes. The translation to DataSet/DataStream operators should be tested separately by unit tests for those (e.g. `DataSetCalcTest`, `DataSetJoinTest` etc.).

          Here is an example how a JointITCase could be converted:
          ```scala
          @Test
          def testJoin(): Unit =

          { val util = batchTestUtil() val in1 = util.addTable[(Int, Long, String)]("Left", 'a, 'b, 'c) val in2 = util.addTable[(Int, Long, Int, String, Long)]("Right", 'd, 'e, 'f, 'g, 'h) val table = in1.join(in2).where("b === e").select("c, g") val expected = unaryNode( "DataSetCalc", binaryNode( "DataSetJoin", batchTableNode(0), batchTableNode(1), term("where", "=(b, e)"), term("join", "a", "b", "c", "d", "e", "f", "g", "h") ), term("joinType", "Join") ) util.verifyTable(table, expected) }

          ```

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/twalthr/flink FLINK-3656_UNIT_TEST_UTILS

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/2595.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #2595


          commit c384628cd1f48f993973bb2e831720a98d55ca16
          Author: twalthr <twalthr@apache.org>
          Date: 2016-10-04T16:32:52Z

          FLINK-3656 [table] Test base for logical unit testing


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user twalthr opened a pull request: https://github.com/apache/flink/pull/2595 FLINK-3656 [table] Test base for logical unit testing This PR introduces a test base for logical unit testing. It could replace most of current ITCases as it checks if the API is correctly translated into `DataSet`/`DataStream` RelNodes. The translation to DataSet/DataStream operators should be tested separately by unit tests for those (e.g. `DataSetCalcTest`, `DataSetJoinTest` etc.). Here is an example how a JointITCase could be converted: ```scala @Test def testJoin(): Unit = { val util = batchTestUtil() val in1 = util.addTable[(Int, Long, String)]("Left", 'a, 'b, 'c) val in2 = util.addTable[(Int, Long, Int, String, Long)]("Right", 'd, 'e, 'f, 'g, 'h) val table = in1.join(in2).where("b === e").select("c, g") val expected = unaryNode( "DataSetCalc", binaryNode( "DataSetJoin", batchTableNode(0), batchTableNode(1), term("where", "=(b, e)"), term("join", "a", "b", "c", "d", "e", "f", "g", "h") ), term("joinType", "Join") ) util.verifyTable(table, expected) } ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/twalthr/flink FLINK-3656 _UNIT_TEST_UTILS Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2595.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2595 commit c384628cd1f48f993973bb2e831720a98d55ca16 Author: twalthr <twalthr@apache.org> Date: 2016-10-04T16:32:52Z FLINK-3656 [table] Test base for logical unit testing
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/2595

          Looks good @twalthr.
          +1 to merge

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2595 Looks good @twalthr. +1 to merge
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/2595

          Merging

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2595 Merging
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/2595

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2595
          Hide
          twalthr Timo Walther added a comment - - edited

          Fabian Hueske I think we can close this issue. What do you think? Most of the problems that we had in the past have been improved in FLINK-6617.

          Show
          twalthr Timo Walther added a comment - - edited Fabian Hueske I think we can close this issue. What do you think? Most of the problems that we had in the past have been improved in FLINK-6617 .
          Hide
          fhueske Fabian Hueske added a comment -

          There's still a bit of duplication, but I agree. We can close this parent issue for now.

          Show
          fhueske Fabian Hueske added a comment - There's still a bit of duplication, but I agree. We can close this parent issue for now.
          Hide
          twalthr Timo Walther added a comment -

          This issue has mostly been fixed as part of FLINK-6617. Improving the tests is a continuous process. Therefore, we will close this issue for now.

          Show
          twalthr Timo Walther added a comment - This issue has mostly been fixed as part of FLINK-6617 . Improving the tests is a continuous process. Therefore, we will close this issue for now.

            People

            • Assignee:
              Unassigned
              Reporter:
              vkalavri Vasia Kalavri
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development