Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-288

Add tests for windowed aggregation based on Postgres reference queries

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Postgres is a good reference implementation for standard SQL. We have some queries for windowed aggregation and their output against Postgres. The output is below. We need a mechanism to add tests in this format. Either the usual sql script + reference log, or an interpreter that can execute the output somehow.

      ```

      foodmart=# select *, first_value(deptno) over () from emp;
      ename | deptno | gender | first_value
      ------------------+------------
      Jane | 10 | F | 10
      Bob | 10 | M | 10
      Eric | 20 | M | 10
      Susan | 30 | F | 10
      Alice | 30 | F | 10
      Adam | 50 | M | 10
      Eve | 50 | F | 10
      Grace | 60 | F | 10
      (8 rows)

      foodmart=# select *, first_value(ename) over () from emp;
      ename | deptno | gender | first_value
      ------------------+------------
      Jane | 10 | F | Jane
      Bob | 10 | M | Jane
      Eric | 20 | M | Jane
      Susan | 30 | F | Jane
      Alice | 30 | F | Jane
      Adam | 50 | M | Jane
      Eve | 50 | F | Jane
      Grace | 60 | F | Jane
      (8 rows)

      foodmart=# select *, first_value(ename) over (partition by deptno) from emp;
      ename | deptno | gender | first_value
      ------------------+------------
      Jane | 10 | F | Jane
      Bob | 10 | M | Jane
      Eric | 20 | M | Eric
      Susan | 30 | F | Susan
      Alice | 30 | F | Susan
      Adam | 50 | M | Adam
      Eve | 50 | F | Adam
      Grace | 60 | F | Grace
      (8 rows)

      foodmart=# select *, first_value(ename) over (partition by deptno range current row) from emp;
      ename | deptno | gender | first_value
      ------------------+------------
      Jane | 10 | F | Jane
      Bob | 10 | M | Jane
      Eric | 20 | M | Eric
      Susan | 30 | F | Susan
      Alice | 30 | F | Susan
      Adam | 50 | M | Adam
      Eve | 50 | F | Adam
      Grace | 60 | F | Grace
      (8 rows)

      foodmart=# select *, first_value(ename) over (partition by deptno range unbounded preceding) from emp;
      ename | deptno | gender | first_value
      ------------------+------------
      Jane | 10 | F | Jane
      Bob | 10 | M | Jane
      Eric | 20 | M | Eric
      Susan | 30 | F | Susan
      Alice | 30 | F | Susan
      Adam | 50 | M | Adam
      Eve | 50 | F | Adam
      Grace | 60 | F | Grace
      (8 rows)

      foodmart=# select *, first_value(ename) over (partition by deptno order by gender range unbounded preceding) from emp;
      ename | deptno | gender | first_value
      ------------------+------------
      Jane | 10 | F | Jane
      Bob | 10 | M | Jane
      Eric | 20 | M | Eric
      Alice | 30 | F | Alice
      Susan | 30 | F | Alice
      Eve | 50 | F | Eve
      Adam | 50 | M | Eve
      Grace | 60 | F | Grace
      (8 rows)

      foodmart=# select *, count over (order by deptno) from emp;
      ename | deptno | gender | count
      ------------------+------
      Jane | 10 | F | 2
      Bob | 10 | M | 2
      Eric | 20 | M | 3
      Susan | 30 | F | 5
      Alice | 30 | F | 5
      Adam | 50 | M | 7
      Eve | 50 | F | 7
      Grace | 60 | F | 8
      (8 rows)

      foodmart=# select *, count over (order by deptno), first_value(ename) over (order by deptno rows 2 following) from emp;
      ERROR: frame starting from following row cannot end with current row
      LINE 1: ...o), first_value(ename) over (order by deptno rows 2 followin...
      ```

      ---------------- Imported from GitHub ----------------
      Url: https://github.com/julianhyde/optiq/issues/288
      Created by: julianhyde
      Labels:
      Created at: Thu May 22 09:19:22 CEST 2014
      State: closed

        Activity

        Hide
        github-import GitHub Import added a comment -

        [Date: Mon May 26 07:04:12 CEST 2014, Author: julianhyde]

        The script engine is done as (#290 | FLINK-290). This feature is to check in tests for windowed aggregation. Some are disabled.

        Show
        github-import GitHub Import added a comment - [Date: Mon May 26 07:04:12 CEST 2014, Author: julianhyde ] The script engine is done as ( #290 | FLINK-290 ). This feature is to check in tests for windowed aggregation. Some are disabled.
        Hide
        github-import GitHub Import added a comment -

        [Date: Thu May 29 22:08:16 CEST 2014, Author: vlsi]

        What is the way to run a specific test?

        What is the way to understand the result?

        When using Assert.XXX, IDEA highlights the difference and allows to open diff window.
        This is cool since it enables to figure out what column of the resultset contains invalid values and so on.

        With plain line by line diff it is hard to debug.

        Show
        github-import GitHub Import added a comment - [Date: Thu May 29 22:08:16 CEST 2014, Author: vlsi ] What is the way to run a specific test? What is the way to understand the result? When using Assert.XXX, IDEA highlights the difference and allows to open diff window. This is cool since it enables to figure out what column of the resultset contains invalid values and so on. With plain line by line diff it is hard to debug.
        Hide
        github-import GitHub Import added a comment -

        [Date: Fri May 30 00:54:52 CEST 2014, Author: julianhyde]

        > What is the way to run a specific test?

        Ultimately you will be able to run SqlRun from the command line. But that would need the test configuration to be packaged as a JDBC data source, which I haven't done yet.

        So, launch tests via a junit method. `JdbcTest.testRunWinAgg` is an example.

        > What is the way to understand the result?
        >
        > When using Assert.XXX, IDEA highlights the difference and allows to open diff window.
        This is cool since it enables to figure out what column of the resultset contains invalid values and so on.
        >
        > With plain line by line diff it is hard to debug.

        The test runs a script in one file and produces a result in another. The test is deemed to have succeeded if and only if the files are the same. If the actual output is correct, then you can copy the file. This is much simpler than changing embedded java strings using copy-paste.

        You can use any diff tool to compare the files. On Linux, I use <a href="https://wiki.gnome.org/Apps/Meld">meld</a>.

        I used a java-based diff to display the differences in the files. We shouldn't use `assertEquals(String, String)` in general because the files might be large, but if you can detect that the unit test is being run from Intellij you could generate a test failure in such a way that Intellij will do a string comparison.

        If you are debugging one statement repeatedly, you could create a small .oq file, use the !skip or !if commands to skip over other statements, or write a temporary unit test in java. But I strongly suggest that when you check in the query, you put it in a .oq file. This will make tests easier to write and maintain by people who do not spend their whole time in an IDE.

        Show
        github-import GitHub Import added a comment - [Date: Fri May 30 00:54:52 CEST 2014, Author: julianhyde ] > What is the way to run a specific test? Ultimately you will be able to run SqlRun from the command line. But that would need the test configuration to be packaged as a JDBC data source, which I haven't done yet. So, launch tests via a junit method. `JdbcTest.testRunWinAgg` is an example. > What is the way to understand the result? > > When using Assert.XXX, IDEA highlights the difference and allows to open diff window. This is cool since it enables to figure out what column of the resultset contains invalid values and so on. > > With plain line by line diff it is hard to debug. The test runs a script in one file and produces a result in another. The test is deemed to have succeeded if and only if the files are the same. If the actual output is correct, then you can copy the file. This is much simpler than changing embedded java strings using copy-paste. You can use any diff tool to compare the files. On Linux, I use <a href="https://wiki.gnome.org/Apps/Meld">meld</a>. I used a java-based diff to display the differences in the files. We shouldn't use `assertEquals(String, String)` in general because the files might be large, but if you can detect that the unit test is being run from Intellij you could generate a test failure in such a way that Intellij will do a string comparison. If you are debugging one statement repeatedly, you could create a small .oq file, use the !skip or !if commands to skip over other statements, or write a temporary unit test in java. But I strongly suggest that when you check in the query, you put it in a .oq file. This will make tests easier to write and maintain by people who do not spend their whole time in an IDE.
        Hide
        github-import GitHub Import added a comment -

        [Date: Fri May 30 07:11:55 CEST 2014, Author: vlsi]

        > Ultimately you will be able to run SqlRun from the command line. But that would need the test configuration to be packaged as a JDBC data source, which I haven't done yet.

        What is the case of running SqlRun from the command line?
        Isn't `mvn test` good enough?

        >So, launch tests via a junit method. JdbcTest.testRunWinAgg is an example.

        That is fine, however `winagg.oq` contains multiple queries, thus it is very hard to debug (step over, step into, breakpoints, such kind of things) if the very last fails.
        As you might notice, I split some of the tests into multiple junit methods for exactly this reason: for the ability to debug the specific failing method from the IDE.

        I am strongly against the case when multiple queries are run under the same test.

        Show
        github-import GitHub Import added a comment - [Date: Fri May 30 07:11:55 CEST 2014, Author: vlsi ] > Ultimately you will be able to run SqlRun from the command line. But that would need the test configuration to be packaged as a JDBC data source, which I haven't done yet. What is the case of running SqlRun from the command line? Isn't `mvn test` good enough? >So, launch tests via a junit method. JdbcTest.testRunWinAgg is an example. That is fine, however `winagg.oq` contains multiple queries, thus it is very hard to debug (step over, step into, breakpoints, such kind of things) if the very last fails. As you might notice, I split some of the tests into multiple junit methods for exactly this reason: for the ability to debug the specific failing method from the IDE. I am strongly against the case when multiple queries are run under the same test.
        Hide
        github-import GitHub Import added a comment -

        [Date: Fri May 30 10:22:05 CEST 2014, Author: julianhyde]

        I think I should explain the use case a bit better. All of the databases I've worked on have ended up with suites that contain thousands of queries. Most of those queries worked first time, but they are kept in the suite to provide coverage of combinations of features. They need to be easy to write, and easy to maintain, preferably by people who don't have strong java skills.

        By "maintain", think of the changes required if you add a column to one of your test tables.

        Optiq is heading down the same path. For windowed aggregation alone, we will need over 100 queries to cover all the combinations. You can crank out those 100 queries quite fast against Postgres. It doesn't make sense to paste each of those queries and its result into java code (I tried that in the first revision of `JdbcTest.testVariousOuter`, and it was laborious and error-prone.)

        To be clear. Developers should carry on writing junit tests. Test-driven development is good. If you have a bug to fix, the first thing you should do is write a java unit test. But we also need system tests, and for a lot of database functionality, SQL scripts are an efficient way of writing those.

        Show
        github-import GitHub Import added a comment - [Date: Fri May 30 10:22:05 CEST 2014, Author: julianhyde ] I think I should explain the use case a bit better. All of the databases I've worked on have ended up with suites that contain thousands of queries. Most of those queries worked first time, but they are kept in the suite to provide coverage of combinations of features. They need to be easy to write, and easy to maintain, preferably by people who don't have strong java skills. By "maintain", think of the changes required if you add a column to one of your test tables. Optiq is heading down the same path. For windowed aggregation alone, we will need over 100 queries to cover all the combinations. You can crank out those 100 queries quite fast against Postgres. It doesn't make sense to paste each of those queries and its result into java code (I tried that in the first revision of `JdbcTest.testVariousOuter`, and it was laborious and error-prone.) To be clear. Developers should carry on writing junit tests. Test-driven development is good. If you have a bug to fix, the first thing you should do is write a java unit test. But we also need system tests, and for a lot of database functionality, SQL scripts are an efficient way of writing those.
        Hide
        github-import GitHub Import added a comment -

        [Date: Fri May 30 10:26:38 CEST 2014, Author: vlsi]

        Ok, that is a valid case.

        What do you think if creating separate file for each query?
        I do not mind if there is "run all the queries" runner as long as it clearly reports which query fails and allows to easily run a particular query.

        Show
        github-import GitHub Import added a comment - [Date: Fri May 30 10:26:38 CEST 2014, Author: vlsi ] Ok, that is a valid case. What do you think if creating separate file for each query? I do not mind if there is "run all the queries" runner as long as it clearly reports which query fails and allows to easily run a particular query.
        Hide
        github-import GitHub Import added a comment -

        [Date: Fri May 30 10:30:35 CEST 2014, Author: julianhyde]

        If you diff the files, you can easily find out which query produced the wrong result.

        From the command line, use `diff -c` to show enough lines of context so you can see which query failed.

        You can use '!if (false)

        { ... }

        ' to comment out long sections of the file. They will trivially succeed.

        Show
        github-import GitHub Import added a comment - [Date: Fri May 30 10:30:35 CEST 2014, Author: julianhyde ] If you diff the files, you can easily find out which query produced the wrong result. From the command line, use `diff -c` to show enough lines of context so you can see which query failed. You can use '!if (false) { ... } ' to comment out long sections of the file. They will trivially succeed.
        Hide
        github-import GitHub Import added a comment -

        [Date: Fri May 30 10:43:28 CEST 2014, Author: vlsi]

        Can you please justify putting all the queries to a single file?

        Show
        github-import GitHub Import added a comment - [Date: Fri May 30 10:43:28 CEST 2014, Author: vlsi ] Can you please justify putting all the queries to a single file?
        Hide
        github-import GitHub Import added a comment -

        [Date: Fri May 30 10:53:59 CEST 2014, Author: julianhyde]

        If you've got thousands of queries, you don't want thousands of files. It is useful to group related queries into the same file, especially if they share the same setup. But developers can do whatever makes most sense.

        Show
        github-import GitHub Import added a comment - [Date: Fri May 30 10:53:59 CEST 2014, Author: julianhyde ] If you've got thousands of queries, you don't want thousands of files. It is useful to group related queries into the same file, especially if they share the same setup. But developers can do whatever makes most sense.

          People

          • Assignee:
            Unassigned
            Reporter:
            github-import GitHub Import
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development