Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-259

Using sub-queries in CASE statement against JDBC tables generates invalid Oracle SQL

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0-incubating
    • Component/s: None
    • Labels:

      Description

      1.
      select e.NAME,
      (CASE e.dept_ID WHEN (Select d.id from PV_ADMIN.dept d where d.id = e.dept_id)
      THEN (Select d.name from PV_ADMIN.dept d where d.id = e.dept_id)
      ELSE 'DepartmentNotFound' END ) AS DEPTNAME
      from PV_ADMIN.EMP e;

      2.
      select e.NAME,
      CASE WHEN e.dept_ID = (select d.ID from PV_ADMIN.dept d where d.NAME = 'SALES') then 'SALES'
      ELSE 'Not Matched.'
      END as department
      from PV_ADMIN.EMP e ;

      ---------------- Imported from GitHub ----------------
      Url: https://github.com/julianhyde/optiq/issues/259
      Created by: kunal-mahale
      Labels: duplicate,
      Created at: Fri Apr 25 06:41:04 CEST 2014
      State: open

      1. [CALCITE-259]-Description.patch
        7 kB
        Yuri Au Yong
      2. [CALCITE-259]-Description.patch
        3 kB
        Yuri Au Yong
      3. CALCITE-259-MYSQL-SINGLE_VALUE-0.1.patch
        7 kB
        YeongWei

        Activity

        Hide
        github-import GitHub Import added a comment -

        [Date: Sat Apr 26 23:25:10 CEST 2014, Author: julianhyde]

        As part of the fix for https://github.com/julianhyde/optiq/issues/252 I added test cases testScalarSubQueryInCase and testScalarSubQueryInCase2. So I believe this is a fixed. If not, provide a repro test case.

        Show
        github-import GitHub Import added a comment - [Date: Sat Apr 26 23:25:10 CEST 2014, Author: julianhyde ] As part of the fix for https://github.com/julianhyde/optiq/issues/252 I added test cases testScalarSubQueryInCase and testScalarSubQueryInCase2. So I believe this is a fixed. If not, provide a repro test case.
        Hide
        github-import GitHub Import added a comment -

        [Date: Tue Apr 29 15:36:16 CEST 2014, Author: kunal-mahale]

        Thanks for the quick fix. :+1: But there are still some errors on oracle side
        1. select e.NAME,
        (CASE e.dept_ID WHEN (Select d.id from PV_ADMIN.dept d where d.id = e.dept_id)
        THEN (Select d.name from PV_ADMIN.dept d where d.id = e.dept_id)
        ELSE 'DepartmentNotFound' END ) AS DEPTNAME
        from PV_ADMIN.EMP e;
        Error : java.sql.SQLException: exception while executing query
        at net.hydromatic.avatica.Helper.createException(Helper.java:40)
        at net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:406)
        at net.hydromatic.avatica.AvaticaStatement.executeQueryInternal(AvaticaStatement.java:350)
        at net.hydromatic.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:78)
        at com.mycompany.optiqtest.App.main(App.java:148)
        Caused by: java.lang.RuntimeException: while executing SQL [SELECT "$f0", SINGLE_VALUE("ID") "$f1"
        FROM (SELECT "t0"."$f0", "DEPT"."ID"
        FROM (SELECT "$f0"
        FROM (SELECT "DEPT_ID" "$f0"
        FROM "PV_ADMIN"."EMP") "t"
        GROUP BY "$f0") "t0"
        INNER JOIN "PV_ADMIN"."DEPT" ON "t0"."$f0" = "DEPT"."ID") "t1"
        GROUP BY "$f0"]
        at net.hydromatic.optiq.runtime.ResultSetEnumerable.enumerator(ResultSetEnumerable.java:144)
        at net.hydromatic.linq4j.EnumerableDefaults.toLookup_(EnumerableDefaults.java:1898)
        at net.hydromatic.linq4j.EnumerableDefaults.toLookup(EnumerableDefaults.java:1891)
        at net.hydromatic.linq4j.EnumerableDefaults.toLookup(EnumerableDefaults.java:1867)
        at net.hydromatic.linq4j.DefaultEnumerable.toLookup(DefaultEnumerable.java:676)
        at net.hydromatic.linq4j.EnumerableDefaults$4.enumerator(EnumerableDefaults.java:848)
        at Baz$12$1.<init>(Unknown Source)
        at Baz$12.enumerator(Unknown Source)
        at net.hydromatic.linq4j.EnumerableDefaults.distinct(EnumerableDefaults.java:363)
        at net.hydromatic.linq4j.DefaultEnumerable.distinct(DefaultEnumerable.java:189)
        at Baz.bind(Unknown Source)
        at net.hydromatic.optiq.jdbc.OptiqPrepare$PrepareResult.getEnumerable(OptiqPrepare.java:252)
        at net.hydromatic.optiq.jdbc.OptiqPrepare$PrepareResult.enumerator(OptiqPrepare.java:262)
        at net.hydromatic.optiq.jdbc.OptiqPrepare$PrepareResult.createCursor(OptiqPrepare.java:230)
        at net.hydromatic.optiq.jdbc.MetaImpl.createCursor(MetaImpl.java:613)
        at net.hydromatic.avatica.AvaticaResultSet.execute(AvaticaResultSet.java:162)
        at net.hydromatic.optiq.jdbc.OptiqResultSet.execute(OptiqResultSet.java:52)
        at net.hydromatic.optiq.jdbc.OptiqResultSet.execute(OptiqResultSet.java:29)
        at net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:404)
        ... 3 more
        Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "SINGLE_VALUE": invalid identifier

        at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:440)
        at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
        at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:837)
        at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:445)
        at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:191)
        at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:523)
        at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:193)
        at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:852)
        at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1153)
        at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1267)
        at oracle.jdbc.driver.OracleStatement.executeQuery(OracleStatement.java:1477)
        at oracle.jdbc.driver.OracleStatementWrapper.executeQuery(OracleStatementWrapper.java:392)
        ... 21 more
        2. In second query the NullPointerExcepetion Still persist.

        Show
        github-import GitHub Import added a comment - [Date: Tue Apr 29 15:36:16 CEST 2014, Author: kunal-mahale ] Thanks for the quick fix. :+1: But there are still some errors on oracle side 1. select e.NAME, (CASE e.dept_ID WHEN (Select d.id from PV_ADMIN.dept d where d.id = e.dept_id) THEN (Select d.name from PV_ADMIN.dept d where d.id = e.dept_id) ELSE 'DepartmentNotFound' END ) AS DEPTNAME from PV_ADMIN.EMP e; Error : java.sql.SQLException: exception while executing query at net.hydromatic.avatica.Helper.createException(Helper.java:40) at net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:406) at net.hydromatic.avatica.AvaticaStatement.executeQueryInternal(AvaticaStatement.java:350) at net.hydromatic.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:78) at com.mycompany.optiqtest.App.main(App.java:148) Caused by: java.lang.RuntimeException: while executing SQL [SELECT "$f0", SINGLE_VALUE("ID") "$f1" FROM (SELECT "t0"."$f0", "DEPT"."ID" FROM (SELECT "$f0" FROM (SELECT "DEPT_ID" "$f0" FROM "PV_ADMIN"."EMP") "t" GROUP BY "$f0") "t0" INNER JOIN "PV_ADMIN"."DEPT" ON "t0"."$f0" = "DEPT"."ID") "t1" GROUP BY "$f0"] at net.hydromatic.optiq.runtime.ResultSetEnumerable.enumerator(ResultSetEnumerable.java:144) at net.hydromatic.linq4j.EnumerableDefaults.toLookup_(EnumerableDefaults.java:1898) at net.hydromatic.linq4j.EnumerableDefaults.toLookup(EnumerableDefaults.java:1891) at net.hydromatic.linq4j.EnumerableDefaults.toLookup(EnumerableDefaults.java:1867) at net.hydromatic.linq4j.DefaultEnumerable.toLookup(DefaultEnumerable.java:676) at net.hydromatic.linq4j.EnumerableDefaults$4.enumerator(EnumerableDefaults.java:848) at Baz$12$1.<init>(Unknown Source) at Baz$12.enumerator(Unknown Source) at net.hydromatic.linq4j.EnumerableDefaults.distinct(EnumerableDefaults.java:363) at net.hydromatic.linq4j.DefaultEnumerable.distinct(DefaultEnumerable.java:189) at Baz.bind(Unknown Source) at net.hydromatic.optiq.jdbc.OptiqPrepare$PrepareResult.getEnumerable(OptiqPrepare.java:252) at net.hydromatic.optiq.jdbc.OptiqPrepare$PrepareResult.enumerator(OptiqPrepare.java:262) at net.hydromatic.optiq.jdbc.OptiqPrepare$PrepareResult.createCursor(OptiqPrepare.java:230) at net.hydromatic.optiq.jdbc.MetaImpl.createCursor(MetaImpl.java:613) at net.hydromatic.avatica.AvaticaResultSet.execute(AvaticaResultSet.java:162) at net.hydromatic.optiq.jdbc.OptiqResultSet.execute(OptiqResultSet.java:52) at net.hydromatic.optiq.jdbc.OptiqResultSet.execute(OptiqResultSet.java:29) at net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:404) ... 3 more Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "SINGLE_VALUE": invalid identifier at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:440) at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396) at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:837) at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:445) at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:191) at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:523) at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:193) at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:852) at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1153) at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1267) at oracle.jdbc.driver.OracleStatement.executeQuery(OracleStatement.java:1477) at oracle.jdbc.driver.OracleStatementWrapper.executeQuery(OracleStatementWrapper.java:392) ... 21 more 2. In second query the NullPointerExcepetion Still persist.
        Hide
        github-import GitHub Import added a comment -

        [Date: Fri May 02 03:01:31 CEST 2014, Author: julianhyde]

        Ok, I see; there's some error in how the JDBC adapter decides what it can push down. SINGLE_VALUE is an Optiq-internal aggregate function that is used to implement scalar sub-queries. I should be able to reproduce on MySQL. Re-opening.

        Show
        github-import GitHub Import added a comment - [Date: Fri May 02 03:01:31 CEST 2014, Author: julianhyde ] Ok, I see; there's some error in how the JDBC adapter decides what it can push down. SINGLE_VALUE is an Optiq-internal aggregate function that is used to implement scalar sub-queries. I should be able to reproduce on MySQL. Re-opening.
        Hide
        YAY Yuri Au Yong added a comment -

        I having the same issue when executing query on hsqldb (e.g.: “select RNUM, C1, C2 from TJOIN2 where C1 = ( select C1 from TJOIN1 WHERE RNUM=0)”). Any pointers on where this SINGLE_VALUE is being resolved?

        Show
        YAY Yuri Au Yong added a comment - I having the same issue when executing query on hsqldb (e.g.: “select RNUM, C1, C2 from TJOIN2 where C1 = ( select C1 from TJOIN1 WHERE RNUM=0)”). Any pointers on where this SINGLE_VALUE is being resolved?
        Hide
        YAY Yuri Au Yong added a comment -

        Ignore Calcite internal SqlSingleValueAggFunction aggCalls in JdbcAggregate

        Show
        YAY Yuri Au Yong added a comment - Ignore Calcite internal SqlSingleValueAggFunction aggCalls in JdbcAggregate
        Hide
        xhoong Xavier FH Leong added a comment -

        Julian Hyde patch attach that checks for SingleValue function instance.

        Show
        xhoong Xavier FH Leong added a comment - Julian Hyde patch attach that checks for SingleValue function instance.
        Hide
        julianhyde Julian Hyde added a comment -

        You need to implement the other side of the functionality: give an error if the single-row query returns more than one row, and return null if the single-row query returns no rows. Stripping out the SINGLE_VALUE function doesn't achieve that. Please include test cases for this functionality in your patch.

        Show
        julianhyde Julian Hyde added a comment - You need to implement the other side of the functionality: give an error if the single-row query returns more than one row, and return null if the single-row query returns no rows. Stripping out the SINGLE_VALUE function doesn't achieve that. Please include test cases for this functionality in your patch.
        Hide
        xhoong Xavier FH Leong added a comment -

        Hi Julian Hyde, I'd discuss over with Yuri Au Yong, basically we had run some test for the SQL to be handle by JDBC adapter so that the query is pass on to the adaptee to run the SQL, some database like Oracle will fail and expect the sub-query to be with single value, some other database however will treat it as IN() statement and works for multiple value return from the sub-query.

        Hence, there's no consistency on the sub-query join handling, and will be an issue if the adaptee able to process the query using the adapter rule but fail when using Calcite EnumarableJoin rule, which handles SINGLE_VALUE. In this case, one would see failure on half the case depends which execution plan is chosen.

        Since we lack of idea (and standards) on how to enforce the SINGLE_VALUE check via SQL on the JDBCAdapter side, what do you think if we handle this case using Calcite default rule so that the results are consistent, even though the execution cost is higher compare on JDBCAdapter. And what will be the guiding design principle of Calcite for handling non-standards behavior, especially for adapter that takes SQL.

        Show
        xhoong Xavier FH Leong added a comment - Hi Julian Hyde , I'd discuss over with Yuri Au Yong , basically we had run some test for the SQL to be handle by JDBC adapter so that the query is pass on to the adaptee to run the SQL, some database like Oracle will fail and expect the sub-query to be with single value, some other database however will treat it as IN() statement and works for multiple value return from the sub-query. Hence, there's no consistency on the sub-query join handling, and will be an issue if the adaptee able to process the query using the adapter rule but fail when using Calcite EnumarableJoin rule, which handles SINGLE_VALUE. In this case, one would see failure on half the case depends which execution plan is chosen. Since we lack of idea (and standards) on how to enforce the SINGLE_VALUE check via SQL on the JDBCAdapter side, what do you think if we handle this case using Calcite default rule so that the results are consistent, even though the execution cost is higher compare on JDBCAdapter. And what will be the guiding design principle of Calcite for handling non-standards behavior, especially for adapter that takes SQL.
        Hide
        julianhyde Julian Hyde added a comment - - edited

        Which databases don't give an error if the sub-query returns more than one row?

        MySQL gives an error:

        mysql> select * from department;
        +---------------+---------------------------+
        | department_id | department_description    |
        +---------------+---------------------------+
        |             1 | HQ General Management     |
        |             2 | HQ Information Systems    |
        |             3 | HQ Marketing              |
        |             4 | HQ Human Resources        |
        |             5 | HQ Finance and Accounting |
        |            11 | Store Management          |
        |            14 | Store Information Systems |
        |            15 | Store Permanent Checkers  |
        |            16 | Store Temporary Checkers  |
        |            17 | Store Permanent Stockers  |
        |            18 | Store Temporary Stockers  |
        |            19 | Store Permanent Butchers  |
        +---------------+---------------------------+
        12 rows in set (0.00 sec)
        
        mysql> select * from employee where department_id = (select department_id from department where department_id > 17);
        ERROR 1242 (21000): Subquery returns more than 1 row

        and so does Oracle. And so should Calcite, regardless of which adapter it is running on.

        In MySQL, for SINGLE_VALUE( x ) you can generate

        CASE COUNT(*) WHEN 0 THEN NULL WHEN 1 THEN x ELSE (SELECT NULL UNION ALL SELECT NULL) END

        Maybe you can do something similar in hsqldb, Oracle etc.

        Show
        julianhyde Julian Hyde added a comment - - edited Which databases don't give an error if the sub-query returns more than one row? MySQL gives an error: mysql> select * from department; +---------------+---------------------------+ | department_id | department_description | +---------------+---------------------------+ | 1 | HQ General Management | | 2 | HQ Information Systems | | 3 | HQ Marketing | | 4 | HQ Human Resources | | 5 | HQ Finance and Accounting | | 11 | Store Management | | 14 | Store Information Systems | | 15 | Store Permanent Checkers | | 16 | Store Temporary Checkers | | 17 | Store Permanent Stockers | | 18 | Store Temporary Stockers | | 19 | Store Permanent Butchers | +---------------+---------------------------+ 12 rows in set (0.00 sec) mysql> select * from employee where department_id = (select department_id from department where department_id > 17); ERROR 1242 (21000): Subquery returns more than 1 row and so does Oracle. And so should Calcite, regardless of which adapter it is running on. In MySQL, for SINGLE_VALUE( x ) you can generate CASE COUNT(*) WHEN 0 THEN NULL WHEN 1 THEN x ELSE (SELECT NULL UNION ALL SELECT NULL) END Maybe you can do something similar in hsqldb, Oracle etc.
        Hide
        YAY Yuri Au Yong added a comment -

        Sorry, my bad, i was mistaken that a db don't give an error if the sub-query returns more than one row.
        It seems "UNION" is not supported in hsqldb. Anyhow, I have applied a fix(see latest attached patch) to replace the single value function with an sql query with a scalar subquery instead. I tried with hsqldb and it gives a cardinality violation(which is what we want) when the scalar subquery returns more than one row.
        Example:

        • before: SELECT SINGLE_VALUE(a) FROM table;
        • after: SELECT a FROM table WHERE a = (SELECT a FROM table) GROUP BY a
        Show
        YAY Yuri Au Yong added a comment - Sorry, my bad, i was mistaken that a db don't give an error if the sub-query returns more than one row. It seems "UNION" is not supported in hsqldb. Anyhow, I have applied a fix(see latest attached patch) to replace the single value function with an sql query with a scalar subquery instead. I tried with hsqldb and it gives a cardinality violation(which is what we want) when the scalar subquery returns more than one row. Example: before: SELECT SINGLE_VALUE(a) FROM table; after: SELECT a FROM table WHERE a = (SELECT a FROM table) GROUP BY a
        Hide
        vladimirsitnikov Vladimir Sitnikov added a comment - - edited

        SELECT a FROM table WHERE a = (SELECT a FROM table) GROUP BY a

        Doesn't that look like an overkill?
        Can you please generate (select a from table) instead?

        Can you please add plan assertions to the test case, so it is obvious from the test what is the expected plan for the given query?

        Show
        vladimirsitnikov Vladimir Sitnikov added a comment - - edited SELECT a FROM table WHERE a = (SELECT a FROM table) GROUP BY a Doesn't that look like an overkill? Can you please generate (select a from table) instead? Can you please add plan assertions to the test case, so it is obvious from the test what is the expected plan for the given query?
        Hide
        julianhyde Julian Hyde added a comment -

        The following expression works on hsqldb:

        CASE COUNT(*) WHEN 0 THEN NULL
            WHEN 1 THEN MIN(x)
            ELSE (SELECT * FROM (VALUES 1) UNION ALL SELECT * FROM (VALUES 1)) END

        (I tried just VALUES 1, 2 but hit https://sourceforge.net/p/hsqldb/bugs/1393/.)

        Show
        julianhyde Julian Hyde added a comment - The following expression works on hsqldb: CASE COUNT(*) WHEN 0 THEN NULL WHEN 1 THEN MIN(x) ELSE (SELECT * FROM (VALUES 1) UNION ALL SELECT * FROM (VALUES 1)) END (I tried just VALUES 1, 2 but hit https://sourceforge.net/p/hsqldb/bugs/1393/ .)
        Hide
        YAY Yuri Au Yong added a comment - - edited

        Vladimir, (select a from table) doesn't check for single value.

        Julian, "SELECT * FROM (VALUES 1) UNION ALL SELECT * FROM (VALUES 1)" doesn't work on mysql.

        I will do this instead:
        SELECT DEPTNO AS $f0,
        CASE (SELECT COUNT(DEPTNO) FROM EMP WHERE ENAME='ALLEN')
        WHEN 1 THEN DEPTNO
        WHEN 0 THEN NULL
        ELSE (SELECT COUNT(1) FROM EMP UNION ALL SELECT COUNT(1) FROM EMP) END
        FROM EMP as t WHERE ENAME='ALLEN'

        Show
        YAY Yuri Au Yong added a comment - - edited Vladimir, (select a from table) doesn't check for single value. Julian, "SELECT * FROM (VALUES 1) UNION ALL SELECT * FROM (VALUES 1)" doesn't work on mysql. I will do this instead: SELECT DEPTNO AS $f0, CASE (SELECT COUNT(DEPTNO) FROM EMP WHERE ENAME='ALLEN') WHEN 1 THEN DEPTNO WHEN 0 THEN NULL ELSE (SELECT COUNT(1) FROM EMP UNION ALL SELECT COUNT(1) FROM EMP) END FROM EMP as t WHERE ENAME='ALLEN'
        Hide
        vladimirsitnikov Vladimir Sitnikov added a comment -

        Vladimir, (select a from table) doesn't check for single value.

        Come again?

        The original SQL is just a scalar subquery

        SELECT * FROM DEPT
         WHERE DEPTNO = (SELECT DEPTNO FROM EMP WHERE ENAME='ALLEN')
        

        It does check for a single value, doesn't it?

        I will do this instead:

        It looks like you are trying to confuse the downstream database.
        I think lots of JDBC databases just support scalar subqueries, thus we could just use that feature.

        Julian Hyde, Do you think SINGLE_VALUE abstraction is good? What if we drop SINGLE_VALUE in Calcite? That would simplify translation for those who support scalar subqueries.

        Show
        vladimirsitnikov Vladimir Sitnikov added a comment - Vladimir, (select a from table) doesn't check for single value. Come again? The original SQL is just a scalar subquery SELECT * FROM DEPT WHERE DEPTNO = ( SELECT DEPTNO FROM EMP WHERE ENAME='ALLEN') It does check for a single value, doesn't it? I will do this instead: It looks like you are trying to confuse the downstream database. I think lots of JDBC databases just support scalar subqueries, thus we could just use that feature. Julian Hyde , Do you think SINGLE_VALUE abstraction is good? What if we drop SINGLE_VALUE in Calcite? That would simplify translation for those who support scalar subqueries.
        Hide
        julianhyde Julian Hyde added a comment -

        Yes, I think SINGLE_VALUE is a good abstraction. Scalar queries are not core relational algebra. They do not occur often. They can be modeled reasonably well as joins. The one thing missing – the fact that they are supposed to fail if they return more than one row – can be handled by one agg function. That is a lot less surface area to test than a new core relational operator.

        I concede that this makes it a bit more difficult to push the whole query, intact, into a back-end DBMS. But I think of that as a special case.

        Show
        julianhyde Julian Hyde added a comment - Yes, I think SINGLE_VALUE is a good abstraction. Scalar queries are not core relational algebra. They do not occur often. They can be modeled reasonably well as joins. The one thing missing – the fact that they are supposed to fail if they return more than one row – can be handled by one agg function. That is a lot less surface area to test than a new core relational operator. I concede that this makes it a bit more difficult to push the whole query, intact, into a back-end DBMS. But I think of that as a special case.
        Hide
        YAY Yuri Au Yong added a comment -

        Hi Julian Hyde,
        Initial Statement: SELECT * FROM DEPT WHERE DEPTNO=(SELECT DEPTNO FROM EMP WHERE ENAME='ALLEN')

        • Generated SQL by Calcite with single_value func:
          SELECT DEPT.DEPTNO, DEPT.DNAME, DEPT.LOC
          FROM (SELECT SINGLE_VALUE(DEPTNO) as $f0
          FROM EMP) as t1
          INNER JOIN DEPT ON t1.$f0 = DEPT.DEPTNO;

        In MySQL "SELECT NULL UNION ALL SELECT NULL" can be applied as the "CASE" expression's "ELSE" clause for return type TINYINT(DEPTNO) and VARCHAR(ENAME).

        In Hsqldb, SELECT statements require FROM clause and using COUNT requires GROUP BY clause, e.g.:

        • Generated SQL by Calcite after replacing single_value func(this works in mysql too):
          SELECT "DEPT"."DEPTNO", "DEPT"."DNAME", "DEPT"."LOC"
          FROM (SELECT CASE "t"."$f0"
          WHEN 1 THEN "t"."DEPTNO"
          WHEN 0 THEN NULL
          ELSE (SELECT 1 FROM "EMP" UNION ALL SELECT 1 FROM "EMP") END
          AS "$f0"
          FROM (SELECT "DEPTNO", COUNT("DEPTNO") AS "$f0"
          FROM "EMP" WHERE "ENAME"='ALLEN' GROUP BY "DEPTNO") AS "t") as "t1"
          INNER JOIN "DEPT" ON "t1"."$f0" = "DEPT"."DEPTNO";

        However, in Hsqldb, the "CASE" expression's "ELSE" clause is required to match all the "THEN" clause data type, e.g.:

        • for TINYINT(DEPTNO) return type: "SELECT 1 FROM EMP UNION ALL SELECT 1 FROM EMP"
        • for VARCHAR(ENAME) return type: "SELECT NULL FROM EMP UNION ALL SELECT NULL FROM EMP"

        It would be too tedious to try to cater for all different data types scenarios with CASE expression.

        Thus, it would be better to embed a scalar subquery statement as replacement for the SINGLE_VALUE function instead (as per the latest attached patch fix) as it covers the above mentioned scenarios.

        Show
        YAY Yuri Au Yong added a comment - Hi Julian Hyde , Initial Statement: SELECT * FROM DEPT WHERE DEPTNO=(SELECT DEPTNO FROM EMP WHERE ENAME='ALLEN') Generated SQL by Calcite with single_value func: SELECT DEPT.DEPTNO, DEPT.DNAME, DEPT.LOC FROM (SELECT SINGLE_VALUE(DEPTNO) as $f0 FROM EMP) as t1 INNER JOIN DEPT ON t1.$f0 = DEPT.DEPTNO; In MySQL "SELECT NULL UNION ALL SELECT NULL" can be applied as the "CASE" expression's "ELSE" clause for return type TINYINT(DEPTNO) and VARCHAR(ENAME). In Hsqldb, SELECT statements require FROM clause and using COUNT requires GROUP BY clause, e.g.: Generated SQL by Calcite after replacing single_value func(this works in mysql too): SELECT "DEPT"."DEPTNO", "DEPT"."DNAME", "DEPT"."LOC" FROM (SELECT CASE "t"."$f0" WHEN 1 THEN "t"."DEPTNO" WHEN 0 THEN NULL ELSE (SELECT 1 FROM "EMP" UNION ALL SELECT 1 FROM "EMP") END AS "$f0" FROM (SELECT "DEPTNO", COUNT("DEPTNO") AS "$f0" FROM "EMP" WHERE "ENAME"='ALLEN' GROUP BY "DEPTNO") AS "t") as "t1" INNER JOIN "DEPT" ON "t1"."$f0" = "DEPT"."DEPTNO"; However, in Hsqldb, the "CASE" expression's "ELSE" clause is required to match all the "THEN" clause data type, e.g.: for TINYINT(DEPTNO) return type: "SELECT 1 FROM EMP UNION ALL SELECT 1 FROM EMP" for VARCHAR(ENAME) return type: "SELECT NULL FROM EMP UNION ALL SELECT NULL FROM EMP" It would be too tedious to try to cater for all different data types scenarios with CASE expression. Thus, it would be better to embed a scalar subquery statement as replacement for the SINGLE_VALUE function instead (as per the latest attached patch fix) as it covers the above mentioned scenarios.
        Hide
        yeongwei YeongWei added a comment -

        Hi Julian Hyde, I can across the CALCITE-704. It appears that the SINGLE_VALUE will be handled by the EnumerableAggregate instead of JdbcAggregate that pushes down into the underlying Data Sources. Therefore the JdbcAggregate will not need to juggle with the SINGLE_VALUE aggregator anymore.

        If the above is true, are there anything that could be pending with regards to this JIRA?

        I have tested on my end too that the SINGLE_VALUE(s) are handled properly.

        Show
        yeongwei YeongWei added a comment - Hi Julian Hyde , I can across the CALCITE-704 . It appears that the SINGLE_VALUE will be handled by the EnumerableAggregate instead of JdbcAggregate that pushes down into the underlying Data Sources. Therefore the JdbcAggregate will not need to juggle with the SINGLE_VALUE aggregator anymore. If the above is true, are there anything that could be pending with regards to this JIRA? I have tested on my end too that the SINGLE_VALUE(s) are handled properly.
        Hide
        julianhyde Julian Hyde added a comment -

        I think you're referring to the JdbcAggregate.canImplement method I added when fixing CALCITE-704. Yeah, I fixed the correctness issue, but I introduced a performance issue. We still WANT to push down SINGLE_VALUE. Especially if the scalar sub-query is correlated, and we end up with a join. Now we evaluate SINGLE_VALUE in the client.

        Show
        julianhyde Julian Hyde added a comment - I think you're referring to the JdbcAggregate.canImplement method I added when fixing CALCITE-704 . Yeah, I fixed the correctness issue, but I introduced a performance issue. We still WANT to push down SINGLE_VALUE. Especially if the scalar sub-query is correlated, and we end up with a join. Now we evaluate SINGLE_VALUE in the client.
        Hide
        yeongwei YeongWei added a comment -

        Hi Julian Hyde,

        For the case with correlated scalar sub-query,

        1) Are we going to re-enable the SINGLE_VALUE in JdbcAggregate.canImplement to handle it? Probably this is not the case because once re-enabled the SINGLE_VALUE would appear for most scalar sub-queries?

        2) Do we explicitly check if a query contains correlated sub-queries then attempt to translate SINGLE_VALUE into something equivalent then push it down into datasource?

        On translating SINGLE_VALUE into something equivalent,

        Few points gathered from this thread,
        a) CASE Expression that has selection of multiple NULL rows to emulate the "More than 1 row scalar sub-query" error
        b) Different databases need different SQL syntax to construct the CASE Expression mentioned in above (#a).
        (E.g. MySQL does not support the syntax "values(1, 2 ..); HSQLDB needs the GroupBy clause when Aggregator is used along with CASE Expressions)

        How about putting concrete queries into the CASE Expression. Please refer below,

        Say,
        select SINGLE_VALUE(col1) from table1 where col2 = something

        Translated into,
        SELECT
        CASE (SELECT COUNT(col1) FROM table1 WHERE col2 = something)
        WHEN 1 THEN (SELECT col1 FROM table1 WHERE col2 = something)
        ELSE (SELECT col1 FROM table1 WHERE col2 = something)
        END
        FROM
        table1
        WHERE
        col2 = something

        But if the original query has multiple scalar sub-queries, this would make the translated query very bloated. However, the Select query syntax would work for most databases.

        On the other hand, the CASE Expression may be replaced by the by the Select statement excluding the COUNT aggregator

        Let me know your comments.

        Thanks!

        Show
        yeongwei YeongWei added a comment - Hi Julian Hyde , For the case with correlated scalar sub-query, 1) Are we going to re-enable the SINGLE_VALUE in JdbcAggregate.canImplement to handle it? Probably this is not the case because once re-enabled the SINGLE_VALUE would appear for most scalar sub-queries? 2) Do we explicitly check if a query contains correlated sub-queries then attempt to translate SINGLE_VALUE into something equivalent then push it down into datasource? On translating SINGLE_VALUE into something equivalent, Few points gathered from this thread, a) CASE Expression that has selection of multiple NULL rows to emulate the "More than 1 row scalar sub-query" error b) Different databases need different SQL syntax to construct the CASE Expression mentioned in above (#a). (E.g. MySQL does not support the syntax "values(1, 2 ..); HSQLDB needs the GroupBy clause when Aggregator is used along with CASE Expressions) How about putting concrete queries into the CASE Expression. Please refer below, Say, select SINGLE_VALUE(col1) from table1 where col2 = something Translated into, SELECT CASE (SELECT COUNT(col1) FROM table1 WHERE col2 = something) WHEN 1 THEN (SELECT col1 FROM table1 WHERE col2 = something) ELSE (SELECT col1 FROM table1 WHERE col2 = something) END FROM table1 WHERE col2 = something But if the original query has multiple scalar sub-queries, this would make the translated query very bloated. However, the Select query syntax would work for most databases. On the other hand, the CASE Expression may be replaced by the by the Select statement excluding the COUNT aggregator Let me know your comments. Thanks!
        Hide
        vladimirsitnikov Vladimir Sitnikov added a comment -

        Say,
        select SINGLE_VALUE(col1) from table1 where col2 = something
        Translated into,
        SELECT
        CASE (SELECT COUNT(col1) FROM table1 WHERE col2 = something)...
        WHEN 1 THEN (SELECT col1 FROM table1 WHERE col2 = something)

        It should be translated as follows:

        (select col1 from table1 where col2 = something)

        Case when then is absolutely not needed here.

        Show
        vladimirsitnikov Vladimir Sitnikov added a comment - Say, select SINGLE_VALUE(col1) from table1 where col2 = something Translated into, SELECT CASE (SELECT COUNT(col1) FROM table1 WHERE col2 = something)... WHEN 1 THEN (SELECT col1 FROM table1 WHERE col2 = something) It should be translated as follows: ( select col1 from table1 where col2 = something) Case when then is absolutely not needed here.
        Hide
        julianhyde Julian Hyde added a comment -

        Here are the facts as I see them (or rather, design constraints):

        • Relational algebra cannot represent scalar sub-queries. We have to convert them to joins.
        • If Calcite identifies that the most efficient way to execute the query is to push it down to a JDBC data source, it is not practical to convert these joins back to scalar sub-queries. So, the SQL we generate to send to the JDBC data source will not contain scalar sub-queries (except "little" scalar sub-queries that are purposely introduced just to throw an error).
        • The difference between a join and a scalar-sub-query is that the scalar sub-query must FAIL if the query returns more than one 1 row. (If it returns 0 rows, it will generate null, which is kind of similar to left-join.)

        So, we must generate SQL to send to the JDBC data source that will generate an error if the query returns more than one row. That SQL does not need to be same for each database. But the most obvious way of writing that SQL involves CASE and COUNT( * ).

        I think I've solved it for hsqldb (see above). I'll let others solve it for MySQL, Oracle, ... any other database you care about.

        Show
        julianhyde Julian Hyde added a comment - Here are the facts as I see them (or rather, design constraints): Relational algebra cannot represent scalar sub-queries. We have to convert them to joins. If Calcite identifies that the most efficient way to execute the query is to push it down to a JDBC data source, it is not practical to convert these joins back to scalar sub-queries. So, the SQL we generate to send to the JDBC data source will not contain scalar sub-queries (except "little" scalar sub-queries that are purposely introduced just to throw an error). The difference between a join and a scalar-sub-query is that the scalar sub-query must FAIL if the query returns more than one 1 row. (If it returns 0 rows, it will generate null, which is kind of similar to left-join.) So, we must generate SQL to send to the JDBC data source that will generate an error if the query returns more than one row. That SQL does not need to be same for each database . But the most obvious way of writing that SQL involves CASE and COUNT( * ). I think I've solved it for hsqldb (see above). I'll let others solve it for MySQL, Oracle, ... any other database you care about.
        Hide
        vladimirsitnikov Vladimir Sitnikov added a comment -

        Julian Hyde, Why do you think it is not practical to send scalar subqueries?

        PS. That might be a good item for a hangout

        Show
        vladimirsitnikov Vladimir Sitnikov added a comment - Julian Hyde , Why do you think it is not practical to send scalar subqueries? PS. That might be a good item for a hangout
        Hide
        julianhyde Julian Hyde added a comment -

        Because our relational algebra can't represent scalar sub-queries.

        Show
        julianhyde Julian Hyde added a comment - Because our relational algebra can't represent scalar sub-queries.
        Hide
        yeongwei YeongWei added a comment -

        Hi Julian Hyde,

        I have attempted the SINGLE_VALUE for MYSQL into the CASE + SUB-QUERY equivalent (JdbcRules.java). The strategy is to make use of SqlDialect to determine which database product is at stake.

        A test(JdbcAdapterTest.java) has been added to test if "SINGLE_VALUE" is within the PLAN and make use of JUnit Annotation to catch exception upon "run" if the DB is MYSQL.

        The test has to be ran with the VM Argument of "-Dcalcite.test.db=MYSQL" and it makes use of the Mondarian FoodMart data sets.

        Please refer to the attached, CALCITE-259-MYSQL-SINGLE_VALUE-0.1.patch.

        Let me know your comments.

        Thanks!

        Show
        yeongwei YeongWei added a comment - Hi Julian Hyde , I have attempted the SINGLE_VALUE for MYSQL into the CASE + SUB-QUERY equivalent (JdbcRules.java). The strategy is to make use of SqlDialect to determine which database product is at stake. A test(JdbcAdapterTest.java) has been added to test if "SINGLE_VALUE" is within the PLAN and make use of JUnit Annotation to catch exception upon "run" if the DB is MYSQL. The test has to be ran with the VM Argument of "-Dcalcite.test.db=MYSQL" and it makes use of the Mondarian FoodMart data sets. Please refer to the attached, CALCITE-259 -MYSQL-SINGLE_VALUE-0.1.patch. Let me know your comments. Thanks!
        Show
        julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/01b2d057 .
        Hide
        yeongwei YeongWei added a comment -

        Hi Julian Hyde

        Appreciate with the commit link shared!

        I have notice your implementation for the HSQLDB scenario. You have chosen to use the CASE Statement for the SINGLE_VALUE implementation.

        On a side note, I have opened a ticket with HSQLDB community, this can be seen from the link below,

        https://sourceforge.net/p/hsqldb/bugs/1396/

        It is related to the CASE Statement return value's data types. It appears that for the CASE Statement in HSQLDB the return type has to be consistent. It appears to me that embedding the Scalar-Subquery into a Simple Select would be easier.

        Let me know what do you think.

        Thanks!

        Show
        yeongwei YeongWei added a comment - Hi Julian Hyde Appreciate with the commit link shared! I have notice your implementation for the HSQLDB scenario. You have chosen to use the CASE Statement for the SINGLE_VALUE implementation. On a side note, I have opened a ticket with HSQLDB community, this can be seen from the link below, https://sourceforge.net/p/hsqldb/bugs/1396/ It is related to the CASE Statement return value's data types. It appears that for the CASE Statement in HSQLDB the return type has to be consistent. It appears to me that embedding the Scalar-Subquery into a Simple Select would be easier. Let me know what do you think. Thanks!
        Hide
        julianhyde Julian Hyde added a comment -

        YeongWei Regarding the commit, you're welcome! We always give credit in the commit message to the author of contributions. (It's the right thing to do. And, it helps us keep our IP straight.)

        Regarding hsqldb. I totally agree. I fought the same issues. Thanks for logging the bug. (hsqldb is a great project and, to be fair, we are pushing their boundaries with some pretty tricky SQL.)

        Show
        julianhyde Julian Hyde added a comment - YeongWei Regarding the commit, you're welcome! We always give credit in the commit message to the author of contributions. (It's the right thing to do. And, it helps us keep our IP straight.) Regarding hsqldb. I totally agree. I fought the same issues. Thanks for logging the bug. (hsqldb is a great project and, to be fair, we are pushing their boundaries with some pretty tricky SQL.)
        Hide
        julianhyde Julian Hyde added a comment -

        Resolved in release 1.3.0-incubating (2015-05-30).

        Show
        julianhyde Julian Hyde added a comment - Resolved in release 1.3.0-incubating (2015-05-30).

          People

          • Assignee:
            julianhyde Julian Hyde
            Reporter:
            github-import GitHub Import
          • Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development