Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.8.0
    • Component/s: druid
    • Labels:
      None

      Issue Links

        Activity

        Hide
        julianhyde Julian Hyde added a comment -

        I am extending the VM to include a Druid server and flattened Foodmart data set in https://github.com/julianhyde/calcite-test-dataset/tree/druid.

        Show
        julianhyde Julian Hyde added a comment - I am extending the VM to include a Druid server and flattened Foodmart data set in https://github.com/julianhyde/calcite-test-dataset/tree/druid .
        Hide
        julianhyde Julian Hyde added a comment -

        The fix at https://github.com/julianhyde/calcite/tree/1121-druid is ready. Can someone please review?

        You will need https://github.com/julianhyde/calcite-test-dataset/tree/druid in order to run it. We won't be able to commit to master until Vladimir Sitnikov has committed https://github.com/vlsi/calcite-test-dataset/pull/7.

        Show
        julianhyde Julian Hyde added a comment - The fix at https://github.com/julianhyde/calcite/tree/1121-druid is ready. Can someone please review? You will need https://github.com/julianhyde/calcite-test-dataset/tree/druid in order to run it. We won't be able to commit to master until Vladimir Sitnikov has committed https://github.com/vlsi/calcite-test-dataset/pull/7 .
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - - edited

        Julian Hyde, I'm taking a look at it; I'm not done yet.

        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - - edited Julian Hyde , I'm taking a look at it; I'm not done yet.
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment -

        This is really cool.

        I went through the code and I left a couple of comments there.

        I am trying to run it right now; I'll post again if I hit any issue.

        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - This is really cool. I went through the code and I left a couple of comments there. I am trying to run it right now; I'll post again if I hit any issue.
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment -

        I was running some queries against Druid with the adapter.

        All seems to be running smoothly. However I found a problem for the following query (might be because of the aggregation column that is added in these cases?):

        select * from "foodmart" where "state_province" = 'CA';
        
        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - I was running some queries against Druid with the adapter. All seems to be running smoothly. However I found a problem for the following query (might be because of the aggregation column that is added in these cases ?): select * from "foodmart" where "state_province" = 'CA';
        Hide
        julianhyde Julian Hyde added a comment -

        I think you are running into a different issue. Since there is no GROUP BY we would run a "select"-type Druid query, and we with it with threshold: MAX_INTEGER, which basically means bring all rows back in one batch. This would blow memory of the processing node, and the node would die, so no subsequent queries would work. As of https://github.com/julianhyde/calcite/commit/fc438538b610553515fe2c85c4ed4313c3b89695, we now fetch 16K rows at a time.

        When we implement SELECT .. [ ORDER BY .. ] LIMIT we should push the limit down.

        Show
        julianhyde Julian Hyde added a comment - I think you are running into a different issue. Since there is no GROUP BY we would run a "select"-type Druid query, and we with it with threshold: MAX_INTEGER , which basically means bring all rows back in one batch. This would blow memory of the processing node, and the node would die, so no subsequent queries would work. As of https://github.com/julianhyde/calcite/commit/fc438538b610553515fe2c85c4ed4313c3b89695 , we now fetch 16K rows at a time. When we implement SELECT .. [ ORDER BY .. ] LIMIT we should push the limit down.
        Hide
        julianhyde Julian Hyde added a comment - - edited

        Responding to Jesus Camacho Rodriguez's review comments against https://github.com/julianhyde/calcite/commit/0a1273c25a01d9678277ae6157823db665e46cc9#commitcomment-17193625.

        Do we need the List as an input to the constructor? The tree is well-formed so we could include directly the root RelNode of the tree. I do not think this would add more complexity to the rest of the methods?

        In other adapters we created sub-classes of each kind of RelNode. For example, the MongoDB adapter has MongoFilter, MongoFilterRule, MongoProject, MongoProjectRule etc. But I found that it was a lot of work to create all of this operators, with little return. In this adapter I am trying a different strategy. I want to represent a table scan followed by a linear sequence of operators (Druid does not support union or join), and thus have DruidQuery as the only new RelNode. The RelNode sub-classes conveniently represent those operations, and are immutable.

        But we're not interested in the tree, or which particular sub-type of RelNode each operator is, just the linear sequence. If the sequence is Scan s, Filter f, Project p, Filter f2, we don't care whether the input to F2 is P or something else, just that f2's input row type matches p's output row type.

        The linear sequence is important, so that we can recognize whether we can use a Druid "select" query or a "groupBy" query, basically a regular expression.

        Back to your question. Since the RelNode s in the list are not necessarily wired up, we have to pass the whole list.

        Currently this case never gets exercised. Is it intentionally included to cover a follow-up case? Would it be better to include a Unsupported error?

        Yeah, I'll remove it. We can consider supporting "topN" in CALCITE-1206.

        Same idea as before. Though this is still WIP, would it be better to include a Unsupported error than adding an artificial aggregation?

        I see that Druid's "select" query now supports a "filter" attribute. So yes, we should not add an artificial aggregation.

        Show
        julianhyde Julian Hyde added a comment - - edited Responding to Jesus Camacho Rodriguez 's review comments against https://github.com/julianhyde/calcite/commit/0a1273c25a01d9678277ae6157823db665e46cc9#commitcomment-17193625 . Do we need the List as an input to the constructor? The tree is well-formed so we could include directly the root RelNode of the tree. I do not think this would add more complexity to the rest of the methods? In other adapters we created sub-classes of each kind of RelNode . For example, the MongoDB adapter has MongoFilter , MongoFilterRule , MongoProject , MongoProjectRule etc. But I found that it was a lot of work to create all of this operators, with little return. In this adapter I am trying a different strategy. I want to represent a table scan followed by a linear sequence of operators (Druid does not support union or join), and thus have DruidQuery as the only new RelNode . The RelNode sub-classes conveniently represent those operations, and are immutable. But we're not interested in the tree, or which particular sub-type of RelNode each operator is, just the linear sequence. If the sequence is Scan s, Filter f, Project p, Filter f2 , we don't care whether the input to F2 is P or something else, just that f2's input row type matches p's output row type. The linear sequence is important, so that we can recognize whether we can use a Druid "select" query or a "groupBy" query, basically a regular expression. Back to your question. Since the RelNode s in the list are not necessarily wired up, we have to pass the whole list. Currently this case never gets exercised. Is it intentionally included to cover a follow-up case? Would it be better to include a Unsupported error? Yeah, I'll remove it. We can consider supporting "topN" in CALCITE-1206 . Same idea as before. Though this is still WIP, would it be better to include a Unsupported error than adding an artificial aggregation? I see that Druid's "select" query now supports a "filter" attribute. So yes, we should not add an artificial aggregation.
        Hide
        julianhyde Julian Hyde added a comment -

        Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/dd8d3c58.

        Details:

        • Generates "select" and "groupBy" query types;
        • can push project and filter into DruidQuery;
        • filters can consist of AND, OR, NOT, =, !=, <, <=, >, >=;
        • splits projects, pushing down the parts that Druid can handle;
        • fetches large Druid "select" queries a page at a time;
        • does not yet push down HAVING, ORDER BY or LIMIT (see CALCITE-1206).
        Show
        julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/dd8d3c58 . Details: Generates "select" and "groupBy" query types; can push project and filter into DruidQuery; filters can consist of AND, OR, NOT, =, !=, <, <=, >, >=; splits projects, pushing down the parts that Druid can handle; fetches large Druid "select" queries a page at a time; does not yet push down HAVING, ORDER BY or LIMIT (see CALCITE-1206 ).
        Hide
        julianhyde Julian Hyde added a comment -

        Resolved in release 1.8.0 (2016-06-13).

        Show
        julianhyde Julian Hyde added a comment - Resolved in release 1.8.0 (2016-06-13).
        Hide
        julianhyde Julian Hyde added a comment -
        Show
        julianhyde Julian Hyde added a comment - There is online documentation .
        Hide
        maver1ck Maciej Bryński added a comment -

        Julian Hyde
        Is there any documentation how to create queries ?
        For example I'd like to set time granularity of query.

        Show
        maver1ck Maciej Bryński added a comment - Julian Hyde Is there any documentation how to create queries ? For example I'd like to set time granularity of query.
        Show
        julianhyde Julian Hyde added a comment - The best documentation we have is the test: https://github.com/apache/calcite/blob/master/druid/src/test/java/org/apache/calcite/test/DruidAdapterIT.java .
        Hide
        maver1ck Maciej Bryński added a comment -
        0: jdbc:calcite:schemaFactory=org.apache.calc> !describe "table"
        +-----------+-------------+------------+-------------+-----------+-----------+-------------+---------------+----------------+----------------+----------+---------+------------+---------------+------------------+
        | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME | DATA_TYPE | TYPE_NAME | COLUMN_SIZE | BUFFER_LENGTH | DECIMAL_DIGITS | NUM_PREC_RADIX | NULLABLE | REMARKS | COLUMN_DEF | SQL_DATA_TYPE | SQL_DATETIME_SUB |
        +-----------+-------------+------------+-------------+-----------+-----------+-------------+---------------+----------------+----------------+----------+---------+------------+---------------+------------------+
        |           | adhoc       | flink_imp  | __time      | -5        | BIGINT    | -1          |               | null           | 10             | 1        |         |            |               |      
        
        
        0: jdbc:calcite:schemaFactory=org.apache.calc> select sum("count") from "table" group by floor("__time" to DAY);
        2016-09-27 21:33:15,476 [main] ERROR - org.apache.calcite.sql.validate.SqlValidatorException: Cannot apply 'FLOOR' to arguments of type 'FLOOR(<BIGINT>, <INTERVAL DAY>)'. Supported form(s): 'FLOOR(<NUMERIC>)'
        'FLOOR(<DATETIME_INTERVAL>)'
        

        Any ideas ?

        Show
        maver1ck Maciej Bryński added a comment - 0: jdbc:calcite:schemaFactory=org.apache.calc> !describe "table" +-----------+-------------+------------+-------------+-----------+-----------+-------------+---------------+----------------+----------------+----------+---------+------------+---------------+------------------+ | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME | DATA_TYPE | TYPE_NAME | COLUMN_SIZE | BUFFER_LENGTH | DECIMAL_DIGITS | NUM_PREC_RADIX | NULLABLE | REMARKS | COLUMN_DEF | SQL_DATA_TYPE | SQL_DATETIME_SUB | +-----------+-------------+------------+-------------+-----------+-----------+-------------+---------------+----------------+----------------+----------+---------+------------+---------------+------------------+ | | adhoc | flink_imp | __time | -5 | BIGINT | -1 | | null | 10 | 1 | | | | 0: jdbc:calcite:schemaFactory=org.apache.calc> select sum( "count" ) from "table" group by floor( "__time" to DAY); 2016-09-27 21:33:15,476 [main] ERROR - org.apache.calcite.sql.validate.SqlValidatorException: Cannot apply 'FLOOR' to arguments of type 'FLOOR(<BIGINT>, <INTERVAL DAY>)'. Supported form(s): 'FLOOR(<NUMERIC>)' 'FLOOR(<DATETIME_INTERVAL>)' Any ideas ?
        Hide
        maver1ck Maciej Bryński added a comment -

        PS. Druid version 0.9.1.1

        Show
        maver1ck Maciej Bryński added a comment - PS. Druid version 0.9.1.1
        Hide
        julianhyde Julian Hyde added a comment -

        Can you log a new issue please? Also, state which version of Calcite you are using.

        Show
        julianhyde Julian Hyde added a comment - Can you log a new issue please? Also, state which version of Calcite you are using.
        Show
        maver1ck Maciej Bryński added a comment - Calcite 1.9.0 https://issues.apache.org/jira/browse/CALCITE-1392

          People

          • Assignee:
            julianhyde Julian Hyde
            Reporter:
            julianhyde Julian Hyde
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development