Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3180

Apache Drill JDBC storage plugin to query rdbms systems such as MySQL and Netezza from Apache Drill

    Details

      Description

      I have developed the base code for a JDBC storage-plugin for Apache Drill. The code is primitive but consitutes a good starting point for further coding. Today it provides primitive support for SELECT against RDBMS with JDBC.

      The goal is to provide complete SELECT support against RDBMS with push down capabilities.

      Currently the code is using standard JDBC classes.

      1. patch.diff
        5 kB
        Magnus Pierre
      2. pom.xml
        2 kB
        Magnus Pierre
      3. storage-mpjdbc.zip
        38 kB
        Magnus Pierre

        Activity

        Hide
        magnusp Magnus Pierre added a comment - - edited

        Storage mpjdbc should be unpacked in the contrib directory of Apache Drill. pom.xml is a replacement for pom.xml in contrib.

        To just build the package use notations similar to following:
        /Applications/apache-maven-3.2.5/bin/mvn clean install -pl contrib/storage-mpjdbc -DskipTests

        Take the jar in contrib/storage-mpjdbc/target and move it to jar folder of a drill dist.
        Add your jdbc driver to same folder.

        Restart drillbits and conf a new plug-in and put in

        { type : "jdbc"}

        and update, and you will get all configuration options.

        Fill them in and update. Make sure not to use a dbname as name for the plug-in.
        Example:

        {
        "type": "jdbc",
        "driver": "com.mysql.jdbc.Driver",
        "uri": "jdbc:mysql://192.168.1.xx/employees",
        "username": "maprdemo",
        "password": "maprdemo",
        "enabled": true
        }

        Now you should be able to query the db using plug-in.table notation.
        Dep on db you might need to have more or less detail in the URL.

        Show
        magnusp Magnus Pierre added a comment - - edited Storage mpjdbc should be unpacked in the contrib directory of Apache Drill. pom.xml is a replacement for pom.xml in contrib. To just build the package use notations similar to following: /Applications/apache-maven-3.2.5/bin/mvn clean install -pl contrib/storage-mpjdbc -DskipTests Take the jar in contrib/storage-mpjdbc/target and move it to jar folder of a drill dist. Add your jdbc driver to same folder. Restart drillbits and conf a new plug-in and put in { type : "jdbc"} and update, and you will get all configuration options. Fill them in and update. Make sure not to use a dbname as name for the plug-in. Example: { "type": "jdbc", "driver": "com.mysql.jdbc.Driver", "uri": "jdbc:mysql://192.168.1.xx/employees", "username": "maprdemo", "password": "maprdemo", "enabled": true } Now you should be able to query the db using plug-in.table notation. Dep on db you might need to have more or less detail in the URL.
        Hide
        magnusp Magnus Pierre added a comment -

        Latest version of the code can be found here:
        https://github.com/mapr-emea/apache-drill-jdbc-plugin (1.1.0 branch) which includes some filter push down, and better error handling for connections.

        Show
        magnusp Magnus Pierre added a comment - Latest version of the code can be found here: https://github.com/mapr-emea/apache-drill-jdbc-plugin (1.1.0 branch) which includes some filter push down, and better error handling for connections.
        Hide
        jnadeau Jacques Nadeau added a comment -

        Magnus Pierre, do you want to propose as a pull request and then we can get it reviewed and merged?

        Show
        jnadeau Jacques Nadeau added a comment - Magnus Pierre , do you want to propose as a pull request and then we can get it reviewed and merged?
        Hide
        magnusp Magnus Pierre added a comment -

        Yes, please guide me how to initiate the process.

        Show
        magnusp Magnus Pierre added a comment - Yes, please guide me how to initiate the process.
        Hide
        jnadeau Jacques Nadeau added a comment -

        First thing you should do is make sure that your current code is part of a personal fork of the Drill project on GitHub. Then make sure that a complete Drill build (with tests) completes successfully. Then, go to github and propose your branch as a pull request to the Apache project (referencing DRILL-3180).

        Show
        jnadeau Jacques Nadeau added a comment - First thing you should do is make sure that your current code is part of a personal fork of the Drill project on GitHub. Then make sure that a complete Drill build (with tests) completes successfully. Then, go to github and propose your branch as a pull request to the Apache project (referencing DRILL-3180 ).
        Hide
        vgkkonda Venkat Gurukrishna added a comment - - edited

        Magnus Pierre Richard Shaw
        Hi Magnus, Richard,

        I want to create the mysql plugin in Apache Drill 1.1.0 using the following configuration:

        { "type": "jdbc", "driver": "com.mysql.jdbc.Driver", "uri": "jdbc:mysql://ipaddress:3306/schemana", "username": "abc", "password": "xyz", "enabled": true }

        Also copied com.mysql.jdbc_5.1.5.jar in the "apache-drill-1.1.0\jars\3rdparty" folder.

        When I try to create a plugin it gives the following error:

        Error: Invalid JSON Mapping

        Need your help on the same.

        Thanks & Regards,
        -Venkat Gurukrishna

        Show
        vgkkonda Venkat Gurukrishna added a comment - - edited Magnus Pierre Richard Shaw Hi Magnus, Richard, I want to create the mysql plugin in Apache Drill 1.1.0 using the following configuration: { "type": "jdbc", "driver": "com.mysql.jdbc.Driver", "uri": "jdbc:mysql://ipaddress:3306/schemana", "username": "abc", "password": "xyz", "enabled": true } Also copied com.mysql.jdbc_5.1.5.jar in the "apache-drill-1.1.0\jars\3rdparty" folder. When I try to create a plugin it gives the following error: Error: Invalid JSON Mapping Need your help on the same. Thanks & Regards, -Venkat Gurukrishna
        Hide
        jnadeau Jacques Nadeau added a comment -

        Initial updates to make work with 1.2 SNAPSHOT as well as simplify the code. Also moving to using as much as possible of the Calcite JDBC adapter. This is a WIP but my work will proceed here:

        https://github.com/jacques-n/drill/tree/DRILL-3180

        Show
        jnadeau Jacques Nadeau added a comment - Initial updates to make work with 1.2 SNAPSHOT as well as simplify the code. Also moving to using as much as possible of the Calcite JDBC adapter. This is a WIP but my work will proceed here: https://github.com/jacques-n/drill/tree/DRILL-3180
        Hide
        jnadeau Jacques Nadeau added a comment -

        Note, my history got messed up since this was initially a separate module. I plan to update the changeset tree to include Magnus's initial commit to ensure recognition of his substantial contributions.

        Show
        jnadeau Jacques Nadeau added a comment - Note, my history got messed up since this was initially a separate module. I plan to update the changeset tree to include Magnus's initial commit to ensure recognition of his substantial contributions.
        Hide
        jnadeau Jacques Nadeau added a comment -

        I've updated my branch to an updated working plugin. Magnus Pierre, if you're interested in how I switched to using Calcite's adapter code, you can take a look. It looks like pushdowns are not firing yet. Need to trace to determine what the issues are. Add a simple unit test framework using Derby to help ease development & testing.

        Show
        jnadeau Jacques Nadeau added a comment - I've updated my branch to an updated working plugin. Magnus Pierre , if you're interested in how I switched to using Calcite's adapter code, you can take a look. It looks like pushdowns are not firing yet. Need to trace to determine what the issues are. Add a simple unit test framework using Derby to help ease development & testing.
        Hide
        magnusp Magnus Pierre added a comment - - edited

        Thanks Jacques, looks really good and far easier than I was able to produce. The biggest reason I did not go the Calcite route was 1) Since I couldn't figure how to do it, and your code proves how it is done (which is great). 2) A statement in the Calcite docs (which I can not find right now) that pushdown was not yet enabled / working which made me to hesitate. I hope that was just a misunderstanding from my part. I will mark my code as obsolete and close down any enhancements of the old code line.

        Show
        magnusp Magnus Pierre added a comment - - edited Thanks Jacques, looks really good and far easier than I was able to produce. The biggest reason I did not go the Calcite route was 1) Since I couldn't figure how to do it, and your code proves how it is done (which is great). 2) A statement in the Calcite docs (which I can not find right now) that pushdown was not yet enabled / working which made me to hesitate. I hope that was just a misunderstanding from my part. I will mark my code as obsolete and close down any enhancements of the old code line.
        Hide
        magnusp Magnus Pierre added a comment -

        Some comments / issues with the new code:
        Building from the branch 3180 fails with an error in SchemaUtilities.java
        -Trying the plugin code with main fails with Schema not compatible with AbstractSchema in getDefaultSchema in JdbcStoragePlugin. My solution to the very same problem was to create a new Schema class that extended AbstractSchema, you probably have a better solution

        Show
        magnusp Magnus Pierre added a comment - Some comments / issues with the new code: Building from the branch 3180 fails with an error in SchemaUtilities.java -Trying the plugin code with main fails with Schema not compatible with AbstractSchema in getDefaultSchema in JdbcStoragePlugin. My solution to the very same problem was to create a new Schema class that extended AbstractSchema, you probably have a better solution
        Hide
        magnusp Magnus Pierre added a comment - - edited

        I added the changes I needed to do to get the new codeline to run (not validated by anyone else) as a patch using git diff.
        You can only apply it on a DRILL-3180 branch clone, i.e. git clone -b DRILL-3180 https://github.com/jacques-n/drill.git

        Show
        magnusp Magnus Pierre added a comment - - edited I added the changes I needed to do to get the new codeline to run (not validated by anyone else) as a patch using git diff. You can only apply it on a DRILL-3180 branch clone, i.e. git clone -b DRILL-3180 https://github.com/jacques-n/drill.git
        Hide
        jnadeau Jacques Nadeau added a comment -

        I've updated this with some Magnus changes plus my own additions. Join and Filter pushdown works. Others probably do as well.

        Show
        jnadeau Jacques Nadeau added a comment - I've updated this with some Magnus changes plus my own additions. Join and Filter pushdown works. Others probably do as well.
        Hide
        magnusp Magnus Pierre added a comment -

        Some findings:
        The plugin is much improved, but I found some issues that might be looked at:

        Metadata for tables only available if URL does not contain target database and only contain the path to the server. (MYSQL) - This is not how it need to work since it excludes many databases such as Netezza that demands to have a target database as part of the URL. (at least when I tried them)

        I personally do not think it is a good idea to have the connection to the database as part of the constructor method for the plugin. It is not possible to create a plugin statement in drill for a db that is not online, even though the information is valid. One alternative is to actually check enabled or not in the json before deciding to connect to db.

        The constructor need to throw the correct error message in case of valid json but invalid connection otherwise it will be hard to understand where the real problem lies.

        SHOW TABLES does not work for schemas returned from JDBC plugin.

        Join pushdown works for simple constructs such as:
        select * from
        mp.employees.`employees` e
        INNER JOIN
        mp.employees.`salaries` s
        ON e.`EMP_NO` = s.`EMP_NO`
        WHERE s.`to_date` > CURRENT_DATE

        But is not happening when writing it as:
        select * from
        mp.employees.`employees` e
        INNER JOIN
        mp.employees.`salaries` s
        ON e.`EMP_NO` = s.`EMP_NO`
        AND s.`to_date` > CURRENT_DATE

        Which is quite common-place.

        A more complex query:
        select * from
        mp.employees.`employees` e
        INNER JOIN
        mp.employees.`salaries` s
        ON e.`EMP_NO` = s.`EMP_NO`
        INNER JOIN
        mp.employees.`dept_emp` ed
        ON e.`EMP_NO` = ed.`EMP_NO`
        WHERE s.`to_date` > CURRENT_DATE and ed.`to_date` > CURRENT_DATE

        Fail with:
        org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IllegalStateException: Already had POJO for id (java.lang.Integer) [com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8] Fragment 1:0 [Error Id: b34092c2-9225-4e37-9955-3a28b6215d97 on administorsmbp2.lan:31010]

        I have not investigated it further.

        Show
        magnusp Magnus Pierre added a comment - Some findings: The plugin is much improved, but I found some issues that might be looked at: Metadata for tables only available if URL does not contain target database and only contain the path to the server. (MYSQL) - This is not how it need to work since it excludes many databases such as Netezza that demands to have a target database as part of the URL. (at least when I tried them) I personally do not think it is a good idea to have the connection to the database as part of the constructor method for the plugin. It is not possible to create a plugin statement in drill for a db that is not online, even though the information is valid. One alternative is to actually check enabled or not in the json before deciding to connect to db. The constructor need to throw the correct error message in case of valid json but invalid connection otherwise it will be hard to understand where the real problem lies. SHOW TABLES does not work for schemas returned from JDBC plugin. Join pushdown works for simple constructs such as: select * from mp.employees.`employees` e INNER JOIN mp.employees.`salaries` s ON e.`EMP_NO` = s.`EMP_NO` WHERE s.`to_date` > CURRENT_DATE But is not happening when writing it as: select * from mp.employees.`employees` e INNER JOIN mp.employees.`salaries` s ON e.`EMP_NO` = s.`EMP_NO` AND s.`to_date` > CURRENT_DATE Which is quite common-place. A more complex query: select * from mp.employees.`employees` e INNER JOIN mp.employees.`salaries` s ON e.`EMP_NO` = s.`EMP_NO` INNER JOIN mp.employees.`dept_emp` ed ON e.`EMP_NO` = ed.`EMP_NO` WHERE s.`to_date` > CURRENT_DATE and ed.`to_date` > CURRENT_DATE Fail with: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IllegalStateException: Already had POJO for id (java.lang.Integer) [com.fasterxml.jackson.annotation.ObjectIdGenerator$IdKey@3372bbe8] Fragment 1:0 [Error Id: b34092c2-9225-4e37-9955-3a28b6215d97 on administorsmbp2.lan:31010] I have not investigated it further.
        Hide
        jnadeau Jacques Nadeau added a comment -

        Moving to 1.3. The plugin has an issue with pushing down certain types of expressions. People are welcome to use the current patch as it is very functional. However, we should hold off merging until the plugin correctly identifies expressions trees that can be pushed down into JDBC versus that cannot. An example failing query is

        SELECT CONVERT_FROM(a,'JSON') FROM jdbctable;
        
        Show
        jnadeau Jacques Nadeau added a comment - Moving to 1.3. The plugin has an issue with pushing down certain types of expressions. People are welcome to use the current patch as it is very functional. However, we should hold off merging until the plugin correctly identifies expressions trees that can be pushed down into JDBC versus that cannot. An example failing query is SELECT CONVERT_FROM(a,'JSON') FROM jdbctable;
        Hide
        jnadeau Jacques Nadeau added a comment -

        I'll post an updated patch shortly. It addresses your error issues. The example in the middle, however, includes a local filtering join condition. We won't initially support pushing this down. Did you mean to write:

        select * from 
        mp.employees.`employees` e
        INNER JOIN 
        mp.employees.`salaries` s
        ON e.`EMP_NO` = s.`EMP_NO`
        WHERE s.`to_date` > CURRENT_DATE
        

        It is much more common to express a separate filtering condition rather than putting the filter condition in the join.

        Show
        jnadeau Jacques Nadeau added a comment - I'll post an updated patch shortly. It addresses your error issues. The example in the middle, however, includes a local filtering join condition. We won't initially support pushing this down. Did you mean to write: select * from mp.employees.`employees` e INNER JOIN mp.employees.`salaries` s ON e.`EMP_NO` = s.`EMP_NO` WHERE s.`to_date` > CURRENT_DATE It is much more common to express a separate filtering condition rather than putting the filter condition in the join.
        Hide
        jnadeau Jacques Nadeau added a comment -

        Latest version updated on my branch: https://github.com/jacques-n/drill/tree/DRILL-3180

        Show
        jnadeau Jacques Nadeau added a comment - Latest version updated on my branch: https://github.com/jacques-n/drill/tree/DRILL-3180
        Hide
        sphillips Steven Phillips added a comment -

        +1

        Show
        sphillips Steven Phillips added a comment - +1
        Hide
        jnadeau Jacques Nadeau added a comment -

        Merged in e12cd470e4ab57b025840fdfa200a051a01df029

        Show
        jnadeau Jacques Nadeau added a comment - Merged in e12cd470e4ab57b025840fdfa200a051a01df029
        Hide
        ajoabraham Ajo Abraham added a comment -

        Hi Jacques Nadeau - is it possible to push down a query directly by passing all the drill optimization? Then we could leverage database native functionality not supported by drill. Use case: i have a really complex query i want to run on postgres db then use that resultset inside drill with other drill sources. But the query on postgres I just want to pass it through. Thanks!

        Show
        ajoabraham Ajo Abraham added a comment - Hi Jacques Nadeau - is it possible to push down a query directly by passing all the drill optimization? Then we could leverage database native functionality not supported by drill. Use case: i have a really complex query i want to run on postgres db then use that resultset inside drill with other drill sources. But the query on postgres I just want to pass it through. Thanks!
        Hide
        magnusp Magnus Pierre added a comment -

        Hello Jacques,
        Sounds good.
        Re the question:
        I humbly disagree with the last sentence in your response. Having almost 15 years in Enterprise Data Warehousing, one of the most common queries I came accross, or wrote myself were queries that dealt with time, and quite common were filtering conditions as part of the join clause.

        Consider when joining n tables i.e. a much bigger query than the expressed and where you have history on most tables you are to join with, it is common to put the filter condition as part of the join since:

        1) It makes the query more clearly expressed and readable where the conditions for the join is together with the join condition (most often a left outer join) where the filter is applied on the right hand table.

        2) It makes the query easier to maintain for the simple reason that you can comment out a block of code without touching multiple places.

        3) Legacy SQL that could be supported provided we support filters as part of the join clause:
        For some DB engines (Teradata to mention one), it is common to use it as part of join since it is more likely that the optimizer will be able to apply the filter before the join, at least on the ancient releases I worked with. (even though it should not matter from a query optimization perspective)
        At the same token, derived tables are commonly used in some databases (TD as an example again) to ensure that a certain condition is processed before the join:
        Example: SELECT * from customer c inner join ( select s0.x,s0.y, s0.z from table_1 s0 where s0.z < 100) as t1 on c.cust_id = t1.x

        Basically trying to circumvent certain limitations of query rewrite by explicitly expressing the processing order knowing that a large query is hard to untangle for most optimizers.

        It should not matter for a mature optimizer, but for some it does.
        So to conclude: It is important to support both cases since for some engines it will make a difference in effiiciency and processing order.

        Show
        magnusp Magnus Pierre added a comment - Hello Jacques, Sounds good. Re the question: I humbly disagree with the last sentence in your response. Having almost 15 years in Enterprise Data Warehousing, one of the most common queries I came accross, or wrote myself were queries that dealt with time, and quite common were filtering conditions as part of the join clause. Consider when joining n tables i.e. a much bigger query than the expressed and where you have history on most tables you are to join with, it is common to put the filter condition as part of the join since: 1) It makes the query more clearly expressed and readable where the conditions for the join is together with the join condition (most often a left outer join) where the filter is applied on the right hand table. 2) It makes the query easier to maintain for the simple reason that you can comment out a block of code without touching multiple places. 3) Legacy SQL that could be supported provided we support filters as part of the join clause: For some DB engines (Teradata to mention one), it is common to use it as part of join since it is more likely that the optimizer will be able to apply the filter before the join, at least on the ancient releases I worked with. (even though it should not matter from a query optimization perspective) At the same token, derived tables are commonly used in some databases (TD as an example again) to ensure that a certain condition is processed before the join: Example: SELECT * from customer c inner join ( select s0.x,s0.y, s0.z from table_1 s0 where s0.z < 100) as t1 on c.cust_id = t1.x Basically trying to circumvent certain limitations of query rewrite by explicitly expressing the processing order knowing that a large query is hard to untangle for most optimizers. It should not matter for a mature optimizer, but for some it does. So to conclude: It is important to support both cases since for some engines it will make a difference in effiiciency and processing order.
        Hide
        magnusp Magnus Pierre added a comment -

        If remembering correctly the JDBC drivers are able to expose both internal, and user defined functions to the application using it. Does drill have a function "store" to validate whether a function is valid and should/could be pushed down or not?

        Show
        magnusp Magnus Pierre added a comment - If remembering correctly the JDBC drivers are able to expose both internal, and user defined functions to the application using it. Does drill have a function "store" to validate whether a function is valid and should/could be pushed down or not?
        Hide
        jnadeau Jacques Nadeau added a comment -

        I think you may be depending on a behavior of particular dbs. The opinions of Julian Hyde, Jinfeng Ni and Aman Sinha would probably be helpful here. If my query has an INNER join with an additional single-table join-local filter condition, then all of these are logically equivalent:

        • filter condition applied as part of join evaluation
        • filter applied after join evaluation
        • filter applied before join evaluation

        As such, In Drill we should be able to rewrite to any of those and things should be ok. Additionally, a derived table expressed in the same query also does not force/guarantee the ordering of operations. The optimizers purpose to is to find all equivalent sets and then pick what it thinks is the best one. If you can force the optimizer to order operations implicitly, that would mean less good SQL writers would compose bad SQL and the optimizer couldn't do anything about it.

        That being said, the conversation of logical equivalencies is really separate from what you really want: push down the join even their if there is an additional filter condition within the join. That seems reasonable and could be done on the Calcite project, specifically right here: https://github.com/apache/incubator-calcite/blob/master/core/src/main/java/org/apache/calcite/adapter/jdbc/JdbcRules.java#L201

        Please note, that doesn't mean that you should expect a particular sql construction to be passed to the underlying jdbc system. This is because the expression goes through the Calcite optimizer. So while you may compose the query with a filter as part of the join condition, that Drill may output the query to the JDBC source using a different but equal pattern. This should be expected as Drill should produce a logically equivalent dataset.

        This all changes if the join with the condition is an OUTER join.

        Show
        jnadeau Jacques Nadeau added a comment - I think you may be depending on a behavior of particular dbs. The opinions of Julian Hyde , Jinfeng Ni and Aman Sinha would probably be helpful here. If my query has an INNER join with an additional single-table join-local filter condition, then all of these are logically equivalent: filter condition applied as part of join evaluation filter applied after join evaluation filter applied before join evaluation As such, In Drill we should be able to rewrite to any of those and things should be ok. Additionally, a derived table expressed in the same query also does not force/guarantee the ordering of operations. The optimizers purpose to is to find all equivalent sets and then pick what it thinks is the best one. If you can force the optimizer to order operations implicitly, that would mean less good SQL writers would compose bad SQL and the optimizer couldn't do anything about it. That being said, the conversation of logical equivalencies is really separate from what you really want: push down the join even their if there is an additional filter condition within the join. That seems reasonable and could be done on the Calcite project, specifically right here: https://github.com/apache/incubator-calcite/blob/master/core/src/main/java/org/apache/calcite/adapter/jdbc/JdbcRules.java#L201 Please note, that doesn't mean that you should expect a particular sql construction to be passed to the underlying jdbc system. This is because the expression goes through the Calcite optimizer. So while you may compose the query with a filter as part of the join condition, that Drill may output the query to the JDBC source using a different but equal pattern. This should be expected as Drill should produce a logically equivalent dataset. This all changes if the join with the condition is an OUTER join.
        Hide
        jnadeau Jacques Nadeau added a comment -

        We made a modification to solve this. As it stands, if the capabilities are in Calcite proper, we assume that the underlying JDBC source supports the operation. Since CONVERT_FROM is a Drill specific function, it isn't pushed down. Since Calcite is generally a common subset, this should be okay in most cases. You can see that code here: https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcExpressionCheck.java

        Note that we could enhance this so very specific expressions are excluded/included for particular systems. Our assumption was that initially, this would work for a lot of common use cases.

        Show
        jnadeau Jacques Nadeau added a comment - We made a modification to solve this. As it stands, if the capabilities are in Calcite proper, we assume that the underlying JDBC source supports the operation. Since CONVERT_FROM is a Drill specific function, it isn't pushed down. Since Calcite is generally a common subset, this should be okay in most cases. You can see that code here: https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcExpressionCheck.java Note that we could enhance this so very specific expressions are excluded/included for particular systems. Our assumption was that initially, this would work for a lot of common use cases.
        Hide
        julianhyde Julian Hyde added a comment -

        I don't often see non-join conditions into the ON clause, but I take Magnus Pierre's point. If that were an outer join, we would be able to push the date condition down to the salary table, whereas if it were in the WHERE clause we could not.

        Anyway, the JDBC adapter's goal is not to format the SQL to follow any "best practice" or to look nice for humans to read. It is to communicate with the target DB's query optimizer, ideally in a form that the optimizer is unlikely to screw up, and most importantly to preserve semantics.

        Sometimes there is a danger that Calcite will "over optimize" the query, e.g.

        select *
        FROM mp.employees.`employees` e
        INNER JOIN  (
          SELECT * FROM mp.employees.`salaries` s
          WHERE s.`to_date` > CURRENT_DATE) AS s
        ON e.`EMP_NO` = s.`EMP_NO`

        is valid and efficient but the new query block might confuse optimizers like MySQL's.

        Show
        julianhyde Julian Hyde added a comment - I don't often see non-join conditions into the ON clause, but I take Magnus Pierre 's point. If that were an outer join, we would be able to push the date condition down to the salary table, whereas if it were in the WHERE clause we could not. Anyway, the JDBC adapter's goal is not to format the SQL to follow any "best practice" or to look nice for humans to read. It is to communicate with the target DB's query optimizer, ideally in a form that the optimizer is unlikely to screw up, and most importantly to preserve semantics. Sometimes there is a danger that Calcite will "over optimize" the query, e.g. select * FROM mp.employees.`employees` e INNER JOIN ( SELECT * FROM mp.employees.`salaries` s WHERE s.`to_date` > CURRENT_DATE) AS s ON e.`EMP_NO` = s.`EMP_NO` is valid and efficient but the new query block might confuse optimizers like MySQL's.
        Hide
        OlavJ Olav Jordens added a comment -

        Hi Jacques,
        I have just downloaded Drill 1.2.0 and am battling to add Netezza as a storage plugin. I have followed the Drill documentation exactly but after entering the following json in the web console:
        {
        "type": "jdbc",
        "driver": "org.netezza.Driver",
        "url": "jdbc:netezza://edw-vip-prod:5480",
        "username": "username",
        "password": "password",
        "enabled": true
        }
        and click create I get message:
        unable to create / update storage plugin.
        I know that netezza is running and I have added the driver jar to the 3rd parties directory. Any idea what could be going wrong?
        Thanks,
        OlavJ

        Show
        OlavJ Olav Jordens added a comment - Hi Jacques, I have just downloaded Drill 1.2.0 and am battling to add Netezza as a storage plugin. I have followed the Drill documentation exactly but after entering the following json in the web console: { "type": "jdbc", "driver": "org.netezza.Driver", "url": "jdbc:netezza://edw-vip-prod:5480", "username": "username", "password": "password", "enabled": true } and click create I get message: unable to create / update storage plugin. I know that netezza is running and I have added the driver jar to the 3rd parties directory. Any idea what could be going wrong? Thanks, OlavJ
        Hide
        magnusp Magnus Pierre added a comment -

        I'm not Jacques, so he might have some more insights but I distinctly remember that Netezza demands to have a database in the URL i.e. jdbc:netezza://edw-vip-prod:5480/DBNAME

        Worth a try at least.

        Regards,
        Magnus

        Show
        magnusp Magnus Pierre added a comment - I'm not Jacques, so he might have some more insights but I distinctly remember that Netezza demands to have a database in the URL i.e. jdbc:netezza://edw-vip-prod:5480/DBNAME Worth a try at least. Regards, Magnus
        Hide
        olavjordens@gmail.com Olav Jordens added a comment -

        Thanks Magnus - I'll give it a try in the morning.

        Olav

        On 13/11/2015, at 9:33 PM, Magnus Pierre (JIRA) wrote:

        [ https://issues.apache.org/jira/browse/DRILL-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003710#comment-15003710 ]

        Magnus Pierre commented on DRILL-3180:
        --------------------------------------

        I'm not Jacques, so he might have some more insights but I distinctly remember that Netezza demands to have a database in the URL i.e. jdbc:netezza://edw-vip-prod:5480/DBNAME

        Worth a try at least.

        Regards,
        Magnus


        This message was sent by Atlassian JIRA
        (v6.3.4#6332)

        Show
        olavjordens@gmail.com Olav Jordens added a comment - Thanks Magnus - I'll give it a try in the morning. Olav On 13/11/2015, at 9:33 PM, Magnus Pierre (JIRA) wrote: [ https://issues.apache.org/jira/browse/DRILL-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003710#comment-15003710 ] Magnus Pierre commented on DRILL-3180 : -------------------------------------- I'm not Jacques, so he might have some more insights but I distinctly remember that Netezza demands to have a database in the URL i.e. jdbc:netezza://edw-vip-prod:5480/DBNAME Worth a try at least. Regards, Magnus – This message was sent by Atlassian JIRA (v6.3.4#6332)
        Hide
        jnadeau Jacques Nadeau added a comment -

        You should also give the latest 1.3 release candidate a try. There were some issues with multi-schemas in 1.2 that should be resolved in 1.3. I've tested it yet with Netezza but it addressed a bunch of issues with SQL Server and Oracle. You can grab the binary here until the release is finalized:

        http://people.apache.org/~jacques/apache-drill-1.3.0.rc2/

        Show
        jnadeau Jacques Nadeau added a comment - You should also give the latest 1.3 release candidate a try. There were some issues with multi-schemas in 1.2 that should be resolved in 1.3. I've tested it yet with Netezza but it addressed a bunch of issues with SQL Server and Oracle. You can grab the binary here until the release is finalized: http://people.apache.org/~jacques/apache-drill-1.3.0.rc2/
        Hide
        OlavJ Olav Jordens added a comment - - edited

        Hi Jacques,

        I followed Magnus' suggestion to create the storage plugin to Netezza using a database like so:
        {
        "type": "jdbc",
        "driver": "org.netezza.Driver",
        "url": "jdbc:netezza://edw-vip-prod:5480/SYSTEM",
        "username": "username",
        "password": "password",
        "enabled": true
        }
        and it gives "Success" in version 1.3.0. I can see the tables now - thanks!

        Show
        OlavJ Olav Jordens added a comment - - edited Hi Jacques, I followed Magnus' suggestion to create the storage plugin to Netezza using a database like so: { "type": "jdbc", "driver": "org.netezza.Driver", "url": "jdbc:netezza://edw-vip-prod:5480/SYSTEM", "username": "username", "password": "password", "enabled": true } and it gives "Success" in version 1.3.0. I can see the tables now - thanks!
        Hide
        hyzandey Hyzandey Castro Alves added a comment - - edited

        Hi, Jacques.

        Considering this plugin can theoretically connect to any RDBMS through JDBC, I tried to connect it to some INFORMIX database with something like this (named it "informix"):

        {
        "type": "jdbc",
        "driver": "com.informix.jdbc.IfxDriver",
        "url": "jdbc:informix-sqli://IP:PORT/sample_database:INFORMIXSERVER=sample_server;user=sample_user;password=sample_password",
        "username": null,
        "password": null,
        "enabled": true
        }

        It successfully connects and gets me to list "schemas" and tables, but I cannot get queries to work because the SQL sent to the INFORMIX server looks like this:

        SELECT *
        FROM sample_database.sample_table

        for a Drill query like this:

        SELECT *
        FROM informix.sample_database.sample_table

        I think this error occurs because it is not necessary to qualify the table name with the database name. Worse than this, the appended database name gets the query sent to INFORMIX to fail.

        How could I fix this? I tried to query from "informix.sample_table" table, but it gets me a "VALIDATION ERROR: informix.sample_table not found".

        Show
        hyzandey Hyzandey Castro Alves added a comment - - edited Hi, Jacques. Considering this plugin can theoretically connect to any RDBMS through JDBC, I tried to connect it to some INFORMIX database with something like this (named it "informix"): { "type": "jdbc", "driver": "com.informix.jdbc.IfxDriver", "url": "jdbc:informix-sqli://IP:PORT/sample_database:INFORMIXSERVER=sample_server;user=sample_user;password=sample_password", "username": null, "password": null, "enabled": true } It successfully connects and gets me to list "schemas" and tables, but I cannot get queries to work because the SQL sent to the INFORMIX server looks like this: SELECT * FROM sample_database.sample_table for a Drill query like this: SELECT * FROM informix.sample_database.sample_table I think this error occurs because it is not necessary to qualify the table name with the database name. Worse than this, the appended database name gets the query sent to INFORMIX to fail. How could I fix this? I tried to query from "informix.sample_table" table, but it gets me a "VALIDATION ERROR: informix.sample_table not found".

          People

          • Assignee:
            jnadeau Jacques Nadeau
            Reporter:
            magnusp Magnus Pierre
          • Votes:
            6 Vote for this issue
            Watchers:
            17 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 864h 1m
              864h 1m
              Remaining:
              Remaining Estimate - 864h 1m
              864h 1m
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development