Hive
  1. Hive
  2. HIVE-48

Support JDBC connections for interoperability between Hive and RDBMS

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.3.0
    • Component/s: JDBC
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      JDBC Driver

      Description

      In many DW and BI systems, the data are stored in RDBMS for now such as oracle, mysql, postgresql ... for reporting, charting and etc.
      It would be useful to be able to import data from RDBMS and export data to RDBMS using JDBC connections.
      If Hive support JDBC connections, It wll be much easier to use 3rd party DW/BI tools.

      1. hive-48.8.patch
        230 kB
        Raghotham Murthy
      2. hive-48.7.patch
        230 kB
        Raghotham Murthy
      3. hive-48.6.patch
        197 kB
        Raghotham Murthy
      4. hive-48.5.patch
        202 kB
        Raghotham Murthy
      5. hadoop-4101.4.patch
        200 kB
        Michi Mutsuzaki
      6. hadoop-4101.3.patch
        9 kB
        Michi Mutsuzaki
      7. hadoop-4101.2.patch
        508 kB
        Raghotham Murthy
      8. hadoop-4101.1.patch
        197 kB
        Michi Mutsuzaki

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Patch Available Patch Available Open Open
          70d 18h 41m 1 dhruba borthakur 06/Jan/09 01:30
          Open Open Patch Available Patch Available
          49d 15h 44m 2 dhruba borthakur 06/Jan/09 02:06
          Patch Available Patch Available Resolved Resolved
          35s 1 dhruba borthakur 06/Jan/09 02:07
          Resolved Resolved Closed Closed
          1074d 22h 1m 1 Carl Steinbach 17/Dec/11 00:09
          Carl Steinbach made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Carl Steinbach made changes -
          Component/s JDBC [ 12314178 ]
          Component/s Drivers [ 12313584 ]
          Jeff Hammerbacher made changes -
          Link This issue relates to HIVE-1536 [ HIVE-1536 ]
          Carl Steinbach made changes -
          Fix Version/s 0.3.0 [ 12313637 ]
          Fix Version/s 0.6.0 [ 12314524 ]
          Carl Steinbach made changes -
          Component/s Drivers [ 12313584 ]
          Component/s Clients [ 12312587 ]
          Zheng Shao made changes -
          Fix Version/s 0.6.0 [ 12314524 ]
          Fix Version/s 0.2.0 [ 12313565 ]
          dhruba borthakur made changes -
          Fix Version/s 0.2.0 [ 12313565 ]
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hadoop Flags [Reviewed]
          Hide
          dhruba borthakur added a comment -

          I just committed this. Thanks Raghu and Michi.

          Show
          dhruba borthakur added a comment - I just committed this. Thanks Raghu and Michi.
          dhruba borthakur made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Raghotham Murthy made changes -
          Attachment hive-48.8.patch [ 12397168 ]
          Hide
          Raghotham Murthy added a comment -

          Oops. My method of generating diffs seems to be broken with git - this happened twice today switching back to svn. Uploaded fixed patch.

          Show
          Raghotham Murthy added a comment - Oops. My method of generating diffs seems to be broken with git - this happened twice today switching back to svn. Uploaded fixed patch.
          dhruba borthakur made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          dhruba borthakur added a comment -

          I get compilation problems:

          core-compile:
          [javac] Compiling 10 source files to /mnt/vol/devrs004.snc1/dhruba/commithive/build/jdbc/classes
          [javac] /mnt/vol/devrs004.snc1/dhruba/commithive/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java:452: unreported exception java.sql.SQLException; must be caught or declared to be thrown
          [javac] throw new SQLException("Method not supported");
          [javac] ^
          [javac] /mnt/vol/devrs004.snc1/dhruba/commithive/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java:462: unreported exception java.sql.SQLException; must be caught or declared to be thrown
          [javac] throw new SQLException("Method not supported");

          Show
          dhruba borthakur added a comment - I get compilation problems: core-compile: [javac] Compiling 10 source files to /mnt/vol/devrs004.snc1/dhruba/commithive/build/jdbc/classes [javac] /mnt/vol/devrs004.snc1/dhruba/commithive/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java:452: unreported exception java.sql.SQLException; must be caught or declared to be thrown [javac] throw new SQLException("Method not supported"); [javac] ^ [javac] /mnt/vol/devrs004.snc1/dhruba/commithive/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveConnection.java:462: unreported exception java.sql.SQLException; must be caught or declared to be thrown [javac] throw new SQLException("Method not supported");
          Ashish Thusoo made changes -
          Assignee Michi Mutsuzaki [ michim ] Raghotham Murthy [ rsm ]
          Hide
          Ashish Thusoo added a comment -

          +1.

          Looks good to me.

          Show
          Ashish Thusoo added a comment - +1. Looks good to me.
          Raghotham Murthy made changes -
          Attachment hive-48.7.patch [ 12397149 ]
          Hide
          Raghotham Murthy added a comment -

          Two changes:
          1. made JdbcSessionState a dummy class
          2. now throwing SQLException for unimplemented functions.

          Show
          Raghotham Murthy added a comment - Two changes: 1. made JdbcSessionState a dummy class 2. now throwing SQLException for unimplemented functions.
          Hide
          Raghotham Murthy added a comment -

          > 1. What is the motivation for Driver.java:87 change?

          The reason was that there is currently no way to retrieve the result table name in the server. Also, the driver is always returning the result of queries, so, 'result' seems to be a reasonable name for the table. Ideally, we should have a function which converts a schema DDL to a Schema object and we should be able to query the schema object. Right now we just pass the schema string as is to the DynamicSerDe.

          > 2. In JdbcSessionState.java what are execString and fileName variables used for?

          Right now JdbcSessionState is a dummy class. Its not being used for anything. The plan is to use it later on. I just copied over the class from Cli. I can make it an empty class.

          > 3. Shouldn't HiveResultSetMetadata.java be doing something?

          The plan was to stage the JDBC implementation. There are a bunch of auto-generated classes which will be used later on.

          > 4. In HiveResultSet.java:436 the getDouble function is missing TODO comment? Same is true for getFloat and getInt.

          The TODO comment is inside the function body. Isnt that enough?

          > 5. For the non implemented functions should we throw a non implemented run time exception instead of just returning 0.

          return 0 was auto-generated. I'll change them to throw SQLException instead.

          Show
          Raghotham Murthy added a comment - > 1. What is the motivation for Driver.java:87 change? The reason was that there is currently no way to retrieve the result table name in the server. Also, the driver is always returning the result of queries, so, 'result' seems to be a reasonable name for the table. Ideally, we should have a function which converts a schema DDL to a Schema object and we should be able to query the schema object. Right now we just pass the schema string as is to the DynamicSerDe. > 2. In JdbcSessionState.java what are execString and fileName variables used for? Right now JdbcSessionState is a dummy class. Its not being used for anything. The plan is to use it later on. I just copied over the class from Cli. I can make it an empty class. > 3. Shouldn't HiveResultSetMetadata.java be doing something? The plan was to stage the JDBC implementation. There are a bunch of auto-generated classes which will be used later on. > 4. In HiveResultSet.java:436 the getDouble function is missing TODO comment? Same is true for getFloat and getInt. The TODO comment is inside the function body. Isnt that enough? > 5. For the non implemented functions should we throw a non implemented run time exception instead of just returning 0. return 0 was auto-generated. I'll change them to throw SQLException instead.
          Hide
          Ashish Thusoo added a comment -

          A few questions:
          1. What is the motivation for Driver.java:87 change?
          2. In JdbcSessionState.java what are execString and fileName variables used for?
          3. Shouldn't HiveResultSetMetadata.java be doing something?
          4. In HiveResultSet.java:436 the getDouble function is missing TODO comment? Same is true for getFloat and getInt.
          5. For the non implemented functions should we through a non implemented run time exception instead of just returning 0.

          Show
          Ashish Thusoo added a comment - A few questions: 1. What is the motivation for Driver.java:87 change? 2. In JdbcSessionState.java what are execString and fileName variables used for? 3. Shouldn't HiveResultSetMetadata.java be doing something? 4. In HiveResultSet.java:436 the getDouble function is missing TODO comment? Same is true for getFloat and getInt. 5. For the non implemented functions should we through a non implemented run time exception instead of just returning 0.
          Raghotham Murthy made changes -
          Attachment hive-48.6.patch [ 12397014 ]
          Hide
          Raghotham Murthy added a comment -

          I have added functionality to getByte, getBoolean, getDouble, getFloat, getInt, getLong, getObject, getShort, and changed getString to do a toString.

          Also added a single test to test getInt. Right now, anything other than select * uses MetaDataTypedColumnSetSerDe - so, the result schema has all columns as strings. Will add more tests to jdbc once we start using DynamicSerDe for all queries.

          Show
          Raghotham Murthy added a comment - I have added functionality to getByte, getBoolean, getDouble, getFloat, getInt, getLong, getObject, getShort, and changed getString to do a toString. Also added a single test to test getInt. Right now, anything other than select * uses MetaDataTypedColumnSetSerDe - so, the result schema has all columns as strings. Will add more tests to jdbc once we start using DynamicSerDe for all queries.
          Raghotham Murthy made changes -
          Attachment hive-48.5.patch [ 12396464 ]
          Hide
          Raghotham Murthy added a comment -

          Fixed url parsing. Also added standalone server option for testing.

          Show
          Raghotham Murthy added a comment - Fixed url parsing. Also added standalone server option for testing.
          Hide
          Edward Capriolo added a comment -

          A few people developing servers Thrift/JDBC need to modify the bin/hive script. HIVE-107 what you think of this idea and if it works for you.

          Show
          Edward Capriolo added a comment - A few people developing servers Thrift/JDBC need to modify the bin/hive script. HIVE-107 what you think of this idea and if it works for you.
          Michi Mutsuzaki made changes -
          Attachment hadoop-4101.4.patch [ 12395218 ]
          Hide
          Michi Mutsuzaki added a comment -

          Previous patch was not working. Giving another try.

          Show
          Michi Mutsuzaki added a comment - Previous patch was not working. Giving another try.
          Ashish Thusoo made changes -
          Assignee Michi Mutsuzaki [ michim ]
          Hide
          Raghotham Murthy added a comment -

          This patch doesnt seem to work. guess you need to upload a patch which adds the jdbc directory. also, can you generate a patch by running 'git diff --no-prefix'. That will allow us to apply the patch with patch -p0.

          Show
          Raghotham Murthy added a comment - This patch doesnt seem to work. guess you need to upload a patch which adds the jdbc directory. also, can you generate a patch by running 'git diff --no-prefix'. That will allow us to apply the patch with patch -p0.
          Michi Mutsuzaki made changes -
          Attachment hadoop-4101.3.patch [ 12395175 ]
          Hide
          Michi Mutsuzaki added a comment -

          [Attached hadoop-4101.3.patch]

          A temporary patch for jdbc support. You need to apply the patch from HIVE-73 before using this patch. See jdbc/src/test/org/apache/hadoop/hive/jdbc/TestHiveDriver.java for supported methods.

          Show
          Michi Mutsuzaki added a comment - [Attached hadoop-4101.3.patch] A temporary patch for jdbc support. You need to apply the patch from HIVE-73 before using this patch. See jdbc/src/test/org/apache/hadoop/hive/jdbc/TestHiveDriver.java for supported methods.
          Ashish Thusoo made changes -
          Component/s Clients [ 12312587 ]
          Raghotham Murthy made changes -
          Link This issue is blocked by HIVE-73 [ HIVE-73 ]
          Prasad Chakka made changes -
          Comment [ +1
          reviewed manually. looks good for now.

          though we need to move to a directory structure that is similar to that of hadoop. (hive/trunk/src/ql hive/trunk/src/metastore) ]
          Hide
          Ashish Thusoo added a comment -

          yes just come over at 7pm.

          Also hive mailing list have changed to

          hive-user@hadoop.apache.org
          hive-dev@hadoop.apache.org
          hive-commits@hadoop.apache.org

          So you may want to subscribe to those (this is because hive is in the process of becoming a subproject under hadoop).

          Show
          Ashish Thusoo added a comment - yes just come over at 7pm. Also hive mailing list have changed to hive-user@hadoop.apache.org hive-dev@hadoop.apache.org hive-commits@hadoop.apache.org So you may want to subscribe to those (this is because hive is in the process of becoming a subproject under hadoop).
          Owen O'Malley made changes -
          Issue Type Improvement [ 4 ] Bug [ 1 ]
          Component/s contrib/hive [ 12312455 ]
          Key HADOOP-4101 HIVE-48
          Project Hadoop Core [ 12310240 ] Hadoop Hive [ 12310843 ]
          Hide
          Namit Jain added a comment -

          7pm is fine with me

          Show
          Namit Jain added a comment - 7pm is fine with me
          Hide
          Michi Mutsuzaki added a comment -

          7pm?

          --Michi

          Show
          Michi Mutsuzaki added a comment - 7pm? --Michi
          Hide
          Ashish Thusoo added a comment -

          Type 4 should work I guess.

          I guess if you use that then you can sidestep the inheritance stuff that I was alluding to. Basically my concern was that if the server APIs mimicked the javax.sql APIs then inheritance would be a problem.

          Can you guys come over tomorrow sometime in the afternoon?

          Show
          Ashish Thusoo added a comment - Type 4 should work I guess. I guess if you use that then you can sidestep the inheritance stuff that I was alluding to. Basically my concern was that if the server APIs mimicked the javax.sql APIs then inheritance would be a problem. Can you guys come over tomorrow sometime in the afternoon?
          Raghotham Murthy made changes -
          Attachment hadoop-4101.2.patch [ 12393621 ]
          Hide
          Raghotham Murthy added a comment -

          I am not sure I understand what Ashish meant by 'inheritance in JDBC metadata calls'. The plan is to include metastore.thrift in hive_service.thrift and then hive_service will just forward metadata calls to the metastore code. I guess with inheritance we wouldnt have to implement the forwarding functions. Is this what you mean Ashish?

          And yes, we should have a single implementation of the thrift server container for HiveServer and Metastore. JDBC would then be a wrapper on top of the thrift hive client.

          update: step 4a above has been completed - can now issue queries via HiveClient and retrieve results. HiveServer - a thrift server - actually runs the queries via ql/Driver. I am attaching the patch with the code for the thrift server/client.

          We should meet up to figure out what the plan is for the JDBC client.

          Show
          Raghotham Murthy added a comment - I am not sure I understand what Ashish meant by 'inheritance in JDBC metadata calls'. The plan is to include metastore.thrift in hive_service.thrift and then hive_service will just forward metadata calls to the metastore code. I guess with inheritance we wouldnt have to implement the forwarding functions. Is this what you mean Ashish? And yes, we should have a single implementation of the thrift server container for HiveServer and Metastore. JDBC would then be a wrapper on top of the thrift hive client. update: step 4a above has been completed - can now issue queries via HiveClient and retrieve results. HiveServer - a thrift server - actually runs the queries via ql/Driver. I am attaching the patch with the code for the thrift server/client. We should meet up to figure out what the plan is for the JDBC client.
          Hide
          Michi Mutsuzaki added a comment -

          I was thinking the JDBC driver will be of type 4:

          http://en.wikipedia.org/wiki/JDBC_driver#Type_4_Driver_-_Native-Protocol_Driver

          which means there is a server <--> client api that is independent of JDBC, and JDBC driver uses the client api.

          We should meet up to make sure we are all on the same page. Ragho, can you set up a meeting?

          --Michi

          Show
          Michi Mutsuzaki added a comment - I was thinking the JDBC driver will be of type 4: http://en.wikipedia.org/wiki/JDBC_driver#Type_4_Driver_-_Native-Protocol_Driver which means there is a server <--> client api that is independent of JDBC, and JDBC driver uses the client api. We should meet up to make sure we are all on the same page. Ragho, can you set up a meeting? --Michi
          Hide
          Ashish Thusoo added a comment -

          How are you planning to implement the metadata calls. There is a lot of inheritance in the JDBC metadata calls and from what I understand, thrift does not support inheritance.

          Also, if you do go the thrift route, it may be better to share the server container code between the metastore and the JDBC driver, the apis I think should be independent and should be kept separate. While reorganizing the code, it may be worthwhile to put the server portion of it in common and then share it between the metastore and service..

          Show
          Ashish Thusoo added a comment - How are you planning to implement the metadata calls. There is a lot of inheritance in the JDBC metadata calls and from what I understand, thrift does not support inheritance. Also, if you do go the thrift route, it may be better to share the server container code between the metastore and the JDBC driver, the apis I think should be independent and should be kept separate. While reorganizing the code, it may be worthwhile to put the server portion of it in common and then share it between the metastore and service..
          Hide
          Michi Mutsuzaki added a comment -

          Ragho: I confirm.

          • I should be able to finish implementing HiveServer.java/HiveClient.java by the end of this week (maybe by Sunday). As Ragho said, right now we have only 2 methods: void execute(String query) and list<String> fetch_row().
          • After that, I will modify the JDBC driver to use HiveClient.
          • Command line interface can use either HiveClient or JDBC driver.
          • I'm usually available after 7 on tue-fri.

          --Michi

          Show
          Michi Mutsuzaki added a comment - Ragho: I confirm. I should be able to finish implementing HiveServer.java/HiveClient.java by the end of this week (maybe by Sunday). As Ragho said, right now we have only 2 methods: void execute(String query) and list<String> fetch_row(). After that, I will modify the JDBC driver to use HiveClient. Command line interface can use either HiveClient or JDBC driver. I'm usually available after 7 on tue-fri. --Michi
          Hide
          Prasad Chakka added a comment -

          Regarding unused files in metastore, these are the files that got carried over hive prototype which used file based metastore. We left them there in case some one wants to use file based metastore. So in a sense they are useful and there are tests.

          I think we should combine the servers now. It will be difficult and time consuming to merge them later. Advanced users can still have two installations of the same server but direct metadata calls to one server and data calls to another server. But the default case, there will be only one server and easier for maintenance.

          Only issue I see is that metastore code is independent of ql/cli code. So it might be better to build JDBC server on top of metastore server (ie extend metastore server) and import metastore thrift IDL into service thrift IDL. So the JDBC service would be a superset of metastore functionality.

          What do you guys think?

          Show
          Prasad Chakka added a comment - Regarding unused files in metastore, these are the files that got carried over hive prototype which used file based metastore. We left them there in case some one wants to use file based metastore. So in a sense they are useful and there are tests. I think we should combine the servers now. It will be difficult and time consuming to merge them later. Advanced users can still have two installations of the same server but direct metadata calls to one server and data calls to another server. But the default case, there will be only one server and easier for maintenance. Only issue I see is that metastore code is independent of ql/cli code. So it might be better to build JDBC server on top of metastore server (ie extend metastore server) and import metastore thrift IDL into service thrift IDL. So the JDBC service would be a superset of metastore functionality. What do you guys think?
          Hide
          Raghotham Murthy added a comment -

          Michi and I were discussing this over the weekend. Here's our current thinking about the design. Michi, pls confirm.

          1. implement a thrift client/server for hive. for now, the interface consists only of execute and fetch_row. we were able to setup the framework with a thrift server and a java client which talks to the server. next step is to get the server to run the queries.
          notes: we looked at the metastore code and thought it might be simpler to first implement a separate thrift client/server before merging it with the metastore. some installations might want to have separate instances of metastore and hive server. and, its easier to test a smaller interface where we understand the code. also, metastore code seems to have classes which arent being used at all and the scripts to start/stop metastore dont really work in non-facebook installations (need to file separate jiras for those).

          2. build a jdbc interface which makes calls to the generated java thrift client. we could also have python and perl dbi interfaces which can be make calls to the generated thrift client code in those languages. so, the thrift interface is a generic interface which is not specific to any particular standard (jdbc/dbi etc).

          3. the directory structure in the code would be as follows in src/contrib/hive. it follows a similar model to metastore.

          service/if/hive_service.thrift
          service/include/<headers from thrift>
          service/fb303/<scripts for service_ctrl to manage server>
          service/src/gen-javabean/<generated java code>
          service/src/gen-php/<generated php>
          service/src/gen-py/<generated python>
          service/src/gen-perl/<generated perl>
          service/src/scripts/<ctrl scripts for server>
          service/src/java/org/apache/hadoop/hive/service/HiveServer.java
          service/src/java/org/apache/hadoop/hive/service/HiveClient.java
          jdbc/src/java/org/apache/hadoop/hive/jdbc/<whatever is in current jdbc patch>
          dbi/<perl dbi interface calling service/src/gen-perl>
          cli/<changed to use HiveClient or HiveJdbc>

          4. next steps
          a. get server to run queries and return results to client.
          b. move ql/Driver.java to service since the actual running of the query is not really part of the query language.
          c. change cli to use the service
          d. verify which parts of the metastore interface are needed by jdbc and move/copy over parts to hive_service - i dont think it makes sense to do it the other way around i.e. put the hive service into metastore since metastore is not the right abstraction to actually run queries.
          e. there is common thrift code in metastore and service. we should either move it to a seprate thrift directory or make metastore use stuff from service.

          It will be good to meet up to discuss them in more detail. I'll let Michi provide a patch for the hive server/client and jdbc wrappers for the hive client.

          Show
          Raghotham Murthy added a comment - Michi and I were discussing this over the weekend. Here's our current thinking about the design. Michi, pls confirm. 1. implement a thrift client/server for hive. for now, the interface consists only of execute and fetch_row. we were able to setup the framework with a thrift server and a java client which talks to the server. next step is to get the server to run the queries. notes: we looked at the metastore code and thought it might be simpler to first implement a separate thrift client/server before merging it with the metastore. some installations might want to have separate instances of metastore and hive server. and, its easier to test a smaller interface where we understand the code. also, metastore code seems to have classes which arent being used at all and the scripts to start/stop metastore dont really work in non-facebook installations (need to file separate jiras for those). 2. build a jdbc interface which makes calls to the generated java thrift client. we could also have python and perl dbi interfaces which can be make calls to the generated thrift client code in those languages. so, the thrift interface is a generic interface which is not specific to any particular standard (jdbc/dbi etc). 3. the directory structure in the code would be as follows in src/contrib/hive. it follows a similar model to metastore. service/if/hive_service.thrift service/include/<headers from thrift> service/fb303/<scripts for service_ctrl to manage server> service/src/gen-javabean/<generated java code> service/src/gen-php/<generated php> service/src/gen-py/<generated python> service/src/gen-perl/<generated perl> service/src/scripts/<ctrl scripts for server> service/src/java/org/apache/hadoop/hive/service/HiveServer.java service/src/java/org/apache/hadoop/hive/service/HiveClient.java jdbc/src/java/org/apache/hadoop/hive/jdbc/<whatever is in current jdbc patch> dbi/<perl dbi interface calling service/src/gen-perl> cli/<changed to use HiveClient or HiveJdbc> 4. next steps a. get server to run queries and return results to client. b. move ql/Driver.java to service since the actual running of the query is not really part of the query language. c. change cli to use the service d. verify which parts of the metastore interface are needed by jdbc and move/copy over parts to hive_service - i dont think it makes sense to do it the other way around i.e. put the hive service into metastore since metastore is not the right abstraction to actually run queries. e. there is common thrift code in metastore and service. we should either move it to a seprate thrift directory or make metastore use stuff from service. It will be good to meet up to discuss them in more detail. I'll let Michi provide a patch for the hive server/client and jdbc wrappers for the hive client.
          Hide
          Namit Jain added a comment -

          Hi Michi, Any updates on this. If you want to meet to discuss in more detail, we can also meet

          Show
          Namit Jain added a comment - Hi Michi, Any updates on this. If you want to meet to discuss in more detail, we can also meet
          Hide
          Michi Mutsuzaki added a comment -

          I talked about this with Ragho.

          • The next step is to separate client from the server.
          • I'll check if we can use thrift to implement JDBC server/client.

          --Michi

          Show
          Michi Mutsuzaki added a comment - I talked about this with Ragho. The next step is to separate client from the server. I'll check if we can use thrift to implement JDBC server/client. --Michi
          Hide
          Prasad Chakka added a comment -

          There is already a MetaStore server (HiveMetaStore.java). It is a thrift service so I am not sure it would fit requirements for JDBC server. If it does, we should add JDBC functionality to this server.

          Show
          Prasad Chakka added a comment - There is already a MetaStore server (HiveMetaStore.java). It is a thrift service so I am not sure it would fit requirements for JDBC server. If it does, we should add JDBC functionality to this server.
          Hide
          Joydeep Sen Sarma added a comment -

          ok - Ashish just walked us through a couple of scenarios:

          • BI tool has server side. in this case the approach in this patch might work - but the concern about setting up classpaths and the suitability of running hadoop code setting classloaders and stuff on the same JVM as the BI server is suspect. At the minimum this has significant integration issues for each BI server.
          • BI tool does not have a server side - only a client. I think this is a very common scenario and something which we should try to cover (since the whole premise of hadoop/hive is to avoid spending a lot of money - which is what BI tools with server side will require). In this case - the approach in this patch will be hard to make work because of firewalling issues that i had mentioned in the previous post (even if all the technical issues like hive treatment of windows paths are resolved).

          hopefully this captures the issues more accurately.

          Show
          Joydeep Sen Sarma added a comment - ok - Ashish just walked us through a couple of scenarios: BI tool has server side. in this case the approach in this patch might work - but the concern about setting up classpaths and the suitability of running hadoop code setting classloaders and stuff on the same JVM as the BI server is suspect. At the minimum this has significant integration issues for each BI server. BI tool does not have a server side - only a client. I think this is a very common scenario and something which we should try to cover (since the whole premise of hadoop/hive is to avoid spending a lot of money - which is what BI tools with server side will require). In this case - the approach in this patch will be hard to make work because of firewalling issues that i had mentioned in the previous post (even if all the technical issues like hive treatment of windows paths are resolved). hopefully this captures the issues more accurately.
          Hide
          Joydeep Sen Sarma added a comment -

          how are we planning on picking up the hadoop and hive configuration file? (the cli picks them up through the classpath). the same concern applies to jar files (there's configuration in the cli shell script to set it up to include jars in auxlib).

          We will need a client-server model. the cli does not, for example, run on cygwin/windows and there are all manner of pathing issues that we would need to fix to make that work. within facebook - we won't even be able to access hdfs directly from windows agents that are outside the secure zone (only http ports are available i believe). i verified from Dhruba that this is the case in yahoo as well. so - we just can't run queries directly from windows machines without a server side that is within the secure zone.

          Show
          Joydeep Sen Sarma added a comment - how are we planning on picking up the hadoop and hive configuration file? (the cli picks them up through the classpath). the same concern applies to jar files (there's configuration in the cli shell script to set it up to include jars in auxlib). We will need a client-server model. the cli does not, for example, run on cygwin/windows and there are all manner of pathing issues that we would need to fix to make that work. within facebook - we won't even be able to access hdfs directly from windows agents that are outside the secure zone (only http ports are available i believe). i verified from Dhruba that this is the case in yahoo as well. so - we just can't run queries directly from windows machines without a server side that is within the secure zone.
          Hide
          Namit Jain added a comment -

          The Driver API has changed - it is now integrated with the serde and returns a vector<string> instead of vector<vector<string>> wrongly.
          That needs to be changed also.

          Show
          Namit Jain added a comment - The Driver API has changed - it is now integrated with the serde and returns a vector<string> instead of vector<vector<string>> wrongly. That needs to be changed also.
          Hide
          Namit Jain added a comment -

          Michi, did you consider having a client-server approach for the JDBC server ? There is nothing wrong with this approach - infact, this way, the server does not become a single point of failure.
          The client does become thicker, which may be acceptable. I just wanted to know did you consider the pros-cons of that approach.

          Show
          Namit Jain added a comment - Michi, did you consider having a client-server approach for the JDBC server ? There is nothing wrong with this approach - infact, this way, the server does not become a single point of failure. The client does become thicker, which may be acceptable. I just wanted to know did you consider the pros-cons of that approach.
          Hide
          Jeff Hammerbacher added a comment -

          Nice, Michi! Will poke at this tomorrow.

          Show
          Jeff Hammerbacher added a comment - Nice, Michi! Will poke at this tomorrow.
          Michi Mutsuzaki made changes -
          Release Note JDBC Driver
          Status Open [ 1 ] Patch Available [ 10002 ]
          Michi Mutsuzaki made changes -
          Field Original Value New Value
          Attachment hadoop-4101.1.patch [ 12392852 ]
          Hide
          Michi Mutsuzaki added a comment -

          Added a JDBC driver for hive. Look at src/contrib/hive/ql/src/test/org/apache/hadoop/hive/ql/jdbc/TestHiveDriver.java for example.

          Next steps:

          • provide a hive standalone server
          • integrate with hive metastore (e.g. support different types)
          Show
          Michi Mutsuzaki added a comment - Added a JDBC driver for hive. Look at src/contrib/hive/ql/src/test/org/apache/hadoop/hive/ql/jdbc/TestHiveDriver.java for example. Next steps: provide a hive standalone server integrate with hive metastore (e.g. support different types)
          Hide
          Raghotham Murthy added a comment -

          I had a preliminary set of classes. I didnt get a chance to finish working on them though. Michi has now taken those classes and I believe he has something working now. I'll let him post a patch.

          Show
          Raghotham Murthy added a comment - I had a preliminary set of classes. I didnt get a chance to finish working on them though. Michi has now taken those classes and I believe he has something working now. I'll let him post a patch.
          Hide
          Jeff Hammerbacher added a comment -

          Raghu, any progress on the JDBC driver?

          Show
          Jeff Hammerbacher added a comment - Raghu, any progress on the JDBC driver?
          Hide
          Raghotham Murthy added a comment -

          I had already added it to the roadmap. Regarding the simple jdbc driver, I will submit a patch next week.

          Show
          Raghotham Murthy added a comment - I had already added it to the roadmap. Regarding the simple jdbc driver, I will submit a patch next week.
          Hide
          Ashish Thusoo added a comment -

          Also I wanted to add that we have tried to structure the Driver code in such a way that we follow the execute/fetch paradgm that is followed by JDBC drivers - though admittedly the metadata part of jdbc is harder than the data part.

          Also Raghu was looking into creating a simple jdbc driver for hive. We should add that to the hive roadmap wiki.

          Show
          Ashish Thusoo added a comment - Also I wanted to add that we have tried to structure the Driver code in such a way that we follow the execute/fetch paradgm that is followed by JDBC drivers - though admittedly the metadata part of jdbc is harder than the data part. Also Raghu was looking into creating a simple jdbc driver for hive. We should add that to the hive roadmap wiki.
          Hide
          Ashish Thusoo added a comment -

          completely agree on this. With a jdbc driver the front end integration would be much easier.

          Show
          Ashish Thusoo added a comment - completely agree on this. With a jdbc driver the front end integration would be much easier.
          YoungWoo Kim created issue -

            People

            • Assignee:
              Raghotham Murthy
              Reporter:
              YoungWoo Kim
            • Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development