Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4.0
    • Component/s: Clients, ODBC
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      HIVE-187. Preliminary ODBC Support. (Eric Hwang via rmurthy)
    • Tags:
      ODBC

      Description

      We need to provide the a small number of functions to get basic query
      execution and retrieval of results. This is based on the tutorial provided
      here: http://www.easysoft.com/developer/languages/c/odbc_tutorial.html

      The minimum set of ODBC functions required are:
      SQLAllocHandle - for environment, connection, statement
      SQLSetEnvAttr
      SQLDriverConnect
      SQLExecDirect
      SQLNumResultCols
      SQLFetch
      SQLGetData
      SQLDisconnect
      SQLFreeHandle

      If required the plan would be to do the following:
      1. generate c++ client stubs for thrift server
      2. implement the required functions in c++ by calling the c++ client
      3. make the c++ functions in (2) extern C and then use those in the odbc
      SQL* functions
      4. provide a .so (in linux) which can be used by the ODBC clients.

      1. HIVE-187.1.patch
        966 kB
        Eric Hwang
      2. thrift_home_linux_32.tgz
        2.69 MB
        Eric Hwang
      3. thrift_home_linux_64.tgz
        2.85 MB
        Eric Hwang
      4. unixODBC-2.2.14.tgz
        2.14 MB
        Eric Hwang
      5. unixODBC-2.2.14-1.tgz
        2.14 MB
        Eric Hwang
      6. HIVE-187.2.patch
        1010 kB
        Eric Hwang
      7. unixODBC-2.2.14-2.tgz
        2.14 MB
        Eric Hwang
      8. unixODBC-2.2.14-3.tgz
        2.14 MB
        Eric Hwang
      9. HIVE-187.3.patch
        1001 kB
        Eric Hwang
      10. hive-187.4.patch
        1001 kB
        Raghotham Murthy
      11. unixodbc.patch
        1.11 MB
        Raghotham Murthy
      12. unixODBC-2.2.14-hive-patched.tar.gz
        2.15 MB
        Carl Steinbach
      13. thrift_64.r790732.tgz
        6.11 MB
        Ning Zhang

        Issue Links

          Activity

          Hide
          Jeff Hammerbacher added a comment -

          Hey,

          Has there been any progress on this issue or updates to the roadmap?

          Thanks,
          Jeff

          Show
          Jeff Hammerbacher added a comment - Hey, Has there been any progress on this issue or updates to the roadmap? Thanks, Jeff
          Hide
          Raghotham Murthy added a comment -

          we are picking this up again. the plan is to get something out within the next couple months.

          Show
          Raghotham Murthy added a comment - we are picking this up again. the plan is to get something out within the next couple months.
          Hide
          Ashish Thusoo added a comment -

          The plan is to modify psqlodbc to get this rolling. The information of this driver is available at

          http://pgfoundry.org/projects/psqlodbc/

          One issue is that this drivers is LGPL - I am not sure that is a show stopper, but I feel there is enough work here that we should reuse this for our ends.

          Show
          Ashish Thusoo added a comment - The plan is to modify psqlodbc to get this rolling. The information of this driver is available at http://pgfoundry.org/projects/psqlodbc/ One issue is that this drivers is LGPL - I am not sure that is a show stopper, but I feel there is enough work here that we should reuse this for our ends.
          Hide
          Ashish Thusoo added a comment -

          Eric is working on this full time. Sorry for all the ownership transfers..

          Show
          Ashish Thusoo added a comment - Eric is working on this full time. Sorry for all the ownership transfers..
          Hide
          Jeff Hammerbacher added a comment -

          Hey Eric,

          Has there been any progress on the ODBC driver?

          Thanks,
          Jeff

          Show
          Jeff Hammerbacher added a comment - Hey Eric, Has there been any progress on the ODBC driver? Thanks, Jeff
          Hide
          Eric Hwang added a comment -

          Hi all,

          The ODBC driver that I am now working on supports most of the basic ODBC functionality for connecting, executing queries, and fetching results. The build process is proving to be a little more complicated because the driver was developed using the unixODBC framework, which cannot be committed into the apache repo, but I'll see if I can include it as an attachment to this JIRA. If all goes well, I should have a patch up by the end of this week. Thanks for waiting guys.

          Regards,
          Eric

          Show
          Eric Hwang added a comment - Hi all, The ODBC driver that I am now working on supports most of the basic ODBC functionality for connecting, executing queries, and fetching results. The build process is proving to be a little more complicated because the driver was developed using the unixODBC framework, which cannot be committed into the apache repo, but I'll see if I can include it as an attachment to this JIRA. If all goes well, I should have a patch up by the end of this week. Thanks for waiting guys. Regards, Eric
          Hide
          Eric Hwang added a comment -

          Hi All,

          I have attached the first version of the patch for the Hive ODBC driver. Please keep in mind that this is still an initial version and is still very rough around the edges. However, it provides basic ODBC 3.51 API support for connecting, executing queries, fetching, etc. This driver has been successfully tested on 32-bit and 64-bit linux machines with iSQL. It has also been tested with partial success on applications such as MicroStrategy. The driver consists of two sections: the Hive client and the unixODBC API wrapper. The patch only contains the Hive client, while the unixODBC portion will be uploaded as a separate attachment that will not be part of this repository (for licensing reasons).

          My development environment for this driver is as follows:
          gcc/g++ version: (GCC) 4.0.2 20051125 (Red Hat 4.0.2-8)
          thrift trunk version: r790732 (the latest revision should be fine)

          Instructions to build and use the Hive ODBC Driver:

          I. Building Thrift:
          1. Get the latest revision of Thrift from http://incubator.apache.org/thrift/download/.
          2. Make and install thrift after tar extraction with the following commands (courtesy of Raghu):

          1. Configure and build thrift compiler and libraries
            $ cd thrift-instant-r790732
            $ ./configure --without-csharp --without-ruby --prefix=<thrift_install_path> && make -j4
          1. Install thrift
            $ make install
          1. Configure, build, and install fb303
            $ cd contrib/fb303
            $ ./bootstrap.sh
            $ ./configure --with-thriftpath=<thrift_install_path> --prefix <thrift_install_path>
            $ make && make install
            2a. I have also attached the precompiled Linux 32bit and 64bit versions of Thrift for your convenience.

          II. Building the Hive client (requires step I):
          1. From the hive root directory, run:
          $ ant compile-cpp -Dthrift.home=<thrift_install_path>

          • <thrift_install_path> should be an absolute path (a.k.a not relative) pointing to the root Thrift installation directory. The <thrift_install_path> referenced in parts I and II should have the same value.
          • You may optionally add '-Dword.size' with a value of 32 or 64 to specify the type of architecture the driver should compile into, but will detect this automatically on its own if unspecified.
          • If you get an undefined reference to vtables error, make sure that you specified the complete absolute path for thrift.home.
            2. Manually install the Hive client libraries by copying the contents of <hive_root>/build/odbc/lib and <hive_root>/build/odbc/include into the corresponding system folders. You may have to run ldconfig to update the dynamic linker's runtime libraries.
            NOTE: the compiled static library, libhiveclient.a, requires linking with stdc++ as well as thrift to function properly.
            3. When running the Hive test suite with 'ant test', specifying the argument '-Dthrift.home=<thrift_install_path>' will enable the tests for the Hive client. Keep in mind that the Hive client tests require a locally running Hive server on port 10000 to execute properly. I apologize for the messiness of the test code, which I whipped up rather quickly. This should be remedied in a later revision.

          III. Building unixODBC (requires step II):
          1. After extracting the unixODBC attachment, run:
          $ ./configure --enable-gui=no
          $ make
          $ make install

          • This will compile and install all of unixODBC, including the ODBC API wrapper for the Hive client. The ODBC API wrapper for the Hive client should be called libodbchive.so.

          IV. Installing the Hive driver into a Driver Manager (requires all prior steps):
          1. Find the odbc.ini file corresponding to the desired ODBC Driver Manager. If you are using unixODBC's Driver Manager, you should be able to run 'odbcinst -j' to list out the paths to important configuration files.
          2. Add the following entry to odbc.ini:
          [Hive]
          Driver = <path_to_libodbchive.so>
          Description = Hive Driver v1
          DATABASE = default
          HOST = <Hive_server_address>
          PORT = <Hive_server_port>
          FRAMED = 0

          Now you should be able to test out the new Hive ODBC driver with your applications that connect through this Driver Manager. If the Driver Manager reports that it cannot open the driver's shared library, make sure that libodbchive.so and libhiveclient.so have all dynamic library paths resolved and that they are both compiled into the proper architecture (32 or 64 bit). Tell me if you find the instructions to be confusing or incomplete.

          Show
          Eric Hwang added a comment - Hi All, I have attached the first version of the patch for the Hive ODBC driver. Please keep in mind that this is still an initial version and is still very rough around the edges. However, it provides basic ODBC 3.51 API support for connecting, executing queries, fetching, etc. This driver has been successfully tested on 32-bit and 64-bit linux machines with iSQL. It has also been tested with partial success on applications such as MicroStrategy. The driver consists of two sections: the Hive client and the unixODBC API wrapper. The patch only contains the Hive client, while the unixODBC portion will be uploaded as a separate attachment that will not be part of this repository (for licensing reasons). My development environment for this driver is as follows: gcc/g++ version: (GCC) 4.0.2 20051125 (Red Hat 4.0.2-8) thrift trunk version: r790732 (the latest revision should be fine) Instructions to build and use the Hive ODBC Driver: I. Building Thrift: 1. Get the latest revision of Thrift from http://incubator.apache.org/thrift/download/ . 2. Make and install thrift after tar extraction with the following commands (courtesy of Raghu): Configure and build thrift compiler and libraries $ cd thrift-instant-r790732 $ ./configure --without-csharp --without-ruby --prefix=<thrift_install_path> && make -j4 Install thrift $ make install Configure, build, and install fb303 $ cd contrib/fb303 $ ./bootstrap.sh $ ./configure --with-thriftpath=<thrift_install_path> --prefix <thrift_install_path> $ make && make install 2a. I have also attached the precompiled Linux 32bit and 64bit versions of Thrift for your convenience. II. Building the Hive client (requires step I): 1. From the hive root directory, run: $ ant compile-cpp -Dthrift.home=<thrift_install_path> <thrift_install_path> should be an absolute path (a.k.a not relative) pointing to the root Thrift installation directory. The <thrift_install_path> referenced in parts I and II should have the same value. You may optionally add '-Dword.size' with a value of 32 or 64 to specify the type of architecture the driver should compile into, but will detect this automatically on its own if unspecified. If you get an undefined reference to vtables error, make sure that you specified the complete absolute path for thrift.home. 2. Manually install the Hive client libraries by copying the contents of <hive_root>/build/odbc/lib and <hive_root>/build/odbc/include into the corresponding system folders. You may have to run ldconfig to update the dynamic linker's runtime libraries. NOTE: the compiled static library, libhiveclient.a, requires linking with stdc++ as well as thrift to function properly. 3. When running the Hive test suite with 'ant test', specifying the argument '-Dthrift.home=<thrift_install_path>' will enable the tests for the Hive client. Keep in mind that the Hive client tests require a locally running Hive server on port 10000 to execute properly. I apologize for the messiness of the test code, which I whipped up rather quickly. This should be remedied in a later revision. III. Building unixODBC (requires step II): 1. After extracting the unixODBC attachment, run: $ ./configure --enable-gui=no $ make $ make install This will compile and install all of unixODBC, including the ODBC API wrapper for the Hive client. The ODBC API wrapper for the Hive client should be called libodbchive.so. IV. Installing the Hive driver into a Driver Manager (requires all prior steps): 1. Find the odbc.ini file corresponding to the desired ODBC Driver Manager. If you are using unixODBC's Driver Manager, you should be able to run 'odbcinst -j' to list out the paths to important configuration files. 2. Add the following entry to odbc.ini: [Hive] Driver = <path_to_libodbchive.so> Description = Hive Driver v1 DATABASE = default HOST = <Hive_server_address> PORT = <Hive_server_port> FRAMED = 0 Now you should be able to test out the new Hive ODBC driver with your applications that connect through this Driver Manager. If the Driver Manager reports that it cannot open the driver's shared library, make sure that libodbchive.so and libhiveclient.so have all dynamic library paths resolved and that they are both compiled into the proper architecture (32 or 64 bit). Tell me if you find the instructions to be confusing or incomplete.
          Hide
          Eric Hwang added a comment -

          I should probably also note that the Hive Server is currently not thread safe (see JIRA HIVE-80: https://issues.apache.org/jira/browse/HIVE-80). This will prevent the driver from safely making multiple connections to the same Hive Server. We need to resolve this issue to allow the driver to operate properly.

          Show
          Eric Hwang added a comment - I should probably also note that the Hive Server is currently not thread safe (see JIRA HIVE-80 : https://issues.apache.org/jira/browse/HIVE-80 ). This will prevent the driver from safely making multiple connections to the same Hive Server. We need to resolve this issue to allow the driver to operate properly.
          Hide
          Eric Hwang added a comment -

          Fixed a compile bug.

          Show
          Eric Hwang added a comment - Fixed a compile bug.
          Hide
          Prasad Chakka added a comment -

          some comments on the build process

          1)Thrift build process:
          need to execute bootstrap.sh before configure
          make install fails without sudo

          2)Unixodbc:
          readline issue. Possibly set --enable-readline=no while configuring unixodbc to avoid this?
          need to do ldconfig after unixodbc is installed

          3)
          'Select * ' and 'explain select ...' crash isql.

          haven't finished looking at the code...

          Show
          Prasad Chakka added a comment - some comments on the build process 1)Thrift build process: need to execute bootstrap.sh before configure make install fails without sudo 2)Unixodbc: readline issue. Possibly set --enable-readline=no while configuring unixodbc to avoid this? need to do ldconfig after unixodbc is installed 3) 'Select * ' and 'explain select ...' crash isql. haven't finished looking at the code...
          Hide
          Raghotham Murthy added a comment -

          @prasad, its better to build from the instant releases (which dont require bootstrap.sh). The instructions should indicate downloading an instant release from: http://gitweb.thrift-rpc.org/?p=thrift.git;a=shortlog;h=refs/misc/instant. There are a couple problems running bootstrap.sh on macs.

          Ideally, specifying a prefix on which you have write permissions should be enough. but, java and python modules get written in the same location that java and python installations exist. for that we need sudo. we can get a full installation of thrift on a local directory by running make install after specifying the appropriate DESTDIR, PY_PREFIX and JAVA_PREFIX env variables.

          Show
          Raghotham Murthy added a comment - @prasad, its better to build from the instant releases (which dont require bootstrap.sh). The instructions should indicate downloading an instant release from: http://gitweb.thrift-rpc.org/?p=thrift.git;a=shortlog;h=refs/misc/instant . There are a couple problems running bootstrap.sh on macs. Ideally, specifying a prefix on which you have write permissions should be enough. but, java and python modules get written in the same location that java and python installations exist. for that we need sudo. we can get a full installation of thrift on a local directory by running make install after specifying the appropriate DESTDIR, PY_PREFIX and JAVA_PREFIX env variables.
          Hide
          Eric Hwang added a comment -

          I have added a wiki page documenting the Hive ODBC Driver. Please refer to the following page for driver build and usage instructions:

          http://wiki.apache.org/hadoop/Hive/HiveODBC

          Show
          Eric Hwang added a comment - I have added a wiki page documenting the Hive ODBC Driver. Please refer to the following page for driver build and usage instructions: http://wiki.apache.org/hadoop/Hive/HiveODBC
          Hide
          Eric Hwang added a comment -

          I just attached a new patch with some minor changes:
          -Fixed some error handling cases in the unixODBC API wrapper
          -Completed driver support for metadata calls: SQLColumns and SQLTables
          -Slight performance tweak to string copying
          -Included support for new HiveMetaStore's get_schema method to correctly return the entire schema (fields + partition)

          Make sure that you use HIVE-187.2.patch with the newest unixODBC API wrapper: unixODBC-2.2.14-1.tgz

          Build instructions and notes may be found in the the aforementioned apache wiki link.

          -Eric

          Show
          Eric Hwang added a comment - I just attached a new patch with some minor changes: -Fixed some error handling cases in the unixODBC API wrapper -Completed driver support for metadata calls: SQLColumns and SQLTables -Slight performance tweak to string copying -Included support for new HiveMetaStore's get_schema method to correctly return the entire schema (fields + partition) Make sure that you use HIVE-187 .2.patch with the newest unixODBC API wrapper: unixODBC-2.2.14-1.tgz Build instructions and notes may be found in the the aforementioned apache wiki link. -Eric
          Hide
          Eric Hwang added a comment -

          unixODBC-2.2.14-2:
          Changed some deprecated functions to pass control to newer equivalents when called.
          Modified how some error cases are being detected.

          Show
          Eric Hwang added a comment - unixODBC-2.2.14-2: Changed some deprecated functions to pass control to newer equivalents when called. Modified how some error cases are being detected.
          Hide
          Eric Hwang added a comment -

          Added some minor modifications to the unixODBC API wrapper that allows for proper handling of result sets without any rows.

          Show
          Eric Hwang added a comment - Added some minor modifications to the unixODBC API wrapper that allows for proper handling of result sets without any rows.
          Hide
          Eric Hwang added a comment -

          Uploaded a new patch with minor changes:
          Combined hiveclient.hpp with hiveclient.h to prevent duplication of header info.
          Fixed some formatting issues
          Did some clean up on test cases
          Changed the thrift generated namespaces to be more specific

          Show
          Eric Hwang added a comment - Uploaded a new patch with minor changes: Combined hiveclient.hpp with hiveclient.h to prevent duplication of header info. Fixed some formatting issues Did some clean up on test cases Changed the thrift generated namespaces to be more specific
          Hide
          Raghotham Murthy added a comment -

          added package-cpp target to include hive cpp libraries and include in dist.

          Show
          Raghotham Murthy added a comment - added package-cpp target to include hive cpp libraries and include in dist.
          Hide
          Raghotham Murthy added a comment -

          Patch file for unixodbc. Download unixodbc from http://www.unixodbc.org/unixODBC-2.2.14.tar.gz. Untar and then apply patch by running patch -p0 < unixodbc.patch from the top level directory.

          Then follow directions in http://wiki.apache.org/hadoop/Hive/HiveODBC

          Show
          Raghotham Murthy added a comment - Patch file for unixodbc. Download unixodbc from http://www.unixodbc.org/unixODBC-2.2.14.tar.gz . Untar and then apply patch by running patch -p0 < unixodbc.patch from the top level directory. Then follow directions in http://wiki.apache.org/hadoop/Hive/HiveODBC
          Hide
          Raghotham Murthy added a comment -

          Committed to both trunk and branch-0.4. Thanks Eric!!

          Show
          Raghotham Murthy added a comment - Committed to both trunk and branch-0.4. Thanks Eric!!
          Hide
          Carl Steinbach added a comment -

          Attaching a copy of unixODBC-2.2.14 with Eric's patch already applied.

          Show
          Carl Steinbach added a comment - Attaching a copy of unixODBC-2.2.14 with Eric's patch already applied.
          Hide
          Ning Zhang added a comment -

          Uploading thrift_64.r790732.tgz the complete 64 bit thrift libs (including libfb303.a) & binaries . These libraries are compiled under CentOS 5.2 (kernel 2.6.20, GCC 4.1.2)

          Show
          Ning Zhang added a comment - Uploading thrift_64.r790732.tgz the complete 64 bit thrift libs (including libfb303.a) & binaries . These libraries are compiled under CentOS 5.2 (kernel 2.6.20, GCC 4.1.2)

            People

            • Assignee:
              Eric Hwang
              Reporter:
              Raghotham Murthy
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development