Hive
  1. Hive
  2. HIVE-818

Create a Hive CLI that connects to hive ThriftServer

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Labels:
      None

      Description

      We should have an alternate CLI that works by interacting with the HiveServer, in this way it will be ready when/if we deprecate the current CLI.

      1. HIVE-818.5.patch
        72 kB
        Ning Zhang
      2. HIVE-818.4.patch
        74 kB
        Ning Zhang
      3. HIVE-818.3.patch
        66 kB
        Ning Zhang
      4. HIVE-818.2.patch
        64 kB
        Ning Zhang
      5. Hive-881_2.patch
        52 kB
        Ning Zhang
      6. HIVE-818.patch
        65 kB
        Ning Zhang

        Activity

        Hide
        Min Zhou added a comment -

        this feature looks pretty good for us, we were looking for a CLI mode client of hive server.

        Show
        Min Zhou added a comment - this feature looks pretty good for us, we were looking for a CLI mode client of hive server.
        Hide
        Ning Zhang added a comment -

        Edward, are you working on this issue? If you are not working on this, do you mind if I assign it to myself?

        Show
        Ning Zhang added a comment - Edward, are you working on this issue? If you are not working on this, do you mind if I assign it to myself?
        Hide
        John Sichi added a comment -

        Ning, also take a look at HIVE-987; maybe we should move over to using JDBC+sqlline. I've added a second patch there which should work with latest Hive trunk.

        Show
        John Sichi added a comment - Ning, also take a look at HIVE-987 ; maybe we should move over to using JDBC+sqlline. I've added a second patch there which should work with latest Hive trunk.
        Hide
        Edward Capriolo added a comment -

        No I am not working on this. It's all you.

        Show
        Edward Capriolo added a comment - No I am not working on this. It's all you.
        Hide
        Ning Zhang added a comment -

        Thanks for the pointer John. /as you pointed out in HIVE-987 I think it would be good to have both the old CLI and sqlline supported. CLI is for backward compatibility and sqlline can test the JDBC and give user better user experience.

        Show
        Ning Zhang added a comment - Thanks for the pointer John. /as you pointed out in HIVE-987 I think it would be good to have both the old CLI and sqlline supported. CLI is for backward compatibility and sqlline can test the JDBC and give user better user experience.
        Hide
        Ning Zhang added a comment -

        This patch does the following:

        • add 2 options (-h, -p) in CLI to specify the hostname and port of Hive server.
        • change the HiveServer to output non-Hive commands (non Driver) to a temp file and change the fetchOne/fetchN/fetchAll functions to get results from the temp file.
        • change the fetchOne function to throw a HiveServerException (error code 0) when reaching the end of result rather than sending an empty string.

        Caveats:

        • session.err from the HiveServer is still not sending back to client. So the progress of a Hadoop job is not shown in the client side in remote mode (I think there is a JIRA opened already. If not I wil file a follow-up JIRA for this).
        • now end-to-end unit test for remote mode. I manually tested HiveServer and CLI in remote mode (set/dfs/SQL commands) and in combination of -e/-f options. I will file a follow-up JIRA for creating a unit test suite for remote mode CLI.
        Show
        Ning Zhang added a comment - This patch does the following: add 2 options (-h, -p) in CLI to specify the hostname and port of Hive server. change the HiveServer to output non-Hive commands (non Driver) to a temp file and change the fetchOne/fetchN/fetchAll functions to get results from the temp file. change the fetchOne function to throw a HiveServerException (error code 0) when reaching the end of result rather than sending an empty string. Caveats: session.err from the HiveServer is still not sending back to client. So the progress of a Hadoop job is not shown in the client side in remote mode (I think there is a JIRA opened already. If not I wil file a follow-up JIRA for this). now end-to-end unit test for remote mode. I manually tested HiveServer and CLI in remote mode (set/dfs/SQL commands) and in combination of -e/-f options. I will file a follow-up JIRA for creating a unit test suite for remote mode CLI.
        Hide
        Ning Zhang added a comment -
        Show
        Ning Zhang added a comment - Review board: https://reviews.apache.org/r/407/
        Hide
        Ning Zhang added a comment -

        Resolved some conflicts with the current trunk.

        Show
        Ning Zhang added a comment - Resolved some conflicts with the current trunk.
        Hide
        He Yongqiang added a comment -

        I will take a look

        Show
        He Yongqiang added a comment - I will take a look
        Hide
        He Yongqiang added a comment -

        Ning, the patch on this jira seems not correct? (but the review is right)

        Can you update a new patch?

        Show
        He Yongqiang added a comment - Ning, the patch on this jira seems not correct? (but the review is right) Can you update a new patch?
        Hide
        Ning Zhang added a comment -

        Sorry HIVE-881_2.patch is the wrong patch. I'm uploading the correct one (review board has the correct one).

        Show
        Ning Zhang added a comment - Sorry HIVE-881 _2.patch is the wrong patch. I'm uploading the correct one (review board has the correct one).
        Hide
        He Yongqiang added a comment -

        The change looks good to me.
        a few minor comments:
        1) do you need to increase the VERSION number in HiveServer?
        2) is it better to put the setupSessionIO() in execute()? If it is already there, should we remove the one in the constructor? And cleanup the Driver at the end of execute()?
        3) the len and pos local var in cleanTmpFile is not used.
        4) maybe not related to this jira: the SessionState in Hive is thread local object, is it guaranteed that the HiveServerHandler is also thread local, (so there is a 1-1 match)?

        Show
        He Yongqiang added a comment - The change looks good to me. a few minor comments: 1) do you need to increase the VERSION number in HiveServer? 2) is it better to put the setupSessionIO() in execute()? If it is already there, should we remove the one in the constructor? And cleanup the Driver at the end of execute()? 3) the len and pos local var in cleanTmpFile is not used. 4) maybe not related to this jira: the SessionState in Hive is thread local object, is it guaranteed that the HiveServerHandler is also thread local, (so there is a 1-1 match)?
        Hide
        Ning Zhang added a comment -

        1) do you need to increase the VERSION number in HiveServer?
        Good point. I've changed in the next patch.

        2) is it better to put the setupSessionIO() in execute()? If it is already there, should we remove the one in the constructor? And cleanup the Driver at the end of execute()?

        session IO cannot be cleaned up at the end of execute(). The data is copied back to the client by fetch* functions, so the client has to do the clean up. Also sessionIO is better to be set up in the constructors because out and err can be used by any function (not only execute). The execute() function is just doing a cleanup work.

        3) the len and pos local var in cleanTmpFile is not used.
        Will do.

        4) maybe not related to this jira: the SessionState in Hive is thread local object, is it guaranteed that the HiveServerHandler is also thread local, (so there is a 1-1 match)?
        HiveServer constructs a new HiveServerHandler for each worker thread. So for each CLI remote connection there is a HiveServerHandler, which will create a thread local SessionState. I've manually tested 100 parallel runs of remote CLI and they are fine.

        Show
        Ning Zhang added a comment - 1) do you need to increase the VERSION number in HiveServer? Good point. I've changed in the next patch. 2) is it better to put the setupSessionIO() in execute()? If it is already there, should we remove the one in the constructor? And cleanup the Driver at the end of execute()? session IO cannot be cleaned up at the end of execute(). The data is copied back to the client by fetch* functions, so the client has to do the clean up. Also sessionIO is better to be set up in the constructors because out and err can be used by any function (not only execute). The execute() function is just doing a cleanup work. 3) the len and pos local var in cleanTmpFile is not used. Will do. 4) maybe not related to this jira: the SessionState in Hive is thread local object, is it guaranteed that the HiveServerHandler is also thread local, (so there is a 1-1 match)? HiveServer constructs a new HiveServerHandler for each worker thread. So for each CLI remote connection there is a HiveServerHandler, which will create a thread local SessionState. I've manually tested 100 parallel runs of remote CLI and they are fine.
        Hide
        He Yongqiang added a comment -

        Removing sessionIOcleanup from constructor should be fine, because every method which needs to write to the tmpfile should do a clean up first. Otherwise they will see results from other methods.

        Show
        He Yongqiang added a comment - Removing sessionIOcleanup from constructor should be fine, because every method which needs to write to the tmpfile should do a clean up first. Otherwise they will see results from other methods.
        Hide
        He Yongqiang added a comment -

        sessionIOcleanup should not be a big issue. running tests.

        Show
        He Yongqiang added a comment - sessionIOcleanup should not be a big issue. running tests.
        Hide
        Ning Zhang added a comment -

        Uploading a patch that resolves some failures in TestHiveServer and TestJdbcCliDriver.

        Show
        Ning Zhang added a comment - Uploading a patch that resolves some failures in TestHiveServer and TestJdbcCliDriver.
        Hide
        Ning Zhang added a comment -

        Yongqiang, have you got any chance to look at the patch?

        Show
        Ning Zhang added a comment - Yongqiang, have you got any chance to look at the patch?
        Hide
        He Yongqiang added a comment -

        sorry, just saw that you already uploaded a new patch.
        will start running tests after 1517.

        Show
        He Yongqiang added a comment - sorry, just saw that you already uploaded a new patch. will start running tests after 1517.
        Hide
        He Yongqiang added a comment -

        Ning, can you update a new patch?
        (Sorry, I just committed 1517, and seems there are some conflicts with this one.)

        Show
        He Yongqiang added a comment - Ning, can you update a new patch? (Sorry, I just committed 1517, and seems there are some conflicts with this one.)
        Hide
        Ning Zhang added a comment -

        updated to the current trunk and resolved conflict in OptionsProcessor.java.

        Show
        Ning Zhang added a comment - updated to the current trunk and resolved conflict in OptionsProcessor.java.
        Hide
        He Yongqiang added a comment -

        Committed! Thanks Ning!

        Show
        He Yongqiang added a comment - Committed! Thanks Ning!

          People

          • Assignee:
            Ning Zhang
            Reporter:
            Edward Capriolo
          • Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development