Hive
  1. Hive
  2. HIVE-617

Script to start classes with hadoop and hive environment

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4.0
    • Component/s: Clients
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Add 'jar' service to hive client shell. (Edward Capriolo via rmurthy)
      Show
      Add 'jar' service to hive client shell. (Edward Capriolo via rmurthy)

      Description

      At times it may be required to write a process that uses both the Hadoop and Hive environment and API. For example, someone may write an application that uses the HIVE api directly. This patch will add a more generic --jar extension that can start any class with the proper environment.

      RUNJAR=/opt/hive/lib/hive_hwi.jar RUNCLASS=test.TestHive hive --service jar

      1. hive-617.patch
        0.6 kB
        Edward Capriolo
      2. TestHive.java
        2 kB
        Edward Capriolo
      3. hive-617.2.patch
        1 kB
        Edward Capriolo
      4. hive-617.3.diff
        1 kB
        Edward Capriolo

        Activity

        Hide
        Raghotham Murthy added a comment -

        Committed. Thanks Edward! Also created HIVE-744 for creating HiveShell.

        Show
        Raghotham Murthy added a comment - Committed. Thanks Edward! Also created HIVE-744 for creating HiveShell.
        Hide
        Raghotham Murthy added a comment -

        I am fine with it as well. I think we should create a separate jira for HiveShell though. Eventually, it would be good to move all tools to use the same code path for configuration management.

        Show
        Raghotham Murthy added a comment - I am fine with it as well. I think we should create a separate jira for HiveShell though. Eventually, it would be good to move all tools to use the same code path for configuration management.
        Hide
        Ashish Thusoo added a comment -

        I think I am fine with this.... Raghu?

        Show
        Ashish Thusoo added a comment - I think I am fine with this.... Raghu?
        Hide
        Edward Capriolo added a comment -

        I follow what you are saying.

        We may not need to add a helper class for the user. The user that would be using this is likely advanced and should/would use the OptionsProcessor and SessionState as they see fit.

        A reason I could see for creating HiveShell is we would like to retrofit all the current tools cli, lineage, hwi to fit into some interface ensure that the SessionState, hive history, and etc is started up properly.

        My use case is to be able to launch a class that can start a map/reduce program with the hadoop API and then execute a query with the hive API. I am using GregorianCalendar and date processing to figure out what files/partition to operate on, building hive strings and executing them directly with the QueryProcessor.

        It seems some people are using a combination of bash|perl|python and hive -e|-f. Other then my reliance on cron to start off these jobs, I am 100% pure Java.

        This tiny shell script is my entry point. For me, it does not need more sophistication but I could be missing something.

        Show
        Edward Capriolo added a comment - I follow what you are saying. We may not need to add a helper class for the user. The user that would be using this is likely advanced and should/would use the OptionsProcessor and SessionState as they see fit. A reason I could see for creating HiveShell is we would like to retrofit all the current tools cli, lineage, hwi to fit into some interface ensure that the SessionState, hive history, and etc is started up properly. My use case is to be able to launch a class that can start a map/reduce program with the hadoop API and then execute a query with the hive API. I am using GregorianCalendar and date processing to figure out what files/partition to operate on, building hive strings and executing them directly with the QueryProcessor. It seems some people are using a combination of bash|perl|python and hive -e|-f. Other then my reliance on cron to start off these jobs, I am 100% pure Java. This tiny shell script is my entry point. For me, it does not need more sophistication but I could be missing something.
        Hide
        Raghotham Murthy added a comment -

        I think we will need a class, say HiveShell, which will read the -hiveconf parameters, remove them from the command line and then invoke the user specified class with the remaining command line arguments. does this make sense?

        Show
        Raghotham Murthy added a comment - I think we will need a class, say HiveShell, which will read the -hiveconf parameters, remove them from the command line and then invoke the user specified class with the remaining command line arguments. does this make sense?
        Hide
        Edward Capriolo added a comment -

        This patch specifies the jarfile and class name as command line arguments like hadoop does. With this change the ordering of jarfile, classname and -hiveconf is now significant. Launching a jar from hadoop has similar constraints so this should not be an issue.

        Show
        Edward Capriolo added a comment - This patch specifies the jarfile and class name as command line arguments like hadoop does. With this change the ordering of jarfile, classname and -hiveconf is now significant. Launching a jar from hadoop has similar constraints so this should not be an issue.
        Hide
        Raghotham Murthy added a comment -

        Why not specify the jarfile and class name as command line arguments like hadoop does?

        Show
        Raghotham Murthy added a comment - Why not specify the jarfile and class name as command line arguments like hadoop does?
        Hide
        Edward Capriolo added a comment -

        TestHive is really not intended for inclusion. The target of the jira is the jar.sh script. We really don't test any of the sh scripts directly since they require the hadoop environment to work TestCLIDriver is an emulated environment. We can include TestHive but it is not actually a test of the jar.sh.

        Show
        Edward Capriolo added a comment - TestHive is really not intended for inclusion. The target of the jira is the jar.sh script. We really don't test any of the sh scripts directly since they require the hadoop environment to work TestCLIDriver is an emulated environment. We can include TestHive but it is not actually a test of the jar.sh.
        Hide
        Zheng Shao added a comment -

        Looks great!
        Can you include TestHIve.java in the patch (and put it in some package org.apache.hadoop.hive.examples, etc), and then invoke TestHive with "ant" in the "test" target (or maybe add a "test_scripts")?

        Show
        Zheng Shao added a comment - Looks great! Can you include TestHIve.java in the patch (and put it in some package org.apache.hadoop.hive.examples, etc), and then invoke TestHive with "ant" in the "test" target (or maybe add a "test_scripts")?
        Hide
        Edward Capriolo added a comment -

        Added License header. Added more verbose help comments. Fixed help instructions.

        Show
        Edward Capriolo added a comment - Added License header. Added more verbose help comments. Fixed help instructions.
        Hide
        Edward Capriolo added a comment -

        Included the patch, and a sample application (TestHive.java) that could be started in this manner.

        Show
        Edward Capriolo added a comment - Included the patch, and a sample application (TestHive.java) that could be started in this manner.
        Hide
        Edward Capriolo added a comment -

        patch adds bin/hive --jar

        Show
        Edward Capriolo added a comment - patch adds bin/hive --jar

          People

          • Assignee:
            Edward Capriolo
            Reporter:
            Edward Capriolo
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development