Pig
  1. Pig
  2. PIG-2482

Integrate HCat DDL command into Pig

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11
    • Component/s: impl
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Pig support sql command in script/grunt/embedding.

      Script/grunt:
      sql create table ......;

      Note the sql statement end with a ";"

      embedding:
      from org.apache.pig.scripting import Pig
      ret = Pig.sql("""drop table if exists table_1;""")
      if ret==0:
          #success

      Configuration:
      pig.sql.type=hcat (backend of sql, hcat is the only sql backend now)
      hcat.bin=/usr/local/hcat/bin/hcat (binary location for hcat)
      Show
      Pig support sql command in script/grunt/embedding. Script/grunt: sql create table ......; Note the sql statement end with a ";" embedding: from org.apache.pig.scripting import Pig ret = Pig.sql("""drop table if exists table_1;""") if ret==0:     #success Configuration: pig.sql.type=hcat (backend of sql, hcat is the only sql backend now) hcat.bin=/usr/local/hcat/bin/hcat (binary location for hcat)

      Description

      We would like to run hcat DDL command inside Pig script or Grunt. We can use a similar approach as "fs" or "sh".

      Grunt> hcat create table .....

      Similar to "fs" and "sh", we don't plan to add Java API in PigServer for it.

      1. PIG-2482-1.patch
        9 kB
        Daniel Dai
      2. PIG-2482-2.patch
        9 kB
        Daniel Dai
      3. PIG-2482-3.patch
        12 kB
        Daniel Dai

        Activity

        Hide
        Alan Gates added a comment -

        I think it's fine not to add commands for it to PigServer. But what about o.a.p.scripting.Pig object? It has a fs() call. It seems like it should have an hcat() call as well.

        Also, should we say "hcat" or "sql"? The latter would leave us open to extend this to exporting full SQL statements later. It is also more generic and in keeping with our goal of keeping Hadoop specifics out of Pig Latin.

        Show
        Alan Gates added a comment - I think it's fine not to add commands for it to PigServer. But what about o.a.p.scripting.Pig object? It has a fs() call. It seems like it should have an hcat() call as well. Also, should we say "hcat" or "sql"? The latter would leave us open to extend this to exporting full SQL statements later. It is also more generic and in keeping with our goal of keeping Hadoop specifics out of Pig Latin.
        Hide
        Daniel Dai added a comment -

        I think it's fine not to add commands for it to PigServer. But what about o.a.p.scripting.Pig object? It has a fs() call. It seems like it should have an hcat() call as well.

        Yes, I can include it in scription.Pig

        Also, should we say "hcat" or "sql"? The latter would leave us open to extend this to exporting full SQL statements later. It is also more generic and in keeping with our goal of keeping Hadoop specifics out of Pig Latin.

        That's better for future extension. I will name it as "sql".

        Show
        Daniel Dai added a comment - I think it's fine not to add commands for it to PigServer. But what about o.a.p.scripting.Pig object? It has a fs() call. It seems like it should have an hcat() call as well. Yes, I can include it in scription.Pig Also, should we say "hcat" or "sql"? The latter would leave us open to extend this to exporting full SQL statements later. It is also more generic and in keeping with our goal of keeping Hadoop specifics out of Pig Latin. That's better for future extension. I will name it as "sql".
        Hide
        Alan Gates added a comment -

        In general looks good. A couple of comments/questions:

        In GruntParser.processSQLCommand(), the error message in the following code looks incomplete:

        
        if (new File("hcat.bin").exists()) {
            throw new IOException(hcatBin + " does not ");
        }
        

        How does removing HADOOP_CLASSPATH from the environment variable solve the problem of antlr version clashes? Doesn't hcat need HADOOP_CLASSPATH set in order to work properly?

        Show
        Alan Gates added a comment - In general looks good. A couple of comments/questions: In GruntParser.processSQLCommand(), the error message in the following code looks incomplete: if ( new File( "hcat.bin" ).exists()) { throw new IOException(hcatBin + " does not " ); } How does removing HADOOP_CLASSPATH from the environment variable solve the problem of antlr version clashes? Doesn't hcat need HADOOP_CLASSPATH set in order to work properly?
        Hide
        Daniel Dai added a comment -

        Sure I can improve the error message.

        For HADOOP_CLASSPATH, when we run pig, bin/pig script will put pig-withouthadoop.jar into HADOOP_CLASSPATH, which contains antlr.jar. hcat script will take HADOOP_CLASSPATH and use its antlr.jar instead of Hive's version. That's why we need to unset HADOOP_CLASSPATH before launching hcat.

        Show
        Daniel Dai added a comment - Sure I can improve the error message. For HADOOP_CLASSPATH, when we run pig, bin/pig script will put pig-withouthadoop.jar into HADOOP_CLASSPATH, which contains antlr.jar. hcat script will take HADOOP_CLASSPATH and use its antlr.jar instead of Hive's version. That's why we need to unset HADOOP_CLASSPATH before launching hcat.
        Hide
        Daniel Dai added a comment -

        PIG-2482-2.patch fix the error message.

        Show
        Daniel Dai added a comment - PIG-2482 -2.patch fix the error message.
        Hide
        Alan Gates added a comment -

        +1, looks good.

        Show
        Alan Gates added a comment - +1, looks good.
        Hide
        Daniel Dai added a comment -

        Add scripting.Pig part in PIG-2482-3.patch.

        Show
        Daniel Dai added a comment - Add scripting.Pig part in PIG-2482 -3.patch.
        Hide
        Daniel Dai added a comment -

        Unit test pass. test-patch:
        [exec] -1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] +1 tests included. The patch appears to include 10 new or modified tests.
        [exec]
        [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        [exec]
        [exec] -1 release audit. The applied patch generated 526 release audit warnings (more than the trunk's current 517 warnings).

        javadoc warning seems unrelated. All new file has proper head, ignore release audit warning.

        Show
        Daniel Dai added a comment - Unit test pass. test-patch: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 10 new or modified tests. [exec] [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 526 release audit warnings (more than the trunk's current 517 warnings). javadoc warning seems unrelated. All new file has proper head, ignore release audit warning.

          People

          • Assignee:
            Daniel Dai
            Reporter:
            Daniel Dai
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development