Pig
  1. Pig
  2. PIG-2456

Pig should have a pigrc to specify default script cache

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11
    • Component/s: None
    • Labels:
      None
    • Patch Info:
      Patch Available
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      User can specify default Pig statements in ~/.pigbootup. Default statements will prepend to every Pig script and Grunt session.

      The default Pig statements filename is configurable via key "pig.load.default.statements"
      Show
      User can specify default Pig statements in ~/.pigbootup. Default statements will prepend to every Pig script and Grunt session. The default Pig statements filename is configurable via key "pig.load.default.statements"

      Description

      There should be a way to specify default statements in pig. This is helpful when multiple users are using pig in interactive mode.

      1. PIG-2456.patch
        5 kB
        Prashant Kommireddi
      2. PIG-2456_2.patch
        12 kB
        Prashant Kommireddi

        Activity

        Hide
        Prashant Kommireddi added a comment -

        Aniket, can you provide an example of this Use Case.

        Show
        Prashant Kommireddi added a comment - Aniket, can you provide an example of this Use Case.
        Hide
        Harsh J added a comment -

        Hive has this, and is very useful to load/add in some user libraries during startup, or to register temp functions/aliases. I agree Pig should have the very same thing.

        Show
        Harsh J added a comment - Hive has this, and is very useful to load/add in some user libraries during startup, or to register temp functions/aliases. I agree Pig should have the very same thing.
        Hide
        Harsh J added a comment -

        Though you can use a pig.properties file (passable as pig -P file), which has a "file=" property that can do the same thing.

        Show
        Harsh J added a comment - Though you can use a pig.properties file (passable as pig -P file ), which has a "file=" property that can do the same thing.
        Hide
        Prashant Kommireddi added a comment -

        Harsh,

        Pig loads properties from $HOME/.pigrc by default. Additional jars can be registered and UDFs can be qualified from here.
        Additionally, pig.properties and pig-default.properties are loaded if present in Pig's classpath.

        Are you looking for any other functionality?

        Show
        Prashant Kommireddi added a comment - Harsh, Pig loads properties from $HOME/.pigrc by default. Additional jars can be registered and UDFs can be qualified from here. Additionally, pig.properties and pig-default.properties are loaded if present in Pig's classpath. Are you looking for any other functionality?
        Hide
        Aniket Mokashi added a comment -

        How about, a = load '1.txt' using SomeLoader(); and a is available every time I run grunt?
        Btw, pigrc looks interesting! Can you point me to a jira so that I can find its features?
        Thanks,
        Aniket

        Show
        Aniket Mokashi added a comment - How about, a = load '1.txt' using SomeLoader(); and a is available every time I run grunt? Btw, pigrc looks interesting! Can you point me to a jira so that I can find its features? Thanks, Aniket
        Hide
        Aniket Mokashi added a comment -

        How about, a = load '1.txt' using SomeLoader(); and a is available every time I run grunt?
        Btw, pigrc looks interesting! Can you point me to a jira so that I can find its features?
        Thanks,
        Aniket

        Show
        Aniket Mokashi added a comment - How about, a = load '1.txt' using SomeLoader(); and a is available every time I run grunt? Btw, pigrc looks interesting! Can you point me to a jira so that I can find its features? Thanks, Aniket
        Hide
        Aniket Mokashi added a comment -

        How about, a = load '1.txt' using SomeLoader(); and a is available every time I run grunt?
        Btw, pigrc looks interesting! Can you point me to a jira so that I can find its features?
        Thanks,
        Aniket

        Show
        Aniket Mokashi added a comment - How about, a = load '1.txt' using SomeLoader(); and a is available every time I run grunt? Btw, pigrc looks interesting! Can you point me to a jira so that I can find its features? Thanks, Aniket
        Hide
        Aniket Mokashi added a comment -

        How about, a = load '1.txt' using SomeLoader(); and a is available every time I run grunt?
        Btw, pigrc looks interesting! Can you point me to a jira so that I can find its features?
        Thanks,
        Aniket

        Show
        Aniket Mokashi added a comment - How about, a = load '1.txt' using SomeLoader(); and a is available every time I run grunt? Btw, pigrc looks interesting! Can you point me to a jira so that I can find its features? Thanks, Aniket
        Hide
        Aniket Mokashi added a comment -

        Apologies, I should upgrade my browser really soon. Sorry for that..

        Show
        Aniket Mokashi added a comment - Apologies, I should upgrade my browser really soon. Sorry for that..
        Hide
        Prashant Kommireddi added a comment -

        These property files are key-value pairs, I believe declaring an alias is not supported.

        Here are a couple JIRAs around pig configs, https://issues.apache.org/jira/browse/PIG-111 and https://issues.apache.org/jira/browse/PIG-1381
        You could also take a look at PropertiesUtil.loadDefaultProperties(Properties)

        public static void loadDefaultProperties(Properties properties) {
                loadPropertiesFromFile(properties,
                        System.getProperty("user.home") + "/.pigrc");
                loadPropertiesFromClasspath(properties, DEFAULT_PROPERTIES_FILE);
                loadPropertiesFromClasspath(properties, PROPERTIES_FILE);
                setDefaultsIfUnset(properties);
                
        //foo..
        
        Show
        Prashant Kommireddi added a comment - These property files are key-value pairs, I believe declaring an alias is not supported. Here are a couple JIRAs around pig configs, https://issues.apache.org/jira/browse/PIG-111 and https://issues.apache.org/jira/browse/PIG-1381 You could also take a look at PropertiesUtil.loadDefaultProperties(Properties) public static void loadDefaultProperties(Properties properties) { loadPropertiesFromFile(properties, System .getProperty( "user.home" ) + "/.pigrc" ); loadPropertiesFromClasspath(properties, DEFAULT_PROPERTIES_FILE); loadPropertiesFromClasspath(properties, PROPERTIES_FILE); setDefaultsIfUnset(properties); //foo..
        Hide
        Prashant Kommireddi added a comment -

        Btw, did you check out MACROS (v0.9). Daniel has a nice post describing the same http://hortonworks.com/new-apache-pig-features-part-1-macro/
        It pretty much does what you are looking for. "pigrc" is a configuration file, and I would be inclined to keeping programmatic parts outside of config files.

        Show
        Prashant Kommireddi added a comment - Btw, did you check out MACROS (v0.9). Daniel has a nice post describing the same http://hortonworks.com/new-apache-pig-features-part-1-macro/ It pretty much does what you are looking for. "pigrc" is a configuration file, and I would be inclined to keeping programmatic parts outside of config files.
        Hide
        Joey Echeverria added a comment -

        The idea is to add a new file that contains pig statements that are always executed. This is more akin to the .hiverc in Hive (https://issues.apache.org/jira/browse/HIVE-1414) than the pig.properties or .pigrc files that already exists. My use case would be to pre-define a bunch of UDFs:

        REGISTER /usr/lib/pig/contrib/piggybank/java/piggybank.jar;
        DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();
        ...

        I could also see it being useful to put in common LOAD statements so that you can just start working with the data from grunt.

        I tried setting the file variable in .pigrc and pig.properties, but it didn't seem to cause the file to be loaded.

        This is somewhat related to macros where I want a default file that is always imported.

        Show
        Joey Echeverria added a comment - The idea is to add a new file that contains pig statements that are always executed. This is more akin to the .hiverc in Hive ( https://issues.apache.org/jira/browse/HIVE-1414 ) than the pig.properties or .pigrc files that already exists. My use case would be to pre-define a bunch of UDFs: REGISTER /usr/lib/pig/contrib/piggybank/java/piggybank.jar; DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader(); ... I could also see it being useful to put in common LOAD statements so that you can just start working with the data from grunt. I tried setting the file variable in .pigrc and pig.properties, but it didn't seem to cause the file to be loaded. This is somewhat related to macros where I want a default file that is always imported.
        Hide
        Prashant Kommireddi added a comment -

        Makes sense. This is what I think we could do to support this

        1. Create a static method in Main that takes in 1 argument, InputStream and returns a SequenceInputStream which is a composite stream comprised of default file, say ".pigbootup" and the InputStream
        2. All individual components within Main call this function to get a handle on the composite stream by passing it the InputStream it uses (Console, FileInput)
        3. This composite stream is then passed to Grunt.

        Another approach would be to add a new constructor to class Grunt, Grunt(InputStream in, PigContext pig). This constructor will create a composite stream (.pigbootup + console/fileinput), then create a BufferedReader on the composite stream and invoke Grunt(BufferedReader in, PigContext pig)

        Not sure which one of these approach' would be better.

        Any suggestions or other ideas?

        Show
        Prashant Kommireddi added a comment - Makes sense. This is what I think we could do to support this 1. Create a static method in Main that takes in 1 argument, InputStream and returns a SequenceInputStream which is a composite stream comprised of default file, say ".pigbootup" and the InputStream 2. All individual components within Main call this function to get a handle on the composite stream by passing it the InputStream it uses (Console, FileInput) 3. This composite stream is then passed to Grunt. Another approach would be to add a new constructor to class Grunt, Grunt(InputStream in, PigContext pig). This constructor will create a composite stream (.pigbootup + console/fileinput), then create a BufferedReader on the composite stream and invoke Grunt(BufferedReader in, PigContext pig) Not sure which one of these approach' would be better. Any suggestions or other ideas?
        Hide
        Prashant Kommireddi added a comment -

        Changes made to Main.java

        Show
        Prashant Kommireddi added a comment - Changes made to Main.java
        Hide
        Daniel Dai added a comment -

        Looks good. I tried dump/store/explain/describe/illustrate, all works fine. I have some minor suggestions:
        1. It's better to make pigunit.pig takes .pigbootup as well. pigunit is for user to write their Pig unit test. If user always use .pigbootup, it makes sense he can use that in unit test as well
        2. It also better to make the filename .pigbootup configurable, it would benefit for user to write unit test using approach #1
        3. Adding a test case. With #2, this should be possible

        Show
        Daniel Dai added a comment - Looks good. I tried dump/store/explain/describe/illustrate, all works fine. I have some minor suggestions: 1. It's better to make pigunit.pig takes .pigbootup as well. pigunit is for user to write their Pig unit test. If user always use .pigbootup, it makes sense he can use that in unit test as well 2. It also better to make the filename .pigbootup configurable, it would benefit for user to write unit test using approach #1 3. Adding a test case. With #2, this should be possible
        Hide
        Prashant Kommireddi added a comment -

        Thanks Daniel.

        1. Can you point me to the code/Class' I need to take a look at for enabling this?
        2. Makes sense. So have a property "pig.load.default.statements" that can be specified by user? Is .pigbootup required to be loaded from user's home directory (like .pigrc) in case "pig.load.default.statements" is not specified?

        Show
        Prashant Kommireddi added a comment - Thanks Daniel. 1. Can you point me to the code/Class' I need to take a look at for enabling this? 2. Makes sense. So have a property "pig.load.default.statements" that can be specified by user? Is .pigbootup required to be loaded from user's home directory (like .pigrc) in case "pig.load.default.statements" is not specified?
        Hide
        Daniel Dai added a comment -

        1. Can you point me to the code/Class' I need to take a look at for enabling this?

        Check org.apache.pig.pigunit.pig.PigServer

        2. Makes sense. So have a property "pig.load.default.statements" that can be specified by user? Is .pigbootup required to be loaded from user's home directory (like .pigrc) in case "pig.load.default.statements" is not specified?

        The property name is fine. It has default value $HOME/.pigbootup in pig-default.properties.

        Show
        Daniel Dai added a comment - 1. Can you point me to the code/Class' I need to take a look at for enabling this? Check org.apache.pig.pigunit.pig.PigServer 2. Makes sense. So have a property "pig.load.default.statements" that can be specified by user? Is .pigbootup required to be loaded from user's home directory (like .pigrc) in case "pig.load.default.statements" is not specified? The property name is fine. It has default value $HOME/.pigbootup in pig-default.properties.
        Hide
        Prashant Kommireddi added a comment -

        Would Utils be a good place to define "getCompositeStream"? It does not seem the best approach to declare a public function in Main and have it called from outside.

        Show
        Prashant Kommireddi added a comment - Would Utils be a good place to define "getCompositeStream"? It does not seem the best approach to declare a public function in Main and have it called from outside.
        Hide
        Daniel Dai added a comment -

        Yes, Utils should be better.

        Show
        Daniel Dai added a comment - Yes, Utils should be better.
        Hide
        Prashant Kommireddi added a comment -

        I have modified org.apache.pig.pigunit.pig.PigServer and TestPigTest to add this functionality to PigUnit. New test added "testDefaultBootup()" to TestPigTest.java

        Also, moved "getCompositeStream(InputStream in, Properties properties)" to Utils.java to be used across.

        Show
        Prashant Kommireddi added a comment - I have modified org.apache.pig.pigunit.pig.PigServer and TestPigTest to add this functionality to PigUnit. New test added "testDefaultBootup()" to TestPigTest.java Also, moved "getCompositeStream(InputStream in, Properties properties)" to Utils.java to be used across.
        Hide
        Prashant Kommireddi added a comment -

        Also, user can now specify the bootup file path through configuration property "pig.load.default.statements". If this property is not found, default location $HOME/.pigbootup will be loaded if present.

           public static InputStream getCompositeStream(InputStream in, Properties properties) {
        	   //Load default ~/.pigbootup if not specified by user
            	final String bootupFile = properties.getProperty("pig.load.default.statements", System.getProperty("user.home") + "/.pigbootup");
            	try {
            	final InputStream inputSteam = new FileInputStream(new File(bootupFile));
            	return new SequenceInputStream(inputSteam, in);
            	} catch(FileNotFoundException fe) {
            		log.info("Default bootup file " +bootupFile+ " not found");
            		return in;
            	}
            }
        
        Show
        Prashant Kommireddi added a comment - Also, user can now specify the bootup file path through configuration property "pig.load.default.statements". If this property is not found, default location $HOME/.pigbootup will be loaded if present. public static InputStream getCompositeStream(InputStream in, Properties properties) { //Load default ~/.pigbootup if not specified by user final String bootupFile = properties.getProperty( "pig.load. default .statements" , System .getProperty( "user.home" ) + "/.pigbootup" ); try { final InputStream inputSteam = new FileInputStream( new File(bootupFile)); return new SequenceInputStream(inputSteam, in); } catch (FileNotFoundException fe) { log.info( "Default bootup file " +bootupFile+ " not found" ); return in; } }
        Hide
        Daniel Dai added a comment -

        Looks good to me. Will commit if tests pass.

        Show
        Daniel Dai added a comment - Looks good to me. Will commit if tests pass.
        Hide
        Daniel Dai added a comment -

        Unit test pass. test-patch:
        [exec] -1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] +1 tests included. The patch appears to include 4 new or modified tests.
        [exec]
        [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        [exec]
        [exec] -1 release audit. The applied patch generated 527 release audit warnings (more than the trunk's current 523 warnings).

        javadoc warning doesn't seems related. No new file added, ignore release audit warning.

        Patch committed to trunk.

        Thanks Prashant!

        Show
        Daniel Dai added a comment - Unit test pass. test-patch: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 4 new or modified tests. [exec] [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 527 release audit warnings (more than the trunk's current 523 warnings). javadoc warning doesn't seems related. No new file added, ignore release audit warning. Patch committed to trunk. Thanks Prashant!
        Hide
        Aniket Mokashi added a comment -

        Awesome! This is useful.

        Show
        Aniket Mokashi added a comment - Awesome! This is useful.
        Hide
        Russell Jurney added a comment -

        What do we think about... default pig .pigbootup configuration should import piggybank, and define all its functions to short, callable names?

        Show
        Russell Jurney added a comment - What do we think about... default pig .pigbootup configuration should import piggybank, and define all its functions to short, callable names?
        Hide
        Aniket Mokashi added a comment -

        How about making udf match case-insensitive? Wouldnt that be a better approach? We already have definedFunctions map, we just have to make it case-insensitive. Not sure if that will break something else. Comments?

        Show
        Aniket Mokashi added a comment - How about making udf match case-insensitive? Wouldnt that be a better approach? We already have definedFunctions map, we just have to make it case-insensitive. Not sure if that will break something else. Comments?
        Hide
        Russell Jurney added a comment -

        Making UDFs not case sensitive makes sense to me, but is a bigger (potentially) breaking change than lowercasing LOWER/UPPER.

        I say do it.

        Show
        Russell Jurney added a comment - Making UDFs not case sensitive makes sense to me, but is a bigger (potentially) breaking change than lowercasing LOWER/UPPER. I say do it.
        Hide
        Daniel Dai added a comment -

        @Russell
        import piggybank can be achieved by -Dpig.additional.jars(http://pig.apache.org/docs/r0.9.2/basic.html#register) and -Dudf.import.list (http://pig.apache.org/docs/r0.9.2/udf.html#eval-functions)

        Show
        Daniel Dai added a comment - @Russell import piggybank can be achieved by -Dpig.additional.jars( http://pig.apache.org/docs/r0.9.2/basic.html#register ) and -Dudf.import.list ( http://pig.apache.org/docs/r0.9.2/udf.html#eval-functions )

          People

          • Assignee:
            Prashant Kommireddi
            Reporter:
            Aniket Mokashi
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development