Pig
  1. Pig
  2. PIG-1381

Need a way for Pig to take an alternative property file

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.7.0
    • Fix Version/s: 0.8.0
    • Component/s: impl
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a default pig.properties and if user have a different pig.properties, there will be a conflict since we can only read one. There are couple of ways to solve it:

      1. Give a command line option for user to pass an additional property file
      2. Change the name for default pig.properties to pig-default.properties, and user can give a pig.properties to override
      3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems to be more natural for hadoop community. If so, we shall provide backward compatibility to also read pig.properties, pig-cluster-hadoop-site.xml.

      1. PIG-1381_cli_2.patch
        10 kB
        V.V.Chaitanya Krishna
      2. PIG-1381_cli_1.patch
        10 kB
        V.V.Chaitanya Krishna
      3. PIG-1381-5.patch
        7 kB
        Daniel Dai
      4. PIG-1381-4.patch
        6 kB
        Daniel Dai
      5. PIG-1381-3.patch
        6 kB
        Daniel Dai
      6. PIG-1381-2.patch
        5 kB
        Daniel Dai
      7. PIG-1381-1.patch
        3 kB
        Daniel Dai

        Issue Links

          Activity

          Hide
          Daniel Dai added a comment -

          Manual test pass. Patch committed. Thanks V.V.Chaitanya!

          Show
          Daniel Dai added a comment - Manual test pass. Patch committed. Thanks V.V.Chaitanya!
          Hide
          Daniel Dai added a comment -

          Patch looks good. Will commit it if test pass.

          Show
          Daniel Dai added a comment - Patch looks good. Will commit it if test pass.
          Hide
          V.V.Chaitanya Krishna added a comment -

          Submitting the new patch to Hudson.

          Show
          V.V.Chaitanya Krishna added a comment - Submitting the new patch to Hudson.
          Hide
          V.V.Chaitanya Krishna added a comment -

          Uploading patch with Daniel's review comments incorporated.

          Show
          V.V.Chaitanya Krishna added a comment - Uploading patch with Daniel's review comments incorporated.
          Hide
          V.V.Chaitanya Krishna added a comment -

          My bad. Got carried away for a while

          Yes. It makes sense to have it as REQUIRED.

          Show
          V.V.Chaitanya Krishna added a comment - My bad. Got carried away for a while Yes. It makes sense to have it as REQUIRED.
          Hide
          Daniel Dai added a comment -

          Hi, V.V. Chaitanya,
          It is "ValueExpected.REQUIRED". It does not require that the option appeared in command line. But if it does, then you you must give a value after "-propertyfile/-P".

          Show
          Daniel Dai added a comment - Hi, V.V. Chaitanya, It is "ValueExpected.REQUIRED". It does not require that the option appeared in command line. But if it does, then you you must give a value after "-propertyfile/-P".
          Hide
          V.V.Chaitanya Krishna added a comment -

          I think value of "perpertyFile" perperty is not OPTIONAL, should change it to REQUIRED

          If the option is made mandatory for the user, then the following scenarios might occur:

          1. User who want to run just with the default properties and does not need any properties to be set will still be forced to submit a blank properties file.
          2. If this is made mandatory, then the presence of pig.properties might not make much sense. I believe this option should be an alternative to pig.properties in user providing properties.

          Ideally, I think users should be able to work without submitting their own set of properties.

          Show
          V.V.Chaitanya Krishna added a comment - I think value of "perpertyFile" perperty is not OPTIONAL, should change it to REQUIRED If the option is made mandatory for the user, then the following scenarios might occur: User who want to run just with the default properties and does not need any properties to be set will still be forced to submit a blank properties file. If this is made mandatory, then the presence of pig.properties might not make much sense. I believe this option should be an alternative to pig.properties in user providing properties. Ideally, I think users should be able to work without submitting their own set of properties.
          Hide
          Daniel Dai added a comment -

          I reviewed the patch. Command line properties file will override default properties, and we can have multiple number of -propertyFile entry in command line. Command line switch is -P or -propertyFile. That's good.

          I have a comment for the line:
          opts.registerOpt('P', "propertyFile", CmdLineParser.ValueExpected.OPTIONAL);

          I think value of "perpertyFile" perperty is not OPTIONAL, should change it to REQUIRED.

          Show
          Daniel Dai added a comment - I reviewed the patch. Command line properties file will override default properties, and we can have multiple number of -propertyFile entry in command line. Command line switch is -P or -propertyFile. That's good. I have a comment for the line: opts.registerOpt('P', "propertyFile", CmdLineParser.ValueExpected.OPTIONAL); I think value of "perpertyFile" perperty is not OPTIONAL, should change it to REQUIRED.
          Hide
          V.V.Chaitanya Krishna added a comment -

          Running through hudson.

          Show
          V.V.Chaitanya Krishna added a comment - Running through hudson.
          Hide
          V.V.Chaitanya Krishna added a comment -

          Uploading patch that implements option 1 of command-line option of providing a properties' file by user.

          The patch has the following changes:

          • Renaming PropertiesUtil.loadPropertiesFromFile to PropertiesUtil.loadDefaultProperties
          • Refactoring the code implementing the loading of properties from pig-default.properties and pig.properties (to avoid code duplication).
          • Extracting the code that loads properties from the deprecated .pigrc file. This makes it easier to use the method again to load properties from the user-specified properties' file.
          • load the properties from deprecated .pigrc file before the other default files (i.e., pig-default.properties and pig.properties). This will make the code simpler as we dont need to check for the existence of property before loading it from .pigrc file, because it will later get overriden by the value in pid-default.properties or pig.properties.
          Show
          V.V.Chaitanya Krishna added a comment - Uploading patch that implements option 1 of command-line option of providing a properties' file by user. The patch has the following changes: Renaming PropertiesUtil.loadPropertiesFromFile to PropertiesUtil.loadDefaultProperties Refactoring the code implementing the loading of properties from pig-default.properties and pig.properties (to avoid code duplication). Extracting the code that loads properties from the deprecated .pigrc file. This makes it easier to use the method again to load properties from the user-specified properties' file. load the properties from deprecated .pigrc file before the other default files (i.e., pig-default.properties and pig.properties). This will make the code simpler as we dont need to check for the existence of property before loading it from .pigrc file, because it will later get overriden by the value in pid-default.properties or pig.properties.
          Hide
          Daniel Dai added a comment -

          Great. Thanks V.V.Chaitanya!

          Show
          Daniel Dai added a comment - Great. Thanks V.V.Chaitanya!
          Hide
          V.V.Chaitanya Krishna added a comment -

          Apologies for coming in so late. I was on vacation and didnt have access to internet.
          I have done the coding part for option 1. I need to write unit test cases for it. I should be submitting the patch positively by saturday.

          Show
          V.V.Chaitanya Krishna added a comment - Apologies for coming in so late. I was on vacation and didnt have access to internet. I have done the coding part for option 1. I need to write unit test cases for it. I should be submitting the patch positively by saturday.
          Hide
          Daniel Dai added a comment -

          Option 2 committed to both trunk and branch 0.7.

          Show
          Daniel Dai added a comment - Option 2 committed to both trunk and branch 0.7.
          Hide
          Daniel Dai added a comment -

          Reattach the patch to address Ashutosh's comment.

          Show
          Daniel Dai added a comment - Reattach the patch to address Ashutosh's comment.
          Hide
          Ashutosh Chauhan added a comment -

          +1 on the changes.
          For completeness, we can also check in an empty pig.properties in the conf dir and then add comments in both pig.properties and pig-default.properties that if user wants to pass some properties doing it through pig-default.properties will have no effect and instead they should add extra properties they want to add/override in pig.properties.

          Show
          Ashutosh Chauhan added a comment - +1 on the changes. For completeness, we can also check in an empty pig.properties in the conf dir and then add comments in both pig.properties and pig-default.properties that if user wants to pass some properties doing it through pig-default.properties will have no effect and instead they should add extra properties they want to add/override in pig.properties.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12444335/PIG-1381-4.patch
          against trunk revision 943578.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/328/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/328/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/328/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12444335/PIG-1381-4.patch against trunk revision 943578. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/328/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/328/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/328/console This message is automatically generated.
          Hide
          Daniel Dai added a comment -

          It is out of sync due to recent check in. Resync it.

          Show
          Daniel Dai added a comment - It is out of sync due to recent check in. Resync it.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12444330/PIG-1381-3.patch
          against trunk revision 943578.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/315/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12444330/PIG-1381-3.patch against trunk revision 943578. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/315/console This message is automatically generated.
          Hide
          Daniel Dai added a comment -

          Address findbug warnings.

          Show
          Daniel Dai added a comment - Address findbug warnings.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12444240/PIG-1381-1.patch
          against trunk revision 943003.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/314/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/314/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/314/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12444240/PIG-1381-1.patch against trunk revision 943003. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/314/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/314/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/314/console This message is automatically generated.
          Hide
          Daniel Dai added a comment -

          Add unit tests.

          Show
          Daniel Dai added a comment - Add unit tests.
          Hide
          Daniel Dai added a comment -

          Attach PIG-1381-1.patch for option 2.

          Show
          Daniel Dai added a comment - Attach PIG-1381 -1.patch for option 2.
          Hide
          Daniel Dai added a comment -

          Hi, V.V.Chaitanya,
          Are you still working on this issue? We want to do both option 1 & 2. That is:
          1. We provide command line option -propertyfile <name>
          2. We read both pig-default.properties and pig.properties. pig.properties will override pig-default.properties. When release, we will bundle pig-default.properties in pig.jar and user can give a pig.properties in classpath

          Show
          Daniel Dai added a comment - Hi, V.V.Chaitanya, Are you still working on this issue? We want to do both option 1 & 2. That is: 1. We provide command line option -propertyfile <name> 2. We read both pig-default.properties and pig.properties. pig.properties will override pig-default.properties. When release, we will bundle pig-default.properties in pig.jar and user can give a pig.properties in classpath
          Hide
          Olga Natkovich added a comment -

          I would also like to propose that we address issue raised in https://issues.apache.org/jira/browse/PIG-602 as part of this JIRA.

          I think what is asked in PIG-602 can be accomplished by making the properties passed in available to all UDFs vi UDFContext. The UDFs can decide what they pull out.

          Show
          Olga Natkovich added a comment - I would also like to propose that we address issue raised in https://issues.apache.org/jira/browse/PIG-602 as part of this JIRA. I think what is asked in PIG-602 can be accomplished by making the properties passed in available to all UDFs vi UDFContext. The UDFs can decide what they pull out.
          Hide
          Daniel Dai added a comment -

          We have an internal discussion and decide to take option 1. In the command line, we take -propertyfile <name> for the alternative property file name. Pig will first load pig.properties, and then load alternative property file in classpath. Later entry will overwrite the previous seen one, ie. alternative property file will take precedence over pig.properties.

          Show
          Daniel Dai added a comment - We have an internal discussion and decide to take option 1. In the command line, we take -propertyfile <name> for the alternative property file name. Pig will first load pig.properties, and then load alternative property file in classpath. Later entry will overwrite the previous seen one, ie. alternative property file will take precedence over pig.properties.
          Hide
          Daniel Dai added a comment -

          For option 3, yes, we need to move key value pair to xml. I checked current pig.properties, seems moving all properties is straightforward. Option 2 and 3 are functionally equivalent. The advantage for 2 is simplicity. The advantage for 3 is more hadoop compliant, one work with hadoop will be clear what pig-default.xml and pig-site-xml means.

          Show
          Daniel Dai added a comment - For option 3, yes, we need to move key value pair to xml. I checked current pig.properties, seems moving all properties is straightforward. Option 2 and 3 are functionally equivalent. The advantage for 2 is simplicity. The advantage for 3 is more hadoop compliant, one work with hadoop will be clear what pig-default.xml and pig-site-xml means.
          Hide
          V.V.Chaitanya Krishna added a comment -

          I think key=value is easier for people to understand.

          Yes. I think it is a valid way of putting up properties. But I think xml format gives more structured-ness to the properties file, no? But however, this would also mean a significant code change in terms of the approach for loading the properties (things like xml parsing etc. might have to be taken care of).

          For option 2 I think we'd need to provide an empty pig.properties file (just as hadoop provides an empty hadoop-site.xml).

          We can provide the current pig.properties file itself, with the default values. So it would be somewhat analogous to hadoop-default.xml. I would also propose that this be bundled in the jar itself and thus avoid it to be changed by users. (because they would have a different properties file already to put up their settings).

          Thoughts?

          Show
          V.V.Chaitanya Krishna added a comment - I think key=value is easier for people to understand. Yes. I think it is a valid way of putting up properties. But I think xml format gives more structured-ness to the properties file, no? But however, this would also mean a significant code change in terms of the approach for loading the properties (things like xml parsing etc. might have to be taken care of). For option 2 I think we'd need to provide an empty pig.properties file (just as hadoop provides an empty hadoop-site.xml). We can provide the current pig.properties file itself, with the default values. So it would be somewhat analogous to hadoop-default.xml. I would also propose that this be bundled in the jar itself and thus avoid it to be changed by users. (because they would have a different properties file already to put up their settings). Thoughts?
          Hide
          Alan Gates added a comment -

          Option 3 also entails moving from key=value format to xml format, correct? Is there any value to the xml format beyond conformity with the rest of Hadoop? I think key=value is easier for people to understand.

          For option 2 I think we'd need to provide an empty pig.properties file (just as hadoop provides an empty hadoop-site.xml).

          Show
          Alan Gates added a comment - Option 3 also entails moving from key=value format to xml format, correct? Is there any value to the xml format beyond conformity with the rest of Hadoop? I think key=value is easier for people to understand. For option 2 I think we'd need to provide an empty pig.properties file (just as hadoop provides an empty hadoop-site.xml).
          Hide
          V.V.Chaitanya Krishna added a comment -

          Do we need to have two different property files ? One possibility is to not package pig.properties in the pig.jar and then include it in the classpath while invoking Pig. (We can modify pig shell script to include it in the path by default). Then, user can add/delete/modify the pig.properties as he wish as well override default properties.

          This might lead to overriding of some properties which might actually be unacceptable. Also, in the long run, we might want to have a configuration file with properties that are not supposed to be changed (similar to what happened in case of Hadoop project)

          Disadvantage of two property files, is sometimes its confusing which property is getting picked up (one in default or one in user specified). If there is only one property file, there is only one way to specify the properties to Pig which I think is better way of doing it.

          Since the processing of properties' files is sequential (i.e., one file after another), we can be sure that the latest occuring value is taken for a give property. For example, we can load the default properties' file first and then followed by the one in which users give their own set of properties. This way, we could provide preference to users' settings as well.

          Thoughts?

          Show
          V.V.Chaitanya Krishna added a comment - Do we need to have two different property files ? One possibility is to not package pig.properties in the pig.jar and then include it in the classpath while invoking Pig. (We can modify pig shell script to include it in the path by default). Then, user can add/delete/modify the pig.properties as he wish as well override default properties. This might lead to overriding of some properties which might actually be unacceptable. Also, in the long run, we might want to have a configuration file with properties that are not supposed to be changed (similar to what happened in case of Hadoop project) Disadvantage of two property files, is sometimes its confusing which property is getting picked up (one in default or one in user specified). If there is only one property file, there is only one way to specify the properties to Pig which I think is better way of doing it. Since the processing of properties' files is sequential (i.e., one file after another), we can be sure that the latest occuring value is taken for a give property. For example, we can load the default properties' file first and then followed by the one in which users give their own set of properties. This way, we could provide preference to users' settings as well. Thoughts?
          Hide
          Ashutosh Chauhan added a comment -

          Do we need to have two different property files ? One possibility is to not package pig.properties in the pig.jar and then include it in the classpath while invoking Pig. (We can modify pig shell script to include it in the path by default). Then, user can add/delete/modify the pig.properties as he wish as well override default properties.

          Disadvantage of two property files, is sometimes its confusing which property is getting picked up (one in default or one in user specified). If there is only one property file, there is only one way to specify the properties to Pig which I think is better way of doing it.

          Show
          Ashutosh Chauhan added a comment - Do we need to have two different property files ? One possibility is to not package pig.properties in the pig.jar and then include it in the classpath while invoking Pig. (We can modify pig shell script to include it in the path by default). Then, user can add/delete/modify the pig.properties as he wish as well override default properties. Disadvantage of two property files, is sometimes its confusing which property is getting picked up (one in default or one in user specified). If there is only one property file, there is only one way to specify the properties to Pig which I think is better way of doing it.
          Hide
          Daniel Dai added a comment -

          My vote is 3>2>1 (from high rank to low rank), opinions?

          Show
          Daniel Dai added a comment - My vote is 3>2>1 (from high rank to low rank), opinions?
          Hide
          V.V.Chaitanya Krishna added a comment -

          2. Change the name for default pig.properties to pig-default.properties, and user can give a pig.properties to override

          3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems to be more natural for hadoop community. If so, we shall provide backward compatibility to also read pig.properties, pig-cluster-hadoop-site.xml.

          +1 to both the above approaches. In fact, I see that both are almost similar in the sense that user has the option to provide a set of properties in a file.

          Since this is something that pig source-code provides, i think it makes sense to keep the filename (the one in which users can specify their own properties) fixed, rather than having it as a command line option.

          Thoughts?

          Show
          V.V.Chaitanya Krishna added a comment - 2. Change the name for default pig.properties to pig-default.properties, and user can give a pig.properties to override 3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems to be more natural for hadoop community. If so, we shall provide backward compatibility to also read pig.properties, pig-cluster-hadoop-site.xml. +1 to both the above approaches. In fact, I see that both are almost similar in the sense that user has the option to provide a set of properties in a file. Since this is something that pig source-code provides, i think it makes sense to keep the filename (the one in which users can specify their own properties) fixed, rather than having it as a command line option. Thoughts?

            People

            • Assignee:
              V.V.Chaitanya Krishna
              Reporter:
              Daniel Dai
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development