Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-236

properties specified on the command line are ignored

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.0.0
    • Fix Version/s: 0.1.0
    • Component/s: None
    • Labels:
      None

      Description

      Looks like code in src/org/apache/pig/impl/util/PropertiesUtil.java does not take system properties into account.

      1. pig-236.patch
        4 kB
        Pradeep Kamath

        Activity

        Hide
        pkamath Pradeep Kamath added a comment - - edited

        Currently the code is not looking at the "System properties" which include the command line "-D" overrides and hence the issue reported in this bug. Here is the plan to address this:

        First the pig.properties (and currently .pigrc - from comments in the code, there seems to be a plan to not support .pigrc in the future) file(s) will be read into the "properties" object. Then the "System properties" (from System.getProperties()) will be added into the properties object overriding any existing key/values.

        A repercussion of this is that any default java or system properties (like java.io.tmpdir) that a user wants to override should be supplied in the commandline and NOT in pig.properties or .pigrc

        Show
        pkamath Pradeep Kamath added a comment - - edited Currently the code is not looking at the "System properties" which include the command line "-D" overrides and hence the issue reported in this bug. Here is the plan to address this: First the pig.properties (and currently .pigrc - from comments in the code, there seems to be a plan to not support .pigrc in the future) file(s) will be read into the "properties" object. Then the "System properties" (from System.getProperties()) will be added into the properties object overriding any existing key/values. A repercussion of this is that any default java or system properties (like java.io.tmpdir) that a user wants to override should be supplied in the commandline and NOT in pig.properties or .pigrc
        Hide
        pi_song Pi Song added a comment -

        +1
        This will increase usability

        Show
        pi_song Pi Song added a comment - +1 This will increase usability
        Hide
        pkamath Pradeep Kamath added a comment -

        In the code, there were several changes made:
        1) In Main the earlier code was setting JOBTRACKER_LOCATION prematurely to "local" in the properties - this has been removed since this should be determined from hadoop-site.xml and then overriden by pig.properties - see the 3) below

        2) Changed PropertiesUtil.java to also include properties from System including command line overrides

        3) In HEexecutionEngine.java, for the non HOD case, the earlier code was only reading "properties" object (built from pig.properties/.pigrc and System.getProperties() which includes command line overrides). Now it first reads hadoop-site.xml and then adds on (over-writing existing keys) properties from the "properties" object. Towards this also added a utility function in ConfigurationUtil.java

        Show
        pkamath Pradeep Kamath added a comment - In the code, there were several changes made: 1) In Main the earlier code was setting JOBTRACKER_LOCATION prematurely to "local" in the properties - this has been removed since this should be determined from hadoop-site.xml and then overriden by pig.properties - see the 3) below 2) Changed PropertiesUtil.java to also include properties from System including command line overrides 3) In HEexecutionEngine.java, for the non HOD case, the earlier code was only reading "properties" object (built from pig.properties/.pigrc and System.getProperties() which includes command line overrides). Now it first reads hadoop-site.xml and then adds on (over-writing existing keys) properties from the "properties" object. Towards this also added a utility function in ConfigurationUtil.java
        Hide
        pkamath Pradeep Kamath added a comment -

        Patch file - details in comments section

        Show
        pkamath Pradeep Kamath added a comment - Patch file - details in comments section
        Hide
        pi_song Pi Song added a comment -

        Overall patch looks good except that I don't understand why you look up hadoop-site.xml in this location.

        configuration.addResource("/hadoop-site.xml");
        
        Show
        pi_song Pi Song added a comment - Overall patch looks good except that I don't understand why you look up hadoop-site.xml in this location. configuration.addResource("/hadoop-site.xml");
        Hide
        pkamath Pradeep Kamath added a comment -

        Hadoop looks for the named file in configuration.AddResource() in the classpath. So by specifying "/hadoop-site.xml" as the argument, we are letting the configuration.AddResource() pick it up from the first location in the classpath where such a file is present.

        Show
        pkamath Pradeep Kamath added a comment - Hadoop looks for the named file in configuration.AddResource() in the classpath. So by specifying "/hadoop-site.xml" as the argument, we are letting the configuration.AddResource() pick it up from the first location in the classpath where such a file is present.
        Hide
        pkamath Pradeep Kamath added a comment -

        Using "hadoop-site.xml" in configuration.addResource(); in HExecutionEngine.init(Properties) instead of "/hadoop-site.xml"

        Show
        pkamath Pradeep Kamath added a comment - Using "hadoop-site.xml" in configuration.addResource(); in HExecutionEngine.init(Properties) instead of "/hadoop-site.xml"
        Hide
        pkamath Pradeep Kamath added a comment -

        "/hadoop-site.xml" was confusing - so I changed it to "hadoop-site.xml" as indicated in previous comment and uploaded a new patch with this change.

        Show
        pkamath Pradeep Kamath added a comment - "/hadoop-site.xml" was confusing - so I changed it to "hadoop-site.xml" as indicated in previous comment and uploaded a new patch with this change.
        Hide
        pi_song Pi Song added a comment -

        +1

        We have two semantics here.
        1) Some configuration files have to be under

        {pig.home}

        /conf
        2) Some configuration files have to be under classpath.

        Possibly we should also fix the shell script to add conf in the classpath as well so that we can say "put everything under conf"

        Show
        pi_song Pi Song added a comment - +1 We have two semantics here. 1) Some configuration files have to be under {pig.home} /conf 2) Some configuration files have to be under classpath. Possibly we should also fix the shell script to add conf in the classpath as well so that we can say "put everything under conf"
        Hide
        pkamath Pradeep Kamath added a comment -

        The Configuration constructor by default handles hadoop-default.xml and hadoop-site.xml.

          public Configuration() {
            if (LOG.isDebugEnabled()) {
              LOG.debug(StringUtils.stringifyException(new IOException("config()")));
            }
            resources.add("hadoop-default.xml");
            resources.add("hadoop-site.xml");
          }
        
        

        Attaching a new patch to make use of this. The logic is the same as before, only with fewer changes to use the above fact.

        Show
        pkamath Pradeep Kamath added a comment - The Configuration constructor by default handles hadoop-default.xml and hadoop-site.xml. public Configuration() { if (LOG.isDebugEnabled()) { LOG.debug(StringUtils.stringifyException( new IOException( "config()" ))); } resources.add( "hadoop- default .xml" ); resources.add( "hadoop-site.xml" ); } Attaching a new patch to make use of this. The logic is the same as before, only with fewer changes to use the above fact.
        Hide
        amirhyoussefi Amir Youssefi added a comment -

        I thinks it's time we have a Wiki page for different configuration files and how they override each other:

        http://wiki.apache.org/pig/

        or

        http://wiki.apache.org/pig/FAQ

        Show
        amirhyoussefi Amir Youssefi added a comment - I thinks it's time we have a Wiki page for different configuration files and how they override each other: http://wiki.apache.org/pig/ or http://wiki.apache.org/pig/FAQ
        Hide
        olgan Olga Natkovich added a comment -

        I don't think we need to pick up config from hadoop-default. That suppose to be used by the backend only.

        Show
        olgan Olga Natkovich added a comment - I don't think we need to pick up config from hadoop-default. That suppose to be used by the backend only.
        Hide
        pi_song Pi Song added a comment -

        Like Pradeep mentioned, trying to read both "hadoop-default.xml" and "hadoop-site.xml" from classpath is a default behavior of Hadoop Configuration so we just have to live with it.

        Show
        pi_song Pi Song added a comment - Like Pradeep mentioned, trying to read both "hadoop-default.xml" and "hadoop-site.xml" from classpath is a default behavior of Hadoop Configuration so we just have to live with it.
        Hide
        alangates Alan Gates added a comment -

        These changes should be tested against a static cluster, against a cluster that uses HOD, and against a cluster that has been allocated with HOD. Pradeep, have you tested all three of those scenarios?

        Show
        alangates Alan Gates added a comment - These changes should be tested against a static cluster, against a cluster that uses HOD, and against a cluster that has been allocated with HOD. Pradeep, have you tested all three of those scenarios?
        Hide
        olgan Olga Natkovich added a comment -

        yes, pradeep and I tested all 3 of these cases together.

        Show
        olgan Olga Natkovich added a comment - yes, pradeep and I tested all 3 of these cases together.
        Hide
        alangates Alan Gates added a comment -

        Testing found that the fix did not work in the case where HOD was being used to allocate clusters. Pradeep will take a look at fixing this.

        Show
        alangates Alan Gates added a comment - Testing found that the fix did not work in the case where HOD was being used to allocate clusters. Pradeep will take a look at fixing this.
        Hide
        pkamath Pradeep Kamath added a comment -

        The issue with the earlier patch was that the following code was moved up only into the non-HOD case in the "if-else"

        configuration = ConfigurationUtil.toConfiguration(properties);
        

        Now this call is made again after the if-else in the snippet below - the comment
        explains the code in the snippet:

                ds = new HDataStorage(properties);
                        
                // The above HDataStorage constructor sets DEFAULT_REPLICATION_FACTOR_KEY in properties.
                // So we need to reconstruct the configuration object for the non HOD case
                // In the HOD case, this is the first time the configuration object will be created
                configuration = ConfigurationUtil.toConfiguration(properties);	
        
        
        Show
        pkamath Pradeep Kamath added a comment - The issue with the earlier patch was that the following code was moved up only into the non-HOD case in the "if-else" configuration = ConfigurationUtil.toConfiguration(properties); Now this call is made again after the if-else in the snippet below - the comment explains the code in the snippet: ds = new HDataStorage(properties); // The above HDataStorage constructor sets DEFAULT_REPLICATION_FACTOR_KEY in properties. // So we need to reconstruct the configuration object for the non HOD case // In the HOD case , this is the first time the configuration object will be created configuration = ConfigurationUtil.toConfiguration(properties);
        Hide
        alangates Alan Gates added a comment -

        Fix checked in at revision 655940. Tested this patch against static cluster, cluster using hod, and cluster allocated via hod.

        Show
        alangates Alan Gates added a comment - Fix checked in at revision 655940. Tested this patch against static cluster, cluster using hod, and cluster allocated via hod.

          People

          • Assignee:
            pkamath Pradeep Kamath
            Reporter:
            olgan Olga Natkovich
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development