Hadoop Common
  1. Hadoop Common
  2. HADOOP-4631

Split the default configurations into 3 parts

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.0
    • Component/s: conf
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Split hadoop-default.xml into core-default.xml, hdfs-default.xml and mapreduce-default.xml.

      Description

      We need to split hadoop-default.xml into core-default.xml, hdfs-default.xml and mapreduce-default.xml. That will enable us to split the project into 3 parts that have the defaults distributed with each component.

      1. 4631_v7.patch
        143 kB
        Sharad Agarwal
      2. 4631_v6.patch
        143 kB
        Sharad Agarwal
      3. 4631_v5.patch
        141 kB
        Sharad Agarwal
      4. 4631_v4.patch
        139 kB
        Sharad Agarwal
      5. 4631_v3.patch
        139 kB
        Sharad Agarwal
      6. 4631_v2.patch
        83 kB
        Sharad Agarwal
      7. 4631_v1.patch
        122 kB
        Sharad Agarwal

        Issue Links

          Activity

          Hide
          Doug Cutting added a comment -

          All default files should be disjoint, but must also be loaded before hadoop

          Here's a proposal:
          . add a static 'defaultResources' field to Configuration, whose default value is ["core-default.xml"];
          . add a new static method, Configuration#addDefaultResource();
          . in JobConf, add 'static

          { Configuration.addDefaultResource("mapred-default,xml"); }

          '
          . in DistributedFileSystem.java, NameNode.java and DataNode.java, add 'static

          { Configuration.addDefaultResource("hdfs-default,xml"); }

          ' We might factor this to a common base class or somesuch.

          When you first call FileSystem.get("hdfs:///", conf), your configuration won't yet have hdfs-specific default values, but, in the course of this call, they will be loaded, before any hdfs code references them. I think this is fine, since application code should not be directly referencing hdfs-specific values. If it needs to set them, it should do so through accessor methods, and these accessor methods should dereference an HDFS class and force the loading of the HDFS defaults. Does that make sense?

          Show
          Doug Cutting added a comment - All default files should be disjoint, but must also be loaded before hadoop Here's a proposal: . add a static 'defaultResources' field to Configuration, whose default value is ["core-default.xml"] ; . add a new static method, Configuration#addDefaultResource(); . in JobConf, add 'static { Configuration.addDefaultResource("mapred-default,xml"); } ' . in DistributedFileSystem.java, NameNode.java and DataNode.java, add 'static { Configuration.addDefaultResource("hdfs-default,xml"); } ' We might factor this to a common base class or somesuch. When you first call FileSystem.get("hdfs:///", conf), your configuration won't yet have hdfs-specific default values, but, in the course of this call, they will be loaded, before any hdfs code references them. I think this is fine, since application code should not be directly referencing hdfs-specific values. If it needs to set them, it should do so through accessor methods, and these accessor methods should dereference an HDFS class and force the loading of the HDFS defaults. Does that make sense?
          Hide
          Doug Cutting added a comment -

          To avoid confusion, default config files should come from the jar file, not from a conf directory. So we should never commit these new default files to conf/, but rather start with them in src/, copied to build/classes and from there into the jar. Thus this issue will also fix HADOOP-4617, no?

          Show
          Doug Cutting added a comment - To avoid confusion, default config files should come from the jar file, not from a conf directory. So we should never commit these new default files to conf/, but rather start with them in src/, copied to build/classes and from there into the jar. Thus this issue will also fix HADOOP-4617 , no?
          Hide
          Sharad Agarwal added a comment -

          When you first call FileSystem.get("hdfs:///", conf), your configuration won't yet have hdfs-specific default values, but, in the course of this call, they will be loaded, before any hdfs code references them. I think this is fine, since application code should not be directly referencing hdfs-specific values.

          For the Configuration objects created prior to loading of HDFS defaults, the properties won't be reloaded. Currently, Configuration objects are created and same being passed around. These objects if used to load hdfs specific values, they won't be able to. no?
          For example: DistributedFileSystem uses the Configuration object created prior to loading of itself.

          Show
          Sharad Agarwal added a comment - When you first call FileSystem.get("hdfs:///", conf), your configuration won't yet have hdfs-specific default values, but, in the course of this call, they will be loaded, before any hdfs code references them. I think this is fine, since application code should not be directly referencing hdfs-specific values. For the Configuration objects created prior to loading of HDFS defaults, the properties won't be reloaded. Currently, Configuration objects are created and same being passed around. These objects if used to load hdfs specific values, they won't be able to. no? For example: DistributedFileSystem uses the Configuration object created prior to loading of itself.
          Hide
          Sharad Agarwal added a comment -

          I am thinking that instead of keeping defaults in static member, will it make sense if we keep it in specific instances. This would separate things clearly; though may require more changes.
          JobConf will do addResouce(mapred-default.xml) in its constructor and similarly we can create HDFSConf (extends Configuration) which will do addResource(hdfs-default.xml)
          The hdfs code would instantiate HDFSConf instead of Configuration. If Configuration object is passed to it for example to DistributedFileSystem, then it would wrap it in HDFSConf before looking for hdfs specific values.

          Show
          Sharad Agarwal added a comment - I am thinking that instead of keeping defaults in static member, will it make sense if we keep it in specific instances. This would separate things clearly; though may require more changes. JobConf will do addResouce(mapred-default.xml) in its constructor and similarly we can create HDFSConf (extends Configuration) which will do addResource(hdfs-default.xml) The hdfs code would instantiate HDFSConf instead of Configuration. If Configuration object is passed to it for example to DistributedFileSystem, then it would wrap it in HDFSConf before looking for hdfs specific values.
          Hide
          Doug Cutting added a comment -

          > For the Configuration objects created prior to loading of HDFS defaults, the properties won't be reloaded.

          Hmm. I guess you're right. We could keep a static WeakHashMap listing all existing Configuration instances, and cause them to be reloaded when the list of defaults changes?

          > we can create HDFSConf (extends Configuration) [ ... ]

          That's a pattern we'd like to get away from. The problem is that applications need to be able configure HDFS, MapReduce, Pig, etc., all with a single Configuration instance. Look, for example at the next-generation MapReduce API in:

          http://svn.apache.org/repos/asf/hadoop/core/trunk/src/mapred/org/apache/hadoop/mapreduce/Job.java

          Show
          Doug Cutting added a comment - > For the Configuration objects created prior to loading of HDFS defaults, the properties won't be reloaded. Hmm. I guess you're right. We could keep a static WeakHashMap listing all existing Configuration instances, and cause them to be reloaded when the list of defaults changes? > we can create HDFSConf (extends Configuration) [ ... ] That's a pattern we'd like to get away from. The problem is that applications need to be able configure HDFS, MapReduce, Pig, etc., all with a single Configuration instance. Look, for example at the next-generation MapReduce API in: http://svn.apache.org/repos/asf/hadoop/core/trunk/src/mapred/org/apache/hadoop/mapreduce/Job.java
          Hide
          Sharad Agarwal added a comment -

          this patch:
          Replaces the hadoop-default|site.xml with core-default|site.xml, mapred-default|site.xml and hdfs-default|site.xml
          Keeps the registry of configuration objects in Configuration. Whenever ad new default resource is added, all configuration objects are reloaded

          TODO:
          move the *-default to src
          fix all the documentation related hadoop-default.xml//hadoop-site.xml
          fix the code in contrib which uses hadoop-default.xml/hadoop-site.xml

          Show
          Sharad Agarwal added a comment - this patch: Replaces the hadoop-default|site.xml with core-default|site.xml, mapred-default|site.xml and hdfs-default|site.xml Keeps the registry of configuration objects in Configuration. Whenever ad new default resource is added, all configuration objects are reloaded TODO: move the *-default to src fix all the documentation related hadoop-default.xml//hadoop-site.xml fix the code in contrib which uses hadoop-default.xml/hadoop-site.xml
          Hide
          Doug Cutting added a comment -

          If we want to get this into 0.20, let's file a separate issue for the documentation, as documentation changes can be made after the feature freeze tomorrow. It would be good to get this into 0.20, no?

          Show
          Doug Cutting added a comment - If we want to get this into 0.20, let's file a separate issue for the documentation, as documentation changes can be made after the feature freeze tomorrow. It would be good to get this into 0.20, no?
          Hide
          Sharad Agarwal added a comment -

          It would be good to get this into 0.20, no?

          Apart from documentation, I think there are few things which needs to ironed out before this gets into:

          • backward compatibility: should we keep hadoop-site.xml as well in conf and add it in Configuration class as a default resource before core-site.xml ? It should be sufficient for the backward compatibility, right? Or keeping it will add to the confusion, lets live with the incompatible change ?
            hadoop-default.xml, anyways people should not care about, so we don't need to keep it. Moving of defaults to src folder should not have any impact on applications.
          • Applications can provide conf file via bin/hadoop -conf option. The provided conf file should override the defaults. In the current code, if hdfs or mapred code is linked later on, the default resource are added in the end of the Configuration#resources . This would make default override the user provided config file, which is not correct. We can do :
            private void loadResources(Properties properties,
                                         ArrayList resources,
                                         boolean quiet) {
                if(loadDefaults) {
                  for (String resource : defaultResources) {
                    loadResource(properties, resource, quiet);
                  }
                }
                for (Object resource : resources) {
                  loadResource(properties, resource, quiet);
                }
              }
            
          • currently contrib projects like hod generates the hadoop-site.xml and uses it in various scripts. I am not sure what impact this jira have on such projects. Contrib test cases are very limited and not sufficient to catch problems. So wherever hadoop-site.xml is being referred in the code has to be analyzed closely.
          Show
          Sharad Agarwal added a comment - It would be good to get this into 0.20, no? Apart from documentation, I think there are few things which needs to ironed out before this gets into: backward compatibility: should we keep hadoop-site.xml as well in conf and add it in Configuration class as a default resource before core-site.xml ? It should be sufficient for the backward compatibility, right? Or keeping it will add to the confusion, lets live with the incompatible change ? hadoop-default.xml, anyways people should not care about, so we don't need to keep it. Moving of defaults to src folder should not have any impact on applications. Applications can provide conf file via bin/hadoop -conf option. The provided conf file should override the defaults. In the current code, if hdfs or mapred code is linked later on, the default resource are added in the end of the Configuration#resources . This would make default override the user provided config file, which is not correct. We can do : private void loadResources(Properties properties, ArrayList resources, boolean quiet) { if (loadDefaults) { for ( String resource : defaultResources) { loadResource(properties, resource, quiet); } } for ( Object resource : resources) { loadResource(properties, resource, quiet); } } currently contrib projects like hod generates the hadoop-site.xml and uses it in various scripts. I am not sure what impact this jira have on such projects. Contrib test cases are very limited and not sufficient to catch problems. So wherever hadoop-site.xml is being referred in the code has to be analyzed closely.
          Hide
          Sharad Agarwal added a comment -

          upated to trunk.
          moved *-default.xml files to respective src folder
          fixed default resource loading as mentioned in above comment
          made minor changes in failmon and chukwa

          Show
          Sharad Agarwal added a comment - upated to trunk. moved *-default.xml files to respective src folder fixed default resource loading as mentioned in above comment made minor changes in failmon and chukwa
          Hide
          Doug Cutting added a comment -

          Back-compatibility is a concern. I think we should still always look for and load hadoop-site.xml but not hadoop-default.xml. When we find and load hadoop-site.xml we should emit a warning.

          Show
          Doug Cutting added a comment - Back-compatibility is a concern. I think we should still always look for and load hadoop-site.xml but not hadoop-default.xml. When we find and load hadoop-site.xml we should emit a warning.
          Hide
          Sharad Agarwal added a comment -

          Attaching the patch for review.

          • supported backward compatibility. If hadoop-site.xml is found in classpath it is loaded, emitting a warning.
          • documentation updation will be done in a separate issue.
          • all the core tests are passing on my machine.
          Show
          Sharad Agarwal added a comment - Attaching the patch for review. supported backward compatibility. If hadoop-site.xml is found in classpath it is loaded, emitting a warning. documentation updation will be done in a separate issue. all the core tests are passing on my machine.
          Hide
          Sharad Agarwal added a comment -

          updated to trunk and fixed a javadoc warning.

          Show
          Sharad Agarwal added a comment - updated to trunk and fixed a javadoc warning.
          Hide
          Sharad Agarwal added a comment -

          updated to trunk

          Show
          Sharad Agarwal added a comment - updated to trunk
          Hide
          Hemanth Yamijala added a comment -

          supported backward compatibility. If hadoop-site.xml is found in classpath it is loaded, emitting a warning.

          To give some time (read - past this release) for clients like HOD which may be reasonably impacted by this change, can we avoid this warning from becoming too verbose ? That is, instead of printing it everytime a configuration object is loaded, can we emit it only once ?

          Show
          Hemanth Yamijala added a comment - supported backward compatibility. If hadoop-site.xml is found in classpath it is loaded, emitting a warning. To give some time (read - past this release) for clients like HOD which may be reasonably impacted by this change, can we avoid this warning from becoming too verbose ? That is, instead of printing it everytime a configuration object is loaded, can we emit it only once ?
          Hide
          Devaraj Das added a comment -

          Some minor nits:
          1) Move the property dfs.block.replication from src/c+/libhdfs/tests/conf/core-site.xml to src/c+/libhdfs/tests/conf/hdfs-site.xml
          2) Have a static loading for the conf resources in JobTracker and TaskTracker as well

          Show
          Devaraj Das added a comment - Some minor nits: 1) Move the property dfs.block.replication from src/c+ /libhdfs/tests/conf/core-site.xml to src/c +/libhdfs/tests/conf/hdfs-site.xml 2) Have a static loading for the conf resources in JobTracker and TaskTracker as well
          Hide
          Sharad Agarwal added a comment -

          Incorporated feedback by Hemanth and Devaraj.
          All tests and test-patch passed.

          Show
          Sharad Agarwal added a comment - Incorporated feedback by Hemanth and Devaraj. All tests and test-patch passed.
          Hide
          Devaraj Das added a comment -

          I just committed this. Thanks, Sharad! (Please add a Release Note)

          Show
          Devaraj Das added a comment - I just committed this. Thanks, Sharad! (Please add a Release Note)
          Hide
          Tsz Wo Nicholas Sze added a comment - - edited

          The forrest doc should be updated.

          Show
          Tsz Wo Nicholas Sze added a comment - - edited The forrest doc should be updated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          These tests seem failing after the patch:

              [junit] Running org.apache.hadoop.fs.TestCopyFiles
              [junit] Tests run: 12, Failures: 2, Errors: 8, Time elapsed: 164.049 sec
              [junit] Running org.apache.hadoop.fs.TestDFSIO
              [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 10.468 sec
          ...
              [junit] Running org.apache.hadoop.fs.TestFileSystem
              [junit] Tests run: 5, Failures: 0, Errors: 2, Time elapsed: 20.726 sec
              [junit] Running org.apache.hadoop.fs.TestGetFileBlockLocations
              [junit] Tests run: 3, Failures: 0, Errors: 3, Time elapsed: 30.824 sec
          ...
              [junit] Running org.apache.hadoop.hdfs.TestDFSShell
              [junit] Tests run: 14, Failures: 1, Errors: 0, Time elapsed: 93.783 sec
           
          Show
          Tsz Wo Nicholas Sze added a comment - These tests seem failing after the patch: [junit] Running org.apache.hadoop.fs.TestCopyFiles [junit] Tests run: 12, Failures: 2, Errors: 8, Time elapsed: 164.049 sec [junit] Running org.apache.hadoop.fs.TestDFSIO [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 10.468 sec ... [junit] Running org.apache.hadoop.fs.TestFileSystem [junit] Tests run: 5, Failures: 0, Errors: 2, Time elapsed: 20.726 sec [junit] Running org.apache.hadoop.fs.TestGetFileBlockLocations [junit] Tests run: 3, Failures: 0, Errors: 3, Time elapsed: 30.824 sec ... [junit] Running org.apache.hadoop.hdfs.TestDFSShell [junit] Tests run: 14, Failures: 1, Errors: 0, Time elapsed: 93.783 sec
          Hide
          Chris Douglas added a comment -

          I reverted this. The test cases fail with the patch applied, and pass when it's reverted

          Show
          Chris Douglas added a comment - I reverted this. The test cases fail with the patch applied, and pass when it's reverted
          Hide
          Chris Douglas added a comment -

          The svn and git ignore lists should be updated as part of this patch, too.

          Show
          Chris Douglas added a comment - The svn and git ignore lists should be updated as part of this patch, too.
          Hide
          Sharad Agarwal added a comment -

          The tests are passing on my linux machine with the patch. Also verified with pre-reverted Revision: 726075

          The problem could be user hadoop-site.xml overriding the test properties. With this patch, src/test/hadoop-site.xml is removed and respective properties are moved to core-site.xml, hdfs-site.xml and mapred-site.xml.

          To overcome this problem, we can keep empty src/test/hadoop-site.xml so that it overrides the conf/hadoop-site.xml and user properties are not picked up while executing test cases.

          Show
          Sharad Agarwal added a comment - The tests are passing on my linux machine with the patch. Also verified with pre-reverted Revision: 726075 The problem could be user hadoop-site.xml overriding the test properties. With this patch, src/test/hadoop-site.xml is removed and respective properties are moved to core-site.xml, hdfs-site.xml and mapred-site.xml. To overcome this problem, we can keep empty src/test/hadoop-site.xml so that it overrides the conf/hadoop-site.xml and user properties are not picked up while executing test cases.
          Hide
          Sharad Agarwal added a comment -

          changes from previous patch:

          • Instread of removing, keeps empty:
            src/test/hadoop-site.xml
            src/contrib/test/hadoop-site.xml
            src/c++/libhdfs/tests/conf/hadoop-site.xml
          • adds conf/core-site.xml, conf/hdfs-site.xml and conf/mapred-site.xml to .gitignore
          Show
          Sharad Agarwal added a comment - changes from previous patch: Instread of removing, keeps empty: src/test/hadoop-site.xml src/contrib/test/hadoop-site.xml src/c++/libhdfs/tests/conf/hadoop-site.xml adds conf/core-site.xml, conf/hdfs-site.xml and conf/mapred-site.xml to .gitignore
          Hide
          Sharad Agarwal added a comment -

          all tests and test-patch passed.

          Show
          Sharad Agarwal added a comment - all tests and test-patch passed.
          Hide
          Chris Douglas added a comment -

          I can no longer reproduce the test failures, not even after merging the original changes back in. Sharad's theory sounds like the most likely explanation.

          Nicholas, can you reproduce the failures you were seeing earlier?

          Show
          Chris Douglas added a comment - I can no longer reproduce the test failures, not even after merging the original changes back in. Sharad's theory sounds like the most likely explanation. Nicholas, can you reproduce the failures you were seeing earlier?
          Hide
          Tsz Wo Nicholas Sze added a comment -
               [exec] +1 overall.  
          
               [exec]     +1 @author.  The patch does not contain any @author tags.
          
               [exec]     +1 tests included.  The patch appears to include 51 new or modified tests.
          
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
          
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
          
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
          
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          

          Running unit tests locally.

          Show
          Tsz Wo Nicholas Sze added a comment - [exec] +1 overall. [exec] +1 @author. The patch does not contain any @author tags. [exec] +1 tests included. The patch appears to include 51 new or modified tests. [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. Running unit tests locally.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Only the following tests failed. I think they are not related to this patch.

          Show
          Tsz Wo Nicholas Sze added a comment - Only the following tests failed. I think they are not related to this patch. TestGlobalFilter: see HADOOP-4695 TestJobTrackerRestart: see HADOOP-4220
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 for committing this.

          Show
          Tsz Wo Nicholas Sze added a comment - +1 for committing this.
          Hide
          Chris Douglas added a comment -

          I just (re)committed this. Thanks Sharad

          Show
          Chris Douglas added a comment - I just (re)committed this. Thanks Sharad
          Hide
          Robert Chansler added a comment -

          Edit release note for publication.

          Show
          Robert Chansler added a comment - Edit release note for publication.

            People

            • Assignee:
              Sharad Agarwal
              Reporter:
              Owen O'Malley
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development