Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-849

Renaming of configuration property names in mapreduce

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Hide
      Rename and categorize configuration keys into - cluster, jobtracker, tasktracker, job, client. Constants are defined for all keys in java and code is changed to use constants instead of direct strings. All old keys are deprecated except of examples and tests. The change is incompatible because support for old keys is not provided for config keys in examples.
      Show
      Rename and categorize configuration keys into - cluster, jobtracker, tasktracker, job, client. Constants are defined for all keys in java and code is changed to use constants instead of direct strings. All old keys are deprecated except of examples and tests. The change is incompatible because support for old keys is not provided for config keys in examples.

      Description

      In-line with HDFS-531, property names in configuration files should be standardized in MAPREDUCE.

      1. patch-849-3.txt
        641 kB
        Amareshwari Sriramadasu
      2. patch-849-2.txt
        654 kB
        Amareshwari Sriramadasu
      3. patch-849-1.txt
        654 kB
        Amareshwari Sriramadasu
      4. patch-849.txt
        658 kB
        Amareshwari Sriramadasu
      5. Config changes.xls
        45 kB
        Amareshwari Sriramadasu
      6. Config changes.xls
        44 kB
        Amareshwari Sriramadasu

        Issue Links

          Activity

          Hide
          Amareshwari Sriramadasu added a comment -

          Configuration properties in Mapreduce project are be catagorized into the following and suggested name for each catagory.

          Catagory Suggested Name
          Cluster config mapreduce.*
          JobTracker config mapreduce.jobtracker.*
          TaskTracker config mapreduce.tasktracker.*
          Job-level config mapreduce.job.*
          Task-level config mapreduce.task.*
          Map task config mapreduce.map.*
          Reduce task config mapreduce.reduce.*
          Job client config mapreduce.jobclient.*
          Pipes config mapreduce.pipes.*
          Lib config mapreduce.<libname>.*
          Example config mapreduce.<example-name>.*
          Test config mapreduce.test.*
          Streaming config mapreduce.streaming.* or streaming.*
          Contrib project config mapreduce.<contrib-project>.* or <contrib-project>.*

          Thoughts?

          Show
          Amareshwari Sriramadasu added a comment - Configuration properties in Mapreduce project are be catagorized into the following and suggested name for each catagory. Catagory Suggested Name Cluster config mapreduce.* JobTracker config mapreduce.jobtracker.* TaskTracker config mapreduce.tasktracker.* Job-level config mapreduce.job.* Task-level config mapreduce.task.* Map task config mapreduce.map.* Reduce task config mapreduce.reduce.* Job client config mapreduce.jobclient.* Pipes config mapreduce.pipes.* Lib config mapreduce.<libname>.* Example config mapreduce.<example-name>.* Test config mapreduce.test.* Streaming config mapreduce.streaming.* or streaming.* Contrib project config mapreduce.<contrib-project>.* or <contrib-project>.* Thoughts?
          Hide
          Vinod Kumar Vavilapalli added a comment -

          These names look a lot cleaner. +1 for the overall direction. But, we should also think of ways to continue doing this going forward even after this issue gets committed.

          While doing this, if we can create the corresponding java.lang.String property names, ala HADOOP-3583 , and use them everywhere, it will be real good. For e.g.,

          static final String MAPREDUCE_CLUSTER_EXAMPLE_CONFIG_PROPERTY = "mapreduce.cluster.example.config"
          

          Also, I think usage of strings like mapreduce.map.max.attempts and mapreduce.jobtracker.maxtasks.per.job should be discouraged in favour of mapreduce.map.max-attempts and mapreduce.jobtracker.maxtasks-per-job respectively. Thoughts about this?

          I am assuming that configuration related to sub-components should start with a prefix of the parent component. For e.g., mapred.healthChecker.script.args will be mapreduce.tasktracker.healthChecker.script-args . Right?

          Show
          Vinod Kumar Vavilapalli added a comment - These names look a lot cleaner. +1 for the overall direction. But, we should also think of ways to continue doing this going forward even after this issue gets committed. While doing this, if we can create the corresponding java.lang.String property names, ala HADOOP-3583 , and use them everywhere, it will be real good. For e.g., static final String MAPREDUCE_CLUSTER_EXAMPLE_CONFIG_PROPERTY = "mapreduce.cluster.example.config" Also, I think usage of strings like mapreduce.map.max.attempts and mapreduce.jobtracker.maxtasks.per.job should be discouraged in favour of mapreduce.map.max-attempts and mapreduce.jobtracker.maxtasks-per-job respectively. Thoughts about this? I am assuming that configuration related to sub-components should start with a prefix of the parent component. For e.g., mapred.healthChecker.script.args will be mapreduce.tasktracker.healthChecker.script-args . Right?
          Hide
          Amareshwari Sriramadasu added a comment -

          I am assuming that configuration related to sub-components should start with a prefix of the parent component. For e.g., mapred.healthChecker.script.args will be mapreduce.tasktracker.healthChecker.script-args . Right?

          Yes. I will post a document which contains complete change-list of old name to new name.

          Show
          Amareshwari Sriramadasu added a comment - I am assuming that configuration related to sub-components should start with a prefix of the parent component. For e.g., mapred.healthChecker.script.args will be mapreduce.tasktracker.healthChecker.script-args . Right? Yes. I will post a document which contains complete change-list of old name to new name.
          Hide
          Amareshwari Sriramadasu added a comment -

          Attaching the document listing oldname and newname with proposed naming convention.

          Show
          Amareshwari Sriramadasu added a comment - Attaching the document listing oldname and newname with proposed naming convention.
          Hide
          Amareshwari Sriramadasu added a comment -

          Updated the document after changing category of some keys. Also updated configuration names, which are added after the earlier document.

          Show
          Amareshwari Sriramadasu added a comment - Updated the document after changing category of some keys. Also updated configuration names, which are added after the earlier document.
          Hide
          Amareshwari Sriramadasu added a comment -

          I propose, we add following interfaces for cluster configuration keys

          public Interface org.apache.hadoop.mapreduce.Configs {
          // holding cluster level configs keys.
          }
          
          public Interface org.apache.hadoop.mapreduce.server.jobtracker.JTConfigs {
          // holding jobtracker config keys
          }
          
          public Interface org.apache.hadoop.mapreduce.server.tasktracker.TTConfigs {
          // holding tasktracker config keys
          }
          

          All job-level keys will be added to org.apache.hadoop.mapreduce.JobContext as public static String.
          These new names will be used everywhere in the framework.

          Thoughts?

          Show
          Amareshwari Sriramadasu added a comment - I propose, we add following interfaces for cluster configuration keys public Interface org.apache.hadoop.mapreduce.Configs { // holding cluster level configs keys. } public Interface org.apache.hadoop.mapreduce.server.jobtracker.JTConfigs { // holding jobtracker config keys } public Interface org.apache.hadoop.mapreduce.server.tasktracker.TTConfigs { // holding tasktracker config keys } All job-level keys will be added to org.apache.hadoop.mapreduce.JobContext as public static String. These new names will be used everywhere in the framework. Thoughts?
          Hide
          Amareshwari Sriramadasu added a comment -

          All job-level keys will be added to org.apache.hadoop.mapreduce.JobContext as public static String.

          These keys will be protected static String, to be in consistent with existing keys.

          Show
          Amareshwari Sriramadasu added a comment - All job-level keys will be added to org.apache.hadoop.mapreduce.JobContext as public static String. These keys will be protected static String, to be in consistent with existing keys.
          Hide
          Amareshwari Sriramadasu added a comment -

          All job-level keys will be added to org.apache.hadoop.mapreduce.JobContext as public static String.

          These keys will be protected static String, to be in consistent with existing keys.

          My bad. Job level keys should be declared as public as for now, to access the keys everywhere. They cannot be protected Strings

          Show
          Amareshwari Sriramadasu added a comment - All job-level keys will be added to org.apache.hadoop.mapreduce.JobContext as public static String. These keys will be protected static String, to be in consistent with existing keys. My bad. Job level keys should be declared as public as for now, to access the keys everywhere. They cannot be protected Strings
          Hide
          Sharad Agarwal added a comment -

          For cluster level properties, I would prefer mapreduce.cluster.* instead of mapreduce.* as it won't be clear and would conflict with other namespaces. We should reserve the namespace for framework properties - mapreduce.cluster., mapreduce.jobtracker., mapreduce.tasktracker., mapreduce.client., mapreduce.job.* etc.

          Show
          Sharad Agarwal added a comment - For cluster level properties, I would prefer mapreduce.cluster.* instead of mapreduce.* as it won't be clear and would conflict with other namespaces. We should reserve the namespace for framework properties - mapreduce.cluster. , mapreduce.jobtracker. , mapreduce.tasktracker. , mapreduce.client. , mapreduce.job.* etc.
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch doing the suggested renaming

          Show
          Amareshwari Sriramadasu added a comment - Patch doing the suggested renaming
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch incorporating offline comments from Sharad.

          Show
          Amareshwari Sriramadasu added a comment - Patch incorporating offline comments from Sharad.
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch updated with trunk

          Show
          Amareshwari Sriramadasu added a comment - Patch updated with trunk
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12419914/patch-849-2.txt
          against trunk revision 816240.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 396 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/99/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/99/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/99/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/99/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419914/patch-849-2.txt against trunk revision 816240. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 396 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/99/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/99/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/99/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/99/console This message is automatically generated.
          Hide
          Sharad Agarwal added a comment -

          Glanced over the patch. Mostly looking fine. Some minor nits:
          To be consistent let all properties start with TT_ in TTConfig. Similarly for JTConfig, all should start from JT_
          JTConfig.JT could have a better name
          mapreduce.jobtracker.taskalloc.capacitypad is scheduler property. Should we rename it to mapreduce.jobtracker.scheduler.taskalloc.capacitypad to be more clear
          JTConfig.HOSTS and JTConfig.HOSTS_EXCLUDE takes file name. It will be better if we have that in the name. Something like mapreduce.jobtracker.hosts.exclude.filename
          In MRConfig, MAX_MAPMEMORY_MB and MAX_REDUCEMEMORY_MB are used only in jobtracker. so these should be moved to JTConfig
          I don't see any references of MRConfig.MAX_TRACKER_BLACKLISTS.

          Also currently mapred-default.xml doesn't document all the properties. We should file a documentation jira to add all missing properties in mapred-default.xml

          Show
          Sharad Agarwal added a comment - Glanced over the patch. Mostly looking fine. Some minor nits: To be consistent let all properties start with TT_ in TTConfig. Similarly for JTConfig, all should start from JT_ JTConfig.JT could have a better name mapreduce.jobtracker.taskalloc.capacitypad is scheduler property. Should we rename it to mapreduce.jobtracker.scheduler.taskalloc.capacitypad to be more clear JTConfig.HOSTS and JTConfig.HOSTS_EXCLUDE takes file name. It will be better if we have that in the name. Something like mapreduce.jobtracker.hosts.exclude.filename In MRConfig, MAX_MAPMEMORY_MB and MAX_REDUCEMEMORY_MB are used only in jobtracker. so these should be moved to JTConfig I don't see any references of MRConfig.MAX_TRACKER_BLACKLISTS. Also currently mapred-default.xml doesn't document all the properties. We should file a documentation jira to add all missing properties in mapred-default.xml
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch incorporating the review comments.

          Show
          Amareshwari Sriramadasu added a comment - Patch incorporating the review comments.
          Hide
          Sharad Agarwal added a comment -

          +1. Patch looks fine to me. I will commit once hudson passes.

          Show
          Sharad Agarwal added a comment - +1. Patch looks fine to me. I will commit once hudson passes.
          Hide
          Sharad Agarwal added a comment -

          Retrying hudson

          Show
          Sharad Agarwal added a comment - Retrying hudson
          Hide
          Ravi Gummadi added a comment -

          Unit tests passed on my local machine.

          ant test-patch gave:

          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 396 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Show
          Ravi Gummadi added a comment - Unit tests passed on my local machine. ant test-patch gave: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 396 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          Hide
          Sharad Agarwal added a comment -

          I just committed this. Thanks Amareshwari!

          Show
          Sharad Agarwal added a comment - I just committed this. Thanks Amareshwari!
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #55 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/55/)
          . Rename configuration properties. Contributed by Amareshwari Sriramadasu.

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #55 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/55/ ) . Rename configuration properties. Contributed by Amareshwari Sriramadasu.
          Hide
          Nigel Daley added a comment -

          No release note for this massively incompatible change?

          Show
          Nigel Daley added a comment - No release note for this massively incompatible change?
          Hide
          Hemanth Yamijala added a comment -

          Umm. I am confused. Why is this incompatible. We are using the backwards compatible feature provided in HADOOP-6105, right ? So old keys work just the same, they are deprecated thats all, isn't it ? Or did I miss something ?

          Show
          Hemanth Yamijala added a comment - Umm. I am confused. Why is this incompatible. We are using the backwards compatible feature provided in HADOOP-6105 , right ? So old keys work just the same, they are deprecated thats all, isn't it ? Or did I miss something ?
          Hide
          Sharad Agarwal added a comment -

          Why is this incompatible.

          We deprecated all old keys except of examples and tests. I assume for tests we don't need to bother. We made incompatible due to keys in examples not being deprecated.

          Show
          Sharad Agarwal added a comment - Why is this incompatible. We deprecated all old keys except of examples and tests. I assume for tests we don't need to bother. We made incompatible due to keys in examples not being deprecated.
          Hide
          Arun C Murthy added a comment -

          Ok, this patch has a big problem.

          Well, not the patch.

          The crux is that we shouldn't have added the new config knobs to mapred-default.xml. Configuration is setup to look for new config knobs when old ones are used, which is fine. However, the new config knobs take precedence over old ones and by setting new ones in mapred-default.xml the old ones are never found...

          Show
          Arun C Murthy added a comment - Ok, this patch has a big problem. Well, not the patch. The crux is that we shouldn't have added the new config knobs to mapred-default.xml. Configuration is setup to look for new config knobs when old ones are used, which is fine. However, the new config knobs take precedence over old ones and by setting new ones in mapred-default.xml the old ones are never found...
          Hide
          Arun C Murthy added a comment -

          Never mind. Amarsri educated me on the arcane vagaries of the system and why this isn't a problem... I basically ran into HADOOP-7287.

          Show
          Arun C Murthy added a comment - Never mind. Amarsri educated me on the arcane vagaries of the system and why this isn't a problem... I basically ran into HADOOP-7287 .

            People

            • Assignee:
              Amareshwari Sriramadasu
              Reporter:
              Amareshwari Sriramadasu
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development