Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-849

Renaming of configuration property names in mapreduce

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Hide
      Rename and categorize configuration keys into - cluster, jobtracker, tasktracker, job, client. Constants are defined for all keys in java and code is changed to use constants instead of direct strings. All old keys are deprecated except of examples and tests. The change is incompatible because support for old keys is not provided for config keys in examples.
      Show
      Rename and categorize configuration keys into - cluster, jobtracker, tasktracker, job, client. Constants are defined for all keys in java and code is changed to use constants instead of direct strings. All old keys are deprecated except of examples and tests. The change is incompatible because support for old keys is not provided for config keys in examples.

      Description

      In-line with HDFS-531, property names in configuration files should be standardized in MAPREDUCE.

      1. patch-849-3.txt
        641 kB
        Amareshwari Sriramadasu
      2. patch-849-2.txt
        654 kB
        Amareshwari Sriramadasu
      3. patch-849-1.txt
        654 kB
        Amareshwari Sriramadasu
      4. patch-849.txt
        658 kB
        Amareshwari Sriramadasu
      5. Config changes.xls
        44 kB
        Amareshwari Sriramadasu
      6. Config changes.xls
        45 kB
        Amareshwari Sriramadasu

        Issue Links

          Activity

          Amareshwari Sriramadasu created issue -
          Amareshwari Sriramadasu made changes -
          Field Original Value New Value
          Link This issue is blocked by HADOOP-6105 [ HADOOP-6105 ]
          Amareshwari Sriramadasu made changes -
          Link This issue is related to HDFS-531 [ HDFS-531 ]
          Hide
          Amareshwari Sriramadasu added a comment -

          Configuration properties in Mapreduce project are be catagorized into the following and suggested name for each catagory.

          Catagory Suggested Name
          Cluster config mapreduce.*
          JobTracker config mapreduce.jobtracker.*
          TaskTracker config mapreduce.tasktracker.*
          Job-level config mapreduce.job.*
          Task-level config mapreduce.task.*
          Map task config mapreduce.map.*
          Reduce task config mapreduce.reduce.*
          Job client config mapreduce.jobclient.*
          Pipes config mapreduce.pipes.*
          Lib config mapreduce.<libname>.*
          Example config mapreduce.<example-name>.*
          Test config mapreduce.test.*
          Streaming config mapreduce.streaming.* or streaming.*
          Contrib project config mapreduce.<contrib-project>.* or <contrib-project>.*

          Thoughts?

          Show
          Amareshwari Sriramadasu added a comment - Configuration properties in Mapreduce project are be catagorized into the following and suggested name for each catagory. Catagory Suggested Name Cluster config mapreduce.* JobTracker config mapreduce.jobtracker.* TaskTracker config mapreduce.tasktracker.* Job-level config mapreduce.job.* Task-level config mapreduce.task.* Map task config mapreduce.map.* Reduce task config mapreduce.reduce.* Job client config mapreduce.jobclient.* Pipes config mapreduce.pipes.* Lib config mapreduce.<libname>.* Example config mapreduce.<example-name>.* Test config mapreduce.test.* Streaming config mapreduce.streaming.* or streaming.* Contrib project config mapreduce.<contrib-project>.* or <contrib-project>.* Thoughts?
          Hide
          Vinod Kumar Vavilapalli added a comment -

          These names look a lot cleaner. +1 for the overall direction. But, we should also think of ways to continue doing this going forward even after this issue gets committed.

          While doing this, if we can create the corresponding java.lang.String property names, ala HADOOP-3583 , and use them everywhere, it will be real good. For e.g.,

          static final String MAPREDUCE_CLUSTER_EXAMPLE_CONFIG_PROPERTY = "mapreduce.cluster.example.config"
          

          Also, I think usage of strings like mapreduce.map.max.attempts and mapreduce.jobtracker.maxtasks.per.job should be discouraged in favour of mapreduce.map.max-attempts and mapreduce.jobtracker.maxtasks-per-job respectively. Thoughts about this?

          I am assuming that configuration related to sub-components should start with a prefix of the parent component. For e.g., mapred.healthChecker.script.args will be mapreduce.tasktracker.healthChecker.script-args . Right?

          Show
          Vinod Kumar Vavilapalli added a comment - These names look a lot cleaner. +1 for the overall direction. But, we should also think of ways to continue doing this going forward even after this issue gets committed. While doing this, if we can create the corresponding java.lang.String property names, ala HADOOP-3583 , and use them everywhere, it will be real good. For e.g., static final String MAPREDUCE_CLUSTER_EXAMPLE_CONFIG_PROPERTY = "mapreduce.cluster.example.config" Also, I think usage of strings like mapreduce.map.max.attempts and mapreduce.jobtracker.maxtasks.per.job should be discouraged in favour of mapreduce.map.max-attempts and mapreduce.jobtracker.maxtasks-per-job respectively. Thoughts about this? I am assuming that configuration related to sub-components should start with a prefix of the parent component. For e.g., mapred.healthChecker.script.args will be mapreduce.tasktracker.healthChecker.script-args . Right?
          Amareshwari Sriramadasu made changes -
          Link This issue is blocked by HADOOP-6199 [ HADOOP-6199 ]
          Hide
          Amareshwari Sriramadasu added a comment -

          I am assuming that configuration related to sub-components should start with a prefix of the parent component. For e.g., mapred.healthChecker.script.args will be mapreduce.tasktracker.healthChecker.script-args . Right?

          Yes. I will post a document which contains complete change-list of old name to new name.

          Show
          Amareshwari Sriramadasu added a comment - I am assuming that configuration related to sub-components should start with a prefix of the parent component. For e.g., mapred.healthChecker.script.args will be mapreduce.tasktracker.healthChecker.script-args . Right? Yes. I will post a document which contains complete change-list of old name to new name.
          Hide
          Amareshwari Sriramadasu added a comment -

          Attaching the document listing oldname and newname with proposed naming convention.

          Show
          Amareshwari Sriramadasu added a comment - Attaching the document listing oldname and newname with proposed naming convention.
          Amareshwari Sriramadasu made changes -
          Attachment Config changes.xls [ 12417007 ]
          Hide
          Amareshwari Sriramadasu added a comment -

          Updated the document after changing category of some keys. Also updated configuration names, which are added after the earlier document.

          Show
          Amareshwari Sriramadasu added a comment - Updated the document after changing category of some keys. Also updated configuration names, which are added after the earlier document.
          Amareshwari Sriramadasu made changes -
          Attachment Config changes.xls [ 12417856 ]
          Hide
          Amareshwari Sriramadasu added a comment -

          I propose, we add following interfaces for cluster configuration keys

          public Interface org.apache.hadoop.mapreduce.Configs {
          // holding cluster level configs keys.
          }
          
          public Interface org.apache.hadoop.mapreduce.server.jobtracker.JTConfigs {
          // holding jobtracker config keys
          }
          
          public Interface org.apache.hadoop.mapreduce.server.tasktracker.TTConfigs {
          // holding tasktracker config keys
          }
          

          All job-level keys will be added to org.apache.hadoop.mapreduce.JobContext as public static String.
          These new names will be used everywhere in the framework.

          Thoughts?

          Show
          Amareshwari Sriramadasu added a comment - I propose, we add following interfaces for cluster configuration keys public Interface org.apache.hadoop.mapreduce.Configs { // holding cluster level configs keys. } public Interface org.apache.hadoop.mapreduce.server.jobtracker.JTConfigs { // holding jobtracker config keys } public Interface org.apache.hadoop.mapreduce.server.tasktracker.TTConfigs { // holding tasktracker config keys } All job-level keys will be added to org.apache.hadoop.mapreduce.JobContext as public static String. These new names will be used everywhere in the framework. Thoughts?
          Hide
          Amareshwari Sriramadasu added a comment -

          All job-level keys will be added to org.apache.hadoop.mapreduce.JobContext as public static String.

          These keys will be protected static String, to be in consistent with existing keys.

          Show
          Amareshwari Sriramadasu added a comment - All job-level keys will be added to org.apache.hadoop.mapreduce.JobContext as public static String. These keys will be protected static String, to be in consistent with existing keys.
          Hide
          Amareshwari Sriramadasu added a comment -

          All job-level keys will be added to org.apache.hadoop.mapreduce.JobContext as public static String.

          These keys will be protected static String, to be in consistent with existing keys.

          My bad. Job level keys should be declared as public as for now, to access the keys everywhere. They cannot be protected Strings

          Show
          Amareshwari Sriramadasu added a comment - All job-level keys will be added to org.apache.hadoop.mapreduce.JobContext as public static String. These keys will be protected static String, to be in consistent with existing keys. My bad. Job level keys should be declared as public as for now, to access the keys everywhere. They cannot be protected Strings
          Hide
          Sharad Agarwal added a comment -

          For cluster level properties, I would prefer mapreduce.cluster.* instead of mapreduce.* as it won't be clear and would conflict with other namespaces. We should reserve the namespace for framework properties - mapreduce.cluster., mapreduce.jobtracker., mapreduce.tasktracker., mapreduce.client., mapreduce.job.* etc.

          Show
          Sharad Agarwal added a comment - For cluster level properties, I would prefer mapreduce.cluster.* instead of mapreduce.* as it won't be clear and would conflict with other namespaces. We should reserve the namespace for framework properties - mapreduce.cluster. , mapreduce.jobtracker. , mapreduce.tasktracker. , mapreduce.client. , mapreduce.job.* etc.
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch doing the suggested renaming

          Show
          Amareshwari Sriramadasu added a comment - Patch doing the suggested renaming
          Amareshwari Sriramadasu made changes -
          Attachment patch-849.txt [ 12419867 ]
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch incorporating offline comments from Sharad.

          Show
          Amareshwari Sriramadasu added a comment - Patch incorporating offline comments from Sharad.
          Amareshwari Sriramadasu made changes -
          Attachment patch-849-1.txt [ 12419888 ]
          Amareshwari Sriramadasu made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Amareshwari Sriramadasu made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch updated with trunk

          Show
          Amareshwari Sriramadasu added a comment - Patch updated with trunk
          Amareshwari Sriramadasu made changes -
          Attachment patch-849-2.txt [ 12419914 ]
          Amareshwari Sriramadasu made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12419914/patch-849-2.txt
          against trunk revision 816240.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 396 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/99/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/99/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/99/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/99/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419914/patch-849-2.txt against trunk revision 816240. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 396 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/99/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/99/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/99/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/99/console This message is automatically generated.
          Hide
          Sharad Agarwal added a comment -

          Glanced over the patch. Mostly looking fine. Some minor nits:
          To be consistent let all properties start with TT_ in TTConfig. Similarly for JTConfig, all should start from JT_
          JTConfig.JT could have a better name
          mapreduce.jobtracker.taskalloc.capacitypad is scheduler property. Should we rename it to mapreduce.jobtracker.scheduler.taskalloc.capacitypad to be more clear
          JTConfig.HOSTS and JTConfig.HOSTS_EXCLUDE takes file name. It will be better if we have that in the name. Something like mapreduce.jobtracker.hosts.exclude.filename
          In MRConfig, MAX_MAPMEMORY_MB and MAX_REDUCEMEMORY_MB are used only in jobtracker. so these should be moved to JTConfig
          I don't see any references of MRConfig.MAX_TRACKER_BLACKLISTS.

          Also currently mapred-default.xml doesn't document all the properties. We should file a documentation jira to add all missing properties in mapred-default.xml

          Show
          Sharad Agarwal added a comment - Glanced over the patch. Mostly looking fine. Some minor nits: To be consistent let all properties start with TT_ in TTConfig. Similarly for JTConfig, all should start from JT_ JTConfig.JT could have a better name mapreduce.jobtracker.taskalloc.capacitypad is scheduler property. Should we rename it to mapreduce.jobtracker.scheduler.taskalloc.capacitypad to be more clear JTConfig.HOSTS and JTConfig.HOSTS_EXCLUDE takes file name. It will be better if we have that in the name. Something like mapreduce.jobtracker.hosts.exclude.filename In MRConfig, MAX_MAPMEMORY_MB and MAX_REDUCEMEMORY_MB are used only in jobtracker. so these should be moved to JTConfig I don't see any references of MRConfig.MAX_TRACKER_BLACKLISTS. Also currently mapred-default.xml doesn't document all the properties. We should file a documentation jira to add all missing properties in mapred-default.xml
          Amareshwari Sriramadasu made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Amareshwari Sriramadasu added a comment -

          Patch incorporating the review comments.

          Show
          Amareshwari Sriramadasu added a comment - Patch incorporating the review comments.
          Amareshwari Sriramadasu made changes -
          Attachment patch-849-3.txt [ 12420000 ]
          Amareshwari Sriramadasu made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hadoop Flags [Incompatible change]
          Hide
          Sharad Agarwal added a comment -

          +1. Patch looks fine to me. I will commit once hudson passes.

          Show
          Sharad Agarwal added a comment - +1. Patch looks fine to me. I will commit once hudson passes.
          Hide
          Sharad Agarwal added a comment -

          Retrying hudson

          Show
          Sharad Agarwal added a comment - Retrying hudson
          Sharad Agarwal made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Sharad Agarwal made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Ravi Gummadi added a comment -

          Unit tests passed on my local machine.

          ant test-patch gave:

          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 396 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          Show
          Ravi Gummadi added a comment - Unit tests passed on my local machine. ant test-patch gave: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 396 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          Hide
          Sharad Agarwal added a comment -

          I just committed this. Thanks Amareshwari!

          Show
          Sharad Agarwal added a comment - I just committed this. Thanks Amareshwari!
          Sharad Agarwal made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Incompatible change] [Incompatible change, Reviewed]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #55 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/55/)
          . Rename configuration properties. Contributed by Amareshwari Sriramadasu.

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #55 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/55/ ) . Rename configuration properties. Contributed by Amareshwari Sriramadasu.
          Hide
          Nigel Daley added a comment -

          No release note for this massively incompatible change?

          Show
          Nigel Daley added a comment - No release note for this massively incompatible change?
          Hide
          Hemanth Yamijala added a comment -

          Umm. I am confused. Why is this incompatible. We are using the backwards compatible feature provided in HADOOP-6105, right ? So old keys work just the same, they are deprecated thats all, isn't it ? Or did I miss something ?

          Show
          Hemanth Yamijala added a comment - Umm. I am confused. Why is this incompatible. We are using the backwards compatible feature provided in HADOOP-6105 , right ? So old keys work just the same, they are deprecated thats all, isn't it ? Or did I miss something ?
          Hide
          Sharad Agarwal added a comment -

          Why is this incompatible.

          We deprecated all old keys except of examples and tests. I assume for tests we don't need to bother. We made incompatible due to keys in examples not being deprecated.

          Show
          Sharad Agarwal added a comment - Why is this incompatible. We deprecated all old keys except of examples and tests. I assume for tests we don't need to bother. We made incompatible due to keys in examples not being deprecated.
          Sharad Agarwal made changes -
          Release Note Rename and categorize configuration keys into - cluster, jobtracker, tasktracker, job, client. Constants are defined for all keys in java and code is changed to use constants instead of direct strings. All old keys are deprecated except of examples and tests. The change is incompatible because support for old keys is not provided for config keys in examples.
          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Arun C Murthy added a comment -

          Ok, this patch has a big problem.

          Well, not the patch.

          The crux is that we shouldn't have added the new config knobs to mapred-default.xml. Configuration is setup to look for new config knobs when old ones are used, which is fine. However, the new config knobs take precedence over old ones and by setting new ones in mapred-default.xml the old ones are never found...

          Show
          Arun C Murthy added a comment - Ok, this patch has a big problem. Well, not the patch. The crux is that we shouldn't have added the new config knobs to mapred-default.xml. Configuration is setup to look for new config knobs when old ones are used, which is fine. However, the new config knobs take precedence over old ones and by setting new ones in mapred-default.xml the old ones are never found...
          Hide
          Arun C Murthy added a comment -

          Never mind. Amarsri educated me on the arcane vagaries of the system and why this isn't a problem... I basically ran into HADOOP-7287.

          Show
          Arun C Murthy added a comment - Never mind. Amarsri educated me on the arcane vagaries of the system and why this isn't a problem... I basically ran into HADOOP-7287 .
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Patch Available Patch Available Open Open
          20h 57m 3 Sharad Agarwal 18/Sep/09 12:24
          Open Open Patch Available Patch Available
          36d 5h 32m 4 Sharad Agarwal 18/Sep/09 12:24
          Patch Available Patch Available Resolved Resolved
          3h 54m 1 Sharad Agarwal 18/Sep/09 16:19
          Resolved Resolved Closed Closed
          340d 5h 55m 1 Tom White 24/Aug/10 22:15

            People

            • Assignee:
              Amareshwari Sriramadasu
              Reporter:
              Amareshwari Sriramadasu
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development