Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-768

Configuration information should generate dump in a standard format.

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Provides an ability to dump jobtracker configuration in JSON format to standard output and exits.
      To dump, use hadoop jobtracker -dumpConfiguration
      The format of the dump is {"properties":[{"key":<key>,"value":<value>,"isFinal":<true/false>,"resource" : <resource>}] }
      Show
      Provides an ability to dump jobtracker configuration in JSON format to standard output and exits. To dump, use hadoop jobtracker -dumpConfiguration The format of the dump is {"properties":[{"key":<key>,"value":<value>,"isFinal":<true/false>,"resource" : <resource>}] }

      Description

      We need to generate the configuration dump in a standard format .

      1. MAPREDUCE-768-ydist-1.patch
        4 kB
        V.V.Chaitanya Krishna
      2. MAPREDUCE-768-ydist.patch
        6 kB
        V.V.Chaitanya Krishna
      3. MAPREDUCE-768-7.patch
        6 kB
        V.V.Chaitanya Krishna
      4. commands_manual.pdf
        45 kB
        V.V.Chaitanya Krishna
      5. MAPREDUCE-768-6.patch
        5 kB
        V.V.Chaitanya Krishna
      6. MAPREDUCE-768-5.patch
        3 kB
        V.V.Chaitanya Krishna
      7. jobtracker_configurationdump.txt
        16 kB
        V.V.Chaitanya Krishna
      8. MAPREDUCE-768-5.patch
        3 kB
        V.V.Chaitanya Krishna
      9. MAPREDUCE-768-4.patch
        3 kB
        V.V.Chaitanya Krishna
      10. MAPREDUCE-768-3.patch
        3 kB
        V.V.Chaitanya Krishna
      11. MAPREDUCE-768-2.patch
        3 kB
        V.V.Chaitanya Krishna
      12. MAPREDUCE-768-1.patch
        3 kB
        V.V.Chaitanya Krishna
      13. MAPREDUCE-768.patch
        1 kB
        V.V.Chaitanya Krishna

        Issue Links

          Activity

          Hide
          V.V.Chaitanya Krishna added a comment -

          The previous patch for Yahoo! internal distribution seems to be incompatible with the changes made in mapred-default.xml. Uploading patch for Yahoo! internal distribution with this issue resolved.

          Show
          V.V.Chaitanya Krishna added a comment - The previous patch for Yahoo! internal distribution seems to be incompatible with the changes made in mapred-default.xml. Uploading patch for Yahoo! internal distribution with this issue resolved.
          Hide
          Hemanth Yamijala added a comment -

          I committed this to trunk. Thanks, Chaitanya !

          Show
          Hemanth Yamijala added a comment - I committed this to trunk. Thanks, Chaitanya !
          Hide
          Hemanth Yamijala added a comment -

          Manual tests have been run to verify the command line works as expected. Regression testing has been done with respect to starting the JT without the Command line option and verifying it starts up as usual. Jobs submitted run fine.

          On the basis of these tests, I will commit the patch to trunk.

          Show
          Hemanth Yamijala added a comment - Manual tests have been run to verify the command line works as expected. Regression testing has been done with respect to starting the JT without the Command line option and verifying it starts up as usual. Jobs submitted run fine. On the basis of these tests, I will commit the patch to trunk.
          Hide
          V.V.Chaitanya Krishna added a comment -

          uploading patch for the internal Yahoo! distribution

          Show
          V.V.Chaitanya Krishna added a comment - uploading patch for the internal Yahoo! distribution
          Hide
          V.V.Chaitanya Krishna added a comment -

          tests and test-patch ran successfully.

          Show
          V.V.Chaitanya Krishna added a comment - tests and test-patch ran successfully.
          Hide
          V.V.Chaitanya Krishna added a comment -

          Uploading the patch with changes needed for documentation done.

          Show
          V.V.Chaitanya Krishna added a comment - Uploading the patch with changes needed for documentation done.
          Hide
          V.V.Chaitanya Krishna added a comment -

          uploading the documentation in pdf format.

          Show
          V.V.Chaitanya Krishna added a comment - uploading the documentation in pdf format.
          Hide
          V.V.Chaitanya Krishna added a comment -

          The mapred-default.xml contains two properties related to queues, which are also present in QueueManager's xml file. Uploading patch with these properties removed from mapred-default.xml to prevent duplication.

          Show
          V.V.Chaitanya Krishna added a comment - The mapred-default.xml contains two properties related to queues, which are also present in QueueManager's xml file. Uploading patch with these properties removed from mapred-default.xml to prevent duplication.
          Hide
          V.V.Chaitanya Krishna added a comment -

          All the tests passed locally, i.e., for contrib and core.

          Show
          V.V.Chaitanya Krishna added a comment - All the tests passed locally, i.e., for contrib and core.
          Hide
          Hemanth Yamijala added a comment -

          +1 for code changes. Trying hudson.

          Show
          Hemanth Yamijala added a comment - +1 for code changes. Trying hudson.
          Hide
          V.V.Chaitanya Krishna added a comment -

          re-uploading the patch for hudson to pick it up.

          Show
          V.V.Chaitanya Krishna added a comment - re-uploading the patch for hudson to pick it up.
          Hide
          V.V.Chaitanya Krishna added a comment -

          uploading the file that contains the configuration dump when -dumpConfiguration option is given.

          Show
          V.V.Chaitanya Krishna added a comment - uploading the file that contains the configuration dump when -dumpConfiguration option is given.
          Hide
          V.V.Chaitanya Krishna added a comment -

          ran test-patch and result is +1 for all except test cases. Since it is a new startup parameter that is being introduced for the jobtracker, it has been manually verified.

          The result of test patch is :

          +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
          [exec] Please justify why no new tests are needed for this patch.
          [exec] Also please list what manual steps were performed to verify this patch.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          [exec]

          Show
          V.V.Chaitanya Krishna added a comment - ran test-patch and result is +1 for all except test cases. Since it is a new startup parameter that is being introduced for the jobtracker, it has been manually verified. The result of test patch is : +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec]
          Hide
          V.V.Chaitanya Krishna added a comment -

          the JobTracker.dumpConfiguration has a System.out.println() which should actually be writer.write("\n"). Uploading patch with this correction.

          Show
          V.V.Chaitanya Krishna added a comment - the JobTracker.dumpConfiguration has a System.out.println() which should actually be writer.write("\n"). Uploading patch with this correction.
          Hide
          V.V.Chaitanya Krishna added a comment -

          uploading patch with javadoc corrected.

          Show
          V.V.Chaitanya Krishna added a comment - uploading patch with javadoc corrected.
          Hide
          Hemanth Yamijala added a comment -

          Javadocs out of sync in both the APIs JobTracker.dumpConfiguration and QueueManager.dumpConfiguration. Other than that, +1.

          Show
          Hemanth Yamijala added a comment - Javadocs out of sync in both the APIs JobTracker.dumpConfiguration and QueueManager.dumpConfiguration. Other than that, +1.
          Hide
          V.V.Chaitanya Krishna added a comment -

          Also, the above points mentioned by Hemanth are taken care of in the new patch uploaded.

          Show
          V.V.Chaitanya Krishna added a comment - Also, the above points mentioned by Hemanth are taken care of in the new patch uploaded.
          Hide
          V.V.Chaitanya Krishna added a comment -

          The patch is not compatible with the recent updates in mapreduce. Uploading patch with this issue resolved.

          Show
          V.V.Chaitanya Krishna added a comment - The patch is not compatible with the recent updates in mapreduce. Uploading patch with this issue resolved.
          Hide
          Hemanth Yamijala added a comment -

          I think we need a new patch, because the one on the jira currently is not applying.

          But I briefly looked at the patch, and can think of a few minor comments:

          • I think JobTracker.dumpConfiguration should not take JobConf as a parameter. It should create one inside the call.
          • Similarly, QueueManager.dumpConfiguration should also not take a JobConf. Further, it should not load the default resources, because otherwise, the JobTracker's configuration would get dumped twice.
          Show
          Hemanth Yamijala added a comment - I think we need a new patch, because the one on the jira currently is not applying. But I briefly looked at the patch, and can think of a few minor comments: I think JobTracker.dumpConfiguration should not take JobConf as a parameter. It should create one inside the call. Similarly, QueueManager.dumpConfiguration should also not take a JobConf. Further, it should not load the default resources, because otherwise, the JobTracker's configuration would get dumped twice.
          Hide
          Hemanth Yamijala added a comment -

          Because Config can pull in JVM properties, you do need to do the expansion on the host that is using the configuration.

          The current scope of this JIRA is to do the dump on the host that is using the configuration. Hence, this is covered in HADOOP-6184.

          It seems sensible to make this a general purpose Tools option,, print my config to stdout, so that anyone using any tool can see the values

          It's also handy to be able to ask a remote service endpoint for their config -any node, master or slave, should be able to serve up the config to someone it trusts. Which introduces one small problem -only users with admin rights should be allowed to see the configurations, in case they contain passwords or other sensitive topics.

          These two are good points and I think we should do them as incremental work. I recommend we think about it filing another JIRA for the same after this goes in.

          Show
          Hemanth Yamijala added a comment - Because Config can pull in JVM properties, you do need to do the expansion on the host that is using the configuration. The current scope of this JIRA is to do the dump on the host that is using the configuration. Hence, this is covered in HADOOP-6184 . It seems sensible to make this a general purpose Tools option,, print my config to stdout, so that anyone using any tool can see the values It's also handy to be able to ask a remote service endpoint for their config -any node, master or slave, should be able to serve up the config to someone it trusts. Which introduces one small problem -only users with admin rights should be allowed to see the configurations, in case they contain passwords or other sensitive topics. These two are good points and I think we should do them as incremental work. I recommend we think about it filing another JIRA for the same after this goes in.
          Hide
          steve_l added a comment -

          in Configuration Management, getting a dump of resolved values is part of the "preflight" process; checking all is well. The more validation you can get off before you go, the better.

          1. Because Config can pull in JVM properties, you do need to do the expansion on the host that is using the configuration.
          2. It seems sensible to make this a general purpose Tools option,, print my config to stdout, so that anyone using any tool can see the values
          3. It's also handy to be able to ask a remote service endpoint for their config -any node, master or slave, should be able to serve up the config to someone it trusts. Which introduces one small problem -only users with admin rights should be allowed to see the configurations, in case they contain passwords or other sensitive topics.
          Show
          steve_l added a comment - in Configuration Management, getting a dump of resolved values is part of the "preflight" process; checking all is well. The more validation you can get off before you go, the better. Because Config can pull in JVM properties, you do need to do the expansion on the host that is using the configuration. It seems sensible to make this a general purpose Tools option,, print my config to stdout, so that anyone using any tool can see the values It's also handy to be able to ask a remote service endpoint for their config -any node, master or slave, should be able to serve up the config to someone it trusts. Which introduces one small problem -only users with admin rights should be allowed to see the configurations, in case they contain passwords or other sensitive topics.
          Hide
          Sreekanth Ramakrishnan added a comment -

          The changes in the patch look fine to me.
          +1 to patch.

          Show
          Sreekanth Ramakrishnan added a comment - The changes in the patch look fine to me. +1 to patch.
          Hide
          V.V.Chaitanya Krishna added a comment -

          uploading an updated patch with the above points considered. Since we are printing the usage only once, it is not being extracted into a different method.
          The other two suggestions are implemented.

          Show
          V.V.Chaitanya Krishna added a comment - uploading an updated patch with the above points considered. Since we are printing the usage only once, it is not being extracted into a different method. The other two suggestions are implemented.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Took a look at the patch:

          • Extract the printing of usage into a new method.
          • Change the usage string to "JobTracker [-dumpConfiguration]"
          • Change the current if else condition in JobTracker to do the following:
            if args.length == 0
              start jobtracker
            else
              if args[1] == "-dumpconfiguration"
                 dump configuration
              else
                 print usage
            
          Show
          Sreekanth Ramakrishnan added a comment - Took a look at the patch: Extract the printing of usage into a new method. Change the usage string to "JobTracker [-dumpConfiguration] " Change the current if else condition in JobTracker to do the following: if args.length == 0 start jobtracker else if args[1] == "-dumpconfiguration" dump configuration else print usage
          Hide
          V.V.Chaitanya Krishna added a comment -

          Uploading patch with the JobTracker.java and QueueManager.java modified in order to have provision to print the configuration properties in json format.

          Show
          V.V.Chaitanya Krishna added a comment - Uploading patch with the JobTracker.java and QueueManager.java modified in order to have provision to print the configuration properties in json format.
          Hide
          V.V.Chaitanya Krishna added a comment -

          Example for above comment:
          in mapred-config.sh:

          export HADOOP_JOBTRACKER_OPTS="-Dmapred.jobtracker.dumpconfiguration=true

          Show
          V.V.Chaitanya Krishna added a comment - Example for above comment: in mapred-config.sh: export HADOOP_JOBTRACKER_OPTS="-Dmapred.jobtracker.dumpconfiguration=true
          Hide
          V.V.Chaitanya Krishna added a comment -

          The users can enable the output of configuration properties in json format by setting an environment variable. For example, in mapred-config.sh, one can keep a variable mapred.jobtracker.dumpconfiguration to true in order to get the json string of properties printed into the standard output stream.

          Show
          V.V.Chaitanya Krishna added a comment - The users can enable the output of configuration properties in json format by setting an environment variable. For example, in mapred-config.sh, one can keep a variable mapred.jobtracker.dumpconfiguration to true in order to get the json string of properties printed into the standard output stream.
          Hide
          V.V.Chaitanya Krishna added a comment -

          uploaded patch which handles the output of various properties in json format to standard output stream.
          It requires patch related to JIRA: HADOOP-6184

          Show
          V.V.Chaitanya Krishna added a comment - uploaded patch which handles the output of various properties in json format to standard output stream. It requires patch related to JIRA: HADOOP-6184
          Hide
          rahul k singh added a comment -

          The current motivation is to allow administrator to have look at the configuration , as errors in configuration have evaded detection for long. The description field would add more verbosity to the information.

          The idea here is that adminitrators would use this json dump format and would write there own validators. Not sure if description would be of much use in those scenarios.

          Show
          rahul k singh added a comment - The current motivation is to allow administrator to have look at the configuration , as errors in configuration have evaded detection for long. The description field would add more verbosity to the information. The idea here is that adminitrators would use this json dump format and would write there own validators. Not sure if description would be of much use in those scenarios.
          Hide
          Sreekanth Ramakrishnan added a comment -

          Resource field of the configuration property is the last resource from which the properties value has been loaded. The motivation of this field would be for administrators to know, for if they have accidentally overridden any property they didn't mean to.

          Show
          Sreekanth Ramakrishnan added a comment - Resource field of the configuration property is the last resource from which the properties value has been loaded. The motivation of this field would be for administrators to know, for if they have accidentally overridden any property they didn't mean to.
          Hide
          Doug Cutting added a comment -

          What values can the "Resource" field take? What fields of a property are required? Should we permit a "description" field?

          Show
          Doug Cutting added a comment - What values can the "Resource" field take? What fields of a property are required? Should we permit a "description" field?
          Hide
          V.V.Chaitanya Krishna added a comment -

          The following is the proposal for the structure of Json Object:

          { "Properties" :
          [

          {"Key" : <key>, "Value":<value>, "isFinal":<true/false>, "Resource" : <resource>}

          ]
          }

          JsonObject
          Property[] properties
          Property
          Key
          Value
          isFinal
          Resource

          The Object contains a list of properties with each element of the array being a property with attributes as Key,Value,isFinal and Resource.

          Show
          V.V.Chaitanya Krishna added a comment - The following is the proposal for the structure of Json Object: { "Properties" : [ {"Key" : <key>, "Value":<value>, "isFinal":<true/false>, "Resource" : <resource>} ] } JsonObject Property[] properties Property Key Value isFinal Resource The Object contains a list of properties with each element of the array being a property with attributes as Key , Value , isFinal and Resource .
          Hide
          V.V.Chaitanya Krishna added a comment -

          Proposal:

          • Create a class which can dump the configuration parameters in desired format (xml/JSON).
          • Provide storage for the information regarding the resource, that sets the key most recently, in Configuration object as key to resource mapping. Storage of this information is done only when user wants to get the configuration dump.
          • The pattern of dump would be key, value, final flag and resource that sets the key most recently.
          Show
          V.V.Chaitanya Krishna added a comment - Proposal: Create a class which can dump the configuration parameters in desired format (xml/JSON). Provide storage for the information regarding the resource, that sets the key most recently, in Configuration object as key to resource mapping. Storage of this information is done only when user wants to get the configuration dump. The pattern of dump would be key, value, final flag and resource that sets the key most recently.
          Hide
          rahul k singh added a comment -

          Proposal:

          • Generate a configuration dump in JSON format.
          • Dump consists of key,value and final flag.

          Good to have :

          • Information regarding the resource or a filename that a given value came from
            or mark it unknown.
          Show
          rahul k singh added a comment - Proposal: Generate a configuration dump in JSON format. Dump consists of key,value and final flag. Good to have : Information regarding the resource or a filename that a given value came from or mark it unknown.
          Hide
          rahul k singh added a comment -

          In addition to the comment above:

          The new configuration dump would help user know what values they have set wrong or what values are being used .

          Show
          rahul k singh added a comment - In addition to the comment above: The new configuration dump would help user know what values they have set wrong or what values are being used .
          Hide
          rahul k singh added a comment -

          If users give improper key they ran into issues and it takes long time to understand the behaviour
          for example: instead of using "a.b" as key if "a.p" is used , this lead to using "a.b" 's default value , and it takes significant time to find out whats happening.

          If there is a need for automation , there would be requirement that the dump should be in standard format.

          Show
          rahul k singh added a comment - If users give improper key they ran into issues and it takes long time to understand the behaviour for example: instead of using "a.b" as key if "a.p" is used , this lead to using "a.b" 's default value , and it takes significant time to find out whats happening. If there is a need for automation , there would be requirement that the dump should be in standard format.
          Hide
          Arun C Murthy added a comment -

          Use case?

          Show
          Arun C Murthy added a comment - Use case?
          Hide
          rahul k singh added a comment -

          Currently configuration keys can be in following category based on what state of there values are being used

          1.final (list of configuration keys which cant be changed)
          2.overridden(list of configuration keys which are overridden at runtime)
          3.default (keys whose default values are being used)
          4.All

          So there is a requirement to generate configuration dump based on above categories , and in a standard format.

          Show
          rahul k singh added a comment - Currently configuration keys can be in following category based on what state of there values are being used 1.final (list of configuration keys which cant be changed) 2.overridden(list of configuration keys which are overridden at runtime) 3.default (keys whose default values are being used) 4.All So there is a requirement to generate configuration dump based on above categories , and in a standard format.

            People

            • Assignee:
              V.V.Chaitanya Krishna
              Reporter:
              rahul k singh
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development