Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.17.0
    • Component/s: scripts
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      New environment variables were introduced to allow finer grained control of Java options passed to server and client JVMs. See the new *_OPTS variables in conf/hadoop-env.sh.

      Description

      We often configure our HADOOP_OPTS on the name node to have JMX running so that we can do JVM monitoring. But doing so means that we need to edit this file if we want to run other hadoop commands, such as fsck. It would be useful if hadoop-env.sh was refactored a bit so that there were different and/or cascading HADOOP_OPTS dependent upon which process/task was being performed.

        Activity

        Hide
        Allen Wittenauer added a comment -

        In particular, I was thinking that it might be useful to have:

        HADOOP_GLOBAL_OPTS = applies to all processes

        HADOOP_NAMENODE_OPTS = applies to just the namenode

        HADOOP_TASK_OPTS = applies to just tasks

        HADOOP_JT_OPTS = applies to the job tracker

        HADOOP_TT_OPTS = applies to task trackers

        HADOOP_CLIENT_OPTS = applies to clients, such as hadoop fsck, hadoop dfs, etc.

        Additionally, it might be useful to split out the HADOOP_HEAPSIZE setting as well.

        Show
        Allen Wittenauer added a comment - In particular, I was thinking that it might be useful to have: HADOOP_GLOBAL_OPTS = applies to all processes HADOOP_NAMENODE_OPTS = applies to just the namenode HADOOP_TASK_OPTS = applies to just tasks HADOOP_JT_OPTS = applies to the job tracker HADOOP_TT_OPTS = applies to task trackers HADOOP_CLIENT_OPTS = applies to clients, such as hadoop fsck, hadoop dfs, etc. Additionally, it might be useful to split out the HADOOP_HEAPSIZE setting as well.
        Hide
        Doug Cutting added a comment -

        I don't think we need HADOOP_GLOBAL_OPTS, we can just use HADOOP_OPTS for that, but we could add a HADOOP_NAMENODE_OPTS that, when starting the namenode, is appended to HADOOP_OPTS, etc. In general, we could modify bin/hadoop to add the value of HADOOP_{$COMMAND}_OPTS to HADOOP_OPTS. Would that suffice?

        Show
        Doug Cutting added a comment - I don't think we need HADOOP_GLOBAL_OPTS, we can just use HADOOP_OPTS for that, but we could add a HADOOP_NAMENODE_OPTS that, when starting the namenode, is appended to HADOOP_OPTS, etc. In general, we could modify bin/hadoop to add the value of HADOOP_{$COMMAND}_OPTS to HADOOP_OPTS. Would that suffice?
        Hide
        Allen Wittenauer added a comment -

        On a first pass, that sounds like a very reasonable fix.

        Show
        Allen Wittenauer added a comment - On a first pass, that sounds like a very reasonable fix.
        Hide
        Nigel Daley added a comment -

        Additionally, it might be useful to split out the HADOOP_HEAPSIZE setting as well.

        Can we just get rid of HADOOP_HEAPSIZE? If people want to set it, use the HADOOP_*_OPTS variables.

        I'm +1 for fixing this issue.

        Show
        Nigel Daley added a comment - Additionally, it might be useful to split out the HADOOP_HEAPSIZE setting as well. Can we just get rid of HADOOP_HEAPSIZE? If people want to set it, use the HADOOP_*_OPTS variables. I'm +1 for fixing this issue.
        Hide
        Joydeep Sen Sarma added a comment -

        +1 for separate heap size setting

        Show
        Joydeep Sen Sarma added a comment - +1 for separate heap size setting
        Hide
        Doug Cutting added a comment -

        Another approach, that also addresses HADOOP-2764, would be to include a hadoop-$

        {COMMAND}

        -env.sh if it exists. So you could add a hadoop-namenode-env.sh that updates various environment variables to values different from those in the hadoop-env.sh, and, separately, a hadoop-tasktracker-env.sh, etc. Could that work?

        Show
        Doug Cutting added a comment - Another approach, that also addresses HADOOP-2764 , would be to include a hadoop-$ {COMMAND} -env.sh if it exists. So you could add a hadoop-namenode-env.sh that updates various environment variables to values different from those in the hadoop-env.sh, and, separately, a hadoop-tasktracker-env.sh, etc. Could that work?
        Hide
        Marco Nicosia added a comment -

        Setting Fix Version to Hadoop 0.17. It's important to remember that the hadoop.sh control files are dead stupid, and I don't think we should try to over-engineer them.

        Show
        Marco Nicosia added a comment - Setting Fix Version to Hadoop 0.17. It's important to remember that the hadoop.sh control files are dead stupid, and I don't think we should try to over-engineer them.
        Hide
        Michael Bieniosek added a comment -

        If you're willing to accept an unsupported solution, the bin/hadoop script happens to set the environment variable COMMAND before it sources hadoop-env.sh.

        Show
        Michael Bieniosek added a comment - If you're willing to accept an unsupported solution, the bin/hadoop script happens to set the environment variable COMMAND before it sources hadoop-env.sh.
        Hide
        Hemanth Yamijala added a comment -

        I prefer the first approach of using different variables. This would be easier to provision through HOD as well. Are there any specific advantages of using the second approach ? (I can see some, but still... smile)

        Show
        Hemanth Yamijala added a comment - I prefer the first approach of using different variables. This would be easier to provision through HOD as well. Are there any specific advantages of using the second approach ? (I can see some, but still... smile )
        Hide
        Raghu Angadi added a comment -

        What is the consensus? If there are no responses by tomorrow (Tuesday), will assume it the first approach (HADOOP-$

        {COMMAND}

        -OPTS).

        Show
        Raghu Angadi added a comment - What is the consensus? If there are no responses by tomorrow (Tuesday), will assume it the first approach (HADOOP-$ {COMMAND} -OPTS).
        Hide
        Raghu Angadi added a comment -

        Attached patch handles the following env variables :

        HADOOP_NAMENODE_OPTS
        HADOOP_SECONDARYNAMENODE_OPTS
        HADOOP_DATANODE_OPTS
        HADOOP_BALANCER_OPTS
        HADOOP_JOBTRACKER_OPTS
        HADOOP_TASKTRACKER_OPTS
        HADOOP_CLIENT_OPTS

        Notes:

        1. There is no HADOOP_TASK_OPTS. The tasks are not started by the scripts. If we need it, it needs to be handled inside mapreduce. A different jira might be better.
        2. As Arun suggested, JobClient and JobShell don't use HADOOP_CLIENT_OPTS
        3. HADOOP_CLIENT_OPTS applies to any other command that does not have its own variable.

        The default options are exactly same as before this patch.

        Show
        Raghu Angadi added a comment - Attached patch handles the following env variables : HADOOP_NAMENODE_OPTS HADOOP_SECONDARYNAMENODE_OPTS HADOOP_DATANODE_OPTS HADOOP_BALANCER_OPTS HADOOP_JOBTRACKER_OPTS HADOOP_TASKTRACKER_OPTS HADOOP_CLIENT_OPTS Notes: There is no HADOOP_TASK_OPTS. The tasks are not started by the scripts. If we need it, it needs to be handled inside mapreduce. A different jira might be better. As Arun suggested, JobClient and JobShell don't use HADOOP_CLIENT_OPTS HADOOP_CLIENT_OPTS applies to any other command that does not have its own variable. The default options are exactly same as before this patch.
        Hide
        Chris Douglas added a comment -

        +1 looks good

        Show
        Chris Douglas added a comment - +1 looks good
        Hide
        Raghu Angadi added a comment -

        Thanks Chris.

        Show
        Raghu Angadi added a comment - Thanks Chris.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12379176/HADOOP-2551.patch
        against trunk revision 643282.

        @author +1. The patch does not contain any @author tags.

        tests included -1. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2128/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2128/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2128/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2128/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12379176/HADOOP-2551.patch against trunk revision 643282. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2128/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2128/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2128/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2128/console This message is automatically generated.
        Hide
        Raghu Angadi added a comment -

        Unit Tests : This only has simple changes to hadoop scripts.

        Show
        Raghu Angadi added a comment - Unit Tests : This only has simple changes to hadoop scripts.
        Hide
        Raghu Angadi added a comment -

        I just committed this.

        Show
        Raghu Angadi added a comment - I just committed this.
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #450 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/450/ )
        Hide
        Raghu Angadi added a comment -

        Release note added. IMHO I don't think this needs to be in the top level release notes.

        Show
        Raghu Angadi added a comment - Release note added. IMHO I don't think this needs to be in the top level release notes.
        Hide
        Raghu Angadi added a comment -

        Nigel's release note is better.

        Show
        Raghu Angadi added a comment - Nigel's release note is better.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        What happened with the idea of doing away with HADOOP_HEAPSIZE completely? The patch doesn't have any fix for this. Track this on another JIRA?

        Currently, if I specify both HADOOP_HEAPSIZE=500 and HADOOP_JOBTRACKER_OPTS=-Xmx1024m, both get passed to jobtracker (JT command line: "java -Xmx500m -Xmx1024m .......") and the runtime picks up the last value. So, it works for now, but it would have been cleaner had HADOOP_HEAPSIZE been kicked of in entirety.

        Show
        Vinod Kumar Vavilapalli added a comment - What happened with the idea of doing away with HADOOP_HEAPSIZE completely? The patch doesn't have any fix for this. Track this on another JIRA? Currently, if I specify both HADOOP_HEAPSIZE=500 and HADOOP_JOBTRACKER_OPTS=-Xmx1024m, both get passed to jobtracker (JT command line: "java -Xmx500m -Xmx1024m .......") and the runtime picks up the last value. So, it works for now, but it would have been cleaner had HADOOP_HEAPSIZE been kicked of in entirety.
        Hide
        Raghu Angadi added a comment -

        Yes, HAEAPSIZE is not part of this jira. There was no mention of removing any existing variable.

        Show
        Raghu Angadi added a comment - Yes, HAEAPSIZE is not part of this jira. There was no mention of removing any existing variable.

          People

          • Assignee:
            Raghu Angadi
            Reporter:
            Allen Wittenauer
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development