Issue Details (XML | Word | Printable)

Key: HADOOP-2551
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Blocker Blocker
Assignee: Raghu Angadi
Reporter: Allen Wittenauer
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

hadoop-env.sh needs finer granularity

Created: 08/Jan/08 06:58 PM   Updated: 21/May/08 08:05 PM
Return to search
Component/s: scripts
Affects Version/s: None
Fix Version/s: 0.17.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works HADOOP-2551.patch 2008-04-02 07:27 PM Raghu Angadi 4 kB

Hadoop Flags: Reviewed
Release Note: New environment variables were introduced to allow finer grained control of Java options passed to server and client JVMs. See the new *_OPTS variables in conf/hadoop-env.sh.
Resolution Date: 03/Apr/08 03:08 AM


 Description  « Hide
We often configure our HADOOP_OPTS on the name node to have JMX running so that we can do JVM monitoring. But doing so means that we need to edit this file if we want to run other hadoop commands, such as fsck. It would be useful if hadoop-env.sh was refactored a bit so that there were different and/or cascading HADOOP_OPTS dependent upon which process/task was being performed.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Allen Wittenauer added a comment - 08/Jan/08 07:00 PM
In particular, I was thinking that it might be useful to have:

HADOOP_GLOBAL_OPTS = applies to all processes

HADOOP_NAMENODE_OPTS = applies to just the namenode

HADOOP_TASK_OPTS = applies to just tasks

HADOOP_JT_OPTS = applies to the job tracker

HADOOP_TT_OPTS = applies to task trackers

HADOOP_CLIENT_OPTS = applies to clients, such as hadoop fsck, hadoop dfs, etc.

Additionally, it might be useful to split out the HADOOP_HEAPSIZE setting as well.


Doug Cutting added a comment - 08/Jan/08 07:27 PM
I don't think we need HADOOP_GLOBAL_OPTS, we can just use HADOOP_OPTS for that, but we could add a HADOOP_NAMENODE_OPTS that, when starting the namenode, is appended to HADOOP_OPTS, etc. In general, we could modify bin/hadoop to add the value of HADOOP_{$COMMAND}_OPTS to HADOOP_OPTS. Would that suffice?

Allen Wittenauer added a comment - 08/Jan/08 08:43 PM
On a first pass, that sounds like a very reasonable fix.

Nigel Daley added a comment - 09/Jan/08 12:53 AM

Additionally, it might be useful to split out the HADOOP_HEAPSIZE setting as well.

Can we just get rid of HADOOP_HEAPSIZE? If people want to set it, use the HADOOP_*_OPTS variables.

I'm +1 for fixing this issue.


Joydeep Sen Sarma added a comment - 01/Feb/08 07:07 PM
+1 for separate heap size setting

Doug Cutting added a comment - 01/Feb/08 09:22 PM
Another approach, that also addresses HADOOP-2764, would be to include a hadoop-${COMMAND}-env.sh if it exists. So you could add a hadoop-namenode-env.sh that updates various environment variables to values different from those in the hadoop-env.sh, and, separately, a hadoop-tasktracker-env.sh, etc. Could that work?

Marco Nicosia added a comment - 01/Feb/08 11:52 PM
Setting Fix Version to Hadoop 0.17. It's important to remember that the hadoop.sh control files are dead stupid, and I don't think we should try to over-engineer them.

Michael Bieniosek added a comment - 02/Feb/08 12:35 AM
If you're willing to accept an unsupported solution, the bin/hadoop script happens to set the environment variable COMMAND before it sources hadoop-env.sh.

Hemanth Yamijala added a comment - 04/Feb/08 02:51 PM
I prefer the first approach of using different variables. This would be easier to provision through HOD as well. Are there any specific advantages of using the second approach ? (I can see some, but still... smile)

Raghu Angadi added a comment - 31/Mar/08 11:52 PM
What is the consensus? If there are no responses by tomorrow (Tuesday), will assume it the first approach (HADOOP-${COMMAND}-OPTS).

Raghu Angadi added a comment - 02/Apr/08 07:27 PM
Attached patch handles the following env variables :

HADOOP_NAMENODE_OPTS
HADOOP_SECONDARYNAMENODE_OPTS
HADOOP_DATANODE_OPTS
HADOOP_BALANCER_OPTS
HADOOP_JOBTRACKER_OPTS
HADOOP_TASKTRACKER_OPTS
HADOOP_CLIENT_OPTS

Notes:

  1. There is no HADOOP_TASK_OPTS. The tasks are not started by the scripts. If we need it, it needs to be handled inside mapreduce. A different jira might be better.
  2. As Arun suggested, JobClient and JobShell don't use HADOOP_CLIENT_OPTS
  3. HADOOP_CLIENT_OPTS applies to any other command that does not have its own variable.

The default options are exactly same as before this patch.


Chris Douglas added a comment - 02/Apr/08 08:51 PM
+1 looks good

Raghu Angadi added a comment - 02/Apr/08 09:02 PM
Thanks Chris.

Hadoop QA added a comment - 02/Apr/08 11:38 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12379176/HADOOP-2551.patch
against trunk revision 643282.

@author +1. The patch does not contain any @author tags.

tests included -1. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2128/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2128/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2128/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2128/console

This message is automatically generated.


Raghu Angadi added a comment - 02/Apr/08 11:52 PM
Unit Tests : This only has simple changes to hadoop scripts.

Raghu Angadi added a comment - 03/Apr/08 03:08 AM
I just committed this.

Hudson added a comment - 03/Apr/08 12:54 PM

Raghu Angadi added a comment - 30/Apr/08 12:42 AM
Release note added. IMHO I don't think this needs to be in the top level release notes.

Raghu Angadi added a comment - 30/Apr/08 05:01 PM
Nigel's release note is better.

Vinod K V added a comment - 12/May/08 06:53 AM
What happened with the idea of doing away with HADOOP_HEAPSIZE completely? The patch doesn't have any fix for this. Track this on another JIRA?

Currently, if I specify both HADOOP_HEAPSIZE=500 and HADOOP_JOBTRACKER_OPTS=-Xmx1024m, both get passed to jobtracker (JT command line: "java -Xmx500m -Xmx1024m .......") and the runtime picks up the last value. So, it works for now, but it would have been cleaner had HADOOP_HEAPSIZE been kicked of in entirety.


Raghu Angadi added a comment - 12/May/08 04:53 PM
Yes, HAEAPSIZE is not part of this jira. There was no mention of removing any existing variable.