[HADOOP-2185] Server ports: to roll or not to roll. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.15.0
Fix Version/s: 0.16.0
Component/s: conf
Labels:
None

Description

Looked at the issues related to port rolling. My impression is that port rolling is required only for the unit tests to run.
Even the name-node port should roll there, which we don't have now, in order to be able to start 2 cluster for testing say dist cp.

For real clusters on the contrary port rolling is not desired and some times even prohibited.
So we should have a way of to ban port rolling. My proposition is to

use ephemeral port 0 if port rolling is desired
if a specific port is specified then port rolling should not happen at all, meaning that a
server is either able or not able to start on that particular port.

The desired port is specified via configuration parameters.

Name-node: fs.default.name = host:port
Data-node: dfs.datanode.port
Job-tracker: mapred.job.tracker = host:port
Task-tracker: mapred.task.tracker.report.bindAddress = host
Task-tracker currently does not have an option to specify port, it always uses the ephemeral port 0,
and therefore I propose to add one.
Secondary node does not need a port to listen on.

For info servers we have two sets of config variables *.info.bindAddress and *.info.port
except for the task tracker, which calls them *.http.bindAddress and *.http.port instead of "info".
With respect to the info servers I propose to completely eliminate the port parameters, and form
*.info.bindAddress = host:port
Info servers should do the same thing, namely start or fail on the specified port if it is not 0,
and start on any free port if it is ephemeral.

For the task-tracker I would rename tasktracker.http.bindAddress to mapred.task.tracker.info.bindAddress
For the data-node the info dfs.datanode.info.bindAddress should be included into the default config.
Is there a reason why it is not there?

This is the summary of proposed changes:

Server	current name = value	proposed name = value
NameNode	fs.default.name = host:port	same
	dfs.info.bindAddress = host	dfs.http.bindAddress = host:port
DataNode	dfs.datanode.bindAddress = host	dfs.datanode.bindAddress = host:port
	dfs.datanode.port = port	eliminate
	dfs.datanode.info.bindAddress = host	dfs.datanode.http.bindAddress = host:port
	dfs.datanode.info.port = port	eliminate
JobTracker	mapred.job.tracker = host:port	same
	mapred.job.tracker.info.bindAddress = host	mapred.job.tracker.http.bindAddress = host:port
	mapred.job.tracker.info.port = port	eliminate
TaskTracker	mapred.task.tracker.report.bindAddress = host	mapred.task.tracker.report.bindAddress = host:port
	tasktracker.http.bindAddress = host	mapred.task.tracker.http.bindAddress = host:port
	tasktracker.http.port = port	eliminate
SecondaryNameNode	dfs.secondary.info.bindAddress = host	dfs.secondary.http.bindAddress = host:port
	dfs.secondary.info.port = port	eliminate

Do we also want to set some uniform naming convention for the configuration variables?
Like having hdfs instead of dfs, or info instead of http, or systematically using either datanode
or data.node would make that look better in my opinion.

So these are all api changes. I would really like some feedback on this, especially from
people who deal with configuration issues on practice.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

FixedPorts3.patch
04/Dec/07 02:53
63 kB
Konstantin Shvachko
FixedPorts4.patch
04/Dec/07 22:58
63 kB
Konstantin Shvachko
port.stack
02/Dec/07 06:00
14 kB
Dhruba Borthakur

Issue Links

relates to

HADOOP-2404 HADOOP-2185 breaks compatibility with hadoop-0.15.0

Closed

Activity

People

Assignee:: Konstantin Shvachko

Reporter:: Konstantin Shvachko

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 10/Nov/07 03:14

Updated:: 08/Jul/09 16:52

Resolved:: 05/Dec/07 19:39