[HADOOP-171] need standard API to set dfs replication = high - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Duplicate
Affects Version/s: 0.2.0
Fix Version/s: None
Component/s: None
Labels:
None

Description

There should be a standard way to indicate that files should be highly replicated, appropriate for files that all nodes will read. This should be settable both on file creation and for already-existing files. Perhaps specifying a particular replication value, like Short.MAX_VALUE, or zero, can be used to signal this. The level should not be constant, but should be relative to the cluster size and network topography. If more nodes are added or if nodes are deleted, the actual replication count should increase or decrease.

Initially, all that is needed is an API to specify this. It could initially be implemented with a constant (e.g., 10) or with something related to the number of datanodes (sqrt?), and needn't auto-adjust as the cluster size changes That is only the long-term goal.

When JobClient copies job files (job.xml & job.jar) into the job's filesystem, it should specify this replication level.

Attachments

Issue Links

duplicates

HADOOP-130 Should be able to specify "wide" or "full" replication

Closed

is related to

HADOOP-170 setReplication and related bug fixes

Closed

Activity

People

Assignee:: Konstantin Shvachko

Reporter:: Doug Cutting

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 27/Apr/06 06:24

Updated:: 08/Jul/09 16:41

Resolved:: 27/Apr/06 07:27