[HIVE-16758] Better Select Number of Replications - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0.0
Component/s: Spark
Labels:
None

Description

org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java

We should be smarter about how we pick a replication number. We should add a new configuration equivalent to mapreduce.client.submit.file.replication. This value should be around the square root of the number of nodes and not hard-coded in the code.

public static final String DFS_REPLICATION_MAX = "dfs.replication.max";
private int minReplication = 10;

  @Override
  protected void initializeOp(Configuration hconf) throws HiveException {
...
    int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication);
    // minReplication value should not cross the value of dfs.replication.max
    minReplication = Math.min(minReplication, dfsMaxReplication);
  }

https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-16758.3.patch
07/Aug/17 01:21
3 kB
David Mollitor
HIVE-16758.2.patch
07/Aug/17 01:21
3 kB
David Mollitor
HIVE-16758.1.patch
05/Jun/17 16:25
3 kB
David Mollitor

Activity

People

Assignee:: David Mollitor

Reporter:: David Mollitor

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 25/May/17 15:04

Updated:: 23/May/18 00:00

Resolved:: 08/Aug/17 17:29