Description
org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java
We should be smarter about how we pick a replication number. We should add a new configuration equivalent to mapreduce.client.submit.file.replication. This value should be around the square root of the number of nodes and not hard-coded in the code.
public static final String DFS_REPLICATION_MAX = "dfs.replication.max"; private int minReplication = 10; @Override protected void initializeOp(Configuration hconf) throws HiveException { ... int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication); // minReplication value should not cross the value of dfs.replication.max minReplication = Math.min(minReplication, dfsMaxReplication); }