Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16758

Better Select Number of Replications

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.0
    • Component/s: Spark
    • Labels:
      None

      Description

      org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java

      We should be smarter about how we pick a replication number. We should add a new configuration equivalent to mapreduce.client.submit.file.replication. This value should be around the square root of the number of nodes and not hard-coded in the code.

      public static final String DFS_REPLICATION_MAX = "dfs.replication.max";
      private int minReplication = 10;
      
        @Override
        protected void initializeOp(Configuration hconf) throws HiveException {
      ...
          int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication);
          // minReplication value should not cross the value of dfs.replication.max
          minReplication = Math.min(minReplication, dfsMaxReplication);
        }
      

      https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

        Attachments

        1. HIVE-16758.1.patch
          3 kB
          David Mollitor
        2. HIVE-16758.2.patch
          3 kB
          David Mollitor
        3. HIVE-16758.3.patch
          3 kB
          David Mollitor

          Activity

            People

            • Assignee:
              belugabehr David Mollitor
              Reporter:
              belugabehr David Mollitor
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: