Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16758

Better Select Number of Replications

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 3.0.0
    • Spark
    • None

    Description

      org.apache.hadoop.hive.ql.exec.SparkHashTableSinkOperator.java

      We should be smarter about how we pick a replication number. We should add a new configuration equivalent to mapreduce.client.submit.file.replication. This value should be around the square root of the number of nodes and not hard-coded in the code.

      public static final String DFS_REPLICATION_MAX = "dfs.replication.max";
      private int minReplication = 10;
      
        @Override
        protected void initializeOp(Configuration hconf) throws HiveException {
      ...
          int dfsMaxReplication = hconf.getInt(DFS_REPLICATION_MAX, minReplication);
          // minReplication value should not cross the value of dfs.replication.max
          minReplication = Math.min(minReplication, dfsMaxReplication);
        }
      

      https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

      Attachments

        1. HIVE-16758.3.patch
          3 kB
          David Mollitor
        2. HIVE-16758.2.patch
          3 kB
          David Mollitor
        3. HIVE-16758.1.patch
          3 kB
          David Mollitor

        Activity

          People

            belugabehr David Mollitor
            belugabehr David Mollitor
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: