diff --git conf/hive-default.xml.template conf/hive-default.xml.template
index 66d22f9..124f048 100644
--- conf/hive-default.xml.template
+++ conf/hive-default.xml.template
@@ -1322,6 +1322,92 @@
+ hive.stats.max.variable.length
+ 100
+
+ To estimate the size of data flowing through operators in hive/tez (for reducer estimation etc.), average row size is multiplied with the total number of rows coming out of each operator.
+ Average row size is computed from average column size of all columns in the row. In the absence of column statistics, for variable length columns (like string, bytes etc.), this value will be used.
+ For fixed length columns their corresponding java equivalent sizes are used (float - 4 bytes, double - 8 bytes etc.).
+
+
+
+
+ hive.stats.list.num.entries
+ 10
+
+ To estimate the size of data flowing through operators in hive/tez (for reducer estimation etc.), average row size is multiplied with the total number of rows coming out of each operator.
+ Average row size is computed from average column size of all columns in the row. In the absence of column statistics and for variable length complex columns like list, the average number of
+ entries/values can be specified using this config.
+
+
+
+
+ hive.stats.map.num.entries
+ 10
+
+ To estimate the size of data flowing through operators in hive/tez (for reducer estimation etc.), average row size is multiplied with the total number of rows coming out of each operator.
+ Average row size is computed from average column size of all columns in the row. In the absence of column statistics and for variable length complex columns like map, the average number of
+ entries/values can be specified using this config.
+
+
+
+
+ hive.stats.map.parallelism
+ 1
+
+ Hive/Tez optimizer estimates the data size flowing through each of the operator. Some operators like GROUPBY, generates more number of rows that corresponds to the number of mappers.
+ By default, this value is set to 1 since optimizer is not aware of the number of mappers. This hive config can be used to specify the number of mappers which will be accounted in data size computation.
+
+
+
+
+ hive.stats.fetch.column.stats
+ false
+
+ Annotation of operator tree with statistics information requires column statisitcs. Column statistics are fetched from metastore. Fetching column statistics for each needed columns
+ can be expensive when the number of columns are high. This flag can be used to disable fetching of column statistics from metastore.
+
+
+
+
+ hive.stats.fetch.partition.stats
+ true
+
+ Annotation of operator tree with statistics information requires paritition level basic statisitcs like number of rows, data size and file size. Partition statistics are fetched from metastore.
+ Fetching partition statistics for each needed partition can be expensive when the number of partitions are high. This flag can be used to disable fetching of partition statistics from metastore.
+ When this flag is disabled, Hive will make calls to filesystem to get file sizes and will estimate the number of rows from row schema.
+
+
+
+
+ hive.stats.avg.row.size
+ 10000
+
+ Hive/Tez optimizer estimates the data size flowing through each of the operator. In the absence of any basic statistics, LIMIT operator (which knows the number of rows) will use this value to
+ estimate the size of data flowing through LIMIT operator.
+
+
+
+
+ hive.stats.join.factor
+ 1.1
+
+ Hive/Tez optimizer estimates the data size flowing through each of the operator. JOIN operator uses column statistics to estimate the number of rows flowing out of it and hence the data size.
+ In the absence of column statistics, this factor determines the amount of rows that flows out of JOIN operator.
+
+
+
+
+ hive.stats.deserialization.factor
+ 1.0
+
+ Hive/Tez optimizer estimates the data size flowing through each of the operator. In the absence of basic statistics like number of rows and data size, file size is used to estimate the number of
+ rows and data size. Since files in table/partitions are serialized (and optionally compressed) the estimates of number of rows and data size cannot be reliably determined. This factor is multiplied
+ with the file size to account for serialization and compression.
+
+
+
+
hive.support.concurrency
false
Whether Hive supports concurrency or not. A ZooKeeper instance must be up and running for the default Hive lock manager to support read-write locks.