Index: conf/hive-default.xml.template
===================================================================
--- conf/hive-default.xml.template (revision 1548369)
+++ conf/hive-default.xml.template (working copy)
@@ -102,7 +102,7 @@
if hive is running in test mode, prefixes the output table by this string
-
+
@@ -118,7 +118,7 @@
hive.test.mode.nosamplelist
- if hive is running in test mode, dont sample the above comma seperated list of tables
+ if hive is running in test mode, don't sample the above comma separated list of tables
@@ -262,13 +262,13 @@
hive.metastore.event.listeners
- list of comma seperated listeners for metastore events.
+ list of comma separated listeners for metastore events.
hive.metastore.partition.inherit.table.properties
- list of comma seperated keys occurring in table properties which will get inherited to newly created partitions. * implies all the keys will get inherited.
+ list of comma separated keys occurring in table properties which will get inherited to newly created partitions. * implies all the keys will get inherited.
@@ -411,7 +411,7 @@
perform the 2 groups bys. This makes sense if map-side aggregation is turned off. However,
with maps-side aggregation, it might be useful in some cases to treat the 2 inserts independently,
thereby performing the query above in 2MR jobs instead of 3 (due to spraying by distinct key first).
- If this parameter is turned off, we dont consider the fact that the distinct key is the same across
+ If this parameter is turned off, we don't consider the fact that the distinct key is the same across
different MR jobs.
@@ -505,7 +505,7 @@
a union is performed for the 2 joins generated above. So unless the same skewed key is present
in both the joined tables, the join for the skewed key will be performed as a map-side join.
- The main difference between this paramater and hive.optimize.skewjoin is that this parameter
+ The main difference between this parameter and hive.optimize.skewjoin is that this parameter
uses the skew information stored in the metastore to optimize the plan at compile time itself.
If there is no skew information in the metadata, this parameter will not have any affect.
Both hive.optimize.skewjoin.compiletime and hive.optimize.skewjoin should be set to true.
@@ -529,7 +529,7 @@
The merge is triggered if either of hive.merge.mapfiles or hive.merge.mapredfiles is set to true.
If the user has set hive.merge.mapfiles to true and hive.merge.mapredfiles to false, the idea was the
number of reducers are few, so the number of files anyway are small. However, with this optimization,
- we are increasing the number of files possibly by a big margin. So, we merge aggresively.
+ we are increasing the number of files possibly by a big margin. So, we merge aggressively.
@@ -578,7 +578,7 @@
This parameter decides if hive should add an additional map-reduce job. If the grouping set
cardinality (4 in the example above), is more than this value, a new MR job is added under the
- assumption that the orginal group by will reduce the data size.
+ assumption that the original group by will reduce the data size.
@@ -605,7 +605,7 @@
false
Whether to enable skew join optimization.
The algorithm is as follows: At runtime, detect the keys with a large skew. Instead of
- processing those keys, store them temporarily in a hdfs directory. In a follow-up map-reduce
+ processing those keys, store them temporarily in an HDFS directory. In a follow-up map-reduce
job, process those skewed keys. The same key need not be skewed for all the tables, and so,
the follow-up map-reduce job (for the skewed keys) would be much faster, since it would be a
map-join.
@@ -653,7 +653,7 @@
hive.enforce.bucketmapjoin
false
If the user asked for bucketed map-side join, and it cannot be performed,
- should the query fail or not ? For eg, if the buckets in the tables being joined are
+ should the query fail or not ? For example, if the buckets in the tables being joined are
not a multiple of each other, bucketed map-side join cannot be performed, and the
query will fail if hive.enforce.bucketmapjoin is set to true.
@@ -760,7 +760,7 @@
hive.metastore.init.hooks
- A comma separated list of hooks to be invoked at the beginning of HMSHandler initialization. Aninit hook is specified as the name of Java class which extends org.apache.hadoop.hive.metastore.MetaStoreInitListener.
+ A comma separated list of hooks to be invoked at the beginning of HMSHandler initialization. An init hook is specified as the name of Java class which extends org.apache.hadoop.hive.metastore.MetaStoreInitListener.
@@ -820,13 +820,13 @@
hive.mapjoin.localtask.max.memory.usage
0.90
- This number means how much memory the local task can take to hold the key/value into in-memory hash table; If the local task's memory usage is more than this number, the local task will be abort by themself. It means the data of small table is too large to be hold in the memory.
+ This number means how much memory the local task can take to hold the key/value into an in-memory hash table. If the local task's memory usage is more than this number, the local task will abort by itself. It means the data of the small table is too large to be held in memory.
hive.mapjoin.followby.gby.localtask.max.memory.usage
0.55
- This number means how much memory the local task can take to hold the key/value into in-memory hash table when this map join followed by a group by; If the local task's memory usage is more than this number, the local task will be abort by themself. It means the data of small table is too large to be hold in the memory.
+ This number means how much memory the local task can take to hold the key/value into an in-memory hash table when this map join is followed by a group by. If the local task's memory usage is more than this number, the local task will abort by itself. It means the data of the small table is too large to be held in memory.
@@ -845,7 +845,7 @@
hive.auto.convert.join.noconditionaltask
true
Whether Hive enable the optimization about converting common join into mapjoin based on the input file
- size. If this paramater is on, and the sum of size for n-1 of the tables/partitions for a n-way join is smaller than the
+ size. If this parameter is on, and the sum of size for n-1 of the tables/partitions for a n-way join is smaller than the
specified size, the join is directly converted to a mapjoin (there is no conditional task).
@@ -862,13 +862,13 @@
hive.script.auto.progress
false
- Whether Hive Tranform/Map/Reduce Clause should automatically send progress information to TaskTracker to avoid the task getting killed because of inactivity. Hive sends progress information when the script is outputting to stderr. This option removes the need of periodically producing stderr messages, but users should be cautious because this may prevent infinite loops in the scripts to be killed by TaskTracker.
+ Whether Hive Transform/Map/Reduce Clause should automatically send progress information to TaskTracker to avoid the task getting killed because of inactivity. Hive sends progress information when the script is outputting to stderr. This option removes the need of periodically producing stderr messages, but users should be cautious because this may prevent infinite loops in the scripts to be killed by TaskTracker.
hive.script.serde
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
- The default serde for trasmitting input data to and reading output data from the user scripts.
+ The default serde for transmitting input data to and reading output data from the user scripts.
@@ -917,7 +917,7 @@
stream.stderr.reporter.prefix
reporter:
- Streaming jobs that log to stardard error with this prefix can log counter or status information.
+ Streaming jobs that log to standard error with this prefix can log counter or status information.
@@ -941,7 +941,7 @@
hive.udtf.auto.progress
false
- Whether Hive should automatically send progress information to TaskTracker when using UDTF's to prevent the task getting killed because of inactivity. Users should be cautious because this may prevent TaskTracker from killing tasks with infinte loops.
+ Whether Hive should automatically send progress information to TaskTracker when using UDTF's to prevent the task getting killed because of inactivity. Users should be cautious because this may prevent TaskTracker from killing tasks with infinite loops.
@@ -1003,7 +1003,7 @@
hive.optimize.bucketingsorting
true
- If hive.enforce.bucketing or hive.enforce.sorting is true, dont create a reducer for enforcing
+ If hive.enforce.bucketing or hive.enforce.sorting is true, don't create a reducer for enforcing
bucketing/sorting for queries of the form:
insert overwrite table T2 select * from T1;
where T1 and T2 are bucketed/sorted by the same keys into the same number of buckets.
@@ -1045,10 +1045,10 @@
hive.auto.convert.sortmerge.join.to.mapjoin
false
If hive.auto.convert.sortmerge.join is set to true, and a join was converted to a sort-merge join,
- this parameter decides whether each table should be tried as a big table, and effectviely a map-join should be
+ this parameter decides whether each table should be tried as a big table, and effectively a map-join should be
tried. That would create a conditional task with n+1 children for a n-way join (1 child for each table as the
- big table), and the backup task will be the sort-merge join. In some casess, a map-join would be faster than a
- sort-merge join, if there is no advantage of having the output bucketed and sorted. For eg. if a very big sorted
+ big table), and the backup task will be the sort-merge join. In some cases, a map-join would be faster than a
+ sort-merge join, if there is no advantage of having the output bucketed and sorted. For example, if a very big sorted
and bucketed table with few files (say 10 files) are being joined with a very small sorter and bucketed table
with few files (10 files), the sort-merge join will only use 10 mappers, and a simple map-only join might be faster
if the complete small table can fit in memory, and a map-join can be performed.
@@ -1058,7 +1058,7 @@
hive.metastore.ds.connection.url.hook
- Name of the hook to use for retriving the JDO connection URL. If empty, the value in javax.jdo.option.ConnectionURL is used
+ Name of the hook to use for retrieving the JDO connection URL. If empty, the value in javax.jdo.option.ConnectionURL is used
@@ -1070,7 +1070,7 @@
hive.metastore.ds.retry.interval
1000
- The number of miliseconds between metastore retry attempts
+ The number of milliseconds between metastore retry attempts
@@ -1198,7 +1198,7 @@
hive.exec.default.partition.name
__HIVE_DEFAULT_PARTITION__
- The default partition name in case the dynamic partition column value is null/empty string or anyother values that cannot be escaped. This value must not contain any special character used in HDFS URI (e.g., ':', '%', '/' etc). The user has to be aware that the dynamic partition value should not contain this value to avoid confusions.
+ The default partition name in case the dynamic partition column value is null/empty string or any other values that cannot be escaped. This value must not contain any special character used in HDFS URI (e.g., ':', '%', '/' etc). The user has to be aware that the dynamic partition value should not contain this value to avoid confusions.
@@ -1252,14 +1252,14 @@
hive.stats.retries.wait
3000
- The base waiting window (in milliseconds) before the next retry. The actual wait time is calculated by baseWindow * failues baseWindow * (failure 1) * (random number between [0.0,1.0]).
+ The base waiting window (in milliseconds) before the next retry. The actual wait time is calculated by baseWindow * failures baseWindow * (failure 1) * (random number between [0.0,1.0]).
hive.stats.reliable
false
Whether queries will fail because stats cannot be collected completely accurately.
- If this is set to true, reading/writing from/into a partition may fail becuase the stats
+ If this is set to true, reading/writing from/into a partition may fail because the stats
could not be computed accurately.
@@ -1355,7 +1355,7 @@
fs.har.impl
org.apache.hadoop.hive.shims.HiveHarFileSystem
- The implementation for accessing Hadoop Archives. Note that this won't be applicable to Hadoop vers less than 0.20
+ The implementation for accessing Hadoop Archives. Note that this won't be applicable to Hadoop versions less than 0.20
@@ -1443,19 +1443,19 @@
hive.conf.validation
true
- Eables type checking for registered hive configurations
+ Enables type checking for registered Hive configurations
hive.security.authorization.enabled
false
- enable or disable the hive client authorization
+ enable or disable the Hive client authorization
hive.security.authorization.manager
org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider
- the hive client authorization manager class name.
+ The Hive client authorization manager class name.
The user defined authorization class should implement interface org.apache.hadoop.hive.ql.security.authorization.HiveAuthorizationProvider.
@@ -1516,13 +1516,13 @@
hive.security.command.whitelist
set,reset,dfs,add,delete
- Comma seperated list of non-SQL Hive commands users are authorized to execute
+ Comma separated list of non-SQL Hive commands users are authorized to execute
hive.conf.restricted.list
- Comma seperated list of configuration options which are immutable at runtime
+ Comma separated list of configuration options which are immutable at runtime
@@ -1537,13 +1537,13 @@
hive.error.on.empty.partition
false
- Whether to throw an excpetion if dynamic partition insert generates empty results.
+ Whether to throw an exception if dynamic partition insert generates empty results.
hive.index.compact.file.ignore.hdfs
false
- True the hdfs location stored in the index file will be igbored at runtime.
+ True the hdfs location stored in the index file will be ignored at runtime.
If the data got moved or the name of the cluster got changed, the index data should still be usable.
@@ -1678,7 +1678,7 @@
hive.start.cleanup.scratchdir
false
- To cleanup the hive scratchdir while starting the hive server
+ To cleanup the Hive scratchdir while starting the Hive Server
@@ -1779,7 +1779,7 @@
Some select queries can be converted to single FETCH task minimizing latency.
Currently the query should be single sourced not having any subquery and should not have
- any aggregations or distincts (which incurrs RS), lateral views and joins.
+ any aggregations or distincts (which incurs RS), lateral views and joins.
1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
2. more : SELECT, FILTER, LIMIT only (TABLESAMPLE, virtual columns)
@@ -1799,8 +1799,8 @@
hive.fetch.task.aggr
false
- Aggregation queries with no group-by clause (for example, select count(*) from src) executes
- final aggregations in single reduce task. If this is set true, hive delegates final aggregation
+ Aggregation queries with no group-by clause (for example, select count(*) from src) execute
+ final aggregations in single reduce task. If this is set true, Hive delegates final aggregation
stage to fetch task, possibly decreasing the query time.
@@ -1826,7 +1826,7 @@
hive.hmshandler.retry.interval
1000
- The number of miliseconds between HMSHandler retry attempts
+ The number of milliseconds between HMSHandler retry attempts
@@ -1838,7 +1838,7 @@
hive.server.tcp.keepalive
true
- Whether to enable TCP keepalive for the Hive server. Keepalive will prevent accumulation of half-open connections.
+ Whether to enable TCP keepalive for the Hive Server. Keepalive will prevent accumulation of half-open connections.
@@ -2018,8 +2018,8 @@
hive.server2.enable.doAs
true
- Setting this property to true will have hive server2 execute
- hive operations as the user making the calls to it.
+ Setting this property to true will have HiveServer2 execute
+ Hive operations as the user making the calls to it.
@@ -2027,9 +2027,9 @@
hive.server2.table.type.mapping
CLASSIC
- This setting reflects how HiveServer will report the table types for JDBC and other
- client implementations that retrieves the available tables and supported table types
- HIVE : Exposes the hive's native table tyes like MANAGED_TABLE, EXTERNAL_TABLE, VIRTUAL_VIEW
+ This setting reflects how HiveServer2 will report the table types for JDBC and other
+ client implementations that retrieve the available tables and supported table types
+ HIVE : Exposes Hive's native table types like MANAGED_TABLE, EXTERNAL_TABLE, VIRTUAL_VIEW
CLASSIC : More generic types like TABLE and VIEW
@@ -2038,11 +2038,11 @@
hive.server2.thrift.sasl.qop
auth
Sasl QOP value; Set it to one of following values to enable higher levels of
- protection for hive server2 communication with clients.
+ protection for HiveServer2 communication with clients.
"auth" - authentication only (default)
"auth-int" - authentication plus integrity protection
"auth-conf" - authentication plus integrity and confidentiality protection
- This is applicable only hive server2 is configured to use kerberos authentication.
+ This is applicable only if HiveServer2 is configured to use Kerberos authentication.
@@ -2098,7 +2098,7 @@
Enforce metastore schema version consistency.
True: Verify that version information stored in metastore matches with one from Hive jars. Also disable automatic
- schema migration attempt. Users are required to manully migrate schema after Hive upgrade which ensures
+ schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
proper metastore schema migration. (Default)
False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.