Index: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java =================================================================== --- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java (revision 1628586) +++ common/src/java/org/apache/hadoop/hive/conf/HiveConf.java (working copy) @@ -487,25 +487,39 @@ "Partition names will be checked against this regex pattern and rejected if not matched."), METASTORE_INTEGER_JDO_PUSHDOWN("hive.metastore.integral.jdo.pushdown", false, - "Allow JDO query pushdown for integral partition columns in metastore. Off by default. This\n" + - "improves metastore perf for integral columns, especially if there's a large number of partitions.\n" + - "However, it doesn't work correctly with integral values that are not normalized (e.g. have\n" + - "leading zeroes, like 0012). If metastore direct SQL is enabled and works, this optimization\n" + - "is also irrelevant."), - METASTORE_TRY_DIRECT_SQL("hive.metastore.try.direct.sql", true, ""), - METASTORE_TRY_DIRECT_SQL_DDL("hive.metastore.try.direct.sql.ddl", true, ""), + "Allow JDO query pushdown for integral partition columns in metastore. Off by default.\n" + + "This improves metastore performance for integral columns, especially if there's a\n" + + "large number of partitions. However, it doesn't work correctly with integral values\n" + + "that are not normalized (for example, if they have leading zeroes like 0012).\n" + + "If metastore direct SQL is enabled and works (hive.metastore.try.direct.sql),\n" + + "this optimization is also irrelevant."), + METASTORE_TRY_DIRECT_SQL("hive.metastore.try.direct.sql", true, + "Whether the Hive metastore should try to use direct SQL queries instead of the\n" + + "DataNucleus for certain read paths. This can improve metastore performance when\n" + + "fetching many partitions or column statistics by orders of magnitude; however, it\n" + + "is not guaranteed to work on all RDBMS-es and all versions. In case of SQL failures,\n" + + "the metastore will fall back to the DataNucleus, so it's safe even if SQL doesn't\n" + + "work for all queries on your datastore. If all SQL queries fail (for example, your\n" + + "metastore is backed by MongoDB), you might want to disable this to save the\n" + + "try-and-fall-back cost."), + METASTORE_TRY_DIRECT_SQL_DDL("hive.metastore.try.direct.sql.ddl", true, + "Same as hive.metastore.try.direct.sql, for read statements within a transaction that\n" + + "modifies metastore data. Due to non-standard behavior in Postgres, if a direct SQL\n" + + "select query has incorrect syntax or something similar inside a transaction, the\n" + + "entire transaction will fail and fall-back to DataNucleus will not be possible. You\n" + + "should disable the usage of direct SQL inside transactions if that happens in your case."), METASTORE_DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES( "hive.metastore.disallow.incompatible.col.type.changes", false, - "If true (default is false), ALTER TABLE operations which change the type of \n" + - "a column (say STRING) to an incompatible type (say MAP) are disallowed. \n" + + "If true (default is false), ALTER TABLE operations which change the type of a\n" + + "column (say STRING) to an incompatible type (say MAP) are disallowed.\n" + "RCFile default SerDe (ColumnarSerDe) serializes the values in such a way that the\n" + "datatypes can be converted from string to any type. The map is also serialized as\n" + - "a string, which can be read as a string as well. However, with any binary \n" + + "a string, which can be read as a string as well. However, with any binary\n" + "serialization, this is not true. Blocking the ALTER TABLE prevents ClassCastExceptions\n" + - "when subsequently trying to access old partitions. \n" + + "when subsequently trying to access old partitions.\n" + "\n" + - "Primitive types like INT, STRING, BIGINT, etc are compatible with each other and are \n" + - "not blocked. \n" + + "Primitive types like INT, STRING, BIGINT, etc., are compatible with each other and are\n" + + "not blocked.\n" + "\n" + "See HIVE-4409 for more details."), @@ -651,16 +665,18 @@ "Whether Hive should use memory-optimized hash table for MapJoin. Only works on Tez,\n" + "because memory-optimized hashtable cannot be serialized."), HIVEMAPJOINUSEOPTIMIZEDKEYS("hive.mapjoin.optimized.keys", true, - "Whether MapJoin hashtable should use optimized (size-wise), keys, allowing the table to take less\n" + - "memory. Depending on key, the memory savings for entire table can be 5-15% or so."), + "Whether MapJoin hashtable should use optimized (size-wise) keys, allowing the table to\n" + + "take less memory. Depending on the key, memory savings for the entire table can be\n" + + "5-15% or so."), HIVEMAPJOINLAZYHASHTABLE("hive.mapjoin.lazy.hashtable", true, - "Whether MapJoin hashtable should deserialize values on demand. Depending on how many values in\n" + - "the table the join will actually touch, it can save a lot of memory by not creating objects for\n" + - "rows that are not needed. If all rows are needed obviously there's no gain."), + "Whether MapJoin hashtable should deserialize values on demand. Depending on how many\n" + + "values in the table the join will actually touch, it can save a lot of memory by not\n" + + "creating objects for rows that are not needed. If all rows are needed, obviously\n" + + "there's no gain."), HIVEHASHTABLEWBSIZE("hive.mapjoin.optimized.hashtable.wbsize", 10 * 1024 * 1024, "Optimized hashtable (see hive.mapjoin.optimized.hashtable) uses a chain of buffers to\n" + - "store data. This is one buffer size. HT may be slightly faster if this is larger, but for small\n" + - "joins unnecessary memory will be allocated and then trimmed."), + "store data. This is one buffer size. HT may be slightly faster if this is larger, but\n" + + "for small joins unnecessary memory will be allocated and then trimmed."), HIVESMBJOINCACHEROWS("hive.smbjoin.cache.rows", 10000, "How many rows with the same key value should be cached in memory per smb joined table."), @@ -811,9 +827,9 @@ HIVEMERGERCFILEBLOCKLEVEL("hive.merge.rcfile.block.level", true, ""), HIVEMERGEORCFILESTRIPELEVEL("hive.merge.orcfile.stripe.level", true, "When hive.merge.mapfiles or hive.merge.mapredfiles is enabled while writing a\n" + - " table with ORC file format, enabling this config will do stripe level fast merge\n" + - " for small ORC files. Note that enabling this config will not honor padding tolerance\n" + - " config (hive.exec.orc.block.padding.tolerance)."), + "table with ORC file format, enabling this config will do stripe level fast merge\n" + + "for small ORC files. Note that enabling this config will not honor padding tolerance\n" + + "config (hive.exec.orc.block.padding.tolerance)."), HIVEUSEEXPLICITRCFILEHEADER("hive.exec.rcfile.use.explicit.header", true, "If this is set the header for RCFiles will simply be RCF. If this is not\n" + @@ -829,51 +845,67 @@ HIVE_ORC_FILE_MEMORY_POOL("hive.exec.orc.memory.pool", 0.5f, "Maximum fraction of heap that can be used by ORC file writers"), HIVE_ORC_WRITE_FORMAT("hive.exec.orc.write.format", null, - "Define the version of the file to write"), + "Define the version of the file to write. Possible values are 0.11 and 0.12.\n" + + "If this parameter is not defined, ORC will use the run length encoding (RLE)\n" + + "introduced in Hive 0.12. Any value other than 0.11 results in the 0.12 encoding."), HIVE_ORC_DEFAULT_STRIPE_SIZE("hive.exec.orc.default.stripe.size", 64L * 1024 * 1024, - "Define the default ORC stripe size"), + "Define the default ORC stripe size, in bytes."), HIVE_ORC_DEFAULT_BLOCK_SIZE("hive.exec.orc.default.block.size", 256L * 1024 * 1024, "Define the default file system block size for ORC files."), HIVE_ORC_DICTIONARY_KEY_SIZE_THRESHOLD("hive.exec.orc.dictionary.key.size.threshold", 0.8f, - "If the number of keys in a dictionary is greater than this fraction of the total number of\n" + - "non-null rows, turn off dictionary encoding. Use 1 to always use dictionary encoding."), - HIVE_ORC_DEFAULT_ROW_INDEX_STRIDE("hive.exec.orc.default.row.index.stride", 10000, "Define the default ORC index stride"), + "If the number of keys in a dictionary is greater than this fraction of the total\n" + + "number of non-null rows, turn off dictionary encoding. Set to 1 to always use\n" + + "dictionary encoding."), + HIVE_ORC_DEFAULT_ROW_INDEX_STRIDE("hive.exec.orc.default.row.index.stride", 10000, + "Define the default ORC index stride in number of rows. (Stride is the number of rows\n" + + "an index entry represents.)"), HIVE_ORC_ROW_INDEX_STRIDE_DICTIONARY_CHECK("hive.orc.row.index.stride.dictionary.check", true, - "If enabled dictionary check will happen after first row index stride (default 10000 rows)\n" + - "else dictionary check will happen before writing first stripe. In both cases, the decision\n" + - "to use dictionary or not will be retained thereafter."), - HIVE_ORC_DEFAULT_BUFFER_SIZE("hive.exec.orc.default.buffer.size", 256 * 1024, "Define the default ORC buffer size"), - HIVE_ORC_DEFAULT_BLOCK_PADDING("hive.exec.orc.default.block.padding", true, "Define the default block padding"), + "If enabled, dictionary check will happen after first row index stride (default\n" + + "10000 rows); else dictionary check will happen before writing first stripe.\n" + + "In both cases, the decision to use dictionary or not will be retained thereafter."), + HIVE_ORC_DEFAULT_BUFFER_SIZE("hive.exec.orc.default.buffer.size", 256 * 1024, + "Define the default ORC buffer size, in bytes."), + HIVE_ORC_DEFAULT_BLOCK_PADDING("hive.exec.orc.default.block.padding", true, + "Define the default block padding. Block padding was added in Hive 0.12.0 by HIVE-5091:\n" + + "ORC files should have an option to pad stripes to the HDFS block boundaries."), HIVE_ORC_BLOCK_PADDING_TOLERANCE("hive.exec.orc.block.padding.tolerance", 0.05f, - "Define the tolerance for block padding as a percentage of stripe size.\n" + - "For the defaults of 64Mb ORC stripe and 256Mb HDFS blocks, a maximum of 3.2Mb will be reserved for padding within the 256Mb block. \n" + - "In that case, if the available size within the block is more than 3.2Mb, a new smaller stripe will be inserted to fit within that space. \n" + - "This will make sure that no stripe written will cross block boundaries and cause remote reads within a node local task."), - HIVE_ORC_DEFAULT_COMPRESS("hive.exec.orc.default.compress", "ZLIB", "Define the default compression codec for ORC file"), + "Define the tolerance for block padding as a decimal fraction of stripe size (for\n" + + "example, the default value 0.05 is 5% of the stripe size). For the defaults of 64Mb\n" + + "ORC stripe and 256Mb HDFS blocks, the default block padding tolerance of 5% will\n" + + "reserve a maximum of 3.2Mb for padding within the 256Mb block. In that case, if the\n" + + "available size within the block is more than 3.2Mb, a new smaller stripe will be\n" + + "inserted to fit within that space. This will make sure that no stripe written will\n" + + "cross block boundaries and cause remote reads within a node local task."), + HIVE_ORC_DEFAULT_COMPRESS("hive.exec.orc.default.compress", "ZLIB", + "Define the default compression codec for ORC file"), - HIVE_ORC_ENCODING_STRATEGY("hive.exec.orc.encoding.strategy", "SPEED", new StringSet("SPEED", "COMPRESSION"), + HIVE_ORC_ENCODING_STRATEGY("hive.exec.orc.encoding.strategy", "SPEED", + new StringSet("SPEED", "COMPRESSION"), "Define the encoding strategy to use while writing data. Changing this will\n" + "only affect the light weight encoding for integers. This flag will not\n" + "change the compression level of higher level compression codec (like ZLIB)."), - HIVE_ORC_COMPRESSION_STRATEGY("hive.exec.orc.compression.strategy", "SPEED", new StringSet("SPEED", "COMPRESSION"), + HIVE_ORC_COMPRESSION_STRATEGY("hive.exec.orc.compression.strategy", "SPEED", + new StringSet("SPEED", "COMPRESSION"), "Define the compression strategy to use while writing data. \n" + "This changes the compression level of higher level compression codec (like ZLIB)."), HIVE_ORC_INCLUDE_FILE_FOOTER_IN_SPLITS("hive.orc.splits.include.file.footer", false, - "If turned on splits generated by orc will include metadata about the stripes in the file. This\n" + - "data is read remotely (from the client or HS2 machine) and sent to all the tasks."), + "If turned on splits generated by ORC will include metadata about the stripes in the\n" + + "file. This data is read remotely (from the client or HiveServer2 machine) and sent\n" + + "to all the tasks."), HIVE_ORC_CACHE_STRIPE_DETAILS_SIZE("hive.orc.cache.stripe.details.size", 10000, "Cache size for keeping meta info about orc splits cached in the client."), HIVE_ORC_COMPUTE_SPLITS_NUM_THREADS("hive.orc.compute.splits.num.threads", 10, - "How many threads orc should use to create splits in parallel."), + "How many threads ORC should use to create splits in parallel."), HIVE_ORC_SKIP_CORRUPT_DATA("hive.exec.orc.skip.corrupt.data", false, - "If ORC reader encounters corrupt data, this value will be used to determine\n" + - "whether to skip the corrupt data or throw exception. The default behavior is to throw exception."), + "If ORC reader encounters corrupt data, this value will be used to determine whether\n" + + "to skip the corrupt data or throw exception. The default behavior is to throw exception."), - HIVE_ORC_ZEROCOPY("hive.exec.orc.zerocopy", false, "Use zerocopy reads with ORC."), + HIVE_ORC_ZEROCOPY("hive.exec.orc.zerocopy", false, + "Use zerocopy reads with ORC. (This requires Hadoop 2.3 or later.)"), HIVE_LAZYSIMPLE_EXTENDED_BOOLEAN_LITERAL("hive.lazysimple.extended_boolean_literal", false, "LazySimpleSerde uses this property to determine if it treats 'T', 't', 'F', 'f',\n" + @@ -897,34 +929,40 @@ HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD("hive.auto.convert.join.noconditionaltask.size", 10000000L, - "If hive.auto.convert.join.noconditionaltask is off, this parameter does not take affect. \n" + - "However, if it is on, and the sum of size for n-1 of the tables/partitions for a n-way join is smaller than this size, \n" + - "the join is directly converted to a mapjoin(there is no conditional task). The default is 10MB"), + "If hive.auto.convert.join.noconditionaltask is off, this parameter does not take\n" + + "effect. However, if it is on, and the sum of sizes for n-1 of the tables/partitions\n" + + "for an n-way join is smaller than this size, the join is directly converted to a\n" + + "mapjoin (there is no conditional task). The default is 10MB."), HIVECONVERTJOINUSENONSTAGED("hive.auto.convert.join.use.nonstaged", false, - "For conditional joins, if input stream from a small alias can be directly applied to join operator without \n" + - "filtering or projection, the alias need not to be pre-staged in distributed cache via mapred local task.\n" + - "Currently, this is not working with vectorization or tez execution engine."), + "For conditional joins, if input stream from a small alias can be directly applied to\n" + + "join operator without filtering or projection, the alias need not be pre-staged in\n" + + "distributed cache via mapred local task.\n" + + "Currently, this is not working with vectorization or Tez execution engine."), HIVESKEWJOINKEY("hive.skewjoin.key", 100000, "Determine if we get a skew key in join. If we see more than the specified number of rows with the same key in join operator,\n" + "we think the key as a skew join key. "), HIVESKEWJOINMAPJOINNUMMAPTASK("hive.skewjoin.mapjoin.map.tasks", 10000, - "Determine the number of map task used in the follow up map join job for a skew join.\n" + - "It should be used together with hive.skewjoin.mapjoin.min.split to perform a fine grained control."), + "Determine the number of map tasks used in the follow up map join job for a skew join.\n" + + "It should be used together with hive.skewjoin.mapjoin.min.split to perform a fine\n" + + "grained control."), HIVESKEWJOINMAPJOINMINSPLIT("hive.skewjoin.mapjoin.min.split", 33554432L, - "Determine the number of map task at most used in the follow up map join job for a skew join by specifying \n" + - "the minimum split size. It should be used together with hive.skewjoin.mapjoin.map.tasks to perform a fine grained control."), + "Determine the number of map tasks at most used in the follow up map join job for\n" + + "a skew join by specifying the minimum split size. It should be used together with\n" + + "hive.skewjoin.mapjoin.map.tasks to perform a fine grained control."), HIVESENDHEARTBEAT("hive.heartbeat.interval", 1000, "Send a heartbeat after this interval - used by mapjoin and filter operators"), HIVELIMITMAXROWSIZE("hive.limit.row.max.size", 100000L, - "When trying a smaller subset of data for simple LIMIT, how much size we need to guarantee each row to have at least."), + "When trying a smaller subset of data for simple LIMIT, how much size we need\n" + + "to guarantee each row to have at least."), HIVELIMITOPTLIMITFILE("hive.limit.optimize.limit.file", 10, - "When trying a smaller subset of data for simple LIMIT, maximum number of files we can sample."), + "When trying a smaller subset of data for simple LIMIT, maximum number of files\n" + + "we can sample."), HIVELIMITOPTENABLE("hive.limit.optimize.enable", false, - "Whether to enable to optimization to trying a smaller subset of data for simple LIMIT first."), + "Whether to enable optimization to try a smaller subset of data for simple LIMIT first."), HIVELIMITOPTMAXFETCH("hive.limit.optimize.fetch.max", 50000, - "Maximum number of rows allowed for a smaller subset of data for simple LIMIT, if it is a fetch query. \n" + - "Insert queries are not restricted by this limit."), + "Maximum number of rows allowed for a smaller subset of data for simple LIMIT,\n" + + "if it is a fetch query. Insert queries are not restricted by this limit."), HIVELIMITPUSHDOWNMEMORYUSAGE("hive.limit.pushdown.memory.usage", -1f, "The max memory to be used for hash in RS operator for top K selection."), HIVELIMITTABLESCANPARTITION("hive.limit.query.max.table.partition", -1, @@ -932,11 +970,12 @@ "The default value \"-1\" means no limit."), HIVEHASHTABLEKEYCOUNTADJUSTMENT("hive.hashtable.key.count.adjustment", 1.0f, - "Adjustment to mapjoin hashtable size derived from table and column statistics; the estimate" + - " of the number of keys is divided by this value. If the value is 0, statistics are not used" + - "and hive.hashtable.initialCapacity is used instead."), - HIVEHASHTABLETHRESHOLD("hive.hashtable.initialCapacity", 100000, "Initial capacity of " + - "mapjoin hashtable if statistics are absent, or if hive.hashtable.stats.key.estimate.adjustment is set to 0"), + "Adjustment to mapjoin hashtable size derived from table and column statistics;\n" + + "the estimate of the number of keys is divided by this value. If the value is 0,\n" + + "statistics are not used and hive.hashtable.initialCapacity is used instead."), + HIVEHASHTABLETHRESHOLD("hive.hashtable.initialCapacity", 100000, + "Initial capacity of mapjoin hashtable if statistics are absent, or if\n" + + "hive.hashtable.stats.key.estimate.adjustment is set to 0."), HIVEHASHTABLELOADFACTOR("hive.hashtable.loadfactor", (float) 0.75, ""), HIVEHASHTABLEFOLLOWBYGBYMAXMEMORYUSAGE("hive.mapjoin.followby.gby.localtask.max.memory.usage", (float) 0.55, "This number means how much memory the local task can take to hold the key/value into an in-memory hash table \n" + @@ -954,12 +993,14 @@ HIVEINPUTFORMAT("hive.input.format", "org.apache.hadoop.hive.ql.io.CombineHiveInputFormat", "The default input format. Set this to HiveInputFormat if you encounter problems with CombineHiveInputFormat."), HIVETEZINPUTFORMAT("hive.tez.input.format", "org.apache.hadoop.hive.ql.io.HiveInputFormat", - "The default input format for tez. Tez groups splits in the AM."), + "The default input format for Tez. Tez groups splits in the AM (ApplicationMaster)."), HIVETEZCONTAINERSIZE("hive.tez.container.size", -1, - "By default Tez will spawn containers of the size of a mapper. This can be used to overwrite."), + "By default Tez will spawn containers of the size of a mapper. This can be used to\n" + + "overwrite the default."), HIVETEZJAVAOPTS("hive.tez.java.opts", null, - "By default Tez will use the Java options from map tasks. This can be used to overwrite."), + "By default Tez will use the Java options from map tasks. This can be used to\n" + + "overwrite the default."), HIVETEZLOGLEVEL("hive.tez.log.level", "INFO", "The log level to use for tasks executing as part of the DAG.\n" + "Used only if hive.tez.java.opts is used to configure Java options."), @@ -1241,13 +1282,13 @@ // Zookeeper related configs HIVE_ZOOKEEPER_QUORUM("hive.zookeeper.quorum", "", - "List of ZooKeeper servers to talk to. This is needed for: " + - "1. Read/write locks - when hive.lock.manager is set to " + - "org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager, " + + "List of ZooKeeper servers to talk to. This is needed for:\n" + + "1. Read/write locks -- when hive.lock.manager is set to\n" + + " org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager,\n" + "2. When HiveServer2 supports service discovery via Zookeeper."), HIVE_ZOOKEEPER_CLIENT_PORT("hive.zookeeper.client.port", "2181", - "The port of ZooKeeper servers to talk to. " + - "If the list of Zookeeper servers specified in hive.zookeeper.quorum," + + "The port of ZooKeeper servers to talk to.\n" + + "If the list of Zookeeper servers specified in hive.zookeeper.quorum\n" + "does not contain port numbers, this value is used."), HIVE_ZOOKEEPER_SESSION_TIMEOUT("hive.zookeeper.session.timeout", 600*1000, "ZooKeeper client's session timeout. The client is disconnected, and as a result, all locks released, \n" + @@ -1259,42 +1300,65 @@ // Transactions HIVE_TXN_MANAGER("hive.txn.manager", - "org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager", ""), + "org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager", + "To turn on Hive transactions, set to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.\n" + + "The default DummyTxnManager replicates pre Hive-0.13 behavior and provides no\n" + + "transactions."), HIVE_TXN_TIMEOUT("hive.txn.timeout", "300s", new TimeValidator(TimeUnit.SECONDS), - "time after which transactions are declared aborted if the client has not sent a heartbeat."), + "Time after which transactions are declared aborted if the client has not sent a\n" + + "heartbeat, in seconds."), HIVE_TXN_MAX_OPEN_BATCH("hive.txn.max.open.batch", 1000, "Maximum number of transactions that can be fetched in one call to open_txns().\n" + - "Increasing this will decrease the number of delta files created when\n" + - "streaming data into Hive. But it will also increase the number of\n" + - "open transactions at any given time, possibly impacting read performance."), + "This controls how many transactions streaming agents such as Flume or Storm open\n" + + "simultaneously. The streaming agent then writes that number of entries into a single\n" + + "file (per Flume agent or Storm bolt). Thus increasing this value decreases the number\n" + + "of delta files created by streaming agents. But it also increases the number of open\n" + + "transactions that Hive has to track at any given time, which may negatively affect\n" + + "read performance."), HIVE_COMPACTOR_INITIATOR_ON("hive.compactor.initiator.on", false, - "Whether to run the compactor's initiator thread in this metastore instance or not."), + "Whether to run the initiator and cleaner threads on this metastore instance or not.\n" + + "Set this to true on one instance of the Thrift metastore service to turn on Hive\n" + + "transactions."), HIVE_COMPACTOR_WORKER_THREADS("hive.compactor.worker.threads", 0, - "Number of compactor worker threads to run on this metastore instance."), + "How many compactor worker threads to run on this metastore instance. Set this to a\n" + + "positive number on one or more instances of the Thrift metastore service to turn on\n" + + "Hive transactions.\n" + + "Worker threads spawn MapReduce jobs to do compactions. They do not do the compactions\n" + + "themselves. Increasing the number of worker threads will decrease the time it takes\n" + + "tables or partitions to be compacted once they are determined to need compaction.\n" + + "It will also increase the background load on the Hadoop cluster as more MapReduce jobs\n" + + "will be running in the background."), HIVE_COMPACTOR_WORKER_TIMEOUT("hive.compactor.worker.timeout", "86400s", new TimeValidator(TimeUnit.SECONDS), - "Time before a given compaction in working state is declared a failure\n" + - "and returned to the initiated state."), + "Time in seconds after which a compaction job will be declared failed and the\n" + + "compaction re-queued."), HIVE_COMPACTOR_CHECK_INTERVAL("hive.compactor.check.interval", "300s", new TimeValidator(TimeUnit.SECONDS), - "Time between checks to see if any partitions need compacted.\n" + - "This should be kept high because each check for compaction requires many calls against the NameNode."), + "Time in seconds between checks to see if any tables or partitions need to be\n" + + "compacted. This should be kept high because each check for compaction requires\n" + + "many calls against the NameNode.\n" + + "Decreasing this value will reduce the time it takes for compaction to be started\n" + + "for a table or partition that requires compaction. However, checking if compaction\n" + + "is needed requires several calls to the NameNode for each table or partition that\n" + + "has had a transaction done on it since the last major compaction. So decreasing this\n" + + "value will increase the load on the NameNode."), HIVE_COMPACTOR_DELTA_NUM_THRESHOLD("hive.compactor.delta.num.threshold", 10, - "Number of delta files that must exist in a directory before the compactor will attempt\n" + - "a minor compaction."), + "Number of delta directories in a table or partition that will trigger a minor\n" + + "compaction."), HIVE_COMPACTOR_DELTA_PCT_THRESHOLD("hive.compactor.delta.pct.threshold", 0.1f, - "Percentage (by size) of base that deltas can be before major compaction is initiated."), + "Percentage (fractional) size of the delta files relative to the base that will trigger\n" + + "a major compaction. (1.0 = 100%, so the default 0.1 = 10%.)"), HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD("hive.compactor.abortedtxn.threshold", 1000, - "Number of aborted transactions involving a particular table or partition before major\n" + - "compaction is initiated."), + "Number of aborted transactions involving a given table or partition that will trigger\n" + + "a major compaction."), // For HBase storage handler HIVE_HBASE_WAL_ENABLED("hive.hbase.wal.enabled", true, @@ -1341,7 +1405,12 @@ "The SerDe used by FetchTask to serialize the fetch output."), HIVEEXPREVALUATIONCACHE("hive.cache.expr.evaluation", true, - "If true, evaluation result of deterministic expression referenced twice or more will be cached."), + "If true, the evaluation result of a deterministic expression referenced twice or more\n" + + "will be cached.\n" + + "For example, in a filter condition like '.. where key + 10 = 100 or key + 10 = 0'\n" + + "the expression 'key + 10' will be evaluated/cached once and reused for the following\n" + + "expression ('key + 10 = 0'). Currently, this is applied only to expressions in select\n" + + "or filter operators."), // Hive Variables HIVEVARIABLESUBSTITUTE("hive.variable.substitute", true, @@ -1359,17 +1428,20 @@ "enable or disable the Hive client authorization"), HIVE_AUTHORIZATION_MANAGER("hive.security.authorization.manager", "org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider", - "The Hive client authorization manager class name. The user defined authorization class should implement \n" + - "interface org.apache.hadoop.hive.ql.security.authorization.HiveAuthorizationProvider."), + "The Hive client authorization manager class name.\n" + + "The user defined authorization class should implement interface\n" + + "org.apache.hadoop.hive.ql.security.authorization.HiveAuthorizationProvider."), HIVE_AUTHENTICATOR_MANAGER("hive.security.authenticator.manager", "org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator", - "hive client authenticator manager class name. The user defined authenticator should implement \n" + - "interface org.apache.hadoop.hive.ql.security.HiveAuthenticationProvider."), + "Hive client authenticator manager class name. The user defined authenticator should\n" + + "implement interface org.apache.hadoop.hive.ql.security.HiveAuthenticationProvider."), HIVE_METASTORE_AUTHORIZATION_MANAGER("hive.security.metastore.authorization.manager", "org.apache.hadoop.hive.ql.security.authorization.DefaultHiveMetastoreAuthorizationProvider", - "authorization manager class name to be used in the metastore for authorization.\n" + - "The user defined authorization class should implement interface \n" + - "org.apache.hadoop.hive.ql.security.authorization.HiveMetastoreAuthorizationProvider. "), + "Names of authorization manager classes (comma separated) to be used in the metastore\n" + + "for authorization. The user defined authorization class should implement interface\n" + + "org.apache.hadoop.hive.ql.security.authorization.HiveMetastoreAuthorizationProvider.\n" + + "All authorization manager classes have to successfully authorize the metastore API\n" + + "call for the command execution to be allowed."), HIVE_METASTORE_AUTHORIZATION_AUTH_READS("hive.security.metastore.authorization.auth.reads", true, "If this is true, metastore authorizer authorizes read actions on database, table"), HIVE_METASTORE_AUTHENTICATOR_MANAGER("hive.security.metastore.authenticator.manager", @@ -1389,15 +1461,19 @@ "the privileges automatically granted to some roles whenever a table gets created.\n" + "An example like \"roleX,roleY:select;roleZ:create\" will grant select privilege to roleX and roleY,\n" + "and grant create privilege to roleZ whenever a new table created."), - HIVE_AUTHORIZATION_TABLE_OWNER_GRANTS("hive.security.authorization.createtable.owner.grants", "", - "the privileges automatically granted to the owner whenever a table gets created.\n" + - "An example like \"select,drop\" will grant select and drop privilege to the owner of the table"), + HIVE_AUTHORIZATION_TABLE_OWNER_GRANTS("hive.security.authorization.createtable.owner.grants", + "", + "The privileges automatically granted to the owner whenever a table gets created.\n" + + "An example like \"select,drop\" will grant select and drop privilege to the owner\n" + + "of the table. Note that the default gives the creator of a table no access to the\n" + + "table (but see HIVE-8067)."), // if this is not set default value is added by sql standard authorizer. // Default value can't be set in this constructor as it would refer names in other ConfVars // whose constructor would not have been called HIVE_AUTHORIZATION_SQL_STD_AUTH_CONFIG_WHITELIST("hive.security.authorization.sqlstd.confwhitelist", "", - "interal variable. List of modifiable configurations by user."), + "Internal variable. List of configurations modifiable by user.\n" + + "See HIVE-6846 for default."), HIVE_CLI_PRINT_HEADER("hive.cli.print.header", false, "Whether to print the names of the columns in query output."), @@ -1481,34 +1557,40 @@ "The data format to use for DDL output. One of \"text\" (for human\n" + "readable text) or \"json\" (for a json object)."), HIVE_ENTITY_SEPARATOR("hive.entity.separator", "@", - "Separator used to construct names of tables and partitions. For example, dbname@tablename@partitionname"), + "Separator used to construct names of tables and partitions. For example,\n" + + "dbname@tablename@partitionname uses the default separator '@'."), HIVE_DISPLAY_PARTITION_COLUMNS_SEPARATELY("hive.display.partition.cols.separately", true, - "In older Hive version (0.10 and earlier) no distinction was made between\n" + + "In older Hive versions (0.10 and earlier) no distinction was made between\n" + "partition columns or non-partition columns while displaying columns in describe\n" + "table. From 0.12 onwards, they are displayed separately. This flag will let you\n" + - "get old behavior, if desired. See, test-case in patch for HIVE-6689."), + "get the old behavior, if desired. See test-case in patch for HIVE-6689."), // HiveServer2 specific configs - HIVE_SERVER2_MAX_START_ATTEMPTS("hive.server2.max.start.attempts", 30L, new RangeValidator(0L, null), - "Number of times HiveServer2 will attempt to start before exiting, sleeping 60 seconds " + - "between retries. \n The default of 30 will keep trying for 30 minutes."), + HIVE_SERVER2_MAX_START_ATTEMPTS("hive.server2.max.start.attempts", 30L, + new RangeValidator(0L, null), + "The number of times HiveServer2 will attempt to start before exiting, sleeping\n" + + "60 seconds between retries. The default of 30 will keep trying for 30 minutes."), HIVE_SERVER2_SUPPORT_DYNAMIC_SERVICE_DISCOVERY("hive.server2.support.dynamic.service.discovery", false, "Whether HiveServer2 supports dynamic service discovery for its clients. " + "To support this, each instance of HiveServer2 currently uses ZooKeeper to register itself, " + "when it is brought up. JDBC/ODBC clients should use the ZooKeeper ensemble: " + "hive.zookeeper.quorum in their connection string."), HIVE_SERVER2_ZOOKEEPER_NAMESPACE("hive.server2.zookeeper.namespace", "hiveserver2", - "The parent node in ZooKeeper used by HiveServer2 when supporting dynamic service discovery."), + "The parent node in ZooKeeper used by HiveServer2 when supporting dynamic service\n" + + "discovery."), // HiveServer2 global init file location - HIVE_SERVER2_GLOBAL_INIT_FILE_LOCATION("hive.server2.global.init.file.location", "${env:HIVE_CONF_DIR}", - "Either the location of a HS2 global init file or a directory containing a .hiverc file. If the \n" + - "property is set, the value must be a valid path to an init file or directory where the init file is located."), - HIVE_SERVER2_TRANSPORT_MODE("hive.server2.transport.mode", "binary", new StringSet("binary", "http"), + HIVE_SERVER2_GLOBAL_INIT_FILE_LOCATION("hive.server2.global.init.file.location", + "${env:HIVE_CONF_DIR}", + "Either the location of a HiveServer2 global init file or a directory containing a \n" + + ".hiverc file. If the property is set, the value must be a valid path to an init file or\n" + + "directory where the init file is located."), + HIVE_SERVER2_TRANSPORT_MODE("hive.server2.transport.mode", "binary", + new StringSet("binary", "http"), "Transport mode of HiveServer2."), HIVE_SERVER2_THRIFT_BIND_HOST("hive.server2.thrift.bind.host", "", "Bind host on which to run the HiveServer2 Thrift service."), - // http (over thrift) transport settings + // http (over Thrift) transport settings HIVE_SERVER2_THRIFT_HTTP_PORT("hive.server2.thrift.http.port", 10001, "Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'http'."), HIVE_SERVER2_THRIFT_HTTP_PATH("hive.server2.thrift.http.path", "cliservice", @@ -1520,20 +1602,21 @@ HIVE_SERVER2_THRIFT_HTTP_MAX_IDLE_TIME("hive.server2.thrift.http.max.idle.time", "1800s", new TimeValidator(TimeUnit.MILLISECONDS), "Maximum idle time for a connection on the server when in HTTP mode."), - HIVE_SERVER2_THRIFT_HTTP_WORKER_KEEPALIVE_TIME("hive.server2.thrift.http.worker.keepalive.time", "60s", - new TimeValidator(TimeUnit.SECONDS), - "Keepalive time for an idle http worker thread. When the number of workers exceeds min workers, " + - "excessive threads are killed after this time interval."), + HIVE_SERVER2_THRIFT_HTTP_WORKER_KEEPALIVE_TIME("hive.server2.thrift.http.worker.keepalive.time", + "60s", new TimeValidator(TimeUnit.SECONDS), + "Keepalive time for an idle http worker thread. When the number of workers exceeds\n" + + "min workers, excessive threads are killed after this time interval."), // binary transport settings HIVE_SERVER2_THRIFT_PORT("hive.server2.thrift.port", 10000, "Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'."), - // hadoop.rpc.protection being set to a higher level than HiveServer2 - // does not make sense in most situations. - // HiveServer2 ignores hadoop.rpc.protection in favor of hive.server2.thrift.sasl.qop. - HIVE_SERVER2_THRIFT_SASL_QOP("hive.server2.thrift.sasl.qop", "auth", new StringSet("auth", "auth-int", "auth-conf"), - "Sasl QOP value; Set it to one of following values to enable higher levels of\n" + - " protection for HiveServer2 communication with clients.\n" + + HIVE_SERVER2_THRIFT_SASL_QOP("hive.server2.thrift.sasl.qop", "auth", + new StringSet("auth", "auth-int", "auth-conf"), + "Sasl QOP value; set it to one of following values to enable higher levels of\n" + + "protection for HiveServer2 communication with clients.\n" + + "Setting hadoop.rpc.protection to a higher level than HiveServer2 does not\n" + + "make sense in most situations. HiveServer2 ignores hadoop.rpc.protection in favor\n" + + "of hive.server2.thrift.sasl.qop.\n" + " \"auth\" - authentication only (default)\n" + " \"auth-int\" - authentication plus integrity protection\n" + " \"auth-conf\" - authentication plus integrity and confidentiality protection\n" + @@ -1544,24 +1627,26 @@ "Maximum number of Thrift worker threads"), HIVE_SERVER2_THRIFT_WORKER_KEEPALIVE_TIME("hive.server2.thrift.worker.keepalive.time", "60s", new TimeValidator(TimeUnit.SECONDS), - "Keepalive time (in seconds) for an idle worker thread. When the number of workers exceeds min workers, " + - "excessive threads are killed after this time interval."), + "Keepalive time (in seconds) for an idle worker thread. When the number of workers\n" + + "exceeds min workers, excessive threads are killed after this time interval."), // Configuration for async thread pool in SessionManager HIVE_SERVER2_ASYNC_EXEC_THREADS("hive.server2.async.exec.threads", 100, "Number of threads in the async thread pool for HiveServer2"), HIVE_SERVER2_ASYNC_EXEC_SHUTDOWN_TIMEOUT("hive.server2.async.exec.shutdown.timeout", "10s", new TimeValidator(TimeUnit.SECONDS), - "Maximum time for which HiveServer2 shutdown will wait for async"), + "Time (in seconds) for which HiveServer2 shutdown will wait for async threads to\n" + + "terminate."), HIVE_SERVER2_ASYNC_EXEC_WAIT_QUEUE_SIZE("hive.server2.async.exec.wait.queue.size", 100, "Size of the wait queue for async thread pool in HiveServer2.\n" + "After hitting this limit, the async thread pool will reject new requests."), HIVE_SERVER2_ASYNC_EXEC_KEEPALIVE_TIME("hive.server2.async.exec.keepalive.time", "10s", new TimeValidator(TimeUnit.SECONDS), - "Time that an idle HiveServer2 async thread (from the thread pool) will wait for a new task\n" + - "to arrive before terminating"), + "Time that an idle HiveServer2 async thread (from the thread pool) will wait for a new\n" + + "task to arrive before terminating."), HIVE_SERVER2_LONG_POLLING_TIMEOUT("hive.server2.long.polling.timeout", "5000ms", new TimeValidator(TimeUnit.MILLISECONDS), - "Time that HiveServer2 will wait before responding to asynchronous calls that use long polling"), + "Time that HiveServer2 will wait before responding to asynchronous calls that use long\n" + + "polling."), // HiveServer2 auth configuration HIVE_SERVER2_AUTHENTICATION("hive.server2.authentication", "NONE", @@ -1571,7 +1656,9 @@ " LDAP: LDAP/AD based authentication\n" + " KERBEROS: Kerberos/GSSAPI authentication\n" + " CUSTOM: Custom authentication provider\n" + - " (Use with property hive.server2.custom.authentication.class)"), + " (Use with property hive.server2.custom.authentication.class)\n" + + " PAM: Pluggable authentication module\n" + + " NOSASL: Raw transport"), HIVE_SERVER2_ALLOW_USER_SUBSTITUTION("hive.server2.allow.user.substitution", true, "Allow alternate user to be specified as part of HiveServer2 open connection request."), HIVE_SERVER2_KERBEROS_KEYTAB("hive.server2.authentication.kerberos.keytab", "", @@ -1579,9 +1666,9 @@ HIVE_SERVER2_KERBEROS_PRINCIPAL("hive.server2.authentication.kerberos.principal", "", "Kerberos server principal"), HIVE_SERVER2_SPNEGO_KEYTAB("hive.server2.authentication.spnego.keytab", "", - "keytab file for SPNego principal, optional,\n" + - "typical value would look like /etc/security/keytabs/spnego.service.keytab,\n" + - "This keytab would be used by HiveServer2 when Kerberos security is enabled and \n" + + "Keytab file for SPNego principal, optional,\n" + + "typical value would look like /etc/security/keytabs/spnego.service.keytab.\n" + + "This keytab would be used by HiveServer2 when Kerberos security is enabled and\n" + "HTTP transport mode is used.\n" + "This needs to be set only if SPNEGO is to be used in authentication.\n" + "SPNego authentication would be honored only if valid\n" + @@ -1592,8 +1679,8 @@ HIVE_SERVER2_SPNEGO_PRINCIPAL("hive.server2.authentication.spnego.principal", "", "SPNego service principal, optional,\n" + "typical value would look like HTTP/_HOST@EXAMPLE.COM\n" + - "SPNego service principal would be used by HiveServer2 when Kerberos security is enabled\n" + - "and HTTP transport mode is used.\n" + + "SPNego service principal would be used by HiveServer2 when Kerberos security is\n" + + "enabled and HTTP transport mode is used.\n" + "This needs to be set only if SPNEGO is to be used in authentication."), HIVE_SERVER2_PLAIN_LDAP_URL("hive.server2.authentication.ldap.url", null, "LDAP connection URL"), HIVE_SERVER2_PLAIN_LDAP_BASEDN("hive.server2.authentication.ldap.baseDN", null, "LDAP base DN"), @@ -1619,11 +1706,15 @@ " HIVE : Exposes Hive's native table types like MANAGED_TABLE, EXTERNAL_TABLE, VIRTUAL_VIEW\n" + " CLASSIC : More generic types like TABLE and VIEW"), HIVE_SERVER2_SESSION_HOOK("hive.server2.session.hook", "", ""), - HIVE_SERVER2_USE_SSL("hive.server2.use.SSL", false, ""), - HIVE_SERVER2_SSL_KEYSTORE_PATH("hive.server2.keystore.path", "", ""), - HIVE_SERVER2_SSL_KEYSTORE_PASSWORD("hive.server2.keystore.password", "", ""), + HIVE_SERVER2_USE_SSL("hive.server2.use.SSL", false, + "Set this to true for using SSL encryption in HiveServer2."), + HIVE_SERVER2_SSL_KEYSTORE_PATH("hive.server2.keystore.path", "", + "SSL certificate keystore location."), + HIVE_SERVER2_SSL_KEYSTORE_PASSWORD("hive.server2.keystore.password", "", + "SSL certificate keystore password."), - HIVE_SECURITY_COMMAND_WHITELIST("hive.security.command.whitelist", "set,reset,dfs,add,list,delete,reload,compile", + HIVE_SECURITY_COMMAND_WHITELIST("hive.security.command.whitelist", + "set,reset,dfs,add,list,delete,reload,compile", "Comma separated list of non-SQL Hive commands users are authorized to execute"), HIVE_SERVER2_SESSION_CHECK_INTERVAL("hive.server2.session.check.interval", "0ms", @@ -1694,13 +1785,13 @@ "Whether to show the unquoted partition names in query results."), HIVE_EXECUTION_ENGINE("hive.execution.engine", "mr", new StringSet("mr", "tez"), - "Chooses execution engine. Options are: mr (Map reduce, default) or tez (hadoop 2 only)"), + "Chooses execution engine. Options are: mr (MapReduce, default) or tez (Hadoop 2 only)."), HIVE_JAR_DIRECTORY("hive.jar.directory", null, - "This is the location hive in tez mode will look for to find a site wide \n" + - "installed hive instance."), + "This is the location that Hive in Tez mode will look for to find a site-wide \n" + + "installed Hive instance."), HIVE_USER_INSTALL_DIR("hive.user.install.directory", "hdfs:///user/", - "If hive (in tez mode only) cannot find a usable hive jar in \"hive.jar.directory\", \n" + - "it will upload the hive jar to \"hive.user.install.directory/user.name\"\n" + + "If Hive (in Tez mode only) cannot find a usable Hive jar in \"hive.jar.directory\", \n" + + "it will upload the Hive jar to \"hive.user.install.directory/user.name\"\n" + "and use it to run queries."), // Vectorization enabled @@ -1768,29 +1859,32 @@ "Setting to 0.12:\n" + " Maintains division behavior: int / int = double"), HIVE_CONVERT_JOIN_BUCKET_MAPJOIN_TEZ("hive.convert.join.bucket.mapjoin.tez", false, - "Whether joins can be automatically converted to bucket map joins in hive \n" + - "when tez is used as the execution engine."), + "Whether joins can be automatically converted to bucket map joins in Hive \n" + + "when Tez is used as the execution engine."), HIVE_CHECK_CROSS_PRODUCT("hive.exec.check.crossproducts", true, - "Check if a plan contains a Cross Product. If there is one, output a warning to the Session's console."), + "Check if a plan contains a Cross Product. If there is one, output a warning to the\n" + + "Session's console."), HIVE_LOCALIZE_RESOURCE_WAIT_INTERVAL("hive.localize.resource.wait.interval", "5000ms", new TimeValidator(TimeUnit.MILLISECONDS), - "Time to wait for another thread to localize the same resource for hive-tez."), + "Time to wait for another thread to localize the same resource for Hive-Tez."), HIVE_LOCALIZE_RESOURCE_NUM_WAIT_ATTEMPTS("hive.localize.resource.num.wait.attempts", 5, - "The number of attempts waiting for localizing a resource in hive-tez."), + "The number of attempts waiting for localizing a resource in Hive-Tez."), TEZ_AUTO_REDUCER_PARALLELISM("hive.tez.auto.reducer.parallelism", false, - "Turn on Tez' auto reducer parallelism feature. When enabled, Hive will still estimate data sizes\n" + - "and set parallelism estimates. Tez will sample source vertices' output sizes and adjust the estimates at runtime as\n" + - "necessary."), + "Turn on Tez' auto reducer parallelism feature. When enabled, Hive will still estimate\n" + + "data sizes and set parallelism estimates. Tez will sample source vertices' output\n" + + "sizes and adjust the estimates at runtime as necessary."), TEZ_MAX_PARTITION_FACTOR("hive.tez.max.partition.factor", 2f, - "When auto reducer parallelism is enabled this factor will be used to over-partition data in shuffle edges."), + "When auto reducer parallelism is enabled this factor will be used to over-partition\n" + + "data in shuffle edges."), TEZ_MIN_PARTITION_FACTOR("hive.tez.min.partition.factor", 0.25f, - "When auto reducer parallelism is enabled this factor will be used to put a lower limit to the number\n" + - "of reducers that tez specifies."), + "When auto reducer parallelism is enabled this factor will be used to put a lower limit\n" + + "to the number of reducers that Tez specifies."), TEZ_DYNAMIC_PARTITION_PRUNING( "hive.tez.dynamic.partition.pruning", true, - "When dynamic pruning is enabled, joins on partition keys will be processed by sending events from the processing " + - "vertices to the tez application master. These events will be used to prune unnecessary partitions."), + "When dynamic pruning is enabled, joins on partition keys will be processed by sending\n" + + "events from the processing vertices to the Tez application master. These events will be\n" + + "used to prune unnecessary partitions."), TEZ_DYNAMIC_PARTITION_PRUNING_MAX_EVENT_SIZE("hive.tez.dynamic.partition.pruning.max.event.size", 1*1024*1024L, "Maximum size of events sent by processors in dynamic pruning. If this size is crossed no pruning will take place."), TEZ_DYNAMIC_PARTITION_PRUNING_MAX_DATA_SIZE("hive.tez.dynamic.partition.pruning.max.data.size", 100*1024*1024L,