A number of configuration variables in Hive can be used by the administrator to change the behavior for their installations and user sessions. These variables can be configured in any of the following ways, shown in the order of preference:
+hive-default.xml is located in the conf directory in your installation root. hive-site.xml should also be created in the same directory.
+
+
+
+| Variable Name |
+Description |
+Default Value |
+
+
+| hive.exec.script.wrapper |
+Wrapper around any invocations to script operator e.g. if this is set to python, the script passed to the script operator will be invoked as python <script command>. If the value is null or not set, the script is invoked as <script command>. |
+null |
+
+
+| hive.exec.plan |
+ |
+null |
+
+
+| hive.exec.scratchdir |
+This directory is used by hive to store the plans for different map/reduce stages for the query as well as to stored the intermediate outputs of these stages. |
+/tmp/<user.name>/hive |
+
+
+| hive.querylog.location |
+ Directory where structured hive query logs are created. One file per session is created in this directory. If this variable set to empty string structured log will not be created. |
+/tmp/<user.name> |
+
+
+| hive.exec.submitviachild |
+Determines whether the map/reduce jobs should be submitted through a separate jvm in the non local mode. |
+false - By default jobs are submitted through the same jvm as the compiler |
+
+
+| hive.exec.script.maxerrsize |
+Maximum number of serialization errors allowed in a user script invoked through TRANSFORM or MAP or REDUCE constructs. |
+100000 |
+
+
+| hive.exec.compress.output |
+Determines whether the output of the final map/reduce job in a query is compressed or not. |
+false |
+
+
+| hive.exec.compress.intermediate |
+Determines whether the output of the intermediate map/reduce jobs in a query is compressed or not. |
+false |
+
+
+| hive.jar.path |
+The location of hive_cli.jar that is used when submitting jobs in a separate jvm. |
+ |
+
+
+| hive.aux.jars.path |
+The location of the plugin jars that contain implementations of user defined functions and serdes. |
+ |
+
+
+| hive.partition.pruning |
+A strict value for this variable indicates that an error is thrown by the compiler in case no partition predicate is provided on a partitioned table. This is used to protect against a user inadvertently issuing a query against all the partitions of the table. |
+nonstrict |
+
+
+| hive.map.aggr |
+Determines whether the map side aggregation is on or not. |
+true |
+
+
+| hive.join.emit.interval |
+ |
+1000 |
+
+
+| hive.map.aggr.hash.percentmemory |
+ |
+(float)0.5 |
+
+
+| hive.default.fileformat |
+Default file format for CREATE TABLE statement. Options are TextFile, SequenceFile and RCFile |
+TextFile |
+
+
+| hive.merge.mapfiles |
+Merge small files at the end of a map-only job. |
+true |
+
+
+| hive.merge.mapredfiles |
+Merge small files at the end of a map-reduce job. |
+false |
+
+
+| hive.merge.size.per.task |
+Size of merged files at the end of the job. |
+256000000 |
+
+
+| hive.merge.smallfiles.avgsize |
+When the average output file size of a job is less than this number, Hive will start an additional map-reduce job to merge the output files into bigger files. This is only done for map-only jobs if hive.merge.mapfiles is true, and for map-reduce jobs if hive.merge.mapredfiles is true. |
+16000000 |
+
+
+| hive.enforce.bucketing |
+ If enabled, enforces inserts into bucketed tables to also be bucketed |
+ false |
+
+
+
+
+
+
+| Variable Name |
+Description |
+Default Value |
+
+
+| hive.metastore.metadb.dir |
+ |
+ |
+
+
+| hive.metastore.warehouse.dir |
+ Location of the default database for the warehouse |
+ |
+
+
+| hive.metastore.uris |
+ |
+ |
+
+
+| hive.metastore.usefilestore |
+ |
+ |
+
+
+| hive.metastore.rawstore.impl |
+ |
+ |
+
+
+| hive.metastore.local |
+ |
+ |
+
+
+| javax.jdo.option.ConnectionURL |
+ JDBC connect string for a JDBC metastore |
+ |
+
+
+| javax.jdo.option.ConnectionDriverName |
+ Driver class name for a JDBC metastore |
+ |
+
+
+| javax.jdo.option.ConnectionUserName |
+ |
+ |
+
+
+| javax.jdo.option.ConnectionPassword |
+ |
+ |
+
+
+| org.jpox.autoCreateSchema |
+ Creates necessary schema on startup if one doesn't exist. (e.g. tables, columns...) Set to false after creating it once. |
+ |
+
+
+| org.jpox.fixedDatastore |
+ Whether the datastore schema is fixed. |
+ |
+
+
+| hive.metastore.checkForDefaultDb |
+ |
+ |
+
+
+| hive.metastore.ds.connection.url.hook |
+ Name of the hook to use for retriving the JDO connection URL. If empty, the value in javax.jdo.option.ConnectionURL is used as the connection URL |
+ |
+
+
+| hive.metastore.ds.retry.attempts |
+ The number of times to retry a call to the backing datastore if there were a connection error |
+ 1 |
+
+
+| hive.metastore.ds.retry.interval |
+ The number of miliseconds between datastore retry attempts |
+ 1000 |
+
+
+| hive.metastore.server.min.threads |
+ Minimum number of worker threads in the Thrift server's pool. |
+ 200 |
+
+
+| hive.metastore.server.max.threads |
+ Maximum number of worker threads in the Thrift server's pool. |
+ 10000 |
+
+
+
+
+Hive uses temporary folders both on the machine running the Hive client and the default HDFS instance. These folders are used to store per-query temporary/intermediate data sets and are normally cleaned up by the hive client when the query is finished. However, in cases of abnormal hive client termination, some data may be left behind. The configuration details are as follows:
+Note that when writing data to a table/partition, Hive will first write to a temporary location on the target table's filesystem (using hive.exec.scratchdir as the temporary location) and then move the data to the target table. This applies in all cases - whether tables are stored in HDFS (normal case) or in file systems like S3 or even NFS.
+