Index: docs/xdocs/admin_manual/installation.xml =================================================================== --- docs/xdocs/admin_manual/installation.xml (revision 0) +++ docs/xdocs/admin_manual/installation.xml (revision 0) @@ -0,0 +1,85 @@ + + + + + + + Hive Installation + Hadoop Hive Documentation Team + + + + +
+ + +

Installing Hive is simple and only requires having Java 1.6 and Ant installed on your machine.

+ +

Hive is available via SVN at http://svn.apache.org/repos/asf/hive/trunk. You can download it by running the following command.

+ + + + +

To build hive, execute the following command on the base directory:

+ + + + +

It will create the subdirectory build/dist with the following contents:

+ + + + +

Subdirectory build/dist should contain all the files necessary to run hive. You can run it from there or copy it to a different location, if you prefer.

+ +

In order to run Hive, you must have hadoop in your path or have defined the environment variable HADOOP_HOME with the hadoop installation directory.

+ +

Moreover, we strongly advise users to create the HDFS directories /tmp and /user/hive/warehouse +(aka hive.metastore.warehouse.dir) and set them chmod g+w before tables are created in Hive.

+ + +

To use hive command line interface (cli) go to the hive home directory (the one with the contents of build/dist) and execute the following command:

+ + + + +

Metadata is stored in an embedded Derby database whose disk storage location is determined by the hive configuration variable named javax.jdo.option.ConnectionURL. By default (see conf/hive-default.xml), this location is ./metastore_db

+ +

Using Derby in embedded mode allows at most one user at a time. To configure Derby to run in server mode, +see Hive Derby Server Mode.

+ + +
+ + + +
Index: docs/xdocs/admin_manual/configuration.xml =================================================================== --- docs/xdocs/admin_manual/configuration.xml (revision 0) +++ docs/xdocs/admin_manual/configuration.xml (revision 0) @@ -0,0 +1,383 @@ + + + + + + + Hive Configuration + Hadoop Hive Documentation Team + + + +
+ + + +

A number of configuration variables in Hive can be used by the administrator to change the behavior for their installations and user sessions. These variables can be configured in any of the following ways, shown in the order of preference:

+ + + +

hive-default.xml is located in the conf directory in your installation root. hive-site.xml should also be created in the same directory.

+ +

Broadly the configuration variables are categorized into:

+

Hive Configuration Variables

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Variable NameDescriptionDefault Value
hive.exec.script.wrapperWrapper around any invocations to script operator e.g. if this is set to python, the script passed to the script operator will be invoked as python <script command>. If the value is null or not set, the script is invoked as <script command>.null
hive.exec.plannull
hive.exec.scratchdirThis directory is used by hive to store the plans for different map/reduce stages for the query as well as to stored the intermediate outputs of these stages./tmp/&lt;user.name&gt;/hive
hive.querylog.location Directory where structured hive query logs are created. One file per session is created in this directory. If this variable set to empty string structured log will not be created./tmp/&lt;user.name&gt;
hive.exec.submitviachildDetermines whether the map/reduce jobs should be submitted through a separate jvm in the non local mode.false - By default jobs are submitted through the same jvm as the compiler
hive.exec.script.maxerrsizeMaximum number of serialization errors allowed in a user script invoked through TRANSFORM or MAP or REDUCE constructs.100000
hive.exec.compress.outputDetermines whether the output of the final map/reduce job in a query is compressed or not.false
hive.exec.compress.intermediateDetermines whether the output of the intermediate map/reduce jobs in a query is compressed or not.false
hive.jar.pathThe location of hive_cli.jar that is used when submitting jobs in a separate jvm.
hive.aux.jars.pathThe location of the plugin jars that contain implementations of user defined functions and serdes.
hive.partition.pruningA strict value for this variable indicates that an error is thrown by the compiler in case no partition predicate is provided on a partitioned table. This is used to protect against a user inadvertently issuing a query against all the partitions of the table.nonstrict
hive.map.aggrDetermines whether the map side aggregation is on or not.true
hive.join.emit.interval1000
hive.map.aggr.hash.percentmemory(float)0.5
hive.default.fileformatDefault file format for CREATE TABLE statement. Options are TextFile, SequenceFile and RCFileTextFile
hive.merge.mapfilesMerge small files at the end of a map-only job.true
hive.merge.mapredfilesMerge small files at the end of a map-reduce job.false
hive.merge.size.per.taskSize of merged files at the end of the job.256000000
hive.merge.smallfiles.avgsizeWhen the average output file size of a job is less than this number, Hive will start an additional map-reduce job to merge the output files into bigger files. This is only done for map-only jobs if hive.merge.mapfiles is true, and for map-reduce jobs if hive.merge.mapredfiles is true.16000000
hive.enforce.bucketing If enabled, enforces inserts into bucketed tables to also be bucketed false
+ + +

Hive MetaStore Configuration Variables

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Variable NameDescriptionDefault Value
hive.metastore.metadb.dir
hive.metastore.warehouse.dir Location of the default database for the warehouse
hive.metastore.uris
hive.metastore.usefilestore
hive.metastore.rawstore.impl
hive.metastore.local
javax.jdo.option.ConnectionURL JDBC connect string for a JDBC metastore
javax.jdo.option.ConnectionDriverName Driver class name for a JDBC metastore
javax.jdo.option.ConnectionUserName
javax.jdo.option.ConnectionPassword
org.jpox.autoCreateSchema Creates necessary schema on startup if one doesn't exist. (e.g. tables, columns...) Set to false after creating it once.
org.jpox.fixedDatastore Whether the datastore schema is fixed.
hive.metastore.checkForDefaultDb
hive.metastore.ds.connection.url.hook Name of the hook to use for retriving the JDO connection URL. If empty, the value in javax.jdo.option.ConnectionURL is used as the connection URL
hive.metastore.ds.retry.attempts The number of times to retry a call to the backing datastore if there were a connection error 1
hive.metastore.ds.retry.interval The number of miliseconds between datastore retry attempts 1000
hive.metastore.server.min.threads Minimum number of worker threads in the Thrift server's pool. 200
hive.metastore.server.max.threads Maximum number of worker threads in the Thrift server's pool. 10000
+ + +

Hive Configuration Variables used to interact with Hadoop

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Variable NameDescriptionDefault Value
hadoop.bin.pathThe location of hadoop script which is used to submit jobs to hadoop when submitting through a separate jvm.$HADOOP_HOME/bin/hadoop
hadoop.config.dirThe location of the configuration directory of the hadoop installation$HADOOP_HOME/conf
fs.default.namefile:///
map.input.filenull
mapred.job.trackerThe url to the jobtracker. If this is set to local then map/reduce is run in the local mode.local
mapred.reduce.tasksThe number of reducers for each map/reduce stage in the query plan.1
mapred.job.nameThe name of the map/reduce jobnull
+ + +

Hive Variables used to pass run time information

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Variable NameDescriptionDefault Value
hive.session.idThe id of the Hive Session.
hive.query.stringThe query string passed to the map/reduce job.
hive.query.planidThe id of the plan for the map/reduce stage.
hive.jobname.lengthThe maximum length of the jobname.50
hive.table.nameThe name of the hive table. This is passed to the user scripts through the script operator.
hive.partition.nameThe name of the hive partition. This is passed to the user scripts through the script operator.
hive.aliasThe alias being processed. This is also passed to the user scripts through the script operator.
+ + +

Temporary Folders

+

Hive uses temporary folders both on the machine running the Hive client and the default HDFS instance. These folders are used to store per-query temporary/intermediate data sets and are normally cleaned up by the hive client when the query is finished. However, in cases of abnormal hive client termination, some data may be left behind. The configuration details are as follows:

+ + + +

Note that when writing data to a table/partition, Hive will first write to a temporary location on the target table's filesystem (using hive.exec.scratchdir as the temporary location) and then move the data to the target table. This applies in all cases - whether tables are stored in HDFS (normal case) or in file systems like S3 or even NFS.

+

Log Files

+ +

Hive client produces logs and history files on the client machine. Please see Getting Started: Error Logs on configuration details.

+ +
+ + + +
Index: docs/stylesheets/project.xml =================================================================== --- docs/stylesheets/project.xml (revision 1337002) +++ docs/stylesheets/project.xml (working copy) @@ -31,6 +31,10 @@ + + + + Index: docs/stylesheets/site.vsl =================================================================== --- docs/stylesheets/site.vsl (revision 1337002) +++ docs/stylesheets/site.vsl (working copy) @@ -307,7 +307,7 @@