Index: src/docs/src/documentation/content/xdocs/dynpartition.xml =================================================================== --- src/docs/src/documentation/content/xdocs/dynpartition.xml (revision 1377542) +++ src/docs/src/documentation/content/xdocs/dynpartition.xml (working copy) @@ -49,7 +49,8 @@

The way dynamic partitioning works is that HCatalog locates partition columns in the data passed to it and uses the data in these columns to split the rows across multiple partitions. (The data passed to HCatalog must have a schema that matches the schema of the destination table and hence should always contain partition columns.) It is important to note that partition columns can’t contain null values or the whole process will fail.

-

It is also important to note that all partitions created during a single run are part of a transaction and if any part of the process fails none of the partitions will be added to the table.

+

It is also important to note that all partitions created during a single run are part of one transaction; +therefore if any part of the process fails, none of the partitions will be added to the table.

@@ -101,7 +102,7 @@ Usage from MapReduce

As with Pig, the only change in dynamic partitioning that a MapReduce programmer sees is that they don't have to specify all the partition key/value combinations.

-

A current code example for writing out a specific partition for (a=1,b=1) would go something like this:

+

A current code example for writing out a specific partition for (a=1, b=1) would go something like this:

Map<String, String> partitionValues = new HashMap<String, String>(); Index: src/docs/src/documentation/content/xdocs/install.xml =================================================================== --- src/docs/src/documentation/content/xdocs/install.xml (revision 1377542) +++ src/docs/src/documentation/content/xdocs/install.xml (working copy) @@ -19,7 +19,7 @@
- Source Installation + Installation from Tarball
@@ -28,11 +28,11 @@

Prerequisites

-

Assumptions

-

When using the HCatalog CLI, you cannot specify a permission string without read permissions for owner, such as -wxrwxr-x. If such a permission setting is desired, you can use the octal version instead, which in this case would be 375. Also, any other kind of permission string where the owner has read permissions (for example r-x------ or r--r--r--) will work fine.

+ +

Owner Permissions

+

When using the HCatalog CLI, you cannot specify a permission string without read permissions for owner, such as -wxrwxr-x, because the string begins with "-". If such a permission setting is desired, you can use the octal version instead, which in this case would be 375. Also, any other kind of permission string where the owner has read permissions (for example r-x------ or r--r--r--) will work fine.

@@ -113,7 +144,7 @@
Create/Drop/Alter View -

Note: Pig and MapReduce coannot read from or write to views.

+

Note: Pig and MapReduce cannot read from or write to views.

CREATE VIEW

Supported. Behavior same as Hive.

@@ -162,7 +193,7 @@
- "dfs" command and "set" command + "dfs" Command and "set" Command

Supported. Behavior same as Hive.

@@ -172,15 +203,20 @@
+
+ CLI Errors

Authentication

If a failure results in a message like "2010-11-03 16:17:28,225 WARN hive.metastore ... - Unable to connect metastore with URI thrift://..." in /tmp/<username>/hive.log, then make sure you have run "kinit <username>@FOO.COM" to get a Kerberos ticket and to be able to authenticate to the HCatalog server.

+ +

Error Log

+

If other errors occur while using the HCatalog CLI, more detailed messages are written to /tmp/<username>/hive.log.

+
- Index: src/docs/src/documentation/content/xdocs/index.xml =================================================================== --- src/docs/src/documentation/content/xdocs/index.xml (revision 1377542) +++ src/docs/src/documentation/content/xdocs/index.xml (working copy) @@ -26,7 +26,7 @@ HCatalog

HCatalog is a table and storage management layer for Hadoop that enables users with different data processing tools – Pig, MapReduce, and Hive – to more easily read and write data on the grid. HCatalog’s table abstraction presents users with a relational view of data in the Hadoop distributed file system (HDFS) and ensures that users need not worry about where or in what format their data is stored – RCFile format, text files, or SequenceFiles.

-

HCatalog supports reading and writing files in any format for which a SerDe can be written. By default, HCatalog supports RCFile, CSV, JSON, and SequenceFile formats. To use a custom format, you must provide the InputFormat, OutputFormat, and SerDe.

+

HCatalog supports reading and writing files in any format for which a SerDe (serializer-deserializer) can be written. By default, HCatalog supports RCFile, CSV, JSON, and SequenceFile formats. To use a custom format, you must provide the InputFormat, OutputFormat, and SerDe.

@@ -42,20 +42,25 @@
Interfaces -

The HCatalog interface for Pig consists of HCatLoader and HCatStorer, which implement the Pig load and store interfaces respectively. HCatLoader accepts a table to read data from; you can indicate which partitions to scan by immediately following the load statement with a partition filter statement. HCatStorer accepts a table to write to and optionally a specification of partition keys to create a new partition. You can write to a single partition by specifying the partition key(s) and value(s) in the STORE clause; and you can write to multiple partitions if the partition key(s) are columns in the data being stored. HCatLoader is implemented on top of HCatInputFormat and HCatStorer is implemented on top of HCatOutputFormat (see HCatalog Load and Store).

+

The HCatalog interface for Pig consists of HCatLoader and HCatStorer, which implement the Pig load and store interfaces respectively. HCatLoader accepts a table to read data from; you can indicate which partitions to scan by immediately following the load statement with a partition filter statement. HCatStorer accepts a table to write to and optionally a specification of partition keys to create a new partition. You can write to a single partition by specifying the partition key(s) and value(s) in the STORE clause; and you can write to multiple partitions if the partition key(s) are columns in the data being stored. HCatLoader is implemented on top of HCatInputFormat and HCatStorer is implemented on top of HCatOutputFormat. +(See Load and Store Interfaces.)

-

The HCatalog interface for MapReduce – HCatInputFormat and HCatOutputFormat – is an implementation of Hadoop InputFormat and OutputFormat. HCatInputFormat accepts a table to read data from and optionally a selection predicate to indicate which partitions to scan. HCatOutputFormat accepts a table to write to and optionally a specification of partition keys to create a new partition. You can write to a single partition by specifying the partition key(s) and value(s) in the setOutput method; and you can write to multiple partitions if the partition key(s) are columns in the data being stored. (See HCatalog Input and Output.)

+

The HCatalog interface for MapReduce — HCatInputFormat and HCatOutputFormat — is an implementation of Hadoop InputFormat and OutputFormat. HCatInputFormat accepts a table to read data from and optionally a selection predicate to indicate which partitions to scan. HCatOutputFormat accepts a table to write to and optionally a specification of partition keys to create a new partition. You can write to a single partition by specifying the partition key(s) and value(s) in the setOutput method; and you can write to multiple partitions if the partition key(s) are columns in the data being stored. +(See Input and Output Interfaces.)

-

Note: There is no Hive-specific interface. Since HCatalog uses Hive's metastore, Hive can read data in HCatalog directly.

+

Note: There is no Hive-specific interface. Since HCatalog uses Hive's metastore, Hive can read data in HCatalog directly.

-

Data is defined using HCatalog's command line interface (CLI). The HCatalog CLI supports all Hive DDL that does not require MapReduce to execute, allowing users to create, alter, drop tables, etc. (Unsupported Hive DDL includes import/export, CREATE TABLE AS SELECT, ALTER TABLE options REBUILD and CONCATENATE, and ANALYZE TABLE ... COMPUTE STATISTICS.) The CLI also supports the data exploration part of the Hive command line, such as SHOW TABLES, DESCRIBE TABLE, etc. (see the HCatalog Command Line Interface).

+

Data is defined using HCatalog's command line interface (CLI). The HCatalog CLI supports all Hive DDL that does not require MapReduce to execute, allowing users to create, alter, drop tables, etc. The CLI also supports the data exploration part of the Hive command line, such as SHOW TABLES, DESCRIBE TABLE, and so on. +Unsupported Hive DDL includes import/export, the REBUILD and CONCATENATE options of ALTER TABLE, CREATE TABLE AS SELECT, and ANALYZE TABLE ... COMPUTE STATISTICS. +(See Command Line Interface.)

Data Model

HCatalog presents a relational view of data. Data is stored in tables and these tables can be placed in databases. Tables can also be hash partitioned on one or more keys; that is, for a given value of a key (or set of keys) there will be one partition that contains all rows with that value (or set of values). For example, if a table is partitioned on date and there are three days of data in the table, there will be three partitions in the table. New partitions can be added to a table, and partitions can be dropped from a table. Partitioned tables have no partitions at create time. Unpartitioned tables effectively have one default partition that must be created at table creation time. There is no guaranteed read consistency when a partition is dropped.

-

Partitions contain records. Once a partition is created records cannot be added to it, removed from it, or updated in it. Partitions are multi-dimensional and not hierarchical. Records are divided into columns. Columns have a name and a datatype. HCatalog supports the same datatypes as Hive (see HCatalog Load and Store).

+

Partitions contain records. Once a partition is created records cannot be added to it, removed from it, or updated in it. Partitions are multi-dimensional and not hierarchical. Records are divided into columns. Columns have a name and a datatype. HCatalog supports the same datatypes as Hive. +See Load and Store Interfaces for more information about datatypes.

Index: src/docs/src/documentation/content/xdocs/notification.xml =================================================================== --- src/docs/src/documentation/content/xdocs/notification.xml (revision 1377542) +++ src/docs/src/documentation/content/xdocs/notification.xml (working copy) @@ -23,7 +23,7 @@ -

Since HCatalog 0.2 provides notifications for certain events happening in the system. This way applications such as Oozie can wait for those events and schedule the work that depends on them. The current version of HCatalog supports two kinds of events:

+

Since version 0.2, HCatalog provides notifications for certain events happening in the system. This way applications such as Oozie can wait for those events and schedule the work that depends on them. The current version of HCatalog supports two kinds of events: