Index: src/docs/src/documentation/content/xdocs/inputoutput.xml =================================================================== --- src/docs/src/documentation/content/xdocs/inputoutput.xml (revision 1311550) +++ src/docs/src/documentation/content/xdocs/inputoutput.xml (working copy) @@ -162,7 +162,7 @@ -

Examples

+

Read Example

The following very simple MapReduce program reads data from one table which it assumes to have an integer in the @@ -266,9 +266,16 @@

To scan just selected partitions of a table, a filter describing the desired partitions can be passed to -InputJobInfo.create. This filter can contain the operators '=', '<', '>', '<=', -'>=', '<>', 'and', 'or', and 'like'. Assume for example you have a web_logs -table that is partitioned by the column datestamp. You could select one partition of the table by changing

+InputJobInfo.create. To scan a single filter, the filter string should look like: "ds=20120401" where ds is the partition column name and 20120401 is the value you want to read.

+ +

Filter Operators

+ +

A filter can contain the operators 'and', 'or', 'like', '()', '=', '<>', '<', '>', '<=' +and '>='.

+ +

Scan Filter

+ +

Assume for example you have a web_logs table that is partitioned by the column datestamp. You could select one partition of the table by changing

HCatInputFormat.setInput(job, InputJobInfo.create(dbName, inputTableName, null)); @@ -281,6 +288,8 @@

This filter must reference only partition columns. Values from other columns will cause the job to fail.

+ +

Write Filter

To write to a single partition you can change the above example to have a Map of key value pairs that describe all of the partition keys and values for that partition. In our example web_logs table, there is only one partition Index: src/docs/src/documentation/content/xdocs/loadstore.xml =================================================================== --- src/docs/src/documentation/content/xdocs/loadstore.xml (revision 1311550) +++ src/docs/src/documentation/content/xdocs/loadstore.xml (working copy) @@ -28,7 +28,7 @@ Set Up

The HCatLoader and HCatStorer interfaces are used with Pig scripts to read and write data in HCatalog managed tables.

-

Authentication

+ @@ -115,12 +115,22 @@ export HADOOP_HOME=<path_to_hadoop_install> export HCAT_HOME=<path_to_hcat_install> -PIG_CLASSPATH=$HCAT_HOME/share/hcatalog/hcatalog-0.4.0.jar:$HCAT_HOME/share/hcatalog/lib/hive-metastore-0.8.1.jar:$HCAT_HOME/share/hcatalog/lib/libthrift-0.7.0.jar:$HCAT_HOME/share/hcatalog/lib/hive-exec-0.8.1.jar:$HCAT_HOME/share/hcatalog/lib/libfb303-0.7.0.jar:$HCAT_HOME/share/hcatalog/lib/jdo2-api-2.3-ec.jar:$HCAT_HOME/etc/hcatalog:$HADOOP_HOME/conf:$HCAT_HOME/share/hcatalog/lib/slf4j-api-1.6.1.jar +PIG_CLASSPATH=$HCAT_HOME/share/hcatalog/hcatalog-0.4.0.jar:$HCAT_HOME/share/hcatalog/lib/ +hive-metastore-0.8.1.jar:$HCAT_HOME/share/hcatalog/lib/libthrift-0.7.0.jar:$HCAT_HOME/ +share/hcatalog/lib/hive-exec-0.8.1.jar:$HCAT_HOME/share/hcatalog/lib/libfb303-0.7.0.jar: +$HCAT_HOME/share/hcatalog/lib/jdo2-api-2.3-ec.jar:$HCAT_HOME/etc/hcatalog:$HADOOP_HOME/ +conf:$HCAT_HOME/share/hcatalog/lib/slf4j-api-1.6.1.jar export PIG_OPTS=-Dhive.metastore.uris=thrift://<hostname>:<port> -<path_to_pig_install>/bin/pig -Dpig.additional.jars=$HCAT_HOME/share/hcatalog/hcatalog-0.4.0.jar:$HCAT_HOME/share/hcatalog/lib/hive-metastore-0.8.1.jar:$HCAT_HOME/share/hcatalog/lib/libthrift-0.7.0.jar:$HCAT_HOME/share/hcatalog/lib/hive-exec-0.8.1.jar:$HCAT_HOME/share/hcatalog/lib/libfb303-0.7.0.jar:$HCAT_HOME/share/hcatalog/lib/jdo2-api-2.3-ec.jar:$HCAT_HOME/etc/hcatalog:$HCAT_HOME/share/hcatalog/lib/slf4j-api-1.6.1.jar <script.pig> +<path_to_pig_install>/bin/pig -Dpig.additional.jars=$HCAT_HOME/share/hcatalog/ +hcatalog-0.4.0.jar:$HCAT_HOME/share/hcatalog/lib/hive-metastore-0.8.1.jar:$HCAT_HOME/ +share/hcatalog/lib/libthrift-0.7.0.jar:$HCAT_HOME/share/hcatalog/lib/hive-exec-0.8.1.jar: +$HCAT_HOME/share/hcatalog/lib/libfb303-0.7.0.jar:$HCAT_HOME/share/hcatalog/lib/jdo2- +api-2.3-ec.jar:$HCAT_HOME/etc/hcatalog:$HCAT_HOME/share/hcatalog/lib/slf4j-api-1.6.1.jar + <script.pig> +

Authentication

@@ -192,6 +202,11 @@ b = filter a by datestamp >= '20110924' and datestamp <= '20110925'; +

Filter Operators

+ +

A filter can contain the operators 'and', 'or', '()', '==', '!=', '<', '>', '<=' +and '>='.

+ @@ -247,8 +262,8 @@

To write into multiple partitions at one, make sure that the partition column is present in your data, then call HCatStorer with no argument:

-store z into 'web_data' using org.apache.hcatalog.pig.HCatStorer(); -- datestamp -must be a field in the relation z +store z into 'web_data' using org.apache.hcatalog.pig.HCatStorer(); + -- datestamp must be a field in the relation z

If you are using a secure cluster and a failure results in a message like "2010-11-03 16:17:28,225 WARN hive.metastore ... - Unable to connect metastore with URI thrift://..." in /tmp/<username>/hive.log, then make sure you have run "kinit <username>@FOO.COM" to get a Kerberos ticket and to be able to authenticate to the HCatalog server.