[PIG-1782] Add ability to load data by column family in HBaseStorage - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.9.0
Component/s: None
Labels:
None
Environment:

Java 6, Mac OS X 10.6

Release Note:

Hide
Enhanced HBaseStorage functionality to support loading dynamically named columns by column family or by column name prefixes.

Javadoc:

/**
* A HBase implementation of LoadFunc and StoreFunc.
* 
* Below is an example showing how to load data from HBase:
* <pre>{@code
* raw = LOAD 'hbase://SampleTable'
* USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
* 'info:first_name info:last_name friends:* info:*', '-loadKey true -limit 5')
* AS (id:bytearray, first_name:chararray, last_name:chararray, friends_map:map[], info_map:map[]);
* }</pre>
* This example loads data redundantly from the info column family just to
* illustrate usage. Note that the row key is inserted first in the result schema.
* To load only column names that start with a given prefix, specify the column
* name with a trailing '*'. For example passing <code>friends:bob_*</code> to
* the constructor in the above example would cause only columns that start with
* bob_ to be loaded.
* 
* Below is an example showing how to store data into HBase:
* <pre>{@code
* copy = STORE raw INTO 'hbase://SampleTableCopy'
* USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
* 'info:first_name info:last_name friends:* info:*')
* AS (info:first_name info:last_name buddies:* info:*);
* }</pre>
* Note that STORE will expect the first value in the tuple to be the row key.
* Scalars values need to map to an explicit column descriptor and maps need to
* map to a column family name. In the above examples, the <code>friends</code>
* column family data from <code>SampleTable</code> will be written to a
* <code>buddies</code> column family in the <code>SampleTableCopy</code> table.
*
*/

Show
Enhanced HBaseStorage functionality to support loading dynamically named columns by column family or by column name prefixes. Javadoc: /** * A HBase implementation of LoadFunc and StoreFunc. * * Below is an example showing how to load data from HBase: * <pre>{@code * raw = LOAD ' hbase://SampleTable' * USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( * 'info:first_name info:last_name friends:* info:*', '-loadKey true -limit 5') * AS (id:bytearray, first_name:chararray, last_name:chararray, friends_map:map[], info_map:map[]); * }</pre> * This example loads data redundantly from the info column family just to * illustrate usage. Note that the row key is inserted first in the result schema. * To load only column names that start with a given prefix, specify the column * name with a trailing '*'. For example passing <code>friends:bob_*</code> to * the constructor in the above example would cause only columns that start with * bob_ to be loaded. * * Below is an example showing how to store data into HBase: * <pre>{@code * copy = STORE raw INTO ' hbase://SampleTableCopy' * USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( * 'info:first_name info:last_name friends:* info:*') * AS (info:first_name info:last_name buddies:* info:*); * }</pre> * Note that STORE will expect the first value in the tuple to be the row key. * Scalars values need to map to an explicit column descriptor and maps need to * map to a column family name. In the above examples, the <code>friends</code> * column family data from <code>SampleTable</code> will be written to a * <code>buddies</code> column family in the <code>SampleTableCopy</code> table. * */

Description

It would be nice to load all columns in the column family by using short hand syntax like:

CpuMetrics = load 'hbase://SystemMetrics' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cpu:','-loadKey');

Assuming there are columns cpu: sys.0, cpu:sys.1, cpu:user.0, cpu:user.1, in cpu column family.

CpuMetrics would contain something like:

(rowKey, cpu:sys.0, cpu:sys.1, cpu:user.0, cpu:user.1)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

apply-PIG-1782-patch.sh
19/Feb/11 21:32
2 kB
William W. Graham Jr
PIG_1782_2.patch
24/Feb/11 23:20
35 kB
William W. Graham Jr
PIG_1782_3.patch
25/Feb/11 20:43
30 kB
William W. Graham Jr
PIG-1782_1.patch
19/Feb/11 21:32
23 kB
William W. Graham Jr
PIG-1782_4.patch
17/Apr/11 01:24
33 kB
Dmitriy V. Ryaboy

Issue Links

relates to

PIG-1832 Support timestamp in HBaseStorage when storing

Resolved

Activity

People

Assignee:: William W. Graham Jr

Reporter:: Eric Yang

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 30/Dec/10 04:32

Updated:: 04/Aug/11 00:34

Resolved:: 17/Apr/11 07:01