Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1782

Add ability to load data by column family in HBaseStorage

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.9.0
    • None
    • None
    • Java 6, Mac OS X 10.6

    • Hide
      Enhanced HBaseStorage functionality to support loading dynamically named columns by column family or by column name prefixes.

      Javadoc:


      /**
       * A HBase implementation of LoadFunc and StoreFunc.
       * <P>
       * Below is an example showing how to load data from HBase:
       * <pre>{@code
       * raw = LOAD 'hbase://SampleTable&#39;
       * USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
       * 'info:first_name info:last_name friends:* info:*', '-loadKey true -limit 5')
       * AS (id:bytearray, first_name:chararray, last_name:chararray, friends_map:map[], info_map:map[]);
       * }</pre>
       * This example loads data redundantly from the info column family just to
       * illustrate usage. Note that the row key is inserted first in the result schema.
       * To load only column names that start with a given prefix, specify the column
       * name with a trailing '*'. For example passing <code>friends:bob_*</code> to
       * the constructor in the above example would cause only columns that start with
       * <i>bob_</i> to be loaded.
       * <P>
       * Below is an example showing how to store data into HBase:
       * <pre>{@code
       * copy = STORE raw INTO 'hbase://SampleTableCopy&#39;
       * USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
       * 'info:first_name info:last_name friends:* info:*')
       * AS (info:first_name info:last_name buddies:* info:*);
       * }</pre>
       * Note that STORE will expect the first value in the tuple to be the row key.
       * Scalars values need to map to an explicit column descriptor and maps need to
       * map to a column family name. In the above examples, the <code>friends</code>
       * column family data from <code>SampleTable</code> will be written to a
       * <code>buddies</code> column family in the <code>SampleTableCopy</code> table.
       *
       */
      Show
      Enhanced HBaseStorage functionality to support loading dynamically named columns by column family or by column name prefixes. Javadoc: /**  * A HBase implementation of LoadFunc and StoreFunc.  * <P>  * Below is an example showing how to load data from HBase:  * <pre>{@code  * raw = LOAD ' hbase://SampleTable&#39;  * USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(  * 'info:first_name info:last_name friends:* info:*', '-loadKey true -limit 5')  * AS (id:bytearray, first_name:chararray, last_name:chararray, friends_map:map[], info_map:map[]);  * }</pre>  * This example loads data redundantly from the info column family just to  * illustrate usage. Note that the row key is inserted first in the result schema.  * To load only column names that start with a given prefix, specify the column  * name with a trailing '*'. For example passing <code>friends:bob_*</code> to  * the constructor in the above example would cause only columns that start with  * <i>bob_</i> to be loaded.  * <P>  * Below is an example showing how to store data into HBase:  * <pre>{@code  * copy = STORE raw INTO ' hbase://SampleTableCopy&#39;  * USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(  * 'info:first_name info:last_name friends:* info:*')  * AS (info:first_name info:last_name buddies:* info:*);  * }</pre>  * Note that STORE will expect the first value in the tuple to be the row key.  * Scalars values need to map to an explicit column descriptor and maps need to  * map to a column family name. In the above examples, the <code>friends</code>  * column family data from <code>SampleTable</code> will be written to a  * <code>buddies</code> column family in the <code>SampleTableCopy</code> table.  *  */

    Description

      It would be nice to load all columns in the column family by using short hand syntax like:

      CpuMetrics = load 'hbase://SystemMetrics' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cpu:','-loadKey');
      

      Assuming there are columns cpu: sys.0, cpu:sys.1, cpu:user.0, cpu:user.1, in cpu column family.

      CpuMetrics would contain something like:

      (rowKey, cpu:sys.0, cpu:sys.1, cpu:user.0, cpu:user.1)
      

      Attachments

        1. PIG-1782_4.patch
          33 kB
          Dmitriy V. Ryaboy
        2. PIG_1782_3.patch
          30 kB
          William W. Graham Jr
        3. PIG_1782_2.patch
          35 kB
          William W. Graham Jr
        4. apply-PIG-1782-patch.sh
          2 kB
          William W. Graham Jr
        5. PIG-1782_1.patch
          23 kB
          William W. Graham Jr

        Issue Links

          Activity

            People

              billgraham William W. Graham Jr
              eyang Eric Yang
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: