Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1782

Add ability to load data by column family in HBaseStorage

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.9.0
    • None
    • None
    • Java 6, Mac OS X 10.6

    • Hide
      Enhanced HBaseStorage functionality to support loading dynamically named columns by column family or by column name prefixes.

      Javadoc:


      /**
       * A HBase implementation of LoadFunc and StoreFunc.
       * <P>
       * Below is an example showing how to load data from HBase:
       * <pre>{@code
       * raw = LOAD 'hbase://SampleTable&#39;
       * USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
       * 'info:first_name info:last_name friends:* info:*', '-loadKey true -limit 5')
       * AS (id:bytearray, first_name:chararray, last_name:chararray, friends_map:map[], info_map:map[]);
       * }</pre>
       * This example loads data redundantly from the info column family just to
       * illustrate usage. Note that the row key is inserted first in the result schema.
       * To load only column names that start with a given prefix, specify the column
       * name with a trailing '*'. For example passing <code>friends:bob_*</code> to
       * the constructor in the above example would cause only columns that start with
       * <i>bob_</i> to be loaded.
       * <P>
       * Below is an example showing how to store data into HBase:
       * <pre>{@code
       * copy = STORE raw INTO 'hbase://SampleTableCopy&#39;
       * USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
       * 'info:first_name info:last_name friends:* info:*')
       * AS (info:first_name info:last_name buddies:* info:*);
       * }</pre>
       * Note that STORE will expect the first value in the tuple to be the row key.
       * Scalars values need to map to an explicit column descriptor and maps need to
       * map to a column family name. In the above examples, the <code>friends</code>
       * column family data from <code>SampleTable</code> will be written to a
       * <code>buddies</code> column family in the <code>SampleTableCopy</code> table.
       *
       */
      Show
      Enhanced HBaseStorage functionality to support loading dynamically named columns by column family or by column name prefixes. Javadoc: /**  * A HBase implementation of LoadFunc and StoreFunc.  * <P>  * Below is an example showing how to load data from HBase:  * <pre>{@code  * raw = LOAD ' hbase://SampleTable&#39;  * USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(  * 'info:first_name info:last_name friends:* info:*', '-loadKey true -limit 5')  * AS (id:bytearray, first_name:chararray, last_name:chararray, friends_map:map[], info_map:map[]);  * }</pre>  * This example loads data redundantly from the info column family just to  * illustrate usage. Note that the row key is inserted first in the result schema.  * To load only column names that start with a given prefix, specify the column  * name with a trailing '*'. For example passing <code>friends:bob_*</code> to  * the constructor in the above example would cause only columns that start with  * <i>bob_</i> to be loaded.  * <P>  * Below is an example showing how to store data into HBase:  * <pre>{@code  * copy = STORE raw INTO ' hbase://SampleTableCopy&#39;  * USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(  * 'info:first_name info:last_name friends:* info:*')  * AS (info:first_name info:last_name buddies:* info:*);  * }</pre>  * Note that STORE will expect the first value in the tuple to be the row key.  * Scalars values need to map to an explicit column descriptor and maps need to  * map to a column family name. In the above examples, the <code>friends</code>  * column family data from <code>SampleTable</code> will be written to a  * <code>buddies</code> column family in the <code>SampleTableCopy</code> table.  *  */

    Description

      It would be nice to load all columns in the column family by using short hand syntax like:

      CpuMetrics = load 'hbase://SystemMetrics' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cpu:','-loadKey');
      

      Assuming there are columns cpu: sys.0, cpu:sys.1, cpu:user.0, cpu:user.1, in cpu column family.

      CpuMetrics would contain something like:

      (rowKey, cpu:sys.0, cpu:sys.1, cpu:user.0, cpu:user.1)
      

      Attachments

        1. PIG-1782_1.patch
          23 kB
          William W. Graham Jr
        2. apply-PIG-1782-patch.sh
          2 kB
          William W. Graham Jr
        3. PIG_1782_2.patch
          35 kB
          William W. Graham Jr
        4. PIG_1782_3.patch
          30 kB
          William W. Graham Jr
        5. PIG-1782_4.patch
          33 kB
          Dmitriy V. Ryaboy

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            billgraham William W. Graham Jr
            eyang Eric Yang
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment