Pig
  1. Pig
  2. PIG-1782

Add ability to load data by column family in HBaseStorage

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      Java 6, Mac OS X 10.6

    • Release Note:
      Hide
      Enhanced HBaseStorage functionality to support loading dynamically named columns by column family or by column name prefixes.

      Javadoc:


      /**
       * A HBase implementation of LoadFunc and StoreFunc.
       * <P>
       * Below is an example showing how to load data from HBase:
       * <pre>{@code
       * raw = LOAD 'hbase://SampleTable&#39;
       * USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
       * 'info:first_name info:last_name friends:* info:*', '-loadKey true -limit 5')
       * AS (id:bytearray, first_name:chararray, last_name:chararray, friends_map:map[], info_map:map[]);
       * }</pre>
       * This example loads data redundantly from the info column family just to
       * illustrate usage. Note that the row key is inserted first in the result schema.
       * To load only column names that start with a given prefix, specify the column
       * name with a trailing '*'. For example passing <code>friends:bob_*</code> to
       * the constructor in the above example would cause only columns that start with
       * <i>bob_</i> to be loaded.
       * <P>
       * Below is an example showing how to store data into HBase:
       * <pre>{@code
       * copy = STORE raw INTO 'hbase://SampleTableCopy&#39;
       * USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
       * 'info:first_name info:last_name friends:* info:*')
       * AS (info:first_name info:last_name buddies:* info:*);
       * }</pre>
       * Note that STORE will expect the first value in the tuple to be the row key.
       * Scalars values need to map to an explicit column descriptor and maps need to
       * map to a column family name. In the above examples, the <code>friends</code>
       * column family data from <code>SampleTable</code> will be written to a
       * <code>buddies</code> column family in the <code>SampleTableCopy</code> table.
       *
       */
      Show
      Enhanced HBaseStorage functionality to support loading dynamically named columns by column family or by column name prefixes. Javadoc: /**  * A HBase implementation of LoadFunc and StoreFunc.  * <P>  * Below is an example showing how to load data from HBase:  * <pre>{@code  * raw = LOAD ' hbase://SampleTable&#39;  * USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(  * 'info:first_name info:last_name friends:* info:*', '-loadKey true -limit 5')  * AS (id:bytearray, first_name:chararray, last_name:chararray, friends_map:map[], info_map:map[]);  * }</pre>  * This example loads data redundantly from the info column family just to  * illustrate usage. Note that the row key is inserted first in the result schema.  * To load only column names that start with a given prefix, specify the column  * name with a trailing '*'. For example passing <code>friends:bob_*</code> to  * the constructor in the above example would cause only columns that start with  * <i>bob_</i> to be loaded.  * <P>  * Below is an example showing how to store data into HBase:  * <pre>{@code  * copy = STORE raw INTO ' hbase://SampleTableCopy&#39;  * USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(  * 'info:first_name info:last_name friends:* info:*')  * AS (info:first_name info:last_name buddies:* info:*);  * }</pre>  * Note that STORE will expect the first value in the tuple to be the row key.  * Scalars values need to map to an explicit column descriptor and maps need to  * map to a column family name. In the above examples, the <code>friends</code>  * column family data from <code>SampleTable</code> will be written to a  * <code>buddies</code> column family in the <code>SampleTableCopy</code> table.  *  */

      Description

      It would be nice to load all columns in the column family by using short hand syntax like:

      CpuMetrics = load 'hbase://SystemMetrics' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('cpu:','-loadKey');
      

      Assuming there are columns cpu: sys.0, cpu:sys.1, cpu:user.0, cpu:user.1, in cpu column family.

      CpuMetrics would contain something like:

      (rowKey, cpu:sys.0, cpu:sys.1, cpu:user.0, cpu:user.1)
      
      1. PIG-1782_4.patch
        33 kB
        Dmitriy V. Ryaboy
      2. PIG_1782_3.patch
        30 kB
        Bill Graham
      3. PIG_1782_2.patch
        35 kB
        Bill Graham
      4. apply-PIG-1782-patch.sh
        2 kB
        Bill Graham
      5. PIG-1782_1.patch
        23 kB
        Bill Graham

        Issue Links

          Activity

          Eric Yang created issue -
          Bill Graham made changes -
          Field Original Value New Value
          Assignee Bill Graham [ billgraham ]
          Eric Yang made changes -
          Link This issue relates to PIG-1832 [ PIG-1832 ]
          Bill Graham made changes -
          Attachment PIG-1782_1.patch [ 12471489 ]
          Attachment apply-PIG-1782-patch.sh [ 12471490 ]
          Bill Graham made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Bill Graham made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Bill Graham made changes -
          Attachment PIG_1782_2.patch [ 12471880 ]
          Bill Graham made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Release Note Enhanced HBaseStorage functionality to support loading dynamically named columns by column family or by column name prefixes.
          Bill Graham made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Bill Graham made changes -
          Attachment PIG_1782_3.patch [ 12471968 ]
          Bill Graham made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Dmitriy V. Ryaboy made changes -
          Attachment PIG-1782_4.patch [ 12476543 ]
          Dmitriy V. Ryaboy made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Dmitriy V. Ryaboy made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Dmitriy V. Ryaboy made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Release Note Enhanced HBaseStorage functionality to support loading dynamically named columns by column family or by column name prefixes. Enhanced HBaseStorage functionality to support loading dynamically named columns by column family or by column name prefixes.

          Javadoc:


          /**
           * A HBase implementation of LoadFunc and StoreFunc.
           * <P>
           * Below is an example showing how to load data from HBase:
           * <pre>{@code
           * raw = LOAD 'hbase://SampleTable&#39;
           * USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
           * 'info:first_name info:last_name friends:* info:*', '-loadKey true -limit 5')
           * AS (id:bytearray, first_name:chararray, last_name:chararray, friends_map:map[], info_map:map[]);
           * }</pre>
           * This example loads data redundantly from the info column family just to
           * illustrate usage. Note that the row key is inserted first in the result schema.
           * To load only column names that start with a given prefix, specify the column
           * name with a trailing '*'. For example passing <code>friends:bob_*</code> to
           * the constructor in the above example would cause only columns that start with
           * <i>bob_</i> to be loaded.
           * <P>
           * Below is an example showing how to store data into HBase:
           * <pre>{@code
           * copy = STORE raw INTO 'hbase://SampleTableCopy&#39;
           * USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
           * 'info:first_name info:last_name friends:* info:*')
           * AS (info:first_name info:last_name buddies:* info:*);
           * }</pre>
           * Note that STORE will expect the first value in the tuple to be the row key.
           * Scalars values need to map to an explicit column descriptor and maps need to
           * map to a column family name. In the above examples, the <code>friends</code>
           * column family data from <code>SampleTable</code> will be written to a
           * <code>buddies</code> column family in the <code>SampleTableCopy</code> table.
           *
           */
          Fix Version/s 0.9.0 [ 12315191 ]
          Resolution Fixed [ 1 ]
          Olga Natkovich made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Bill Graham
              Reporter:
              Eric Yang
            • Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development