Pig
  1. Pig
  2. PIG-966 Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces
  3. PIG-1205

Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.7.0
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None
    • Release Note:
      Hide
      HBaseStorage has been significantly reworked with this release.

      Usage:
      {code}
      my_data = LOAD 'hbase://table_name' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfamily:col1 colfamily:col2', '-caching 100') as (col1:int, col2:chararray);

      STORE my_date INTO 'hbaseL//other_table' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfamily:col1 colfamily:col2');
      {code}

      HBaseStorage can now write data into HBase as well as read it. The first argument is a space-delimited list of columns to be loaded (or stored). Columns are specified as columnfamily:column_name. The second argument is an optional set of key-value pairs used to control HBaseStorage behavior. Available arguments are:

      * {{monospaced}}-loadKey{{monospaced}} Used to load the row key; false by default. If true, the first field in the returned tuple will be the value of the row key.
      * {{monospaced}}-gt, -gte, -lt, and -lte{{monospaced}} Used to specify bounds on row keys to be scanned. The keys are specified as binary data, using the hex representation. Any slashes have to be double-escaped (two slashes per single "real" slash) to be parsed correctly.
      * {{monospaced}}-caching{{monospaced}} Used to specify the number of rows to be cached per HBase RPC call. See http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#setScannerCaching%28int%29 for more information about this HBase feature.
      * {{monospaced}}-limit{{monospaced}} Used to control how many rows *per scanned region* will be retrieved. This can of course speed up processing if you just want a few rows. The total number of rows returned will be up to number of regions * limit. The limit is applied after any -gt, -lt, etc filters. Pig's LIMIT operator can be used in conjunction with this argument.
      * {{monospaced}}-caster{{monospaced}} Used to specify a LoadCaster (or LoadStoreCaster, for storage) used to convert the data stored in HBase into Pig data. By default, the Utf8StorageConverter is used, which stores all data as its string representation. The string "HBaseBinaryConverter" can be used to specify that data is stored in HBase's native binary format. Note that the HBaseBinary converter does not work with complex data types such as maps, tuples, and bags. You can also specify a full class path such as org.apache.pig.backend.hadoop.hbase.HBaseBinaryConverter to use your own Caster. The default caster can be changed by setting the pig.hbase.caster property in pig,properties

      HBaseStorage matches column arguments to tuple fields based on their ordinal position. When storing, the first field is expected to be the key value.
      Show
      HBaseStorage has been significantly reworked with this release. Usage: {code} my_data = LOAD ' hbase://table_name' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfamily:col1 colfamily:col2', '-caching 100') as (col1:int, col2:chararray); STORE my_date INTO 'hbaseL//other_table' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfamily:col1 colfamily:col2'); {code} HBaseStorage can now write data into HBase as well as read it. The first argument is a space-delimited list of columns to be loaded (or stored). Columns are specified as columnfamily:column_name. The second argument is an optional set of key-value pairs used to control HBaseStorage behavior. Available arguments are: * {{monospaced}}-loadKey{{monospaced}} Used to load the row key; false by default. If true, the first field in the returned tuple will be the value of the row key. * {{monospaced}}-gt, -gte, -lt, and -lte{{monospaced}} Used to specify bounds on row keys to be scanned. The keys are specified as binary data, using the hex representation. Any slashes have to be double-escaped (two slashes per single "real" slash) to be parsed correctly. * {{monospaced}}-caching{{monospaced}} Used to specify the number of rows to be cached per HBase RPC call. See http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#setScannerCaching%28int%29 for more information about this HBase feature. * {{monospaced}}-limit{{monospaced}} Used to control how many rows *per scanned region* will be retrieved. This can of course speed up processing if you just want a few rows. The total number of rows returned will be up to number of regions * limit. The limit is applied after any -gt, -lt, etc filters. Pig's LIMIT operator can be used in conjunction with this argument. * {{monospaced}}-caster{{monospaced}} Used to specify a LoadCaster (or LoadStoreCaster, for storage) used to convert the data stored in HBase into Pig data. By default, the Utf8StorageConverter is used, which stores all data as its string representation. The string "HBaseBinaryConverter" can be used to specify that data is stored in HBase's native binary format. Note that the HBaseBinary converter does not work with complex data types such as maps, tuples, and bags. You can also specify a full class path such as org.apache.pig.backend.hadoop.hbase.HBaseBinaryConverter to use your own Caster. The default caster can be changed by setting the pig.hbase.caster property in pig,properties HBaseStorage matches column arguments to tuple fields based on their ordinal position. When storing, the first field is expected to be the key value.
    • Tags:
      hbase
    1. PIG_1205.patch
      13 kB
      Jeff Zhang
    2. PIG_1205_9.patch
      75 kB
      Dmitriy V. Ryaboy
    3. PIG_1205_8.patch
      66 kB
      Jeff Zhang
    4. PIG_1205_7.patch
      52 kB
      Dmitriy V. Ryaboy
    5. PIG_1205_6.patch
      35 kB
      Dmitriy V. Ryaboy
    6. PIG_1205_5.path
      32 kB
      Dmitriy V. Ryaboy
    7. PIG_1205_4.patch
      15 kB
      Jeff Zhang
    8. PIG_1205_3.patch
      14 kB
      Jeff Zhang
    9. PIG_1205_2.patch
      13 kB
      Jeff Zhang
    10. hbase-0.20.6-test.jar
      1.94 MB
      Dmitriy V. Ryaboy
    11. hbase-0.20.6.jar
      1.50 MB
      Dmitriy V. Ryaboy

      Activity

      Hide
      Jeff Zhang added a comment -

      This issue is about enhance the features of HBaseStorage.
      1. make it support loading key. Add a constructor for HBaseStorage , the second argument can have value "true" or "false" which indicate whether you want to load the row key as the first field of Tuple

      public HBaseStorage(String columnList, String loadRowKey){
      ...
      }
      

      2. Make the HBaseStorage implements interface StoreFunc, you can store the Tuple to HBase. One thing to be noticed is that if you use HBaseStorage, the type of of each field stored in HBase is String no matter what its type in Tuple. That means if you use HBaseStorage to load it again, you have to convert the type by yourself. The HBaseStorage do not do the type conversion for you.

      Show
      Jeff Zhang added a comment - This issue is about enhance the features of HBaseStorage. 1. make it support loading key. Add a constructor for HBaseStorage , the second argument can have value "true" or "false" which indicate whether you want to load the row key as the first field of Tuple public HBaseStorage( String columnList, String loadRowKey){ ... } 2. Make the HBaseStorage implements interface StoreFunc, you can store the Tuple to HBase. One thing to be noticed is that if you use HBaseStorage, the type of of each field stored in HBase is String no matter what its type in Tuple. That means if you use HBaseStorage to load it again, you have to convert the type by yourself. The HBaseStorage do not do the type conversion for you.
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12431541/PIG_1205.patch
      against trunk revision 903030.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 3 new or modified tests.

      -1 patch. The patch command could not apply the patch.

      Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/179/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12431541/PIG_1205.patch against trunk revision 903030. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/179/console This message is automatically generated.
      Hide
      Pradeep Kamath added a comment -

      Jeff, the patch no longer applies cleanly on trunk - looks like we missed reviewing this earlier - sorry about that - can you regenerate this patch against trunk?

      Show
      Pradeep Kamath added a comment - Jeff, the patch no longer applies cleanly on trunk - looks like we missed reviewing this earlier - sorry about that - can you regenerate this patch against trunk?
      Hide
      Jeff Zhang added a comment -

      Submit the patch for trunk

      Show
      Jeff Zhang added a comment - Submit the patch for trunk
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12437440/PIG_1205_2.patch
      against trunk revision 916793.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 3 new or modified tests.

      -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

      -1 javac. The applied patch generated 99 javac compiler warnings (more than the trunk's current 98 warnings).

      +1 findbugs. The patch does not introduce any new Findbugs warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      +1 core tests. The patch passed core unit tests.

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/230/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/230/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/230/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12437440/PIG_1205_2.patch against trunk revision 916793. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. -1 javac. The applied patch generated 99 javac compiler warnings (more than the trunk's current 98 warnings). +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/230/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/230/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/230/console This message is automatically generated.
      Hide
      Pradeep Kamath added a comment -

      Review comments:
      1) The top level comment in HBaseStorage reads - "A Hbase loader" - am wondering if it is worth keeping it a loader (maybe change the name to HBaseLoader) and create a separate Storer which extends StoreFunc rather than have HBaseStorage implement StoreFuncInterface - by extending the StoreFunc, if new functions with default implementations are added then the Storer will not need to change. The disadvantage is if we call the loader HBaseLoader, existing users of HBaseStorage would have to change their scripts to use HBaseLoader instead. This is just a suggestion - I am fine if HBaseStorage does both load and store and implements StoreFuncInterface - Jeff I will let you decide which is better. If you choose to do both load and store in HBaseStorage change the top level comment accordingly.
      2) The following method implementation should change from:

            @Override                                                                                                                                                                                                                            
            public String relToAbsPathForStoreLocation(String location, Path curDir)                                                                                                                                                             
                    throws IOException {                                                                                                                                                                                                         
                // TODO Auto-generated method stub                                                                                                                                                                                               
                return null;                                                                                                                                                                                                                     
            }               
      

      to

            @Override                                                                                                                                                                                                                            
            public String relToAbsPathForStoreLocation(String location, Path curDir)                                                                                                                                                             
                    throws IOException {                                                                                                                                                                                                         
                return location;                                                                                                                                                                                                                     
            }               
      

      Also, do address the javadoc/javac issues reported above.

      If the above are addressed, +1 for the patch (I don't have enough HBase knowledge to review the HBase specific code - I have only reviewed the use of load/store API).

      Show
      Pradeep Kamath added a comment - Review comments: 1) The top level comment in HBaseStorage reads - "A Hbase loader" - am wondering if it is worth keeping it a loader (maybe change the name to HBaseLoader) and create a separate Storer which extends StoreFunc rather than have HBaseStorage implement StoreFuncInterface - by extending the StoreFunc, if new functions with default implementations are added then the Storer will not need to change. The disadvantage is if we call the loader HBaseLoader, existing users of HBaseStorage would have to change their scripts to use HBaseLoader instead. This is just a suggestion - I am fine if HBaseStorage does both load and store and implements StoreFuncInterface - Jeff I will let you decide which is better. If you choose to do both load and store in HBaseStorage change the top level comment accordingly. 2) The following method implementation should change from: @Override public String relToAbsPathForStoreLocation( String location, Path curDir) throws IOException { // TODO Auto-generated method stub return null ; } to @Override public String relToAbsPathForStoreLocation( String location, Path curDir) throws IOException { return location; } Also, do address the javadoc/javac issues reported above. If the above are addressed, +1 for the patch (I don't have enough HBase knowledge to review the HBase specific code - I have only reviewed the use of load/store API).
      Hide
      Jeff Zhang added a comment -

      Pradeep, thanks for your review. I've updated patch. I choose to put the loadfunc and storefunc together into HBaseStorage, make it like PigStorage

      Show
      Jeff Zhang added a comment - Pradeep, thanks for your review. I've updated patch. I choose to put the loadfunc and storefunc together into HBaseStorage, make it like PigStorage
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12438187/PIG_1205_3.patch
      against trunk revision 919634.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 3 new or modified tests.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      -1 javac. The applied patch generated 99 javac compiler warnings (more than the trunk's current 98 warnings).

      +1 findbugs. The patch does not introduce any new Findbugs warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      +1 core tests. The patch passed core unit tests.

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/227/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/227/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/227/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12438187/PIG_1205_3.patch against trunk revision 919634. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 99 javac compiler warnings (more than the trunk's current 98 warnings). +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/227/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/227/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/227/console This message is automatically generated.
      Hide
      Pradeep Kamath added a comment -

      New patch looks good - i had one question - cleanupOnFailure() is implemented as an empty body method - wondering if there should be some cleanup of the hbase data which was written out since this method is called when the job fails? If so, please add the the clean up code in that method.

      If the above comment and the extra javac warning reported by Hadoop QA in the previous comment are address, the patch is good to be committed (again, if someone else who has hbase knowledge can review the hbase portion that would be good, since I have not reviewed that part).

      Show
      Pradeep Kamath added a comment - New patch looks good - i had one question - cleanupOnFailure() is implemented as an empty body method - wondering if there should be some cleanup of the hbase data which was written out since this method is called when the job fails? If so, please add the the clean up code in that method. If the above comment and the extra javac warning reported by Hadoop QA in the previous comment are address, the patch is good to be committed (again, if someone else who has hbase knowledge can review the hbase portion that would be good, since I have not reviewed that part).
      Hide
      Dmitriy V. Ryaboy added a comment -

      Jeff, thanks for all your work on this!
      It occurs to me that the constructor with position-dependent arguments is not a scalable pattern. We may have many options we might want to set for Hbase in the future (only load certain column-families, minimum record date, interpret stored data as strings or bytes, etc), and having to set them by adding on a third, fourth, fifth argument to the constructor is not really great.

      We could instead use commons-cli (already used by Hadoop, so no new dependencies) or args4j to parse a command-line style argument. So invocation would look like this instead:

      load 'hdfs://blah" using HBaseStorage("-columns=col1,col2 -loadRowKey=true -castUsing=Utf8StorageConverter")

      What do you think?

      Show
      Dmitriy V. Ryaboy added a comment - Jeff, thanks for all your work on this! It occurs to me that the constructor with position-dependent arguments is not a scalable pattern. We may have many options we might want to set for Hbase in the future (only load certain column-families, minimum record date, interpret stored data as strings or bytes, etc), and having to set them by adding on a third, fourth, fifth argument to the constructor is not really great. We could instead use commons-cli (already used by Hadoop, so no new dependencies) or args4j to parse a command-line style argument. So invocation would look like this instead: load 'hdfs://blah" using HBaseStorage("-columns=col1,col2 -loadRowKey=true -castUsing=Utf8StorageConverter") What do you think?
      Hide
      Jeff Zhang added a comment -

      Response to Pradeep,
      1. HBase support multi-version values, so the failure won't affect the next rerun. another concern is that cleanOnFailure is called on client side, so if the number of result rows is very large, the cost of deletion will be high.
      2. I check the test report here http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/227/artifact/trunk/patchprocess/trunkJavacWarnings.txt and it weird that is shows there are only 98 warnings which is not the 99 warnings. Do you know the reason ? Thanks.

      Show
      Jeff Zhang added a comment - Response to Pradeep, 1. HBase support multi-version values, so the failure won't affect the next rerun. another concern is that cleanOnFailure is called on client side, so if the number of result rows is very large, the cost of deletion will be high. 2. I check the test report here http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/227/artifact/trunk/patchprocess/trunkJavacWarnings.txt and it weird that is shows there are only 98 warnings which is not the 99 warnings. Do you know the reason ? Thanks.
      Hide
      Jeff Zhang added a comment -

      Response to Dmitriy,

      Thanks for your great idea, it really has better extensibility. But one problem is that this will sacrifice the incompatibility with previous version of HBaseStorage. I have not rich experience on hbase, do you think it is often required by users to specify other arguments ?

      Show
      Jeff Zhang added a comment - Response to Dmitriy, Thanks for your great idea, it really has better extensibility. But one problem is that this will sacrifice the incompatibility with previous version of HBaseStorage. I have not rich experience on hbase, do you think it is often required by users to specify other arguments ?
      Hide
      Dmitriy V. Ryaboy added a comment -

      First, I have to give credit to Ciemo for the idea, he mentioned it a while back on the SchemaStorage ticket.

      You raise a good point, although I doubt there are many users of the old LoadFunc.

      How about making the LoadFunc take an optional second parameter, which has all the options, and maintain the first one as the list of columns to pull?

      Show
      Dmitriy V. Ryaboy added a comment - First, I have to give credit to Ciemo for the idea, he mentioned it a while back on the SchemaStorage ticket. You raise a good point, although I doubt there are many users of the old LoadFunc. How about making the LoadFunc take an optional second parameter, which has all the options, and maintain the first one as the list of columns to pull?
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12438187/PIG_1205_3.patch
      against trunk revision 921185.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 3 new or modified tests.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      -1 javac. The applied patch generated 99 javac compiler warnings (more than the trunk's current 98 warnings).

      +1 findbugs. The patch does not introduce any new Findbugs warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      +1 core tests. The patch passed core unit tests.

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/237/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/237/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/237/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12438187/PIG_1205_3.patch against trunk revision 921185. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 99 javac compiler warnings (more than the trunk's current 98 warnings). +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/237/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/237/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/237/console This message is automatically generated.
      Hide
      Jeff Zhang added a comment -

      Dmitriy,
      Thanks for your suggestion, I adopt your idea to let the second argument have all the options and let the first one as the columns argument.

      Pradeep,
      I check the test report, It seems that the new generated javac warning is acceptable, this warning is the same as the another one warning in PigStorage. And does this need the build engineer to do some extra configuration ?

      Show
      Jeff Zhang added a comment - Dmitriy, Thanks for your suggestion, I adopt your idea to let the second argument have all the options and let the first one as the columns argument. Pradeep, I check the test report, It seems that the new generated javac warning is acceptable, this warning is the same as the another one warning in PigStorage. And does this need the build engineer to do some extra configuration ?
      Hide
      Pradeep Kamath added a comment -

      Jeff, unless the warning is due to use of deprecated hadoop API, we should fix it - if it is due to deprecated hadoop API then its ok to ignore.

      Show
      Pradeep Kamath added a comment - Jeff, unless the warning is due to use of deprecated hadoop API, we should fix it - if it is due to deprecated hadoop API then its ok to ignore.
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12438486/PIG_1205_4.patch
      against trunk revision 921585.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 3 new or modified tests.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      -1 javac. The applied patch generated 99 javac compiler warnings (more than the trunk's current 98 warnings).

      +1 findbugs. The patch does not introduce any new Findbugs warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      +1 core tests. The patch passed core unit tests.

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/234/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/234/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/234/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12438486/PIG_1205_4.patch against trunk revision 921585. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 99 javac compiler warnings (more than the trunk's current 98 warnings). +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/234/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/234/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/234/console This message is automatically generated.
      Hide
      Pradeep Kamath added a comment -

      Jeff, if the only issue blocking the commit is javac warning - unless the warning is due to use of deprecated hadoop API, we should fix it - if it is due to deprecated hadoop API then its ok to ignore. Very soon trunk will be branched for Pig 0.7.0 - so if this feature is useful to feature in Pig 0.7.0, we should get this committed soon.

      Show
      Pradeep Kamath added a comment - Jeff, if the only issue blocking the commit is javac warning - unless the warning is due to use of deprecated hadoop API, we should fix it - if it is due to deprecated hadoop API then its ok to ignore. Very soon trunk will be branched for Pig 0.7.0 - so if this feature is useful to feature in Pig 0.7.0, we should get this committed soon.
      Hide
      Jeff Zhang added a comment -

      Thanks, Pradeep. I am sorry for no time to fix the javac warning. It seems this is "unchecked" javac warning which is the same as the one in PigStorage.

      Show
      Jeff Zhang added a comment - Thanks, Pradeep. I am sorry for no time to fix the javac warning. It seems this is "unchecked" javac warning which is the same as the one in PigStorage.
      Hide
      Dmitriy V. Ryaboy added a comment -

      You can suppress the unchecked warning with @SuppresWarnings("unchecked"), and comment why it's ok to suppress the warning

      I've been playing with using HBase through pig using the 0.6 loader, and I must say, it's very far from being ready from prime-time. I don't know whether we need to exert too much effort to get this under the wire when it won't really be usable anyway until much further love is applied.

      -D

      Show
      Dmitriy V. Ryaboy added a comment - You can suppress the unchecked warning with @SuppresWarnings("unchecked"), and comment why it's ok to suppress the warning I've been playing with using HBase through pig using the 0.6 loader, and I must say, it's very far from being ready from prime-time. I don't know whether we need to exert too much effort to get this under the wire when it won't really be usable anyway until much further love is applied. -D
      Hide
      Jeff Zhang added a comment -

      Dmitriy, do you mean the HBaseStorage's performance is far from expected ? There's some discussion on PIG-1209, some extra options should been set to improve the performance.
      With this task, the second parameter can been extended to support other parameters.

      Show
      Jeff Zhang added a comment - Dmitriy, do you mean the HBaseStorage's performance is far from expected ? There's some discussion on PIG-1209 , some extra options should been set to improve the performance. With this task, the second parameter can been extended to support other parameters.
      Hide
      Dmitriy V. Ryaboy added a comment -

      Jeff,
      A brief list of issues that come to mind (these apply to the 0.6 version, but I think things are substantially the same in 0.7):

      1) extending Utf8Converter means data in HBase is expected to be stored as strings. Conversion using the Hbase Bytes class should be supported instead (or at least in addition).
      2) No projection push-down. For some reason even though it is clear what columns to pull, this client pulls everything, and filters out the right columns when constructing a tuple. The columns should be pushed into the Scan.
      3) No filter push-down. HBase has a number of efficient filters available, none of which are used. At a minimum, range constraints on the row key should be supported.
      4) No way to pull out the row key (but you are adding that in this ticket, so that's good).
      5) No way to control row version / timestamp

      None of this is rocket science, and in fact I am making good progress on all of them for 0.6, but it's unlikely to get done and ported for 0.7 by Monday.

      -D

      Show
      Dmitriy V. Ryaboy added a comment - Jeff, A brief list of issues that come to mind (these apply to the 0.6 version, but I think things are substantially the same in 0.7): 1) extending Utf8Converter means data in HBase is expected to be stored as strings. Conversion using the Hbase Bytes class should be supported instead (or at least in addition). 2) No projection push-down. For some reason even though it is clear what columns to pull, this client pulls everything, and filters out the right columns when constructing a tuple. The columns should be pushed into the Scan. 3) No filter push-down. HBase has a number of efficient filters available, none of which are used. At a minimum, range constraints on the row key should be supported. 4) No way to pull out the row key (but you are adding that in this ticket, so that's good). 5) No way to control row version / timestamp None of this is rocket science, and in fact I am making good progress on all of them for 0.6, but it's unlikely to get done and ported for 0.7 by Monday. -D
      Hide
      Olga Natkovich added a comment -

      Jeff, are you still planning to get this into Pig 0.7.0 by Monday or should we move this to Pig 0.8.0?

      Show
      Olga Natkovich added a comment - Jeff, are you still planning to get this into Pig 0.7.0 by Monday or should we move this to Pig 0.8.0?
      Hide
      Dmitriy V. Ryaboy added a comment -

      fwiw – I have an implementation for 0.6 that does most of what I outlined above; could probably port to 0.7 and make apache-friendly within the next couple of weeks.

      Show
      Dmitriy V. Ryaboy added a comment - fwiw – I have an implementation for 0.6 that does most of what I outlined above; could probably port to 0.7 and make apache-friendly within the next couple of weeks.
      Hide
      Olga Natkovich added a comment -

      Sounds good. Then I will mark it for inclusion in 0.8.0.

      Show
      Olga Natkovich added a comment - Sounds good. Then I will mark it for inclusion in 0.8.0.
      Hide
      Olga Natkovich added a comment -

      Jeff and Dmitry - are you still planning to finish this for Pig 0.8.0 release

      Show
      Olga Natkovich added a comment - Jeff and Dmitry - are you still planning to finish this for Pig 0.8.0 release
      Hide
      Dmitriy V. Ryaboy added a comment -

      When is the cut-off date for that?

      Show
      Dmitriy V. Ryaboy added a comment - When is the cut-off date for that?
      Hide
      Olga Natkovich added a comment -

      The proposal is to cut the branch on 8/30 so the code needs to be committed by then.

      Show
      Olga Natkovich added a comment - The proposal is to cut the branch on 8/30 so the code needs to be committed by then.
      Hide
      Dmitriy V. Ryaboy added a comment -

      I can integrate my changes by then.

      Show
      Dmitriy V. Ryaboy added a comment - I can integrate my changes by then.
      Hide
      Dmitriy V. Ryaboy added a comment -

      This patch (not really review-ready yet) introduces the Elephant-Bird improvements.

      You can use -gt, -gte, -lt, -lte flags to filter out row ranges, specify caching and per-region row limits, and you can specify the caster to use (interpret Strings, as before, or use bytes directly for more eficient storage and communication).

      The filtering is a bit off because it still spins up all the map tasks, the ones whose keys are filtered out just finish extremely fast.

      The progress reporting is a bit jittery, but better than nothing.

      TODO: fix up filtering, add projection pushdown, add filter pushdown, and write better tests.

      Show
      Dmitriy V. Ryaboy added a comment - This patch (not really review-ready yet) introduces the Elephant-Bird improvements. You can use -gt, -gte, -lt, -lte flags to filter out row ranges, specify caching and per-region row limits, and you can specify the caster to use (interpret Strings, as before, or use bytes directly for more eficient storage and communication). The filtering is a bit off because it still spins up all the map tasks, the ones whose keys are filtered out just finish extremely fast. The progress reporting is a bit jittery, but better than nothing. TODO: fix up filtering, add projection pushdown, add filter pushdown, and write better tests.
      Hide
      Jeff Zhang added a comment -

      I am reviewing this patch.

      Show
      Jeff Zhang added a comment - I am reviewing this patch.
      Hide
      Jeff Zhang added a comment -

      Several comments:

      1. Is it possible to specify min_row_key and max_row_key in parameters, in the current implementation, hbase will create split covering the whole table. But sometimes we do not need to scan all the table. Some regions can been ignored. This can improve the performance.

      2. One small suggestion: move line 206 to if block (only one time setting is enough)

             if (scanFilter == null) {
                  scanFilter = new FilterList();
                  scan.setFilter(scanFilter);
             }
      

      3. It's better to add warning log in HBaseBinaryConverter when the bytes is cut off for type conversion

      4. The parameter "Per-region limit" is a bit confusing for me, I think users would like to the set the limit on the whole table not per region. What do you think ?

      Show
      Jeff Zhang added a comment - Several comments: 1. Is it possible to specify min_row_key and max_row_key in parameters, in the current implementation, hbase will create split covering the whole table. But sometimes we do not need to scan all the table. Some regions can been ignored. This can improve the performance. 2. One small suggestion: move line 206 to if block (only one time setting is enough) if (scanFilter == null ) { scanFilter = new FilterList(); scan.setFilter(scanFilter); } 3. It's better to add warning log in HBaseBinaryConverter when the bytes is cut off for type conversion 4. The parameter "Per-region limit" is a bit confusing for me, I think users would like to the set the limit on the whole table not per region. What do you think ?
      Hide
      Dmitriy V. Ryaboy added a comment -

      1. Is it possible to specify min_row_key and max_row_key in parameters

      Even better than that – you can specify lt, lte, gt, and gte. It's true that as written splits will be created for the whole table, but the filters will cause most of those splits to immediately exit. Not creating the splits is on my todo list (I already do this in the elephantbird version for 0.6)

      2. One small suggestion: move line 206 to if block (only one time setting is enough)

      Good idea.

      3. It's better to add warning log in HBaseBinaryConverter when the bytes is cut off for type conversion

      Will do.

      4. The parameter "Per-region limit" is a bit confusing for me, I think users would like to the set the limit on the whole table not per region. What do you think ?

      Trouble is, you can't enforce a total limit without post-processing. In practice, I use -limit when I am experimenting and want to get just a few rows from HBase; if I want a specific number of rows, I use both -limit (to speed up the tasks, since the scanners will exit early), and Pig's LIMIT operator (to get the exact number of rows I need).

      Show
      Dmitriy V. Ryaboy added a comment - 1. Is it possible to specify min_row_key and max_row_key in parameters Even better than that – you can specify lt, lte, gt, and gte. It's true that as written splits will be created for the whole table, but the filters will cause most of those splits to immediately exit. Not creating the splits is on my todo list (I already do this in the elephantbird version for 0.6) 2. One small suggestion: move line 206 to if block (only one time setting is enough) Good idea. 3. It's better to add warning log in HBaseBinaryConverter when the bytes is cut off for type conversion Will do. 4. The parameter "Per-region limit" is a bit confusing for me, I think users would like to the set the limit on the whole table not per region. What do you think ? Trouble is, you can't enforce a total limit without post-processing. In practice, I use -limit when I am experimenting and want to get just a few rows from HBase; if I want a specific number of rows, I use both -limit (to speed up the tasks, since the scanners will exit early), and Pig's LIMIT operator (to get the exact number of rows I need).
      Hide
      Jeff Zhang added a comment -

      Dimitiy,

      One suggestion of Point 1:
      You can set the StartRow and EndRow to Scan if user specify gt, lt, lte and gte parameters, one thing should been noticed is StartRow is inclusive and EndRow is exclusive. So when users specify lte, you can add 1 to the EndRow key. In this way, TableInputFormat will create fewer InputSplit which means fewer mapper task.

      Show
      Jeff Zhang added a comment - Dimitiy, One suggestion of Point 1: You can set the StartRow and EndRow to Scan if user specify gt, lt, lte and gte parameters, one thing should been noticed is StartRow is inclusive and EndRow is exclusive. So when users specify lte, you can add 1 to the EndRow key. In this way, TableInputFormat will create fewer InputSplit which means fewer mapper task.
      Hide
      Dmitriy V. Ryaboy added a comment -

      Fixed test (but did not add new tests).
      Made default caster configurable by setting pig.hbase.caster property.
      Made rowKey filters (gt, lt, gte, lte) filter out regions when possible. Tested manually.

      Jeff, to your comments about shifting to cut off regions – I think it's better to have the loader think about region sizes, and let the user only worry about key values. If they are intimate enough with their tables to know region boundaries, they should know which end of a region is inclusive and which is exclusive, and provide the correct filters.

      Show
      Dmitriy V. Ryaboy added a comment - Fixed test (but did not add new tests). Made default caster configurable by setting pig.hbase.caster property. Made rowKey filters (gt, lt, gte, lte) filter out regions when possible. Tested manually. Jeff, to your comments about shifting to cut off regions – I think it's better to have the loader think about region sizes, and let the user only worry about key values. If they are intimate enough with their tables to know region boundaries, they should know which end of a region is inclusive and which is exclusive, and provide the correct filters.
      Hide
      Dmitriy V. Ryaboy added a comment -

      Implemented LoadPushDown (NOTE: this involved a slight backwards-compatible refactoring of Utf8StorageConverter).
      Refactored the tests a bit.

      At this point I think we are good except for further testing and documentation.

      Show
      Dmitriy V. Ryaboy added a comment - Implemented LoadPushDown (NOTE: this involved a slight backwards-compatible refactoring of Utf8StorageConverter). Refactored the tests a bit. At this point I think we are good except for further testing and documentation.
      Hide
      Jeff Zhang added a comment -

      Several updates continue Dmitriy's work.

      1. Add unit test to HBaseStorage

      • code refactoring of TestHBaseStorage
      • add unit test for parameters: gt lt gte let limit and HBaseBinaryConverter.

      2. Update hbase 0.20 to hbase 0.20.6 (Dimitry, I found HBaseStorage do not work on hbase 0.20, do you also manul test on hbase 0.20.6 rather than 0.20.0 ?)

      3. I think we need more document for HBaseStorage especially the LoadCaster, if user specify the wrong LoadCast, he will get confusing result.

      Show
      Jeff Zhang added a comment - Several updates continue Dmitriy's work. 1. Add unit test to HBaseStorage code refactoring of TestHBaseStorage add unit test for parameters: gt lt gte let limit and HBaseBinaryConverter. 2. Update hbase 0.20 to hbase 0.20.6 (Dimitry, I found HBaseStorage do not work on hbase 0.20, do you also manul test on hbase 0.20.6 rather than 0.20.0 ?) 3. I think we need more document for HBaseStorage especially the LoadCaster, if user specify the wrong LoadCast, he will get confusing result.
      Hide
      Dmitriy V. Ryaboy added a comment -

      Jeff,
      Thanks a lot for pitching in with the tests!

      I was using 0.20.0 and the old tests passed. I've only tested the binary conversion stuff and other new features on the Twitter machines, and they do run a later HBase version – perhaps the incompatibility is in the filters or binary casters code?
      Do you know which tests fail with 0.20.0?

      I will definitely add a bunch of documentation.

      Show
      Dmitriy V. Ryaboy added a comment - Jeff, Thanks a lot for pitching in with the tests! I was using 0.20.0 and the old tests passed. I've only tested the binary conversion stuff and other new features on the Twitter machines, and they do run a later HBase version – perhaps the incompatibility is in the filters or binary casters code? Do you know which tests fail with 0.20.0? I will definitely add a bunch of documentation.
      Hide
      Jeff Zhang added a comment -

      Dmitriy,

      The testcase of testLoadWithParameters_1 and testLoadWithParameters_2 failed when using hbase 0.20
      I think TableInputFormat has some update (maybe bug fixing) from hbase 0.20. to hbase 0.20.6

      The following is log:
      10/08/24 17:28:00 ERROR mapReduceLayer.Launcher: Backend error message during job submission
      org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: hbase://pigtable_1
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269)
      at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
      at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
      at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
      at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
      at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
      at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
      at java.lang.Thread.run(Thread.java:619)
      Caused by: java.lang.NullPointerException
      at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:365)
      at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:347)
      at org.apache.hadoop.hbase.filter.CompareFilter.readFields(CompareFilter.java:132)
      at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:418)
      at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:347)
      at org.apache.hadoop.hbase.filter.FilterList.readFields(FilterList.java:204)
      at org.apache.hadoop.hbase.client.Scan.readFields(Scan.java:523)
      at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertStringToScan(TableMapReduceUtil.java:94)
      at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:79)
      at org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat$HBaseTableIFBuilder.build(HBaseTableInputFormat.java:77)
      at org.apache.pig.backend.hadoop.hbase.HBaseStorage.getInputFormat(HBaseStorage.java:268)
      at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:257)
      ... 7 more

      10/08/24 17:28:00 ERROR pigstats.PigStats: ERROR 2118: Unable to create input splits for: hbase://pigtable_1
      10/08/24 17:28:00 ERROR pigstats.PigStatsUtil: 1 map reduce job(s) failed!
      10/08/24 17:28:00 INFO pigstats.PigStats: Script Statistics:

      Show
      Jeff Zhang added a comment - Dmitriy, The testcase of testLoadWithParameters_1 and testLoadWithParameters_2 failed when using hbase 0.20 I think TableInputFormat has some update (maybe bug fixing) from hbase 0.20. to hbase 0.20.6 The following is log: 10/08/24 17:28:00 ERROR mapReduceLayer.Launcher: Backend error message during job submission org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to create input splits for: hbase://pigtable_1 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247) at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:365) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:347) at org.apache.hadoop.hbase.filter.CompareFilter.readFields(CompareFilter.java:132) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:418) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:347) at org.apache.hadoop.hbase.filter.FilterList.readFields(FilterList.java:204) at org.apache.hadoop.hbase.client.Scan.readFields(Scan.java:523) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertStringToScan(TableMapReduceUtil.java:94) at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:79) at org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat$HBaseTableIFBuilder.build(HBaseTableInputFormat.java:77) at org.apache.pig.backend.hadoop.hbase.HBaseStorage.getInputFormat(HBaseStorage.java:268) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:257) ... 7 more 10/08/24 17:28:00 ERROR pigstats.PigStats: ERROR 2118: Unable to create input splits for: hbase://pigtable_1 10/08/24 17:28:00 ERROR pigstats.PigStatsUtil: 1 map reduce job(s) failed! 10/08/24 17:28:00 INFO pigstats.PigStats: Script Statistics:
      Hide
      Jeff Zhang added a comment -

      Dmitriy,

      I found the problem. This is really a bug of hbase 0.20.0 about the serialization of filter (https://issues.apache.org/jira/browse/HBASE-1830)
      I think we should update hbase to 0.20.6 in pig, and 0.20.6 is compatible with 0.20.0

      Show
      Jeff Zhang added a comment - Dmitriy, I found the problem. This is really a bug of hbase 0.20.0 about the serialization of filter ( https://issues.apache.org/jira/browse/HBASE-1830 ) I think we should update hbase to 0.20.6 in pig, and 0.20.6 is compatible with 0.20.0
      Hide
      Dmitriy V. Ryaboy added a comment -

      Ok, let's upgrade to 20.6 then. We could work around by serializing the filters ourselves, and applying them to the scan when reading the UDFContext, but seems a bit overboard, and folks should be upgrading anyway.

      Commiters: this is ready for review.

      Show
      Dmitriy V. Ryaboy added a comment - Ok, let's upgrade to 20.6 then. We could work around by serializing the filters ourselves, and applying them to the scan when reading the UDFContext, but seems a bit overboard, and folks should be upgrading anyway. Commiters : this is ready for review.
      Hide
      Alan Gates added a comment -

      Comments

      1. As discussed previously, LoadStoreCaster should be changed so that there is a StoreCaster interface that has the toByte methods, and LoadStoreCaster is a convenience interface that extends LoadCaster and StoreCaster.
      2. It looks like with HBASE-1933 Hbase is now available via Maven. Can we pull it from Maven rather than check in the jar to our lib directory?

      Since I know little about Hbase I focussed my review on the Pig side.

      Show
      Alan Gates added a comment - Comments As discussed previously, LoadStoreCaster should be changed so that there is a StoreCaster interface that has the toByte methods, and LoadStoreCaster is a convenience interface that extends LoadCaster and StoreCaster. It looks like with HBASE-1933 Hbase is now available via Maven. Can we pull it from Maven rather than check in the jar to our lib directory? Since I know little about Hbase I focussed my review on the Pig side.
      Hide
      Dmitriy V. Ryaboy added a comment -

      Attaching the hbase-0.20.6 jars

      HBase is an apache project, so no license issues.

      Show
      Dmitriy V. Ryaboy added a comment - Attaching the hbase-0.20.6 jars HBase is an apache project, so no license issues.
      Hide
      Dmitriy V. Ryaboy added a comment -

      Patch with the StoreCaster changes as suggested by Alan. With +1s from Alan and Jeff, committing.

      Show
      Dmitriy V. Ryaboy added a comment - Patch with the StoreCaster changes as suggested by Alan. With +1s from Alan and Jeff, committing.
      Hide
      Dmitriy V. Ryaboy added a comment -

      Re HBASE-1933, they are publishing snapshots of current trunk, not the 0.20 branch. We'll be able to start using maven to pull down hbase when we upgrade to their 0.9 release (which iirc depends on hdfs appends...)

      Show
      Dmitriy V. Ryaboy added a comment - Re HBASE-1933 , they are publishing snapshots of current trunk, not the 0.20 branch. We'll be able to start using maven to pull down hbase when we upgrade to their 0.9 release (which iirc depends on hdfs appends...)

        People

        • Assignee:
          Dmitriy V. Ryaboy
          Reporter:
          Jeff Zhang
        • Votes:
          1 Vote for this issue
          Watchers:
          8 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development