Jayesh, this patch is great. Thanks for taking this on. Just a few comments:
- The various options should be listed in the Terms table, instead of in usage.
- Try to talk about what it does, as opposed to what it can do. For example, "HBaseStorage can store and load data from HBase" should be "HBaseStorage stores and loads data from HBase"
- When describing the various params, describe what the param does, as opposed to how it does it. For example, "This specifies to the HBase scan method to read rows greater than minKeyVal" should be "Specifies only rows with a rowKey greater than minKeyVal are to be returned".
- "and a wildcard as a suffix" should be "followed by an asterisk (*)". "using the column family name and a wildcard" should be "using the column family name and an asterisk (i.e., cf:*)"
- "Columns from multiple column families are specified by seperating each column family and column qualifier pair by a single space." should be "Columns from multiple column families can be returned." No need to specify the space delimiter, since you already have.
- Likewise, the last two sentance of Usage can be omitted, since you mention above that not all columns must be specified.
- Should specify that loadKey is false by default as well as how it inserts an extra field as the first element fo the schema, before the columns specified.
- There are a few more options to describe (see the constructor javadocs in the code on the trunk): delim, ignoreWhitespace, noWAL, minTimestamp, maxTimestamp, timestamp. Note that the "extreme caution" warning in the javadoc is mis-located. It should apply to the noWAL option.
- We should add some discussion about STORE and how the first field needs to be the rowKey, as well as how maps and scalars are handled. See the Javadoc of the class for a description of this.
Also, after you upload the next patch (typically named something like
PIG-2341_2.patch) you'll want to set the "patch available" flag, which alerts folks that it's ready for review.