Details

      Description

      One of the nice thing between Pig and Hbase is that they can be integrated. Thanks to recent patch (PIG-1250) committed.

      The documentation is not well updated yet (currently almost relate to the patch itself). It world be nice to document this feature in detail in the Pig documentation page (e.g, in here: http://pig.apache.org/docs/r0.9.1/func.html#load-store-functions).

      1. PIG-2341.2.patch
        8 kB
        Bill Graham
      2. PIG-2341.3.patch
        9 kB
        Jayesh Thakrar
      3. PIG-2341.4.patch
        8 kB
        Jayesh Thakrar
      4. PIG-2341.5.patch
        9 kB
        Bill Graham
      5. PIG-2341.patch
        5 kB
        Jayesh Thakrar

        Activity

        Mikael Sitruk created issue -
        Hide
        Dmitriy V. Ryaboy added a comment -

        Good call.

        Docs on the API are pretty good, but we should put them into func.html as well.

        http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html

        (the class description is half the documentation; the constructor documents arguments HBaseStorage understands, and is therefore also quite important).

        Show
        Dmitriy V. Ryaboy added a comment - Good call. Docs on the API are pretty good, but we should put them into func.html as well. http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html (the class description is half the documentation; the constructor documents arguments HBaseStorage understands, and is therefore also quite important).
        Bill Graham made changes -
        Field Original Value New Value
        Parent PIG-2756 [ 12560811 ]
        Issue Type Improvement [ 4 ] Sub-task [ 7 ]
        Hide
        Jayesh Thakrar added a comment -

        Hi,

        I am using the HBaseStorage at my work and am very happy about it. I would like to volunteer to take up this task. How can I go about doing it?

        Will greatly appreciate any pointers......

        Thanks,
        Jayesh

        Show
        Jayesh Thakrar added a comment - Hi, I am using the HBaseStorage at my work and am very happy about it. I would like to volunteer to take up this task. How can I go about doing it? Will greatly appreciate any pointers...... Thanks, Jayesh
        Hide
        Bill Graham added a comment -

        Thanks Jayesh for volunteering! Having HBaseStorage documented on the site is way overdue.

        To document, you'll want to check out pig from SVN (or git), edit src/docs/src/documentation/content/xdocs/func.xml, build locally to check the generated HTML and then submit a patch.

        This wiki explains how to build the documentation:
        https://cwiki.apache.org/confluence/display/PIG/HowToDocument

        And this one is a more general doc to get set up:
        https://cwiki.apache.org/confluence/display/PIG/HowToContribute

        And of course ask questions here or on the list if you have any.

        Show
        Bill Graham added a comment - Thanks Jayesh for volunteering! Having HBaseStorage documented on the site is way overdue. To document, you'll want to check out pig from SVN (or git), edit src/docs/src/documentation/content/xdocs/func.xml , build locally to check the generated HTML and then submit a patch. This wiki explains how to build the documentation: https://cwiki.apache.org/confluence/display/PIG/HowToDocument And this one is a more general doc to get set up: https://cwiki.apache.org/confluence/display/PIG/HowToContribute And of course ask questions here or on the list if you have any.
        Bill Graham made changes -
        Labels hbase
        Hide
        Jayesh Thakrar added a comment -

        I have attached the patch file for review. This is my first attempt to contribute to Apache, so not sure of the protocol......

        Show
        Jayesh Thakrar added a comment - I have attached the patch file for review. This is my first attempt to contribute to Apache, so not sure of the protocol......
        Jayesh Thakrar made changes -
        Attachment PIG-2341.patch [ 12547152 ]
        Hide
        Bill Graham added a comment -

        Jayesh, this patch is great. Thanks for taking this on. Just a few comments:

        • The various options should be listed in the Terms table, instead of in usage.
        • Try to talk about what it does, as opposed to what it can do. For example, "HBaseStorage can store and load data from HBase" should be "HBaseStorage stores and loads data from HBase"
        • When describing the various params, describe what the param does, as opposed to how it does it. For example, "This specifies to the HBase scan method to read rows greater than minKeyVal" should be "Specifies only rows with a rowKey greater than minKeyVal are to be returned".
        • "and a wildcard as a suffix" should be "followed by an asterisk (*)". "using the column family name and a wildcard" should be "using the column family name and an asterisk (i.e., cf:*)"
        • "Columns from multiple column families are specified by seperating each column family and column qualifier pair by a single space." should be "Columns from multiple column families can be returned." No need to specify the space delimiter, since you already have.
        • Likewise, the last two sentance of Usage can be omitted, since you mention above that not all columns must be specified.
        • Should specify that loadKey is false by default as well as how it inserts an extra field as the first element fo the schema, before the columns specified.
        • There are a few more options to describe (see the constructor javadocs in the code on the trunk): delim, ignoreWhitespace, noWAL, minTimestamp, maxTimestamp, timestamp. Note that the "extreme caution" warning in the javadoc is mis-located. It should apply to the noWAL option.
        • We should add some discussion about STORE and how the first field needs to be the rowKey, as well as how maps and scalars are handled. See the Javadoc of the class for a description of this.

        Also, after you upload the next patch (typically named something like PIG-2341_2.patch) you'll want to set the "patch available" flag, which alerts folks that it's ready for review.

        Show
        Bill Graham added a comment - Jayesh, this patch is great. Thanks for taking this on. Just a few comments: The various options should be listed in the Terms table, instead of in usage. Try to talk about what it does, as opposed to what it can do. For example, "HBaseStorage can store and load data from HBase" should be "HBaseStorage stores and loads data from HBase" When describing the various params, describe what the param does, as opposed to how it does it. For example, "This specifies to the HBase scan method to read rows greater than minKeyVal" should be "Specifies only rows with a rowKey greater than minKeyVal are to be returned". "and a wildcard as a suffix" should be "followed by an asterisk (*)". "using the column family name and a wildcard" should be "using the column family name and an asterisk (i.e., cf:*)" "Columns from multiple column families are specified by seperating each column family and column qualifier pair by a single space." should be "Columns from multiple column families can be returned." No need to specify the space delimiter, since you already have. Likewise, the last two sentance of Usage can be omitted, since you mention above that not all columns must be specified. Should specify that loadKey is false by default as well as how it inserts an extra field as the first element fo the schema, before the columns specified. There are a few more options to describe (see the constructor javadocs in the code on the trunk): delim, ignoreWhitespace, noWAL, minTimestamp, maxTimestamp, timestamp. Note that the "extreme caution" warning in the javadoc is mis-located. It should apply to the noWAL option. We should add some discussion about STORE and how the first field needs to be the rowKey, as well as how maps and scalars are handled. See the Javadoc of the class for a description of this. Also, after you upload the next patch (typically named something like PIG-2341 _2.patch) you'll want to set the "patch available" flag, which alerts folks that it's ready for review.
        Bill Graham made changes -
        Assignee Jayesh Thakrar [ jthakrar ]
        Hide
        Bill Graham added a comment -

        Attaching a second patch with my comments included. Added a section on using HBaseStorage for loading and added missing options. Will commit if no one has any comments.

        Show
        Bill Graham added a comment - Attaching a second patch with my comments included. Added a section on using HBaseStorage for loading and added missing options. Will commit if no one has any comments.
        Bill Graham made changes -
        Attachment PIG-2341.2.patch [ 12560146 ]
        Bill Graham made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Affects Version/s 0.10.0 [ 12316246 ]
        Labels hbase documentation hbase
        Fix Version/s 0.11 [ 12318878 ]
        Hide
        Jayesh Thakrar added a comment -

        Some more details added to the function documentation.

        Show
        Jayesh Thakrar added a comment - Some more details added to the function documentation.
        Jayesh Thakrar made changes -
        Attachment PIG-2341.3.patch [ 12560233 ]
        Hide
        Jayesh Thakrar added a comment -

        Merged my changes with Bill's patch.

        Show
        Jayesh Thakrar added a comment - Merged my changes with Bill's patch.
        Jayesh Thakrar made changes -
        Attachment PIG-2341.4.patch [ 12560495 ]
        Hide
        Bill Graham added a comment -

        Thanks Jayesh for the merge! I think we're all set. Attaching patch 5 which contains some minor tweaks and two main changes:

        • Rebasing the patch the base of the Pig repos. You generally will want to submit pathes so they can apply from the base dir.
        • Rolling javadoc bug PIG-3092 into this one.
        Show
        Bill Graham added a comment - Thanks Jayesh for the merge! I think we're all set. Attaching patch 5 which contains some minor tweaks and two main changes: Rebasing the patch the base of the Pig repos. You generally will want to submit pathes so they can apply from the base dir. Rolling javadoc bug PIG-3092 into this one.
        Bill Graham made changes -
        Attachment PIG-2341.5.patch [ 12560733 ]
        Hide
        Bill Graham added a comment -

        Committed, thanks Jayesh! This documentation is way overdue, so huge props for jumping on it.

        Show
        Bill Graham added a comment - Committed, thanks Jayesh! This documentation is way overdue, so huge props for jumping on it.
        Bill Graham made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Bill Graham made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        405d 20h 16m 1 Bill Graham 10/Dec/12 05:31
        Patch Available Patch Available Resolved Resolved
        3d 2h 46m 1 Bill Graham 13/Dec/12 08:18
        Resolved Resolved Closed Closed
        70d 20h 34m 1 Bill Graham 22/Feb/13 04:53

          People

          • Assignee:
            Jayesh Thakrar
            Reporter:
            Mikael Sitruk
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development