Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.3
    • Fix Version/s: 0.3
    • Component/s: None
    • Labels:

      Description

      We need to create a HCatRecordSerDe to enable hive-hcat integration where hive can read and write using HCatInputFormat/HCatOutputFormat.

      The first step of this is to write a SerDe that serializes to and from a HCatRecord, and implements appropriate ObjectInspectors.

      1. HCATALOG-204-4.patch
        26 kB
        Sushanth Sowmyan
      2. HCATALOG-204-3.patch
        15 kB
        Sushanth Sowmyan
      3. HCATALOG-204-2.patch
        24 kB
        Sushanth Sowmyan
      4. HCATALOG-204-2.patch
        24 kB
        Sushanth Sowmyan
      5. HCATALOG-204.patch
        23 kB
        Sushanth Sowmyan

        Issue Links

          Activity

          Hide
          Alan Gates added a comment -

          Issue closed with 0.4 release.

          Show
          Alan Gates added a comment - Issue closed with 0.4 release.
          Hide
          Alan Gates added a comment -

          Patch 4 checked into trunk and 0.3 branch.

          Show
          Alan Gates added a comment - Patch 4 checked into trunk and 0.3 branch.
          Hide
          Sushanth Sowmyan added a comment -

          (Oops, missed a file in 204-3.patch, including now.)

          Show
          Sushanth Sowmyan added a comment - (Oops, missed a file in 204-3.patch, including now.)
          Hide
          Sushanth Sowmyan added a comment -

          (Renaming file to be more obvious as to which is the latest file)

          Show
          Sushanth Sowmyan added a comment - (Renaming file to be more obvious as to which is the latest file)
          Hide
          Sushanth Sowmyan added a comment -

          Updating patch

          + Added copy semantic to HCatRecord

          Show
          Sushanth Sowmyan added a comment - Updating patch + Added copy semantic to HCatRecord
          Hide
          Alan Gates added a comment -

          Patch looks good. It just needs e2e tests and it should be ready for check in.

          Show
          Alan Gates added a comment - Patch looks good. It just needs e2e tests and it should be ready for check in.
          Hide
          Sushanth Sowmyan added a comment -

          Updated patch:

          + All commented System.err converted to LOG.debug
          + As to hacking up a textIF/OF that uses this SerDe, was not able to do so. After trying, realized that HCatRecord generated by this is not serializable/deserializable in a manner that TextIF/OF can understand - it really needs an underlying IF/OF that understands HCatRecord. So, we can only do an e2e with a HCatIF/OF, which depend on this.
          + However, this bit really is only about the serialize/deserialize and generation of the appropriate object inspector, and the unit test tests that. To that end, I've added a bit to the test that converts data back and forth between LazySimpleSerDe and HCatRecordSerDe to show that the conversion works. With a complex enough schema(as provided), this test goes through all the code provided as well.
          + One bug fixed in this exercise - using LazySimpleSerDe showed me that sometimes, the primitive object might not be a java object in itself, and I needed to use it to get the native java object further, that has now been fixed.

          Show
          Sushanth Sowmyan added a comment - Updated patch: + All commented System.err converted to LOG.debug + As to hacking up a textIF/OF that uses this SerDe, was not able to do so. After trying, realized that HCatRecord generated by this is not serializable/deserializable in a manner that TextIF/OF can understand - it really needs an underlying IF/OF that understands HCatRecord. So, we can only do an e2e with a HCatIF/OF, which depend on this. + However, this bit really is only about the serialize/deserialize and generation of the appropriate object inspector, and the unit test tests that. To that end, I've added a bit to the test that converts data back and forth between LazySimpleSerDe and HCatRecordSerDe to show that the conversion works. With a complex enough schema(as provided), this test goes through all the code provided as well. + One bug fixed in this exercise - using LazySimpleSerDe showed me that sometimes, the primitive object might not be a java object in itself, and I needed to use it to get the native java object further, that has now been fixed.
          Hide
          Ashutosh Chauhan added a comment -

          I could hack one up that uses TextIF/OF that uses this SerDe to test if we want that.

          That will be useful to understand the code flow. Please do so.

          Show
          Ashutosh Chauhan added a comment - I could hack one up that uses TextIF/OF that uses this SerDe to test if we want that. That will be useful to understand the code flow. Please do so.
          Show
          Sushanth Sowmyan added a comment - https://reviews.apache.org/r/3448/
          Hide
          Sushanth Sowmyan added a comment -

          > Re: e2e tests / hive query : that follows with the larger task, that of reading and writing through hive, and needs other patches. Unit tests are what are relevant for the individual piece. I could hack one up that uses TextIF/OF that uses this SerDe to test if we want that.

          > Re: System.err - changed it to LOG.debugs around instead of commented System.err - I didn't want to remove it outright because I feel they're helpful to programmers looking into this in the future.

          > ok, doing so with the patch update now.

          Show
          Sushanth Sowmyan added a comment - > Re: e2e tests / hive query : that follows with the larger task, that of reading and writing through hive, and needs other patches. Unit tests are what are relevant for the individual piece. I could hack one up that uses TextIF/OF that uses this SerDe to test if we want that. > Re: System.err - changed it to LOG.debugs around instead of commented System.err - I didn't want to remove it outright because I feel they're helpful to programmers looking into this in the future. > ok, doing so with the patch update now.
          Hide
          Ashutosh Chauhan added a comment -

          Few initial comments:

          • Unit test is good, but It would be great to provide a test-case in form of a hive query to demonstrate the intended usage on how this feature should be used. You can either add in e2e tests or in unit tests.
          • Get rid of System.err.println
          • create review-board entry for ease of review.
          Show
          Ashutosh Chauhan added a comment - Few initial comments: Unit test is good, but It would be great to provide a test-case in form of a hive query to demonstrate the intended usage on how this feature should be used. You can either add in e2e tests or in unit tests. Get rid of System.err.println create review-board entry for ease of review.
          Hide
          Sushanth Sowmyan added a comment -

          Patch provided.

          Show
          Sushanth Sowmyan added a comment - Patch provided.

            People

            • Assignee:
              Sushanth Sowmyan
              Reporter:
              Sushanth Sowmyan
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development