Hive
  1. Hive
  2. HIVE-6148

Support arbitrary structs stored in HBase

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.12.0
    • Fix Version/s: 1.1.0
    • Component/s: HBase Handler
    • Labels:

      Description

      We should add support to be able to query arbitrary structs stored in HBase.

      1. HIVE-6148.3.patch.txt
        32 kB
        Swarnim Kulkarni
      2. HIVE-6148.2.patch.txt
        28 kB
        Swarnim Kulkarni
      3. HIVE-6148.1.patch.txt
        26 kB
        Swarnim Kulkarni

        Activity

        Hide
        Sergey Shelukhin added a comment -

        Can you elaborate? HBase provides an option for structured keys w/type support and correct multi-field ordering (as of 0.96 IIRC, Nick Dimiduk might know more).
        Using these, if they fit the use case, might be nice for both sides (they are already there so no new code, and this will provide the real use case for them).
        We (ok, at least myself) can volunteer to fix things quickly to make it work

        Show
        Sergey Shelukhin added a comment - Can you elaborate? HBase provides an option for structured keys w/type support and correct multi-field ordering (as of 0.96 IIRC, Nick Dimiduk might know more). Using these, if they fit the use case, might be nice for both sides (they are already there so no new code, and this will provide the real use case for them). We (ok, at least myself) can volunteer to fix things quickly to make it work
        Hide
        Swarnim Kulkarni added a comment -

        Hi Sergey,

        This bug is similar to other two bugs[1][2] that I logged to get an out-of-the-box support for querying structs stored in HBase. The other two issues mainly deal with supporting protobufs, thrift and avros which can be deeply nested while this one was for simple structs that don't fall in either category (fix being pretty similar to one suggested in HIVE-2599). I kind of already have something which is working but I can completely throw it away if we can come up with a better strategy.

        [1] https://issues.apache.org/jira/browse/HIVE-3555
        [2] https://issues.apache.org/jira/browse/HIVE-6147

        Show
        Swarnim Kulkarni added a comment - Hi Sergey, This bug is similar to other two bugs [1] [2] that I logged to get an out-of-the-box support for querying structs stored in HBase. The other two issues mainly deal with supporting protobufs, thrift and avros which can be deeply nested while this one was for simple structs that don't fall in either category (fix being pretty similar to one suggested in HIVE-2599 ). I kind of already have something which is working but I can completely throw it away if we can come up with a better strategy. [1] https://issues.apache.org/jira/browse/HIVE-3555 [2] https://issues.apache.org/jira/browse/HIVE-6147
        Hide
        Nick Dimiduk added a comment -

        How is this not a duplicate of HIVE-3211?

        I think it would be trivial to create Avro and Protobuf implementations of org.apache.hadoop.hbase.types.DataType. The details will be similar to what's already provided in Struct (which I suggested you examine over in HIVE-2599). If you build on top of that interface, all HBase users will benefit from from you code, not just Hive + HBase users.

        Show
        Nick Dimiduk added a comment - How is this not a duplicate of HIVE-3211 ? I think it would be trivial to create Avro and Protobuf implementations of org.apache.hadoop.hbase.types.DataType . The details will be similar to what's already provided in Struct (which I suggested you examine over in HIVE-2599 ). If you build on top of that interface, all HBase users will benefit from from you code, not just Hive + HBase users.
        Hide
        Swarnim Kulkarni added a comment -

        Patch attached.

        Show
        Swarnim Kulkarni added a comment - Patch attached.
        Show
        Swarnim Kulkarni added a comment - RB: https://reviews.apache.org/r/25669
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12668872/HIVE-6148.1.patch.txt

        ERROR: -1 due to 1 failed/errored test(s), 6277 tests executed
        Failed tests:

        org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation
        

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/818/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/818/console
        Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-818/

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 1 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12668872

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12668872/HIVE-6148.1.patch.txt ERROR: -1 due to 1 failed/errored test(s), 6277 tests executed Failed tests: org.apache.hive.service.TestHS2ImpersonationWithRemoteMS.testImpersonation Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/818/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/818/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-818/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed This message is automatically generated. ATTACHMENT ID: 12668872
        Hide
        Swarnim Kulkarni added a comment -

        Brock NolandXuefu Zhang If you guys get a chance to review this, I would highly appreciate that. The above one test failure is unrelated to my patch.

        Show
        Swarnim Kulkarni added a comment - Brock Noland Xuefu Zhang If you guys get a chance to review this, I would highly appreciate that. The above one test failure is unrelated to my patch.
        Hide
        Brock Noland added a comment -

        Thank you! Comments on RB.

        Show
        Brock Noland added a comment - Thank you! Comments on RB.
        Hide
        Swarnim Kulkarni added a comment -

        Thanks for the review Brock Noland. Addressed comments and patch updated.

        Show
        Swarnim Kulkarni added a comment - Thanks for the review Brock Noland . Addressed comments and patch updated.
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12670972/HIVE-6148.2.patch.txt

        ERROR: -1 due to 1 failed/errored test(s), 6347 tests executed
        Failed tests:

        org.apache.hadoop.hive.ql.parse.TestParse.testParse_union
        

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/967/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/967/console
        Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-967/

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 1 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12670972

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12670972/HIVE-6148.2.patch.txt ERROR: -1 due to 1 failed/errored test(s), 6347 tests executed Failed tests: org.apache.hadoop.hive.ql.parse.TestParse.testParse_union Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/967/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/967/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-967/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed This message is automatically generated. ATTACHMENT ID: 12670972
        Hide
        Swarnim Kulkarni added a comment -

        The above test failure is unrelated to the change.

        Show
        Swarnim Kulkarni added a comment - The above test failure is unrelated to the change.
        Hide
        Brock Noland added a comment -

        Patch looks good and I am +1, but I still see:

        @@ -1403,6 +1484,8 @@ private void deserializeAndSerializeHiveAvro(HBaseSerDe serDe, Result r, Put p,
               assertNotNull(fieldData);
               assertEquals(expectedFieldsData[j], fieldData.toString().trim());
             }
        +    
        +    SerDeUtils.getJSONString(row, soi);
         
             // Now serialize
             Put put = ((PutWritable) serDe.serialize(row, soi)).getPut();
        

        which doesn't seem to make sense?

        Show
        Brock Noland added a comment - Patch looks good and I am +1, but I still see: @@ -1403,6 +1484,8 @@ private void deserializeAndSerializeHiveAvro(HBaseSerDe serDe, Result r, Put p, assertNotNull(fieldData); assertEquals(expectedFieldsData[j], fieldData.toString().trim()); } + + SerDeUtils.getJSONString(row, soi); // Now serialize Put put = ((PutWritable) serDe.serialize(row, soi)).getPut(); which doesn't seem to make sense?
        Hide
        Swarnim Kulkarni added a comment -

        I had actually added that to ensure that the deserialization stuff goes along fine because that was a statement which seemed to be called on select * style of queries. In the latest patch, added assertions around SerDeUtils.getJSONString

        Show
        Swarnim Kulkarni added a comment - I had actually added that to ensure that the deserialization stuff goes along fine because that was a statement which seemed to be called on select * style of queries. In the latest patch, added assertions around SerDeUtils.getJSONString
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12671376/HIVE-6148.3.patch.txt

        ERROR: -1 due to 1 failed/errored test(s), 6357 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority2
        

        Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1001/testReport
        Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1001/console
        Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1001/

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 1 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12671376

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12671376/HIVE-6148.3.patch.txt ERROR: -1 due to 1 failed/errored test(s), 6357 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority2 Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1001/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1001/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1001/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed This message is automatically generated. ATTACHMENT ID: 12671376
        Hide
        Swarnim Kulkarni added a comment -

        The failed test seems flaky and unrelated to my changes here.

        Show
        Swarnim Kulkarni added a comment - The failed test seems flaky and unrelated to my changes here.
        Hide
        Brock Noland added a comment -

        +1

        Show
        Brock Noland added a comment - +1
        Hide
        Brock Noland added a comment -

        Thank you very much Swarnim! I have committed this to trunk!

        Show
        Brock Noland added a comment - Thank you very much Swarnim! I have committed this to trunk!
        Hide
        Lefty Leverenz added a comment -

        By the time Hive 0.15 is released, this should be documented in the wiki:

        Show
        Lefty Leverenz added a comment - By the time Hive 0.15 is released, this should be documented in the wiki: Hive HBase Integration

          People

          • Assignee:
            Swarnim Kulkarni
            Reporter:
            Swarnim Kulkarni
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development