Hive
  1. Hive
  2. HIVE-1229

replace dependencies on HBase deprecated API

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6.0
    • Fix Version/s: 0.7.0
    • Component/s: HBase Handler
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Some of these dependencies are on the old Hadoop mapred packages; others are HBase-specific. The former have to wait until the rest of Hive moves over to the new Hadoop mapreduce package, but the HBase-specific ones don't have to wait.

      1. HIVE-1229.2.patch
        78 kB
        Basab Maulik
      2. HIVE-1229.1.patch
        96 kB
        Basab Maulik
      3. HIVE-1229.3.patch
        81 kB
        Basab Maulik
      4. HIVE-1229.4.patch
        81 kB
        Basab Maulik

        Issue Links

          Activity

          Hide
          Basab Maulik added a comment -

          I have attached a preliminary patch for this issue for feedback/comments. The patch includes changes from HIVE-1383.3.patch.

          With the patch applied the following tests pass:

          ant test -Dtestcase=TestHBaseSerDe
          ant test -Dtestcase=TestLazyHBaseObject
          ant test -Dtestcase=TestHBaseCliDriver -Dqfile=hbase_queries.q

          .. the following fails:

          ant clean test tar -logfile ant.log

          'ant testreport' shows six failures in ql/parse.

          I think these failures are from my environment rather than the patch?

          I may also need to fix some checkstyle violations.

          Show
          Basab Maulik added a comment - I have attached a preliminary patch for this issue for feedback/comments. The patch includes changes from HIVE-1383 .3.patch. With the patch applied the following tests pass: ant test -Dtestcase=TestHBaseSerDe ant test -Dtestcase=TestLazyHBaseObject ant test -Dtestcase=TestHBaseCliDriver -Dqfile=hbase_queries.q .. the following fails: ant clean test tar -logfile ant.log 'ant testreport' shows six failures in ql/parse. I think these failures are from my environment rather than the patch? I may also need to fix some checkstyle violations.
          Hide
          John Sichi added a comment -

          Taking a look at this one now.

          Show
          John Sichi added a comment - Taking a look at this one now.
          Hide
          John Sichi added a comment -

          Hey Basab,

          Looks great. I have a few review questions before we commit this, but first, I need to you to submit a new patch rebased against latest trunk now that HIVE-1383 has been committed. (Note that there were some more minor changes in between HIVE-1383.3.patch and HIVE-1383.4.patch.)

          For the last patch, you had a typo in the filename (1129 instead of 1229); call the new one HIVE-1229.2.patch and then delete the old one to avoid confusion.

          Show
          John Sichi added a comment - Hey Basab, Looks great. I have a few review questions before we commit this, but first, I need to you to submit a new patch rebased against latest trunk now that HIVE-1383 has been committed. (Note that there were some more minor changes in between HIVE-1383 .3.patch and HIVE-1383 .4.patch.) For the last patch, you had a typo in the filename (1129 instead of 1229); call the new one HIVE-1229 .2.patch and then delete the old one to avoid confusion.
          Hide
          Basab Maulik added a comment -

          ... recreating patch against current trunk. Thanks for the feedback!

          Show
          Basab Maulik added a comment - ... recreating patch against current trunk. Thanks for the feedback!
          Hide
          Basab Maulik added a comment -

          fixed checkstyle violations and rebased against trunk.

          Tests run successfully:

          ant test -Dtestcase=TestLazyHBaseObject
          ant test -Dtestcase=TestHBaseSerDe
          ant test -Dtestcase=TestHBaseCliDriver -Dqfile=hbase_queries.q

          thanks.

          Show
          Basab Maulik added a comment - fixed checkstyle violations and rebased against trunk. Tests run successfully: ant test -Dtestcase=TestLazyHBaseObject ant test -Dtestcase=TestHBaseSerDe ant test -Dtestcase=TestHBaseCliDriver -Dqfile=hbase_queries.q thanks.
          Hide
          Basab Maulik added a comment -

          Reattaching first patch with correct name.

          Show
          Basab Maulik added a comment - Reattaching first patch with correct name.
          Hide
          HBase Review Board added a comment -

          Message from: "John Sichi" <jsichi@facebook.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/238/
          -----------------------------------------------------------

          Review request for Hive Developers.

          Summary
          -------

          review by JVS

          This addresses bug HIVE-1229.
          http://issues.apache.org/jira/browse/HIVE-1229

          Diffs


          http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 906294

          Diff: http://review.hbase.org/r/238/diff

          Testing
          -------

          Thanks,

          John

          Show
          HBase Review Board added a comment - Message from: "John Sichi" <jsichi@facebook.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/238/ ----------------------------------------------------------- Review request for Hive Developers. Summary ------- review by JVS This addresses bug HIVE-1229 . http://issues.apache.org/jira/browse/HIVE-1229 Diffs http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java 906294 Diff: http://review.hbase.org/r/238/diff Testing ------- Thanks, John
          Hide
          HBase Review Board added a comment -

          Message from: "John Sichi" <jsichi@facebook.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/239/
          -----------------------------------------------------------

          Review request for Hive Developers.

          Summary
          -------

          review by JVS (please ignore the previous one I created a few minutes ago with the bad patch by accident)

          This addresses bug HIVE-1229.
          http://issues.apache.org/jira/browse/HIVE-1229

          Diffs


          http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 957296
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSplit.java 957296
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 957296
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java 957296
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat.java 957296
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java 957296
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java 957296
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 957296
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java 957296
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 957296
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java 957296

          Diff: http://review.hbase.org/r/239/diff

          Testing
          -------

          Thanks,

          John

          Show
          HBase Review Board added a comment - Message from: "John Sichi" <jsichi@facebook.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/239/ ----------------------------------------------------------- Review request for Hive Developers. Summary ------- review by JVS (please ignore the previous one I created a few minutes ago with the bad patch by accident) This addresses bug HIVE-1229 . http://issues.apache.org/jira/browse/HIVE-1229 Diffs http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 957296 http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSplit.java 957296 http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 957296 http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java 957296 http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat.java 957296 http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java 957296 http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java 957296 http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 957296 http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java 957296 http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 957296 http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java 957296 Diff: http://review.hbase.org/r/239/diff Testing ------- Thanks, John
          Hide
          John Sichi added a comment -

          Hey Basab,

          See my review comments above. They are all minor except one: hbase_joins.q is currently failing:

          ant test -Dtestcase=TestHBaseCliDriver -Dqfile=hbase_joins.q

          I think it might have to do with the fact that you got rid of the column mapping from the splits; I think we need to keep this because in the presence of multiple tables, the parameter in the job conf changes. But I didn't look into it carefully.

          Thanks for cleaning up so much code in this patch; it is a big improvement overall.

          Show
          John Sichi added a comment - Hey Basab, See my review comments above. They are all minor except one: hbase_joins.q is currently failing: ant test -Dtestcase=TestHBaseCliDriver -Dqfile=hbase_joins.q I think it might have to do with the fact that you got rid of the column mapping from the splits; I think we need to keep this because in the presence of multiple tables, the parameter in the job conf changes. But I didn't look into it carefully. Thanks for cleaning up so much code in this patch; it is a big improvement overall.
          Hide
          HBase Review Board added a comment -

          Message from: "John Sichi" <jsichi@facebook.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/239/#review286
          -----------------------------------------------------------

          http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
          <http://review.hbase.org/r/239/#comment1286>

          Can we avoid the per-row call to toBytes here by keeping around the bytes instead of the string for the column name?

          http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSplit.java
          <http://review.hbase.org/r/239/#comment1287>

          I'm not sure we can get rid of the column mapping info from the split like this. This may be the reason why hbase_joins.q is failing.

          http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat.java
          <http://review.hbase.org/r/239/#comment1285>

          Isn't the "implements OutputFormat" redundant since TableOutputFormat also implements OutputFormat?

          http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java
          <http://review.hbase.org/r/239/#comment1284>

          Can we move this (and the Bytes.toBytes(columnFamily) below) to init? That way we don't have to pay the cost for every row.

          Also, if it doesn't end in colon, isn't something wrong?

          http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java
          <http://review.hbase.org/r/239/#comment1283>

          Normally I would agree that it is good to use List rather than ArrayList, but not in this case since it forces us to cast later on due to LazyStruct's explicit usage of ArrayList. So I think we should suppress the checkstyle warning in this case.

          • John
          Show
          HBase Review Board added a comment - Message from: "John Sichi" <jsichi@facebook.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/239/#review286 ----------------------------------------------------------- http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java < http://review.hbase.org/r/239/#comment1286 > Can we avoid the per-row call to toBytes here by keeping around the bytes instead of the string for the column name? http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSplit.java < http://review.hbase.org/r/239/#comment1287 > I'm not sure we can get rid of the column mapping info from the split like this. This may be the reason why hbase_joins.q is failing. http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat.java < http://review.hbase.org/r/239/#comment1285 > Isn't the "implements OutputFormat" redundant since TableOutputFormat also implements OutputFormat? http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java < http://review.hbase.org/r/239/#comment1284 > Can we move this (and the Bytes.toBytes(columnFamily) below) to init? That way we don't have to pay the cost for every row. Also, if it doesn't end in colon, isn't something wrong? http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java < http://review.hbase.org/r/239/#comment1283 > Normally I would agree that it is good to use List rather than ArrayList, but not in this case since it forces us to cast later on due to LazyStruct's explicit usage of ArrayList. So I think we should suppress the checkstyle warning in this case. John
          Hide
          Basab Maulik added a comment -

          This patch fixes the issue which was causing hbase_joins.q tests to fail. I needed to update the HBase Scan instance properly.

          I have also made the changes as suggested on Review Board, where I will post the line item details.

          Thanks for the feedback/comments.

          ant test -Dtestcase=TestHBaseSerDe
          ant test -Dtestcase=TestLazyHBaseObject
          ant test -Dtestcase=TestHBaseCliDriver
          ant test -Dtestcase=TestHBaseMinimrCliDriver

          run successfully.

          Show
          Basab Maulik added a comment - This patch fixes the issue which was causing hbase_joins.q tests to fail. I needed to update the HBase Scan instance properly. I have also made the changes as suggested on Review Board, where I will post the line item details. Thanks for the feedback/comments. ant test -Dtestcase=TestHBaseSerDe ant test -Dtestcase=TestLazyHBaseObject ant test -Dtestcase=TestHBaseCliDriver ant test -Dtestcase=TestHBaseMinimrCliDriver run successfully.
          Hide
          HBase Review Board added a comment -

          Message from: bkm.hadoop@gmail.com

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/239/#review309
          -----------------------------------------------------------

          http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
          <http://review.hbase.org/r/239/#comment1362>

          I have added a HashMap to HBaseSerDe to cache the String to byte [] mapping. The code looks it up for the serialization path. The map is also passed to LazyHBaseRow and LazyHBaseCellMap for lookup during deserialization. In addition I have moved some function calls to serdeParams and saved their return values as instance variables to reduce the per row calls.

          http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSplit.java
          <http://review.hbase.org/r/239/#comment1363>

          I have left this in the 3rd patch. Thanks for explaining this. I don't think we have a test case which exposes this or even whether the serde instance and the record reader instance could get out of sync, but good idea to leave it in.

          The failing tests were due to an improperly initialized Scan instance.

          http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat.java
          <http://review.hbase.org/r/239/#comment1366>

          This OutputFormat is from the deprecated mapred package - this is needed to keep it compatible with the storage handler which needs it to be compatible with this – see the getOutputFormat() method.

          http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java
          <http://review.hbase.org/r/239/#comment1365>

          Actually, parse() should also be called only once. I have added the missing call to set parsed to true. In addition the cached values are now passed in from the serde to lazy row to the lazy cell map in the deserialization path.

          http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java
          <http://review.hbase.org/r/239/#comment1364>

          Done, reverted these changes.

          • bkm
          Show
          HBase Review Board added a comment - Message from: bkm.hadoop@gmail.com ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/239/#review309 ----------------------------------------------------------- http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java < http://review.hbase.org/r/239/#comment1362 > I have added a HashMap to HBaseSerDe to cache the String to byte [] mapping. The code looks it up for the serialization path. The map is also passed to LazyHBaseRow and LazyHBaseCellMap for lookup during deserialization. In addition I have moved some function calls to serdeParams and saved their return values as instance variables to reduce the per row calls. http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSplit.java < http://review.hbase.org/r/239/#comment1363 > I have left this in the 3rd patch. Thanks for explaining this. I don't think we have a test case which exposes this or even whether the serde instance and the record reader instance could get out of sync, but good idea to leave it in. The failing tests were due to an improperly initialized Scan instance. http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat.java < http://review.hbase.org/r/239/#comment1366 > This OutputFormat is from the deprecated mapred package - this is needed to keep it compatible with the storage handler which needs it to be compatible with this – see the getOutputFormat() method. http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java < http://review.hbase.org/r/239/#comment1365 > Actually, parse() should also be called only once. I have added the missing call to set parsed to true. In addition the cached values are now passed in from the serde to lazy row to the lazy cell map in the deserialization path. http://svn.apache.org/repos/asf/hadoop/hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java < http://review.hbase.org/r/239/#comment1364 > Done, reverted these changes. bkm
          Hide
          John Sichi added a comment -

          Instead of cachedColumnNameBytes, is it possible to instead keep a separate array of the names in byte form? If we're doing all access positionally, that would allow us to skip the hash map lookups.

          Show
          John Sichi added a comment - Instead of cachedColumnNameBytes, is it possible to instead keep a separate array of the names in byte form? If we're doing all access positionally, that would allow us to skip the hash map lookups.
          Hide
          Basab Maulik added a comment -

          Update patch based on review feedback.

          Show
          Basab Maulik added a comment - Update patch based on review feedback.
          Hide
          John Sichi added a comment -

          +1. Will commit when tests pass.

          Show
          John Sichi added a comment - +1. Will commit when tests pass.
          Hide
          John Sichi added a comment -

          Committed. Thanks Basab!

          Show
          John Sichi added a comment - Committed. Thanks Basab!

            People

            • Assignee:
              Basab Maulik
              Reporter:
              John Sichi
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development