Hive
  1. Hive
  2. HIVE-1634

Allow access to Primitive types stored in binary format in HBase

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.7.0, 0.8.0, 0.9.0
    • Fix Version/s: 0.9.0
    • Component/s: HBase Handler
    • Labels:
      None

      Description

      This addresses HIVE-1245 in part, for atomic or primitive types.

      The serde property "hbase.columns.storage.types" = ",b,b,b,b,b,b,b,b" is a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping". Allowed values are '' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples.

      There is also a table property "hbase.table.default.storage.type" = "string" to specify a table level default storage type. The other valid specification is "binary". The table level default is overridden by a column level specification.

      This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key.

      Attached is a program for creating a table and populating it in HBase. The external table in Hive can access the data as shown in the example below.

      hive> create external table TestHiveHBaseExternalTable
      > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
      > c_int int, c_long bigint, c_string string, c_float float, c_double double)
      > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
      > with serdeproperties ("hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double")
      > tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
      OK
      Time taken: 0.691 seconds
      hive> select * from TestHiveHBaseExternalTable;
      OK
      key-1 NULL NULL NULL NULL NULL Test-String NULL NULL
      Time taken: 0.346 seconds
      hive> drop table TestHiveHBaseExternalTable;
      OK
      Time taken: 0.139 seconds
      hive> create external table TestHiveHBaseExternalTable
      > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
      > c_int int, c_long bigint, c_string string, c_float float, c_double double)
      > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
      > with serdeproperties (
      > "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
      > "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" )
      > tblproperties (
      > "hbase.table.name" = "TestHiveHBaseExternalTable",
      > "hbase.table.default.storage.type" = "string");
      OK
      Time taken: 0.139 seconds
      hive> select * from TestHiveHBaseExternalTable;
      OK
      key-1 true -128 -32768 -2147483648 -9223372036854775808 Test-String -2.1793132E-11 2.01345E291
      Time taken: 0.151 seconds
      hive> drop table TestHiveHBaseExternalTable;
      OK
      Time taken: 0.154 seconds
      hive> create external table TestHiveHBaseExternalTable
      > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
      > c_int int, c_long bigint, c_string string, c_float float, c_double double)
      > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
      > with serdeproperties (
      > "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
      > "hbase.columns.storage.types" = ",b,b,b,b,b,,b,b" )
      > tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
      OK
      Time taken: 0.347 seconds
      hive> select * from TestHiveHBaseExternalTable;
      OK
      key-1 true -128 -32768 -2147483648 -9223372036854775808 Test-String -2.1793132E-11 2.01345E291
      Time taken: 0.245 seconds
      hive>

      1. ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.1.patch
        243 kB
        Phabricator
      2. ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.2.patch
        247 kB
        Phabricator
      3. ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.3.patch
        246 kB
        Phabricator
      4. hive-1634_3.patch
        247 kB
        Ashutosh Chauhan
      5. HIVE-1634.0.patch
        561 kB
        Basab Maulik
      6. HIVE-1634.1.patch
        249 kB
        John Sichi
      7. HIVE-1634.branch08.patch
        247 kB
        Alan Gates
      8. TestHiveHBaseExternalTable.java
        2 kB
        Basab Maulik

        Issue Links

          Activity

          Hide
          Nick Dimiduk added a comment -

          Hi Venki Korukanti. The parent ticket for HBase types is HBASE-8089. The groundwork has been laid on the HBase side by way of a DataType API and an order-preserving serialization format. The next step, as I see it, would be to implement HBASE-10091, that way there's a common description language that can be used to declare HBase types. I'd love your thoughts on that topic if you have some moments to spare.

          Show
          Nick Dimiduk added a comment - Hi Venki Korukanti . The parent ticket for HBase types is HBASE-8089 . The groundwork has been laid on the HBase side by way of a DataType API and an order-preserving serialization format. The next step, as I see it, would be to implement HBASE-10091 , that way there's a common description language that can be used to declare HBase types. I'd love your thoughts on that topic if you have some moments to spare.
          Hide
          Ashutosh Chauhan added a comment -

          I don't think there is any jira for new types or complex types. At the time, this work was done only those primitive types were supported in Hive.
          Although, any new work in this direction should take into account addition of type support work in Hbase. cc: Nick Dimiduk who is leading the effort in hbase land.

          Show
          Ashutosh Chauhan added a comment - I don't think there is any jira for new types or complex types. At the time, this work was done only those primitive types were supported in Hive. Although, any new work in this direction should take into account addition of type support work in Hbase. cc: Nick Dimiduk who is leading the effort in hbase land.
          Hide
          Venki Korukanti added a comment -

          From the description it looks like binary storage support is only for few primitive types.
          Quoting from description: "This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types"

          Is there any JIRA or requirement to support the rest of primitive types (like binary, timestamp, decimal) in binary storage format?

          Show
          Venki Korukanti added a comment - From the description it looks like binary storage support is only for few primitive types. Quoting from description: "This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types" Is there any JIRA or requirement to support the rest of primitive types (like binary, timestamp, decimal) in binary storage format?
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
          HIVE-2958 [jira] GROUP BY causing ClassCastException [LazyDioInteger cannot be
          cast LazyInteger]
          (Navis Ryu via Ashutosh Chauhan)

          Summary:
          DPAL-1111 GROUP BY causing ClassCastException [LazyDioInteger cannot be cast
          LazyInteger]

          This relates to https://issues.apache.org/jira/browse/HIVE-1634.

          The following work fine:

          CREATE EXTERNAL TABLE tim_hbase_occurrence (
          id int,
          scientific_name string,
          data_resource_id int
          ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH
          SERDEPROPERTIES (
          "hbase.columns.mapping" = ":key#b,v:scientific_name#s,v:data_resource_id#b"
          ) TBLPROPERTIES(
          "hbase.table.name" = "mini_occurrences",
          "hbase.table.default.storage.type" = "binary"
          );
          SELECT * FROM tim_hbase_occurrence LIMIT 3;
          SELECT * FROM tim_hbase_occurrence WHERE data_resource_id=1081 LIMIT 3;

          However, the following fails:

          SELECT data_resource_id, count FROM tim_hbase_occurrence GROUP BY
          data_resource_id;

          The error given:

          0 TS
          2012-04-17 16:58:45,693 INFO org.apache.hadoop.hive.ql.exec.MapOperator:
          Initialization Done 7 MAP
          2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.MapOperator:
          Processing alias tim_hbase_occurrence for file
          hdfs://c1n2.gbif.org/user/hive/warehouse/tim_hbase_occurrence
          2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7
          forwarding 1 rows
          2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0
          forwarding 1 rows
          2012-04-17 16:58:45,716 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1
          forwarding 1 rows
          2012-04-17 16:58:45,723 FATAL ExecMapper:
          org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
          processing row

          {"id":1444,"scientific_name":null,"data_resource_id":1081}

          at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:548)
          at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
          at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
          at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
          at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:396)
          at
          org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
          at org.apache.hadoop.mapred.Child.main(Child.java:264)
          Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
          java.lang.ClassCastException:
          org.apache.hadoop.hive.serde2.lazydio.LazyDioInteger cannot be cast to
          org.apache.hadoop.hive.serde2.lazy.LazyInteger
          at
          org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:737)
          at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
          at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
          at
          org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
          at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
          at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
          at
          org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
          at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
          at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
          at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529)
          ... 9 more
          Caused by: java.lang.ClassCastException:
          org.apache.hadoop.hive.serde2.lazydio.LazyDioInteger cannot be cast to
          org.apache.hadoop.hive.serde2.lazy.LazyInteger
          at
          org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector.copyObject(LazyIntObjectInspector.java:43)
          at
          org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:239)
          at
          org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150)
          at
          org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142)
          at
          org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119)
          at
          org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:750)
          at
          org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:722)
          ... 18 more

          Test Plan: EMPTY

          Reviewers: JIRA, ashutoshc

          Reviewed By: ashutoshc

          Differential Revision: https://reviews.facebook.net/D2871 (Revision 1328157)
          HIVE-1634: Allow access to Primitive types stored in binary format in HBase (Basab Maulik, Ashutosh Chauhan via hashutosh) (Revision 1298673)

          Result = ABORTED
          hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1328157
          Files :

          • /hive/trunk/hbase-handler/src/test/queries/hbase_binary_external_table_queries.q
          • /hive/trunk/hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioBoolean.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioByte.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioDouble.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioFloat.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioInteger.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioLong.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioShort.java

          hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1298673
          Files :

          • /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
          • /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java
          • /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java
          • /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
          • /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
          • /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat.java
          • /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java
          • /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java
          • /hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java
          • /hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java
          • /hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java
          • /hive/trunk/hbase-handler/src/test/queries/hbase_binary_external_table_queries.q
          • /hive/trunk/hbase-handler/src/test/queries/hbase_binary_map_queries.q
          • /hive/trunk/hbase-handler/src/test/queries/hbase_binary_storage_queries.q
          • /hive/trunk/hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out
          • /hive/trunk/hbase-handler/src/test/results/hbase_binary_map_queries.q.out
          • /hive/trunk/hbase-handler/src/test/results/hbase_binary_storage_queries.q.out
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyPrimitive.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioBoolean.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioByte.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioDouble.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioFloat.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioInteger.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioLong.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioShort.java
          Show
          Hudson added a comment - Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/ ) HIVE-2958 [jira] GROUP BY causing ClassCastException [LazyDioInteger cannot be cast LazyInteger] (Navis Ryu via Ashutosh Chauhan) Summary: DPAL-1111 GROUP BY causing ClassCastException [LazyDioInteger cannot be cast LazyInteger] This relates to https://issues.apache.org/jira/browse/HIVE-1634 . The following work fine: CREATE EXTERNAL TABLE tim_hbase_occurrence ( id int, scientific_name string, data_resource_id int ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = ":key#b,v:scientific_name#s,v:data_resource_id#b" ) TBLPROPERTIES( "hbase.table.name" = "mini_occurrences", "hbase.table.default.storage.type" = "binary" ); SELECT * FROM tim_hbase_occurrence LIMIT 3; SELECT * FROM tim_hbase_occurrence WHERE data_resource_id=1081 LIMIT 3; However, the following fails: SELECT data_resource_id, count FROM tim_hbase_occurrence GROUP BY data_resource_id; The error given: 0 TS 2012-04-17 16:58:45,693 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Initialization Done 7 MAP 2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias tim_hbase_occurrence for file hdfs://c1n2.gbif.org/user/hive/warehouse/tim_hbase_occurrence 2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 forwarding 1 rows 2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 1 rows 2012-04-17 16:58:45,716 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 forwarding 1 rows 2012-04-17 16:58:45,723 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"id":1444,"scientific_name":null,"data_resource_id":1081} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:548) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazydio.LazyDioInteger cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyInteger at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:737) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529) ... 9 more Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazydio.LazyDioInteger cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyInteger at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector.copyObject(LazyIntObjectInspector.java:43) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:239) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:750) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:722) ... 18 more Test Plan: EMPTY Reviewers: JIRA, ashutoshc Reviewed By: ashutoshc Differential Revision: https://reviews.facebook.net/D2871 (Revision 1328157) HIVE-1634 : Allow access to Primitive types stored in binary format in HBase (Basab Maulik, Ashutosh Chauhan via hashutosh) (Revision 1298673) Result = ABORTED hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1328157 Files : /hive/trunk/hbase-handler/src/test/queries/hbase_binary_external_table_queries.q /hive/trunk/hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioBoolean.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioByte.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioDouble.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioFloat.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioInteger.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioLong.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioShort.java hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1298673 Files : /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat.java /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java /hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java /hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java /hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java /hive/trunk/hbase-handler/src/test/queries/hbase_binary_external_table_queries.q /hive/trunk/hbase-handler/src/test/queries/hbase_binary_map_queries.q /hive/trunk/hbase-handler/src/test/queries/hbase_binary_storage_queries.q /hive/trunk/hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out /hive/trunk/hbase-handler/src/test/results/hbase_binary_map_queries.q.out /hive/trunk/hbase-handler/src/test/results/hbase_binary_storage_queries.q.out /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyPrimitive.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioBoolean.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioByte.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioDouble.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioFloat.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioInteger.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioLong.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioShort.java
          Hide
          Ashutosh Chauhan added a comment -

          This issue is closed now. It was released with the fix in 0.9.0. If there is a problem, please open a new jira and link this one with that.

          Show
          Ashutosh Chauhan added a comment - This issue is closed now. It was released with the fix in 0.9.0. If there is a problem, please open a new jira and link this one with that.
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-h0.21 #1384 (See https://builds.apache.org/job/Hive-trunk-h0.21/1384/)
          HIVE-2958 [jira] GROUP BY causing ClassCastException [LazyDioInteger cannot be
          cast LazyInteger]
          (Navis Ryu via Ashutosh Chauhan)

          Summary:
          DPAL-1111 GROUP BY causing ClassCastException [LazyDioInteger cannot be cast
          LazyInteger]

          This relates to https://issues.apache.org/jira/browse/HIVE-1634.

          The following work fine:

          CREATE EXTERNAL TABLE tim_hbase_occurrence (
          id int,
          scientific_name string,
          data_resource_id int
          ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH
          SERDEPROPERTIES (
          "hbase.columns.mapping" = ":key#b,v:scientific_name#s,v:data_resource_id#b"
          ) TBLPROPERTIES(
          "hbase.table.name" = "mini_occurrences",
          "hbase.table.default.storage.type" = "binary"
          );
          SELECT * FROM tim_hbase_occurrence LIMIT 3;
          SELECT * FROM tim_hbase_occurrence WHERE data_resource_id=1081 LIMIT 3;

          However, the following fails:

          SELECT data_resource_id, count FROM tim_hbase_occurrence GROUP BY
          data_resource_id;

          The error given:

          0 TS
          2012-04-17 16:58:45,693 INFO org.apache.hadoop.hive.ql.exec.MapOperator:
          Initialization Done 7 MAP
          2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.MapOperator:
          Processing alias tim_hbase_occurrence for file
          hdfs://c1n2.gbif.org/user/hive/warehouse/tim_hbase_occurrence
          2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7
          forwarding 1 rows
          2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0
          forwarding 1 rows
          2012-04-17 16:58:45,716 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1
          forwarding 1 rows
          2012-04-17 16:58:45,723 FATAL ExecMapper:
          org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
          processing row

          {"id":1444,"scientific_name":null,"data_resource_id":1081}

          at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:548)
          at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
          at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
          at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
          at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:396)
          at
          org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
          at org.apache.hadoop.mapred.Child.main(Child.java:264)
          Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
          java.lang.ClassCastException:
          org.apache.hadoop.hive.serde2.lazydio.LazyDioInteger cannot be cast to
          org.apache.hadoop.hive.serde2.lazy.LazyInteger
          at
          org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:737)
          at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
          at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
          at
          org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
          at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
          at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
          at
          org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
          at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
          at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
          at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529)
          ... 9 more
          Caused by: java.lang.ClassCastException:
          org.apache.hadoop.hive.serde2.lazydio.LazyDioInteger cannot be cast to
          org.apache.hadoop.hive.serde2.lazy.LazyInteger
          at
          org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector.copyObject(LazyIntObjectInspector.java:43)
          at
          org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:239)
          at
          org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150)
          at
          org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142)
          at
          org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119)
          at
          org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:750)
          at
          org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:722)
          ... 18 more

          Test Plan: EMPTY

          Reviewers: JIRA, ashutoshc

          Reviewed By: ashutoshc

          Differential Revision: https://reviews.facebook.net/D2871 (Revision 1328157)

          Result = FAILURE
          hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1328157
          Files :

          • /hive/trunk/hbase-handler/src/test/queries/hbase_binary_external_table_queries.q
          • /hive/trunk/hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioBoolean.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioByte.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioDouble.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioFloat.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioInteger.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioLong.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioShort.java
          Show
          Hudson added a comment - Integrated in Hive-trunk-h0.21 #1384 (See https://builds.apache.org/job/Hive-trunk-h0.21/1384/ ) HIVE-2958 [jira] GROUP BY causing ClassCastException [LazyDioInteger cannot be cast LazyInteger] (Navis Ryu via Ashutosh Chauhan) Summary: DPAL-1111 GROUP BY causing ClassCastException [LazyDioInteger cannot be cast LazyInteger] This relates to https://issues.apache.org/jira/browse/HIVE-1634 . The following work fine: CREATE EXTERNAL TABLE tim_hbase_occurrence ( id int, scientific_name string, data_resource_id int ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = ":key#b,v:scientific_name#s,v:data_resource_id#b" ) TBLPROPERTIES( "hbase.table.name" = "mini_occurrences", "hbase.table.default.storage.type" = "binary" ); SELECT * FROM tim_hbase_occurrence LIMIT 3; SELECT * FROM tim_hbase_occurrence WHERE data_resource_id=1081 LIMIT 3; However, the following fails: SELECT data_resource_id, count FROM tim_hbase_occurrence GROUP BY data_resource_id; The error given: 0 TS 2012-04-17 16:58:45,693 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Initialization Done 7 MAP 2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias tim_hbase_occurrence for file hdfs://c1n2.gbif.org/user/hive/warehouse/tim_hbase_occurrence 2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 forwarding 1 rows 2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 1 rows 2012-04-17 16:58:45,716 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 forwarding 1 rows 2012-04-17 16:58:45,723 FATAL ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"id":1444,"scientific_name":null,"data_resource_id":1081} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:548) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazydio.LazyDioInteger cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyInteger at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:737) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529) ... 9 more Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazydio.LazyDioInteger cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyInteger at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector.copyObject(LazyIntObjectInspector.java:43) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:239) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:750) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:722) ... 18 more Test Plan: EMPTY Reviewers: JIRA, ashutoshc Reviewed By: ashutoshc Differential Revision: https://reviews.facebook.net/D2871 (Revision 1328157) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1328157 Files : /hive/trunk/hbase-handler/src/test/queries/hbase_binary_external_table_queries.q /hive/trunk/hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioBoolean.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioByte.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioDouble.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioFloat.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioInteger.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioLong.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioShort.java
          Hide
          Alan Gates added a comment -

          Version of the patch that applies to branch 0.8 after applying HIVE-2748 patch 3-1.

          Show
          Alan Gates added a comment - Version of the patch that applies to branch 0.8 after applying HIVE-2748 patch 3-1.
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-h0.21 #1299 (See https://builds.apache.org/job/Hive-trunk-h0.21/1299/)
          HIVE-1634: Allow access to Primitive types stored in binary format in HBase (Basab Maulik, Ashutosh Chauhan via hashutosh) (Revision 1298673)

          Result = FAILURE
          hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1298673
          Files :

          • /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
          • /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java
          • /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java
          • /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
          • /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
          • /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat.java
          • /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java
          • /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java
          • /hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java
          • /hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java
          • /hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java
          • /hive/trunk/hbase-handler/src/test/queries/hbase_binary_external_table_queries.q
          • /hive/trunk/hbase-handler/src/test/queries/hbase_binary_map_queries.q
          • /hive/trunk/hbase-handler/src/test/queries/hbase_binary_storage_queries.q
          • /hive/trunk/hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out
          • /hive/trunk/hbase-handler/src/test/results/hbase_binary_map_queries.q.out
          • /hive/trunk/hbase-handler/src/test/results/hbase_binary_storage_queries.q.out
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyPrimitive.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioBoolean.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioByte.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioDouble.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioFloat.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioInteger.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioLong.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioShort.java
          Show
          Hudson added a comment - Integrated in Hive-trunk-h0.21 #1299 (See https://builds.apache.org/job/Hive-trunk-h0.21/1299/ ) HIVE-1634 : Allow access to Primitive types stored in binary format in HBase (Basab Maulik, Ashutosh Chauhan via hashutosh) (Revision 1298673) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1298673 Files : /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat.java /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java /hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java /hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java /hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java /hive/trunk/hbase-handler/src/test/queries/hbase_binary_external_table_queries.q /hive/trunk/hbase-handler/src/test/queries/hbase_binary_map_queries.q /hive/trunk/hbase-handler/src/test/queries/hbase_binary_storage_queries.q /hive/trunk/hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out /hive/trunk/hbase-handler/src/test/results/hbase_binary_map_queries.q.out /hive/trunk/hbase-handler/src/test/results/hbase_binary_storage_queries.q.out /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyPrimitive.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioBoolean.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioByte.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioDouble.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioFloat.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioInteger.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioLong.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioShort.java
          Hide
          Ashutosh Chauhan added a comment -

          Committed to trunk. Thanks, Carl for the review.

          Credit for this goes to Basab Maulik who did initial patch. I just rebased and took care of the comments by reviewers. Thanks, Basab!

          Show
          Ashutosh Chauhan added a comment - Committed to trunk. Thanks, Carl for the review. Credit for this goes to Basab Maulik who did initial patch. I just rebased and took care of the comments by reviewers. Thanks, Basab!
          Hide
          Carl Steinbach added a comment -

          +1.

          @Ashutosh: can you please commit this? Thanks.

          Show
          Carl Steinbach added a comment - +1. @Ashutosh: can you please commit this? Thanks.
          Hide
          Ashutosh Chauhan added a comment -

          All the tests passed with latest patch on latest trunk.

          BUILD SUCCESSFUL
          Total time: 323 minutes 29 seconds
          
          Show
          Ashutosh Chauhan added a comment - All the tests passed with latest patch on latest trunk. BUILD SUCCESSFUL Total time: 323 minutes 29 seconds
          Hide
          Ashutosh Chauhan added a comment -

          Patch Available

          Show
          Ashutosh Chauhan added a comment - Patch Available
          Hide
          Ashutosh Chauhan added a comment -

          Patch with ASF perms.

          Show
          Ashutosh Chauhan added a comment - Patch with ASF perms.
          Hide
          Phabricator added a comment -

          ashutoshc updated the revision "HIVE-1634 [jira] Allow access to Primitive types stored in binary format in HBase".
          Reviewers: JIRA, jsichi, cwsteinbach

          Rebased to trunk.

          REVISION DETAIL
          https://reviews.facebook.net/D1581

          AFFECTED FILES
          hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out
          hbase-handler/src/test/results/hbase_binary_map_queries.q.out
          hbase-handler/src/test/results/hbase_binary_storage_queries.q.out
          hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java
          hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java
          hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java
          hbase-handler/src/test/queries/hbase_binary_map_queries.q
          hbase-handler/src/test/queries/hbase_binary_storage_queries.q
          hbase-handler/src/test/queries/hbase_binary_external_table_queries.q
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyPrimitive.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazydio
          serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioByte.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioFloat.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioDouble.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioInteger.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioBoolean.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioLong.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioShort.java

          Show
          Phabricator added a comment - ashutoshc updated the revision " HIVE-1634 [jira] Allow access to Primitive types stored in binary format in HBase". Reviewers: JIRA, jsichi, cwsteinbach Rebased to trunk. REVISION DETAIL https://reviews.facebook.net/D1581 AFFECTED FILES hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out hbase-handler/src/test/results/hbase_binary_map_queries.q.out hbase-handler/src/test/results/hbase_binary_storage_queries.q.out hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java hbase-handler/src/test/queries/hbase_binary_map_queries.q hbase-handler/src/test/queries/hbase_binary_storage_queries.q hbase-handler/src/test/queries/hbase_binary_external_table_queries.q hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyPrimitive.java serde/src/java/org/apache/hadoop/hive/serde2/lazydio serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioByte.java serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioFloat.java serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioDouble.java serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioInteger.java serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioBoolean.java serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioLong.java serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioShort.java
          Hide
          Phabricator added a comment -

          cwsteinbach has requested changes to the revision "HIVE-1634 [jira] Allow access to Primitive types stored in binary format in HBase".

          Patch needs to be rebased against trunk.

          REVISION DETAIL
          https://reviews.facebook.net/D1581

          BRANCH
          svn

          Show
          Phabricator added a comment - cwsteinbach has requested changes to the revision " HIVE-1634 [jira] Allow access to Primitive types stored in binary format in HBase". Patch needs to be rebased against trunk. REVISION DETAIL https://reviews.facebook.net/D1581 BRANCH svn
          Hide
          Phabricator added a comment -

          cwsteinbach has accepted the revision "HIVE-1634 [jira] Allow access to Primitive types stored in binary format in HBase".

          Will commit if tests pass.

          REVISION DETAIL
          https://reviews.facebook.net/D1581

          BRANCH
          svn

          Show
          Phabricator added a comment - cwsteinbach has accepted the revision " HIVE-1634 [jira] Allow access to Primitive types stored in binary format in HBase". Will commit if tests pass. REVISION DETAIL https://reviews.facebook.net/D1581 BRANCH svn
          Hide
          Phabricator added a comment -

          ashutoshc updated the revision "HIVE-1634 [jira] Allow access to Primitive types stored in binary format in HBase".
          Reviewers: JIRA, jsichi

          Fixed couple of bugs in testcases. Now all the tests pass.
          Refactored to move new introduced classes in their own package serde2.lazydio. Also named classes as LazyDioInteger and so on.

          REVISION DETAIL
          https://reviews.facebook.net/D1581

          AFFECTED FILES
          hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out
          hbase-handler/src/test/results/hbase_binary_map_queries.q.out
          hbase-handler/src/test/results/hbase_binary_storage_queries.q.out
          hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java
          hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java
          hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java
          hbase-handler/src/test/queries/hbase_binary_map_queries.q
          hbase-handler/src/test/queries/hbase_binary_storage_queries.q
          hbase-handler/src/test/queries/hbase_binary_external_table_queries.q
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyPrimitive.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazydio
          serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioByte.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioFloat.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioDouble.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioInteger.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioBoolean.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioLong.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioShort.java

          Show
          Phabricator added a comment - ashutoshc updated the revision " HIVE-1634 [jira] Allow access to Primitive types stored in binary format in HBase". Reviewers: JIRA, jsichi Fixed couple of bugs in testcases. Now all the tests pass. Refactored to move new introduced classes in their own package serde2.lazydio. Also named classes as LazyDioInteger and so on. REVISION DETAIL https://reviews.facebook.net/D1581 AFFECTED FILES hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out hbase-handler/src/test/results/hbase_binary_map_queries.q.out hbase-handler/src/test/results/hbase_binary_storage_queries.q.out hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java hbase-handler/src/test/queries/hbase_binary_map_queries.q hbase-handler/src/test/queries/hbase_binary_storage_queries.q hbase-handler/src/test/queries/hbase_binary_external_table_queries.q hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyPrimitive.java serde/src/java/org/apache/hadoop/hive/serde2/lazydio serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioByte.java serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioFloat.java serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioDouble.java serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioInteger.java serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioBoolean.java serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioLong.java serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioShort.java
          Hide
          Ashutosh Chauhan added a comment -

          This patch is ready for review.

          Show
          Ashutosh Chauhan added a comment - This patch is ready for review.
          Hide
          Phabricator added a comment -

          ashutoshc requested code review of "HIVE-1634 [jira] Allow access to Primitive types stored in binary format in HBase".
          Reviewers: JIRA

          https://issues.apache.org/jira/browse/HIVE-1634

          Rebased the patch to the trunk. This patch adds support binary storage support for HBase tables. What that means is if you have existing hbase tables (that is those not written through hive) you can use query them now using hbase-handler. Without this patch, you can only read hbase tables which were stored through hive.

          Test Plan:
          3 new .q files
          hbase_binary_external_table_queries.q
          hbase_binary_map_queries.q
          hbase_binary_storage_queries.q
          which has new tests.

          This addresses HIVE-1245 in part, for atomic or primitive types.

          The serde property "hbase.columns.storage.types" = ",b,b,b,b,b,b,b,b" is a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping". Allowed values are '' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples.

          There is also a table property "hbase.table.default.storage.type" = "string" to specify a table level default storage type. The other valid specification is "binary". The table level default is overridden by a column level specification.

          This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key.

          Attached is a program for creating a table and populating it in HBase. The external table in Hive can access the data as shown in the example below.

          hive> create external table TestHiveHBaseExternalTable
          > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
          > c_int int, c_long bigint, c_string string, c_float float, c_double double)
          > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
          > with serdeproperties ("hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double")
          > tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
          OK
          Time taken: 0.691 seconds
          hive> select * from TestHiveHBaseExternalTable;
          OK
          key-1 NULL NULL NULL NULL NULL Test-String NULL NULL
          Time taken: 0.346 seconds
          hive> drop table TestHiveHBaseExternalTable;
          OK
          Time taken: 0.139 seconds
          hive> create external table TestHiveHBaseExternalTable
          > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
          > c_int int, c_long bigint, c_string string, c_float float, c_double double)
          > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
          > with serdeproperties (
          > "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
          > "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" )
          > tblproperties (
          > "hbase.table.name" = "TestHiveHBaseExternalTable",
          > "hbase.table.default.storage.type" = "string");
          OK
          Time taken: 0.139 seconds
          hive> select * from TestHiveHBaseExternalTable;
          OK
          key-1 true -128 -32768 -2147483648 -9223372036854775808 Test-String -2.1793132E-11 2.01345E291
          Time taken: 0.151 seconds
          hive> drop table TestHiveHBaseExternalTable;
          OK
          Time taken: 0.154 seconds
          hive> create external table TestHiveHBaseExternalTable
          > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
          > c_int int, c_long bigint, c_string string, c_float float, c_double double)
          > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
          > with serdeproperties (
          > "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
          > "hbase.columns.storage.types" = ",b,b,b,b,b,,b,b" )
          > tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
          OK
          Time taken: 0.347 seconds
          hive> select * from TestHiveHBaseExternalTable;
          OK
          key-1 true -128 -32768 -2147483648 -9223372036854775808 Test-String -2.1793132E-11 2.01345E291
          Time taken: 0.245 seconds
          hive>

          TEST PLAN
          EMPTY

          REVISION DETAIL
          https://reviews.facebook.net/D1581

          AFFECTED FILES
          hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out
          hbase-handler/src/test/results/hbase_binary_map_queries.q.out
          hbase-handler/src/test/results/hbase_binary_storage_queries.q.out
          hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java
          hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java
          hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java
          hbase-handler/src/test/queries/hbase_binary_map_queries.q
          hbase-handler/src/test/queries/hbase_binary_storage_queries.q
          hbase-handler/src/test/queries/hbase_binary_external_table_queries.q
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
          hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyIntegerBinary.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyShortBinary.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyLongBinary.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyByteBinary.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFloatBinary.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyDoubleBinary.java
          serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyBooleanBinary.java

          MANAGE HERALD DIFFERENTIAL RULES
          https://reviews.facebook.net/herald/view/differential/

          WHY DID I GET THIS EMAIL?
          https://reviews.facebook.net/herald/transcript/3321/

          Tip: use the X-Herald-Rules header to filter Herald messages in your client.

          Show
          Phabricator added a comment - ashutoshc requested code review of " HIVE-1634 [jira] Allow access to Primitive types stored in binary format in HBase". Reviewers: JIRA https://issues.apache.org/jira/browse/HIVE-1634 Rebased the patch to the trunk. This patch adds support binary storage support for HBase tables. What that means is if you have existing hbase tables (that is those not written through hive) you can use query them now using hbase-handler. Without this patch, you can only read hbase tables which were stored through hive. Test Plan: 3 new .q files hbase_binary_external_table_queries.q hbase_binary_map_queries.q hbase_binary_storage_queries.q which has new tests. This addresses HIVE-1245 in part, for atomic or primitive types. The serde property "hbase.columns.storage.types" = ",b,b,b,b,b,b,b,b" is a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping". Allowed values are '' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples. There is also a table property "hbase.table.default.storage.type" = "string" to specify a table level default storage type. The other valid specification is "binary". The table level default is overridden by a column level specification. This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key. Attached is a program for creating a table and populating it in HBase. The external table in Hive can access the data as shown in the example below. hive> create external table TestHiveHBaseExternalTable > (key string, c_bool boolean, c_byte tinyint, c_short smallint, > c_int int, c_long bigint, c_string string, c_float float, c_double double) > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > with serdeproperties ("hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double") > tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable"); OK Time taken: 0.691 seconds hive> select * from TestHiveHBaseExternalTable; OK key-1 NULL NULL NULL NULL NULL Test-String NULL NULL Time taken: 0.346 seconds hive> drop table TestHiveHBaseExternalTable; OK Time taken: 0.139 seconds hive> create external table TestHiveHBaseExternalTable > (key string, c_bool boolean, c_byte tinyint, c_short smallint, > c_int int, c_long bigint, c_string string, c_float float, c_double double) > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > with serdeproperties ( > "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double", > "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" ) > tblproperties ( > "hbase.table.name" = "TestHiveHBaseExternalTable", > "hbase.table.default.storage.type" = "string"); OK Time taken: 0.139 seconds hive> select * from TestHiveHBaseExternalTable; OK key-1 true -128 -32768 -2147483648 -9223372036854775808 Test-String -2.1793132E-11 2.01345E291 Time taken: 0.151 seconds hive> drop table TestHiveHBaseExternalTable; OK Time taken: 0.154 seconds hive> create external table TestHiveHBaseExternalTable > (key string, c_bool boolean, c_byte tinyint, c_short smallint, > c_int int, c_long bigint, c_string string, c_float float, c_double double) > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > with serdeproperties ( > "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double", > "hbase.columns.storage.types" = ",b,b,b,b,b,,b,b" ) > tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable"); OK Time taken: 0.347 seconds hive> select * from TestHiveHBaseExternalTable; OK key-1 true -128 -32768 -2147483648 -9223372036854775808 Test-String -2.1793132E-11 2.01345E291 Time taken: 0.245 seconds hive> TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D1581 AFFECTED FILES hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out hbase-handler/src/test/results/hbase_binary_map_queries.q.out hbase-handler/src/test/results/hbase_binary_storage_queries.q.out hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java hbase-handler/src/test/queries/hbase_binary_map_queries.q hbase-handler/src/test/queries/hbase_binary_storage_queries.q hbase-handler/src/test/queries/hbase_binary_external_table_queries.q hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyIntegerBinary.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyShortBinary.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyLongBinary.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyByteBinary.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFloatBinary.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyDoubleBinary.java serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyBooleanBinary.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/3321/ Tip: use the X-Herald-Rules header to filter Herald messages in your client.
          Hide
          John Sichi added a comment -

          This issue is not resolved; the patch has not been completed.

          Show
          John Sichi added a comment - This issue is not resolved; the patch has not been completed.
          Hide
          Developer EA added a comment -

          It's unclear from the last comment whether the issue is resolved or not.I tried with hive-hbase-handler-0.7.1.jar, but the issue still exists. Can you suggest work around for this.

          Show
          Developer EA added a comment - It's unclear from the last comment whether the issue is resolved or not.I tried with hive-hbase-handler-0.7.1.jar, but the issue still exists. Can you suggest work around for this.
          Hide
          John Sichi added a comment -

          Not sure what happened to the last patch I was reviewing, but here it is again.

          Show
          John Sichi added a comment - Not sure what happened to the last patch I was reviewing, but here it is again.
          Hide
          John Sichi added a comment -

          But looking into it further, is it true that the only difference in persistence format is for Long and Integer (due to the zero-compression)? Or are any of the other formats different as well? If it's only these, then adding a whole new set of classes seems like a bad idea, and we should instead do any necessary refactoring now to allow the existing binary classes to be used (and add a couple of new ones for uncompressed int/long).

          Considering the fact that we eventually want to be able to store map/struct/list as well (the rest of HIVE-1245), it's worth looking into the refactoring now, since the existing lazybinary covers those too (and we don't want to duplicate that).

          Show
          John Sichi added a comment - But looking into it further, is it true that the only difference in persistence format is for Long and Integer (due to the zero-compression)? Or are any of the other formats different as well? If it's only these, then adding a whole new set of classes seems like a bad idea, and we should instead do any necessary refactoring now to allow the existing binary classes to be used (and add a couple of new ones for uncompressed int/long). Considering the fact that we eventually want to be able to store map/struct/list as well (the rest of HIVE-1245 ), it's worth looking into the refactoring now, since the existing lazybinary covers those too (and we don't want to duplicate that).
          Hide
          John Sichi added a comment -

          OK, I finally got some time to look into the Lazy* classes. I see what you mean about the class hierarchy, and I agree that we can leave any refactoring of the existing classes for a followup patch. Also, I was wrong to think that we could reuse the existing binary classes, since they do things such as VInt zero-compression, and that's incompatible with the HBase Bytes format.

          However, for this patch, I want to at least get the new classes into their final destination with respect to package name and class name (so that we don't have to move them later, even if we adjust their inheritance). To this end, I suggest a new package serde2.lazydio, and name the classes on the pattern LazyDioInteger. The "Dio" is to indicate DataInput/DataOutput format. (I was thinking of lazybytes and LazyByteInteger, to indicate HBase Bytes format, but then I saw that Byte is also one of the datatypes, and LazyBytesByte would be puzzling.)

          Having both LazyIntegerBinary and LazyBinaryInteger, as in the current patch, would just be too confusing.

          Also, regarding the implementation of the new classes, most of the init method code is duplicated from class to class. The only thing specific to each class is the actual read+set. Should we factor out a LazyDioObject (similar to the existing pattern for LazyObject and LazyBinaryObject)? Likewise for LazyDioPrimitive and LazyDioNonPrimitive.

          I will ask some others to chime in on this as well.

          Show
          John Sichi added a comment - OK, I finally got some time to look into the Lazy* classes. I see what you mean about the class hierarchy, and I agree that we can leave any refactoring of the existing classes for a followup patch. Also, I was wrong to think that we could reuse the existing binary classes, since they do things such as VInt zero-compression, and that's incompatible with the HBase Bytes format. However, for this patch, I want to at least get the new classes into their final destination with respect to package name and class name (so that we don't have to move them later, even if we adjust their inheritance). To this end, I suggest a new package serde2.lazydio, and name the classes on the pattern LazyDioInteger. The "Dio" is to indicate DataInput/DataOutput format. (I was thinking of lazybytes and LazyByteInteger, to indicate HBase Bytes format, but then I saw that Byte is also one of the datatypes, and LazyBytesByte would be puzzling.) Having both LazyIntegerBinary and LazyBinaryInteger, as in the current patch, would just be too confusing. Also, regarding the implementation of the new classes, most of the init method code is duplicated from class to class. The only thing specific to each class is the actual read+set. Should we factor out a LazyDioObject (similar to the existing pattern for LazyObject and LazyBinaryObject)? Likewise for LazyDioPrimitive and LazyDioNonPrimitive. I will ask some others to chime in on this as well.
          Hide
          John Sichi added a comment -

          Thanks Basab, I'm going to try to take a look at this one next week.

          Show
          John Sichi added a comment - Thanks Basab, I'm going to try to take a look at this one next week.
          Hide
          Basab Maulik added a comment -

          Re: Beyond the review comments I added, I do have some higher-level suggestions:

          • For the column mapping, the reason I suggested "a:b:string" in the original JIRA description is that it's a pain to keep everything lined up by column position. It's already less than ideal that we do the column name mapping by position, so I don't think we should make it worse by having a separate property for type. Using the s/b shorthand is fine, and if you think that we shouldn't overload the colon, we can use a different separator, e.g. "cf:cq#s". Since the existing property name is hbase.columns.mapping, I don't think it will be confusing to roll in the (optional) type info as well.

          I have adopted your suggestion of '#' as the separator to the storage information and use 'hbase.columns.mapping' to carry the additional storage information optionally. I have made a small change to allow any prefix of 'string' or of 'binary' to be valid, i.e. s/b or str/bin or string/binary etc.

          • I'm wondering whether we can just use the existing classes like LazyBinaryByte in package org.apache.hadoop.hive.serde2.lazybinary instead of creating new ones. Or are these not compatible with hbase.utils.Bytes?

          I think the incompatibility stems more from trying to stay within the serde2.lazy.Lazy family of objects which the HBaseSerDe, LazyHBaseRow, and LazyHBaseCellMap extend or depend on. It will be useful to have these two families of classes compatible (inherit from a common base class). Small differences in the object inspector classes which type parametrize these classes further complicates getting past the type system. Should be doable but perhaps as a separate patch?

          • For the tests, I noticed that you have attached TestHiveHBaseExternalTable. I think it would be a good idea if you can create and populate such a fixture table in HBaseTestSetup; that way it can be available (treated as read-only) to all of the HBase .q tests. Otherwise, it's hard to verify that we're compatible with a table created directly through HBase API's rather than Hive.

          Done. Added tests to create a Hive external table associated with this HBase table and test queries.

          • Also for the tests, it would be good if you can filter it down to only a small number of representative rows when pulling the initial test data set from the Hive src table. That way, we can keep the .q.out files smaller.

          Done, the .out files are a lot smaller than in the initial patch.

          • Once we get this one committed, be sure to update the wiki.

          Will do once this is committed.

          Show
          Basab Maulik added a comment - Re: Beyond the review comments I added, I do have some higher-level suggestions: For the column mapping, the reason I suggested "a:b:string" in the original JIRA description is that it's a pain to keep everything lined up by column position. It's already less than ideal that we do the column name mapping by position, so I don't think we should make it worse by having a separate property for type. Using the s/b shorthand is fine, and if you think that we shouldn't overload the colon, we can use a different separator, e.g. "cf:cq#s". Since the existing property name is hbase.columns.mapping, I don't think it will be confusing to roll in the (optional) type info as well. I have adopted your suggestion of '#' as the separator to the storage information and use 'hbase.columns.mapping' to carry the additional storage information optionally. I have made a small change to allow any prefix of 'string' or of 'binary' to be valid, i.e. s/b or str/bin or string/binary etc. I'm wondering whether we can just use the existing classes like LazyBinaryByte in package org.apache.hadoop.hive.serde2.lazybinary instead of creating new ones. Or are these not compatible with hbase.utils.Bytes? I think the incompatibility stems more from trying to stay within the serde2.lazy.Lazy family of objects which the HBaseSerDe, LazyHBaseRow, and LazyHBaseCellMap extend or depend on. It will be useful to have these two families of classes compatible (inherit from a common base class). Small differences in the object inspector classes which type parametrize these classes further complicates getting past the type system. Should be doable but perhaps as a separate patch? For the tests, I noticed that you have attached TestHiveHBaseExternalTable. I think it would be a good idea if you can create and populate such a fixture table in HBaseTestSetup; that way it can be available (treated as read-only) to all of the HBase .q tests. Otherwise, it's hard to verify that we're compatible with a table created directly through HBase API's rather than Hive. Done. Added tests to create a Hive external table associated with this HBase table and test queries. Also for the tests, it would be good if you can filter it down to only a small number of representative rows when pulling the initial test data set from the Hive src table. That way, we can keep the .q.out files smaller. Done, the .out files are a lot smaller than in the initial patch. Once we get this one committed, be sure to update the wiki. Will do once this is committed.
          Hide
          HBase Review Board added a comment -

          Message from: bkm.hadoop@gmail.com

          On 2010-09-16 13:28:48, John Sichi wrote:

          > trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 499

          > <http://review.cloudera.org/r/826/diff/1/?file=11523#file11523line499>

          >

          > Doesn't this error message need to change?

          Updated the comment to "' should be mapped to Map<? extends LazyPrimitive<?, ?>,?>, that is " + "the Key for the map should be of primitive type, but is ... "

          On 2010-09-16 13:28:48, John Sichi wrote:

          > trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 623

          > <http://review.cloudera.org/r/826/diff/1/?file=11523#file11523line623>

          >

          > I don't understand these TODO's.

          Removed/updated comment.

          On 2010-09-16 13:28:48, John Sichi wrote:

          > trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 76

          > <http://review.cloudera.org/r/826/diff/1/?file=11523#file11523line76>

          >

          > We keep adding new List data members. Probably time to move to a single List<ColumnMapping>, with a new class ColumnMapping with fields for familyName, familyNameBytes, qualifierName, qualifierNameBytes, familyBinary, qualifierBinary. That will be a lot cleaner and also allow you to avoid the boolean [] here, which is a little clumsy.

          I have changed the code to use List<ColumnMapping> with the fields of interest as members of this data class.

          On 2010-09-16 13:28:48, John Sichi wrote:

          > trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java, line 480

          > <http://review.cloudera.org/r/826/diff/1/?file=11526#file11526line480>

          >

          > Why is this assertion commented out?

          I have removed this test. We do have coverage from the .q files for this case. This was failing due to small differences in the byte arrays from DataOutputStream/DataInputStream vs o.a.h.hbase.utils.Bytes.

          • bkm

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/826/#review1247
          -----------------------------------------------------------

          Show
          HBase Review Board added a comment - Message from: bkm.hadoop@gmail.com On 2010-09-16 13:28:48, John Sichi wrote: > trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 499 > < http://review.cloudera.org/r/826/diff/1/?file=11523#file11523line499 > > > Doesn't this error message need to change? Updated the comment to "' should be mapped to Map<? extends LazyPrimitive<?, ?>,?>, that is " + "the Key for the map should be of primitive type, but is ... " On 2010-09-16 13:28:48, John Sichi wrote: > trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 623 > < http://review.cloudera.org/r/826/diff/1/?file=11523#file11523line623 > > > I don't understand these TODO's. Removed/updated comment. On 2010-09-16 13:28:48, John Sichi wrote: > trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 76 > < http://review.cloudera.org/r/826/diff/1/?file=11523#file11523line76 > > > We keep adding new List data members. Probably time to move to a single List<ColumnMapping>, with a new class ColumnMapping with fields for familyName, familyNameBytes, qualifierName, qualifierNameBytes, familyBinary, qualifierBinary. That will be a lot cleaner and also allow you to avoid the boolean [] here, which is a little clumsy. I have changed the code to use List<ColumnMapping> with the fields of interest as members of this data class. On 2010-09-16 13:28:48, John Sichi wrote: > trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java, line 480 > < http://review.cloudera.org/r/826/diff/1/?file=11526#file11526line480 > > > Why is this assertion commented out? I have removed this test. We do have coverage from the .q files for this case. This was failing due to small differences in the byte arrays from DataOutputStream/DataInputStream vs o.a.h.hbase.utils.Bytes. bkm ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/826/#review1247 -----------------------------------------------------------
          Hide
          HBase Review Board added a comment -

          Message from: bkm.hadoop@gmail.com

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/826/
          -----------------------------------------------------------

          (Updated 2010-10-21 20:11:06.837430)

          Review request for Hive Developers and John Sichi.

          Changes
          -------

          The proposed serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" as a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping" has been removed. Instead the storage option is an optional part of the "hbase.columns.mapping" and is specified for a column using '#' as a separator following the column family/qualifier. Allowed values are '' for table default, a prefix of 'string' for standard string storage, and a prefix of 'binary' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 'str:bin' or 's:b' for the key and value part specifiers respectively.

          The tests TestHBaseSerDe, TestLazyHBaseObject, TestHBaseCliDriver, and TestHBaseMinimrCliDriver pass.

          Summary
          -------

          This addresses HIVE-1245 in part, for atomic or primitive types.

          The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping". Allowed values are '' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples.

          There is also a table property "hbase.table.default.storage.type" = "string" to specify a table level default storage type. The other valid specification is "binary". The table level default is overridden by a column level specification.

          This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key.

          This addresses bug HIVE-1634.
          http://issues.apache.org/jira/browse/HIVE-1634

          Diffs (updated)


          trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 1023967
          trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java 1023967
          trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java 1023967
          trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 1023967
          trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java 1023967
          trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java 1023967
          trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 1023967
          trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java 1023967
          trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 1023967
          trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java 1023967
          trunk/hbase-handler/src/test/queries/hbase_binary_external_table_queries.q PRE-CREATION
          trunk/hbase-handler/src/test/queries/hbase_binary_map_queries.q PRE-CREATION
          trunk/hbase-handler/src/test/queries/hbase_binary_storage_queries.q PRE-CREATION
          trunk/hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out PRE-CREATION
          trunk/hbase-handler/src/test/results/hbase_binary_map_queries.q.out PRE-CREATION
          trunk/hbase-handler/src/test/results/hbase_binary_storage_queries.q.out PRE-CREATION
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyBooleanBinary.java PRE-CREATION
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyByteBinary.java PRE-CREATION
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyDoubleBinary.java PRE-CREATION
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java 1023967
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFloatBinary.java PRE-CREATION
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyIntegerBinary.java PRE-CREATION
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyLongBinary.java PRE-CREATION
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyShortBinary.java PRE-CREATION
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java 1023967

          Diff: http://review.cloudera.org/r/826/diff

          Testing
          -------

          The HBase handler tests TestHBaseSerDe, TestLazyHBaseObject, TestHBaseCliDriver, and TestHBaseMinimrCliDriver pass.

          New tests have been added to TestHBaseSerDe and TestLazyHBaseObject to test this feature.

          New queries which exercise this feature have been added to query files hbase_binary_map_queries.q and hbase_binary_storage_queries.q.

          Thanks,

          bkm

          Show
          HBase Review Board added a comment - Message from: bkm.hadoop@gmail.com ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/826/ ----------------------------------------------------------- (Updated 2010-10-21 20:11:06.837430) Review request for Hive Developers and John Sichi. Changes ------- The proposed serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" as a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping" has been removed. Instead the storage option is an optional part of the "hbase.columns.mapping" and is specified for a column using '#' as a separator following the column family/qualifier. Allowed values are '' for table default, a prefix of 'string' for standard string storage, and a prefix of 'binary' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 'str:bin' or 's:b' for the key and value part specifiers respectively. The tests TestHBaseSerDe, TestLazyHBaseObject, TestHBaseCliDriver, and TestHBaseMinimrCliDriver pass. Summary ------- This addresses HIVE-1245 in part, for atomic or primitive types. The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping". Allowed values are '' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples. There is also a table property "hbase.table.default.storage.type" = "string" to specify a table level default storage type. The other valid specification is "binary". The table level default is overridden by a column level specification. This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key. This addresses bug HIVE-1634 . http://issues.apache.org/jira/browse/HIVE-1634 Diffs (updated) trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 1023967 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java 1023967 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java 1023967 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 1023967 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java 1023967 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java 1023967 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 1023967 trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java 1023967 trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 1023967 trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java 1023967 trunk/hbase-handler/src/test/queries/hbase_binary_external_table_queries.q PRE-CREATION trunk/hbase-handler/src/test/queries/hbase_binary_map_queries.q PRE-CREATION trunk/hbase-handler/src/test/queries/hbase_binary_storage_queries.q PRE-CREATION trunk/hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out PRE-CREATION trunk/hbase-handler/src/test/results/hbase_binary_map_queries.q.out PRE-CREATION trunk/hbase-handler/src/test/results/hbase_binary_storage_queries.q.out PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyBooleanBinary.java PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyByteBinary.java PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyDoubleBinary.java PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java 1023967 trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFloatBinary.java PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyIntegerBinary.java PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyLongBinary.java PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyShortBinary.java PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java 1023967 Diff: http://review.cloudera.org/r/826/diff Testing ------- The HBase handler tests TestHBaseSerDe, TestLazyHBaseObject, TestHBaseCliDriver, and TestHBaseMinimrCliDriver pass. New tests have been added to TestHBaseSerDe and TestLazyHBaseObject to test this feature. New queries which exercise this feature have been added to query files hbase_binary_map_queries.q and hbase_binary_storage_queries.q. Thanks, bkm
          Hide
          HBase Review Board added a comment -

          Message from: "John Sichi" <jsichi@facebook.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/826/#review1247
          -----------------------------------------------------------

          trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
          <http://review.cloudera.org/r/826/#comment4213>

          We keep adding new List data members. Probably time to move to a single List<ColumnMapping>, with a new class ColumnMapping with fields for familyName, familyNameBytes, qualifierName, qualifierNameBytes, familyBinary, qualifierBinary. That will be a lot cleaner and also allow you to avoid the boolean [] here, which is a little clumsy.

          trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
          <http://review.cloudera.org/r/826/#comment4210>

          Doesn't this error message need to change?

          trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
          <http://review.cloudera.org/r/826/#comment4214>

          I don't understand these TODO's.

          trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java
          <http://review.cloudera.org/r/826/#comment4215>

          Why is this assertion commented out?

          • John
          Show
          HBase Review Board added a comment - Message from: "John Sichi" <jsichi@facebook.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/826/#review1247 ----------------------------------------------------------- trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java < http://review.cloudera.org/r/826/#comment4213 > We keep adding new List data members. Probably time to move to a single List<ColumnMapping>, with a new class ColumnMapping with fields for familyName, familyNameBytes, qualifierName, qualifierNameBytes, familyBinary, qualifierBinary. That will be a lot cleaner and also allow you to avoid the boolean [] here, which is a little clumsy. trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java < http://review.cloudera.org/r/826/#comment4210 > Doesn't this error message need to change? trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java < http://review.cloudera.org/r/826/#comment4214 > I don't understand these TODO's. trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java < http://review.cloudera.org/r/826/#comment4215 > Why is this assertion commented out? John
          Hide
          John Sichi added a comment -

          Hey Basab,

          This is a great start. Beyond the review comments I added, I do have some higher-level suggestions:

          • For the column mapping, the reason I suggested "a:b:string" in the original JIRA description is that it's a pain to keep everything lined up by column position. It's already less than ideal that we do the column name mapping by position, so I don't think we should make it worse by having a separate property for type. Using the s/b shorthand is fine, and if you think that we shouldn't overload the colon, we can use a different separator, e.g. "cf:cq#s". Since the existing property name is hbase.columns.mapping, I don't think it will be confusing to roll in the (optional) type info as well.
          • I'm wondering whether we can just use the existing classes like LazyBinaryByte in package org.apache.hadoop.hive.serde2.lazybinary instead of creating new ones. Or are these not compatible with hbase.utils.Bytes?
          • For the tests, I noticed that you have attached TestHiveHBaseExternalTable. I think it would be a good idea if you can create and populate such a fixture table in HBaseTestSetup; that way it can be available (treated as read-only) to all of the HBase .q tests. Otherwise, it's hard to verify that we're compatible with a table created directly through HBase API's rather than Hive.
          • Also for the tests, it would be good if you can filter it down to only a small number of representative rows when pulling the initial test data set from the Hive src table. That way, we can keep the .q.out files smaller.
          • Once we get this one committed, be sure to update the wiki.
          Show
          John Sichi added a comment - Hey Basab, This is a great start. Beyond the review comments I added, I do have some higher-level suggestions: For the column mapping, the reason I suggested "a:b:string" in the original JIRA description is that it's a pain to keep everything lined up by column position. It's already less than ideal that we do the column name mapping by position, so I don't think we should make it worse by having a separate property for type. Using the s/b shorthand is fine, and if you think that we shouldn't overload the colon, we can use a different separator, e.g. "cf:cq#s". Since the existing property name is hbase.columns.mapping, I don't think it will be confusing to roll in the (optional) type info as well. I'm wondering whether we can just use the existing classes like LazyBinaryByte in package org.apache.hadoop.hive.serde2.lazybinary instead of creating new ones. Or are these not compatible with hbase.utils.Bytes? For the tests, I noticed that you have attached TestHiveHBaseExternalTable. I think it would be a good idea if you can create and populate such a fixture table in HBaseTestSetup; that way it can be available (treated as read-only) to all of the HBase .q tests. Otherwise, it's hard to verify that we're compatible with a table created directly through HBase API's rather than Hive. Also for the tests, it would be good if you can filter it down to only a small number of representative rows when pulling the initial test data set from the Hive src table. That way, we can keep the .q.out files smaller. Once we get this one committed, be sure to update the wiki.
          Hide
          HBase Review Board added a comment -

          Message from: bkm.hadoop@gmail.com

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/826/
          -----------------------------------------------------------

          Review request for Hive Developers and John Sichi.

          Summary
          -------

          This addresses HIVE-1245 in part, for atomic or primitive types.

          The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping". Allowed values are '' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples.

          There is also a table property "hbase.table.default.storage.type" = "string" to specify a table level default storage type. The other valid specification is "binary". The table level default is overridden by a column level specification.

          This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key.

          This addresses bug HIVE-1634.
          http://issues.apache.org/jira/browse/HIVE-1634

          Diffs


          trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 990439
          trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java 990439
          trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 990439
          trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 990439
          trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java 990439
          trunk/hbase-handler/src/test/queries/hbase_binary_map_queries.q PRE-CREATION
          trunk/hbase-handler/src/test/queries/hbase_binary_storage_queries.q PRE-CREATION
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyBooleanBinary.java PRE-CREATION
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyByteBinary.java PRE-CREATION
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyDoubleBinary.java PRE-CREATION
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java 990439
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFloatBinary.java PRE-CREATION
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyIntegerBinary.java PRE-CREATION
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyLongBinary.java PRE-CREATION
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyShortBinary.java PRE-CREATION
          trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java 990439

          Diff: http://review.cloudera.org/r/826/diff

          Testing
          -------

          The HBase handler tests TestHBaseSerDe, TestLazyHBaseObject, TestHBaseCliDriver, and TestHBaseMinimrCliDriver pass.

          New tests have been added to TestHBaseSerDe and TestLazyHBaseObject to test this feature.

          New queries which exercise this feature have been added to query files hbase_binary_map_queries.q and hbase_binary_storage_queries.q.

          Thanks,

          bkm

          Show
          HBase Review Board added a comment - Message from: bkm.hadoop@gmail.com ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/826/ ----------------------------------------------------------- Review request for Hive Developers and John Sichi. Summary ------- This addresses HIVE-1245 in part, for atomic or primitive types. The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping". Allowed values are '' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples. There is also a table property "hbase.table.default.storage.type" = "string" to specify a table level default storage type. The other valid specification is "binary". The table level default is overridden by a column level specification. This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key. This addresses bug HIVE-1634 . http://issues.apache.org/jira/browse/HIVE-1634 Diffs trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 990439 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java 990439 trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 990439 trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 990439 trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java 990439 trunk/hbase-handler/src/test/queries/hbase_binary_map_queries.q PRE-CREATION trunk/hbase-handler/src/test/queries/hbase_binary_storage_queries.q PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyBooleanBinary.java PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyByteBinary.java PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyDoubleBinary.java PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java 990439 trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFloatBinary.java PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyIntegerBinary.java PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyLongBinary.java PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyShortBinary.java PRE-CREATION trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java 990439 Diff: http://review.cloudera.org/r/826/diff Testing ------- The HBase handler tests TestHBaseSerDe, TestLazyHBaseObject, TestHBaseCliDriver, and TestHBaseMinimrCliDriver pass. New tests have been added to TestHBaseSerDe and TestLazyHBaseObject to test this feature. New queries which exercise this feature have been added to query files hbase_binary_map_queries.q and hbase_binary_storage_queries.q. Thanks, bkm
          Hide
          Basab Maulik added a comment -

          Attached is a preliminary patch for this issue.

          A scan of the HBase table for the example above:

          hbase(main):004:0> scan 'TestHiveHBaseExternalTable'
          ROW COLUMN+CELL
          key-1 column=cf:boolean, timestamp=1284406847770, value=\xFF
          key-1 column=cf:byte, timestamp=1284406847770, value=\x80
          key-1 column=cf:double, timestamp=1284406847770, value=|i\xD3lwy\xDCb
          key-1 column=cf:float, timestamp=1284406847770, value=\xAD\xBF\xB1\xC5
          key-1 column=cf:int, timestamp=1284406847770, value=\x80\x00\x00\x00
          key-1 column=cf:long, timestamp=1284406847770, value=\x80\x00\x00\x00\x00\x00\x00\x00
          key-1 column=cf:short, timestamp=1284406847770, value=\x80\x00
          key-1 column=cf:string, timestamp=1284406847770, value=Test-String
          1 row(s) in 0.3670 seconds

          Show
          Basab Maulik added a comment - Attached is a preliminary patch for this issue. A scan of the HBase table for the example above: hbase(main):004:0> scan 'TestHiveHBaseExternalTable' ROW COLUMN+CELL key-1 column=cf:boolean, timestamp=1284406847770, value=\xFF key-1 column=cf:byte, timestamp=1284406847770, value=\x80 key-1 column=cf:double, timestamp=1284406847770, value=|i\xD3lwy\xDCb key-1 column=cf:float, timestamp=1284406847770, value=\xAD\xBF\xB1\xC5 key-1 column=cf:int, timestamp=1284406847770, value=\x80\x00\x00\x00 key-1 column=cf:long, timestamp=1284406847770, value=\x80\x00\x00\x00\x00\x00\x00\x00 key-1 column=cf:short, timestamp=1284406847770, value=\x80\x00 key-1 column=cf:string, timestamp=1284406847770, value=Test-String 1 row(s) in 0.3670 seconds

            People

            • Assignee:
              Ashutosh Chauhan
              Reporter:
              Basab Maulik
            • Votes:
              3 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development