Hive
  1. Hive
  2. HIVE-758

function to load data from hive to hbase

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: HBase Handler, UDF
    • Labels:
      None

      Description

      supoprt a query like: SELECT hbase_put('hive_hbase_table', rowid, colfamily, col, value, ts) FROM src;

      1. hive-758.2.patch
        21 kB
        Raghotham Murthy
      2. hive-758.1.patch
        6 kB
        Raghotham Murthy

        Issue Links

          Activity

          Hide
          Nick Dimiduk added a comment -

          Is this feature deprecated by support for external HBase tables?

          Show
          Nick Dimiduk added a comment - Is this feature deprecated by support for external HBase tables?
          Carl Steinbach made changes -
          Component/s UDF [ 12313585 ]
          John Sichi made changes -
          Component/s HBase Handler [ 12313461 ]
          Component/s Contrib [ 12313001 ]
          Hide
          SeanM added a comment -

          This UDAF works well but I've encountered two gotchyas:

          strong*Nested queries with a where clause that filter out records will throw an exception, even if there are no null values in the table whatsoever*strong

           
          Hive> SELECT hbase_put("test", rowid, "data", colfamily, value, 0) FROM ( SELECT * FROM some_table WHERE value = "some_value") t1;
          
          java.lang.RuntimeException: Error while closing operators
          	at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:232)
          	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
          	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
          	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
          	at org.apache.hadoop.mapred.Child.main(Child.java:170)
          Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public boolean org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator.iterate(java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int)  on object org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator@70e35d5 of class org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator with arguments {null, null, null, null, null, null} of size 6
          	at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:799)
          	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:462)
          	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
          	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
          	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
          	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
          	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
          	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470)
          	at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:211)
          	... 4 more
          Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public boolean org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator.iterate(java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int)  on object org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator@70e35d5 of class org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator with arguments {null, null, null, null, null, null} of size 6
          	at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:661)
          	at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.iterate(GenericUDAFBridge.java:167)
          	at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:110)
          	at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:768)
          	... 12 more
          Caused by: java.lang.IllegalArgumentException
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          	at java.lang.reflect.Method.invoke(Method.java:597)
          	at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:638)
          	... 15 more
          
          
          It's just a hunch, but it seems like the UDAFs iterate() is being called with null values for rows that were filtered out?
          

          strong*Null values*strong
          The UDAF is very sensitive to null values. If using mapped or array types, or any field that may possibly be null, use an if construct for safety:

          if (some_field IS NULL, "", some_field)
          
          Show
          SeanM added a comment - This UDAF works well but I've encountered two gotchyas: strong*Nested queries with a where clause that filter out records will throw an exception, even if there are no null values in the table whatsoever*strong Hive> SELECT hbase_put("test", rowid, "data", colfamily, value, 0) FROM ( SELECT * FROM some_table WHERE value = "some_value") t1; java.lang.RuntimeException: Error while closing operators at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:232) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public boolean org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator.iterate(java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int) on object org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator@70e35d5 of class org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator with arguments {null, null, null, null, null, null} of size 6 at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:799) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:462) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:470) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:211) ... 4 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public boolean org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator.iterate(java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.lang.String,int) on object org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator@70e35d5 of class org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut$UDAFHbasePutEvaluator with arguments {null, null, null, null, null, null} of size 6 at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:661) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.iterate(GenericUDAFBridge.java:167) at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:110) at org.apache.hadoop.hive.ql.exec.GroupByOperator.closeOp(GroupByOperator.java:768) ... 12 more Caused by: java.lang.IllegalArgumentException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:638) ... 15 more It's just a hunch, but it seems like the UDAFs iterate() is being called with null values for rows that were filtered out? strong*Null values*strong The UDAF is very sensitive to null values. If using mapped or array types, or any field that may possibly be null, use an if construct for safety: if (some_field IS NULL, "", some_field)
          Zheng Shao made changes -
          Link This issue relates to HIVE-806 [ HIVE-806 ]
          Zheng Shao made changes -
          Link This issue relates to HIVE-705 [ HIVE-705 ]
          Raghotham Murthy made changes -
          Attachment hive-758.2.patch [ 12417495 ]
          Hide
          Raghotham Murthy added a comment -

          Added warning to description. But, it doesnt look like i cannot describe a temporary function. I get an error:

          hive> ADD FILE /data/users/rmurthy/dev/hive/build/dist/lib/hive_contrib.jar;                          
          hive> CREATE TEMPORARY FUNCTION hbase_put AS 'org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut';
          OK
          Time taken: 0.263 seconds
          hive> DESCRIBE FUNCTION hash_put;                                                                     
          FAILED: Error in metadata: java.lang.NullPointerException
          FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
          
          Show
          Raghotham Murthy added a comment - Added warning to description. But, it doesnt look like i cannot describe a temporary function. I get an error: hive> ADD FILE /data/users/rmurthy/dev/hive/build/dist/lib/hive_contrib.jar; hive> CREATE TEMPORARY FUNCTION hbase_put AS 'org.apache.hadoop.hive.contrib.udaf.hbase.UDAFHbasePut'; OK Time taken: 0.263 seconds hive> DESCRIBE FUNCTION hash_put; FAILED: Error in metadata: java.lang.NullPointerException FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
          Raghotham Murthy made changes -
          Summary [contrib] function to load data from hive to hbase function to load data from hive to hbase
          Component/s Contrib [ 12313001 ]
          Component/s Query Processor [ 12312586 ]
          Hide
          Zheng Shao added a comment -

          Both this and HIVE-645 should put a javadoc comment/description on the UDAF/UDF warning people about the possibility of side effects (failed tasks, speculative executions).

          Show
          Zheng Shao added a comment - Both this and HIVE-645 should put a javadoc comment/description on the UDAF/UDF warning people about the possibility of side effects (failed tasks, speculative executions).
          Zheng Shao made changes -
          Link This issue relates to HIVE-645 [ HIVE-645 ]
          Raghotham Murthy made changes -
          Field Original Value New Value
          Attachment hive-758.1.patch [ 12416640 ]
          Hide
          Raghotham Murthy added a comment -

          This is a UDAF to load the data. It returns the total number of rows loaded. Currently, we need to set HBASE_HOME, HIVE_HOME and HADOOP_HOME before running the test script at contrib/src/test/scripts/udaf_hbase_put_test.sh. We need to use this script because hive -f currently does not support running shell commands within a file.

          Show
          Raghotham Murthy added a comment - This is a UDAF to load the data. It returns the total number of rows loaded. Currently, we need to set HBASE_HOME, HIVE_HOME and HADOOP_HOME before running the test script at contrib/src/test/scripts/udaf_hbase_put_test.sh. We need to use this script because hive -f currently does not support running shell commands within a file.
          Raghotham Murthy created issue -

            People

            • Assignee:
              Unassigned
              Reporter:
              Raghotham Murthy
            • Votes:
              1 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:

                Development