Hive
  1. Hive
  2. HIVE-2361

Add some UDFs which help to migrate Oracle to Hive

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.8.0
    • Fix Version/s: None
    • Component/s: UDF
    • Labels:

      Description

      Here some UDFs which can be matched to oracle functions:
      There are two kinds of oracle functions. one is scalar function and another is analytic function.
      Most scalar functions in Oracle can be converted to hive's udf directly.

      Oracle Scalar Function

      GenericUDFDecode : Compares first argument to each other value one by one. e.g., DECODE(x,0,'zero',1,'one') will return 'zero' if x is 0
      GenericUDFGreatest : Return the greatest of the list of one or more expressions. e.g., GREATEST(2,5,12,3) will return 12
      GenericUDFInstr : Return the location of a substring in a string. e.g., INSTR('next', 'e') will return 2
      GenericUDFLnnvl : Evaluate a condition when one or both operands of the condition may be null. e.g., LNNVL(2 > 4) will return true
      GenericUDFNVL : Replace null with a string in the results of a query. e.g., NVL(null,'hive') will return hive
      GenericUDFNVL2 : Determine the value returned by a query based on whether a specified expression is null or not null. e.g., NVL2(null,'not null','null value') will return 'null value'
      GenericUDFToNumber : Convert a string to a number. e.g., TO_NUMBER('112','999') will return 112
      GenericUDFTrunc : Returns a date truncated to a specific unit of measure. e.g., TRUNC('2002-11-02 01:01:01','YYYY') will return '2002-01-01 00:00:00'

      Oracle Analytic Function

      Most analytic functions in Oracle can't be converted to hive's query and udf directly.
      Following udfs should be used with DISTRIBUTED, SORT BY and HASH of hive to support analytic functions

      e.q., SELECT FUNC(hash(col1), col2, ...) FROM SELECT ~ FROM table DISTRIBUTED BY hash(col1) SORT BY col1, col2 ...

      GenericUDFSum : Calculate a cumulative sum.
      GenericUDFRank : Assign a sequential order, or rank within some group based on key.
      GenericUDFDenseRank : Act like RANK function except that it assigns consecutive ranks.
      GenericUDFRowNumber : Return sequence integer value within some group based on key.
      GenericUDFMax : Determine the highest value within some group based on key.
      GenericUDFMin : Determine the lowest value within some group based on key.
      GenericUDFLag : Access data from a previous row.

      This udfs was developed with hive-pdk

      1. nexr-udf.tar
        185 kB
        JunHo Cho

        Issue Links

          Activity

          Hide
          Aniket Mokashi added a comment -

          This issue depends on Hive-2524 for better utility functions. I suggest to commit these UDFs on 0.8. I will create a patch for the same.

          Show
          Aniket Mokashi added a comment - This issue depends on Hive-2524 for better utility functions. I suggest to commit these UDFs on 0.8. I will create a patch for the same.
          Hide
          Carl Steinbach added a comment -

          @JunHo: It would be great to get these committed. Do you have time to modify the patch so that they are added as standard UDFs? Thanks.

          Show
          Carl Steinbach added a comment - @JunHo: It would be great to get these committed. Do you have time to modify the patch so that they are added as standard UDFs? Thanks.
          Hide
          neelesh gadhia added a comment -

          New to using UDF's on hive. But implemented these on hive 0.9.0 and hadoop 1.1.1. Was able to add jar to hive and kicked the sample sql
          SELECT t.empno, t.deptno, t.sal, nexr_sum(hash(t.deptno),t.sal) as sal_sum
          FROM (
          select a.empno, a.deptno, a.sal from emp a
          distribute by hash(a.deptno)
          sort BY a.deptno, a.empno
          ) t;

          The sql failed with errors

          2013-02-13 23:30:18,925 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201302132324_0002_r_000000_3' to tip task_201302132324_0002_r_000000, for tracker 'tracker_192.168.0.151:localhost/127.0.0.1:50099'
          2013-02-13 23:30:18,925 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201302132324_0002_r_000000_2'
          2013-02-13 23:30:26,484 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201302132324_0002_r_000000_3: java.lang.RuntimeException: Error in configuring object
          at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
          at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
          at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
          at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:486)
          at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
          at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:396)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
          at org.apache.hadoop.mapred.Child.main(Child.java:249)
          Caused by: java.lang.reflect.InvocationTargetException
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          at java.lang.reflect.Method.invoke(Method.java:597)
          at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
          ... 9 more
          Caused by: java.lang.RuntimeException: Reduce operator initialization failed
          at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:157)
          ... 14 more
          Caused by: java.lang.NullPointerException
          at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:137)
          at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:896)
          at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:922)
          at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60)
          at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
          at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
          at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389)
          at org.apache.hadoop.hive.ql.exec.ExtractOperator.initializeOp(ExtractOperator.java:40)
          at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
          at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:150)
          ... 14 more

          2013-02-13 23:30:29,819 INFO org.apache.hadoop.mapred.TaskInProgress: TaskInProgress task_201302132324_0002_r_000000 has failed 4 times.
          2013-02-13 23:30:29,820 INFO org.apache.hadoop.mapred.JobInProgress: TaskTracker at '192.168.0.151' turned 'flaky'
          .... 12 more lines..

          Tried different function "GenericUDFMax".. same error.

          Any pointers/advise, what could be wrong?

          Show
          neelesh gadhia added a comment - New to using UDF's on hive. But implemented these on hive 0.9.0 and hadoop 1.1.1. Was able to add jar to hive and kicked the sample sql SELECT t.empno, t.deptno, t.sal, nexr_sum(hash(t.deptno),t.sal) as sal_sum FROM ( select a.empno, a.deptno, a.sal from emp a distribute by hash(a.deptno) sort BY a.deptno, a.empno ) t; The sql failed with errors 2013-02-13 23:30:18,925 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201302132324_0002_r_000000_3' to tip task_201302132324_0002_r_000000, for tracker 'tracker_192.168.0.151:localhost/127.0.0.1:50099' 2013-02-13 23:30:18,925 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201302132324_0002_r_000000_2' 2013-02-13 23:30:26,484 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201302132324_0002_r_000000_3: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:486) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:157) ... 14 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:137) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:896) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:922) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:60) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) at org.apache.hadoop.hive.ql.exec.ExtractOperator.initializeOp(ExtractOperator.java:40) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:150) ... 14 more 2013-02-13 23:30:29,819 INFO org.apache.hadoop.mapred.TaskInProgress: TaskInProgress task_201302132324_0002_r_000000 has failed 4 times. 2013-02-13 23:30:29,820 INFO org.apache.hadoop.mapred.JobInProgress: TaskTracker at '192.168.0.151' turned 'flaky' .... 12 more lines.. Tried different function "GenericUDFMax".. same error. Any pointers/advise, what could be wrong?
          Hide
          Aniket Mokashi added a comment -

          Please post this discussion to hive-user mailing list for faster help.

          Show
          Aniket Mokashi added a comment - Please post this discussion to hive-user mailing list for faster help.
          Hide
          neelesh gadhia added a comment -

          can you post the url for discussion/forum for hive-user mailing list?. or is it just the email address I need to send the details about the issue?

          Show
          neelesh gadhia added a comment - can you post the url for discussion/forum for hive-user mailing list?. or is it just the email address I need to send the details about the issue?
          Hide
          Kostiantyn Kudriavtsev added a comment -

          It seems useful, specially Scalar functions... does this code available somewhere?

          Show
          Kostiantyn Kudriavtsev added a comment - It seems useful, specially Scalar functions... does this code available somewhere?

            People

            • Assignee:
              JunHo Cho
              Reporter:
              JunHo Cho
            • Votes:
              2 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:

                Development