Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-2361

Add some UDFs which help to migrate Oracle to Hive

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 0.8.0
    • None
    • UDF

    Description

      Here some UDFs which can be matched to oracle functions:
      There are two kinds of oracle functions. one is scalar function and another is analytic function.
      Most scalar functions in Oracle can be converted to hive's udf directly.

      Oracle Scalar Function

      GenericUDFDecode : Compares first argument to each other value one by one. e.g., DECODE(x,0,'zero',1,'one') will return 'zero' if x is 0
      GenericUDFGreatest : Return the greatest of the list of one or more expressions. e.g., GREATEST(2,5,12,3) will return 12
      GenericUDFInstr : Return the location of a substring in a string. e.g., INSTR('next', 'e') will return 2
      GenericUDFLnnvl : Evaluate a condition when one or both operands of the condition may be null. e.g., LNNVL(2 > 4) will return true
      GenericUDFNVL : Replace null with a string in the results of a query. e.g., NVL(null,'hive') will return hive
      GenericUDFNVL2 : Determine the value returned by a query based on whether a specified expression is null or not null. e.g., NVL2(null,'not null','null value') will return 'null value'
      GenericUDFToNumber : Convert a string to a number. e.g., TO_NUMBER('112','999') will return 112
      GenericUDFTrunc : Returns a date truncated to a specific unit of measure. e.g., TRUNC('2002-11-02 01:01:01','YYYY') will return '2002-01-01 00:00:00'

      Oracle Analytic Function

      Most analytic functions in Oracle can't be converted to hive's query and udf directly.
      Following udfs should be used with DISTRIBUTED, SORT BY and HASH of hive to support analytic functions

      e.q., SELECT FUNC(hash(col1), col2, ...) FROM SELECT ~ FROM table DISTRIBUTED BY hash(col1) SORT BY col1, col2 ...

      GenericUDFSum : Calculate a cumulative sum.
      GenericUDFRank : Assign a sequential order, or rank within some group based on key.
      GenericUDFDenseRank : Act like RANK function except that it assigns consecutive ranks.
      GenericUDFRowNumber : Return sequence integer value within some group based on key.
      GenericUDFMax : Determine the highest value within some group based on key.
      GenericUDFMin : Determine the lowest value within some group based on key.
      GenericUDFLag : Access data from a previous row.

      This udfs was developed with hive-pdk

      Attachments

        1. nexr-udf.tar
          185 kB
          JunHo Cho

        Issue Links

          Activity

            People

              miloveme JunHo Cho
              miloveme JunHo Cho
              Votes:
              2 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated: