Details

Type: New Feature

Status: Open

Priority: Minor

Resolution: Unresolved

Affects Version/s: None

Fix Version/s: None

Component/s: UDF

Labels:None
Description
Here some UD(A)Fs which can be incorporated into the Hive distribution:
UDFArgMax  Find the 0indexed index of the largest argument. e.g., ARGMAX(4, 5, 3) returns 1.
UDFBucket  Find the bucket in which the first argument belongs. e.g., BUCKET(x, b_1, b_2, b_3, ...), will return the smallest i such that x > b_
but <= b_
{i+1}. Returns 0 if x is smaller than all the buckets.
UDFFindInArray  Finds the 1index of the first element in the array given as the second argument. Returns 0 if not found. Returns NULL if either argument is NULL. E.g., FIND_IN_ARRAY(5, array(1,2,5)) will return 3. FIND_IN_ARRAY(5, array(1,2,3)) will return 0.
UDFGreatCircleDist  Finds the great circle distance (in km) between two lat/long coordinates (in degrees).
UDFLDA  Performs LDA inference on a vector given fixed topics.
UDFNumberRows  Number successive rows starting from 1. Counter resets to 1 whenever any of its parameters changes.
UDFPmax  Finds the maximum of a set of columns. e.g., PMAX(4, 5, 3) returns 5.
UDFRegexpExtractAll  Like REGEXP_EXTRACT except that it returns all matches in an array.
UDFUnescape  Returns the string unescaped (using C/Java style unescaping).
UDFWhich  Given a boolean array, return the indices which are TRUE.
UDFJaccard
UDAFCollect  Takes all the values associated with a row and converts it into a list. Make sure to have: set hive.map.aggr = false;
UDAFCollectMap  Like collect except that it takes tuples and generates a map.
UDAFEntropy  Compute the entropy of a column.
UDAFPearson (BROKEN!!!)  Computes the pearson correlation between two columns.
UDAFTop  TOP(KEY, VAL)  returns the KEY associated with the largest value of VAL.
UDAFTopN (BROKEN!!!)  Like TOP except returns a list of the keys associated with the N (passed as the third parameter) largest values of VAL.
UDAFHistogram
Here is a tarball of the poorly documented/tested udfs.