Hive
  1. Hive
  2. HIVE-512

[GenericUDF] new string function ELT(N,str1,str2,str3,...)

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.4.0
    • Fix Version/s: 0.4.0
    • Component/s: Query Processor
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      ELT(N,str1,str2,str3,...)

      Returns str1 if N = 1, str2 if N = 2, and so on. Returns NULL if N is less than 1 or greater than the number of arguments. ELT() is the complement of FIELD().

      mysql> SELECT ELT(1, 'ej', 'Heja', 'hej', 'foo');
              -> 'ej'
      mysql> SELECT ELT(4, 'ej', 'Heja', 'hej', 'foo');
              -> 'foo'
      
      1. HIVE-512.2.patch
        11 kB
        Min Zhou
      2. HIVE-512.patch
        5 kB
        Min Zhou

        Issue Links

          Activity

          Hide
          Min Zhou added a comment -

          patch w/o unit test. the api of GenericUDF leaded something inconvenient to custom an GenericUDF.
          I've commentted those inconvenience in my patch.

          Show
          Min Zhou added a comment - patch w/o unit test. the api of GenericUDF leaded something inconvenient to custom an GenericUDF. I've commentted those inconvenience in my patch.
          Hide
          Min Zhou added a comment -

          patch, tested all test cases.

          Show
          Min Zhou added a comment - patch, tested all test cases.
          Hide
          Namit Jain added a comment -

          mysql> create table t1 as select elt(1, 2, 3);
          Query OK, 1 row affected (0.27 sec)
          Records: 1 Duplicates: 0 Warnings: 0

          mysql> describe t1;
          ---------------------------------------------+

          Field Type Null Key Default Extra

          ---------------------------------------------+

          elt(1, 2, 3) varbinary(1) YES   NULL  

          ---------------------------------------------+
          1 row in set (0.00 sec)

          mysql> create table t2 as select elt(1, 'a', 'b');
          Query OK, 1 row affected (0.07 sec)
          Records: 1 Duplicates: 0 Warnings: 0

          mysql> describe t2;
          -----------------------------------------------+

          Field Type Null Key Default Extra

          -----------------------------------------------+

          elt(1, 'a', 'b') varchar(1) YES   NULL  

          -----------------------------------------------+
          1 row in set (0.00 sec)

          the return datatype of elt is the least common denominator of all the arguments, whereas it is always string in your case.
          Can you change the function to return the least common denominator ?

          Show
          Namit Jain added a comment - mysql> create table t1 as select elt(1, 2, 3); Query OK, 1 row affected (0.27 sec) Records: 1 Duplicates: 0 Warnings: 0 mysql> describe t1; ------------- ------------ ---- --- ------- ------+ Field Type Null Key Default Extra ------------- ------------ ---- --- ------- ------+ elt(1, 2, 3) varbinary(1) YES   NULL   ------------- ------------ ---- --- ------- ------+ 1 row in set (0.00 sec) mysql> create table t2 as select elt(1, 'a', 'b'); Query OK, 1 row affected (0.07 sec) Records: 1 Duplicates: 0 Warnings: 0 mysql> describe t2; ----------------- ---------- ---- --- ------- ------+ Field Type Null Key Default Extra ----------------- ---------- ---- --- ------- ------+ elt(1, 'a', 'b') varchar(1) YES   NULL   ----------------- ---------- ---- --- ------- ------+ 1 row in set (0.00 sec) the return datatype of elt is the least common denominator of all the arguments, whereas it is always string in your case. Can you change the function to return the least common denominator ?
          Hide
          Min Zhou added a comment -

          actually, elt return only two types of results in mysql : varbinary, varchar.
          varchar will be returned if all arguments are varchars, or varbinarys will be returned.

          mysql> create table t3 as select elt(1, 'a', 3);
          Query OK, 1 row affected (0.01 sec)
          Records: 1 Duplicates: 0 Warnings: 0

          mysql> describe t3;
          ------------------------------------------------+

          Field Type Null Key Default Extra

          ------------------------------------------------+

          elt(1, 'a', 3) varbinary(1) YES   NULL  

          ------------------------------------------------+
          1 row in set (0.00 sec)

          mysql> create table t4 as select elt(1, true, false);
          Query OK, 1 row affected (0.00 sec)
          Records: 1 Duplicates: 0 Warnings: 0

          mysql> describe t4;
          -----------------------------------------------------+

          Field Type Null Key Default Extra

          -----------------------------------------------------+

          elt(1, true, false) varbinary(1) YES   NULL  

          -----------------------------------------------------+
          1 row in set (0.00 sec)

          mysql> create table t5 as select elt(1, 2.0, false);
          Query OK, 1 row affected (0.01 sec)
          Records: 1 Duplicates: 0 Warnings: 0

          mysql> describe t5;
          ----------------------------------------------------+

          Field Type Null Key Default Extra

          ----------------------------------------------------+

          elt(1, 2.0, false) varbinary(4) YES   NULL  

          ----------------------------------------------------+
          1 row in set (0.00 sec)

          Based on the above, I think it better return string as binary is commonly used in hive.

          Show
          Min Zhou added a comment - actually, elt return only two types of results in mysql : varbinary, varchar. varchar will be returned if all arguments are varchars, or varbinarys will be returned. mysql> create table t3 as select elt(1, 'a', 3); Query OK, 1 row affected (0.01 sec) Records: 1 Duplicates: 0 Warnings: 0 mysql> describe t3; ---------------- ------------ ---- --- ------- ------+ Field Type Null Key Default Extra ---------------- ------------ ---- --- ------- ------+ elt(1, 'a', 3) varbinary(1) YES   NULL   ---------------- ------------ ---- --- ------- ------+ 1 row in set (0.00 sec) mysql> create table t4 as select elt(1, true, false); Query OK, 1 row affected (0.00 sec) Records: 1 Duplicates: 0 Warnings: 0 mysql> describe t4; --------------------- ------------ ---- --- ------- ------+ Field Type Null Key Default Extra --------------------- ------------ ---- --- ------- ------+ elt(1, true, false) varbinary(1) YES   NULL   --------------------- ------------ ---- --- ------- ------+ 1 row in set (0.00 sec) mysql> create table t5 as select elt(1, 2.0, false); Query OK, 1 row affected (0.01 sec) Records: 1 Duplicates: 0 Warnings: 0 mysql> describe t5; -------------------- ------------ ---- --- ------- ------+ Field Type Null Key Default Extra -------------------- ------------ ---- --- ------- ------+ elt(1, 2.0, false) varbinary(4) YES   NULL   -------------------- ------------ ---- --- ------- ------+ 1 row in set (0.00 sec) Based on the above, I think it better return string as binary is commonly used in hive.
          Hide
          Min Zhou added a comment -

          some typo.
          Based on the above, I think it better return string sine binary isnot commonly used in hive.

          Show
          Min Zhou added a comment - some typo. Based on the above, I think it better return string sine binary isnot commonly used in hive.
          Hide
          Namit Jain added a comment -

          select elt(1,2,3) should return an int instead of string

          Show
          Namit Jain added a comment - select elt(1,2,3) should return an int instead of string
          Hide
          Min Zhou added a comment -

          Here is the definition of elt: Return string at index number.
          It's essentially a string function
          select elt(1, 2, 3) will return a varbinary in mysql, rather than a int. I still insist returning string is better.

          Even if do it as you said, what type of results will return when doing queries like below?

          select(1, '2', 3)
          select(2, '2', 3)
          select(1, true, 3)
          select(2, 2.0, cast(3 as double))

          Show
          Min Zhou added a comment - Here is the definition of elt: Return string at index number. It's essentially a string function select elt(1, 2, 3) will return a varbinary in mysql, rather than a int. I still insist returning string is better. Even if do it as you said, what type of results will return when doing queries like below? select(1, '2', 3) select(2, '2', 3) select(1, true, 3) select(2, 2.0, cast(3 as double))
          Hide
          Prasad Chakka added a comment -

          it may have been more useful to return a common denominator type but mysql behaves differently. varbinary in mysql seems to be more closer to a string than to a number. so if we want to follow mysql it is better to return number and let user cast it to int. it is weird though to return a string if all arguments are numeric.

          Show
          Prasad Chakka added a comment - it may have been more useful to return a common denominator type but mysql behaves differently. varbinary in mysql seems to be more closer to a string than to a number. so if we want to follow mysql it is better to return number and let user cast it to int. it is weird though to return a string if all arguments are numeric.
          Hide
          Min Zhou added a comment -

          Originally, arguments after the first one of elt the can only be strings, but for more widely applicable, automatic conversion is added.
          It's a string function, not a function on ints and other types besides strings. I think is very resonable return a string.

          Show
          Min Zhou added a comment - Originally, arguments after the first one of elt the can only be strings, but for more widely applicable, automatic conversion is added. It's a string function, not a function on ints and other types besides strings. I think is very resonable return a string.
          Hide
          Namit Jain added a comment -

          I think we should follow the same convention as we do for CASE statement. The same denominator type should be returned

          Show
          Namit Jain added a comment - I think we should follow the same convention as we do for CASE statement. The same denominator type should be returned
          Hide
          Min Zhou added a comment -

          can you answer me about this queries?

          Show
          Min Zhou added a comment - can you answer me about this queries?
          Hide
          Min Zhou added a comment -

          select(1, '2', 3)
          select(2, '2', 3)
          select(1, true, 3)
          select(2, 2.0, cast(3 as double))

          if we don't uniformly return strings, it would be confused to user detemining which type will return.

          Show
          Min Zhou added a comment - select(1, '2', 3) select(2, '2', 3) select(1, true, 3) select(2, 2.0, cast(3 as double)) if we don't uniformly return strings, it would be confused to user detemining which type will return.
          Hide
          Namit Jain added a comment -

          As I said before, the return type should be the same as would be for case

          select case x then '2' else 3
          select case x then '2' else 3
          select case x then true else 3
          select case x then 2.0 else cast(3 as double)

          The first argument's value is not important as far as the return type is concerned

          Show
          Namit Jain added a comment - As I said before, the return type should be the same as would be for case select case x then '2' else 3 select case x then '2' else 3 select case x then true else 3 select case x then 2.0 else cast(3 as double) The first argument's value is not important as far as the return type is concerned
          Hide
          Min Zhou added a comment -

          If you inspected the implementation of case, you will know it's unacceptable to case with different types of arguments.
          see: GenericUDFCase.java , GenericUDFWhen.java

          hive> select case when true then '2' else 3 end from pokes limit 1;
          FAILED: Error in semantic analysis: line 1:36 Argument Type Mismatch 3: The expression after ELSE should have the same type as those after THEN: "string" is expected but "int" is found
          

          elt is a string function, confusion will be caused if we casually change its behavior. It no need make things more complex.

          Show
          Min Zhou added a comment - If you inspected the implementation of case, you will know it's unacceptable to case with different types of arguments. see: GenericUDFCase.java , GenericUDFWhen.java hive> select case when true then '2' else 3 end from pokes limit 1; FAILED: Error in semantic analysis: line 1:36 Argument Type Mismatch 3: The expression after ELSE should have the same type as those after THEN: "string" is expected but " int " is found elt is a string function, confusion will be caused if we casually change its behavior. It no need make things more complex.
          Hide
          Namit Jain added a comment -

          OK

          +1

          I will commit it if the tests pass

          Show
          Namit Jain added a comment - OK +1 I will commit it if the tests pass
          Hide
          Namit Jain added a comment -

          Committed. Thanks Min

          Show
          Namit Jain added a comment - Committed. Thanks Min

            People

            • Assignee:
              Min Zhou
              Reporter:
              Min Zhou
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development