Hive
  1. Hive
  2. HIVE-2418

replace or translate function in hive

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0
    • Fix Version/s: 0.10.0
    • Component/s: UDF
    • Labels:
    • Environment:

      hive-0.7.0

    • Hadoop Flags:
      Incompatible change
    • Tags:
      replace or translate function

      Description

      replace or translate function in hive

      1. udf_translate_v3_with_3_negative_tests.patch
        27 kB
        Mark Grover
      2. udf_translate_v3_with_1_negative_test.patch
        26 kB
        Mark Grover
      3. udf_translate_v2_with_3_negative_tests.patch
        27 kB
        Mark Grover
      4. udf_translate_v2_with_1_negative_test.patch
        26 kB
        Mark Grover
      5. udf_translate_v1.patch
        17 kB
        Mark Grover
      6. hive-2418.1.patch.txt
        26 kB
        Edward Capriolo

        Activity

        Hide
        Edward Capriolo added a comment -

        What would the function do more details are needed.

        Show
        Edward Capriolo added a comment - What would the function do more details are needed.
        Hide
        Joey Echeverria added a comment -

        Perhaps they meant a function similar to Postgres's translate function:

        http://www.postgresql.org/docs/9.1/interactive/functions-string.html

        Show
        Joey Echeverria added a comment - Perhaps they meant a function similar to Postgres's translate function: http://www.postgresql.org/docs/9.1/interactive/functions-string.html
        Hide
        Edward Capriolo added a comment -

        This would be a nice issue to tackle for someone looking to break into hive.

        Show
        Edward Capriolo added a comment - This would be a nice issue to tackle for someone looking to break into hive.
        Hide
        Mark Grover added a comment -

        Edward, I would be happy to work on this. Not sure how to assign this to myself but if you could do that, that would be much appreciated. Thanks!

        Show
        Mark Grover added a comment - Edward, I would be happy to work on this. Not sure how to assign this to myself but if you could do that, that would be much appreciated. Thanks!
        Hide
        Ashutosh Chauhan added a comment -

        Mark,
        Assigned to you. Hack away!

        Show
        Ashutosh Chauhan added a comment - Mark, Assigned to you. Hack away!
        Hide
        Mark Grover added a comment -

        The code and test .q files are ready but I am having some trouble generating the .q.out files for the tests.

        As a matter of fact, I am not even able to run the existing unit tests successfully. I believe it's because of the fact the source tables that these tests access e.g. "src" are not available. Do I have to edit some configuration or run some init script before I can run the test?
        The command I am using to run my test is something like (this runs an existing test and fails):
        ant -lib testlibs clean-test test -Dtestcase=TestCliDriver -Dqfile=udf_concat_ws.q

        with an exception:

        2012-06-03 19:33:21,399 ERROR exec.Task (SessionState.java:printError(400)) - Execution failed with exit status: 127
        2012-06-03 19:33:21,400 ERROR exec.Task (SessionState.java:printError(400)) - Obtaining error information
        2012-06-03 19:33:21,401 ERROR exec.Task (SessionState.java:printError(400)) -
        Task failed!
        Task ID:
        Stage-1

        Logs:

        2012-06-03 19:33:21,401 ERROR exec.Task (SessionState.java:printError(400)) - /home/mgrover/hive_install/src/trunk/build/ql/tmp//hive.log
        2012-06-03 19:33:21,402 ERROR exec.ExecDriver (MapRedTask.java:execute(286)) - Execution failed with exit status: 127
        2012-06-03 19:33:21,594 ERROR ql.Driver (SessionState.java:printError(400)) - FAILED: Execution Error, return code 127 from org.apache.hadoop.hive.ql.exec.MapRedTask

        Any thoughts?

        Show
        Mark Grover added a comment - The code and test .q files are ready but I am having some trouble generating the .q.out files for the tests. As a matter of fact, I am not even able to run the existing unit tests successfully. I believe it's because of the fact the source tables that these tests access e.g. "src" are not available. Do I have to edit some configuration or run some init script before I can run the test? The command I am using to run my test is something like (this runs an existing test and fails): ant -lib testlibs clean-test test -Dtestcase=TestCliDriver -Dqfile=udf_concat_ws.q with an exception: 2012-06-03 19:33:21,399 ERROR exec.Task (SessionState.java:printError(400)) - Execution failed with exit status: 127 2012-06-03 19:33:21,400 ERROR exec.Task (SessionState.java:printError(400)) - Obtaining error information 2012-06-03 19:33:21,401 ERROR exec.Task (SessionState.java:printError(400)) - Task failed! Task ID: Stage-1 Logs: 2012-06-03 19:33:21,401 ERROR exec.Task (SessionState.java:printError(400)) - /home/mgrover/hive_install/src/trunk/build/ql/tmp//hive.log 2012-06-03 19:33:21,402 ERROR exec.ExecDriver (MapRedTask.java:execute(286)) - Execution failed with exit status: 127 2012-06-03 19:33:21,594 ERROR ql.Driver (SessionState.java:printError(400)) - FAILED: Execution Error, return code 127 from org.apache.hadoop.hive.ql.exec.MapRedTask Any thoughts?
        Hide
        Mark Grover added a comment -

        Here is what I have so far. It's missing the .q.out files since I haven't been able to run the tests yet but I am posting it so folks can start the review.

        Show
        Mark Grover added a comment - Here is what I have so far. It's missing the .q.out files since I haven't been able to run the tests yet but I am posting it so folks can start the review.
        Hide
        Mark Grover added a comment -

        Here is what I have so far. It's missing the .q.out files since I haven't been able to run the tests yet but I am posting it so folks can start the review.

        Show
        Mark Grover added a comment - Here is what I have so far. It's missing the .q.out files since I haven't been able to run the tests yet but I am posting it so folks can start the review.
        Hide
        Edward Capriolo added a comment -

        127 might mean hive is not able to find java aka ... no JAVA_HOME is set. Code looks great so far. You do not need so many negative tests, tests take a long time to run. 1 or 0 is more then enough.

        Nice job so far.

        Show
        Edward Capriolo added a comment - 127 might mean hive is not able to find java aka ... no JAVA_HOME is set. Code looks great so far. You do not need so many negative tests, tests take a long time to run. 1 or 0 is more then enough. Nice job so far.
        Hide
        Mark Grover added a comment -

        Thanks, Ed. That fixed the issue!

        Show
        Mark Grover added a comment - Thanks, Ed. That fixed the issue!
        Hide
        Mark Grover added a comment -

        A new version of the patch with a minor bug fix related to complex argument being passed as a parameter.
        This has only 1 negative test.

        Show
        Mark Grover added a comment - A new version of the patch with a minor bug fix related to complex argument being passed as a parameter. This has only 1 negative test.
        Hide
        Mark Grover added a comment -

        A new version of the patch with a minor bug fix related to complex argument being passed as a parameter.
        This has 3 negative tests.

        Show
        Mark Grover added a comment - A new version of the patch with a minor bug fix related to complex argument being passed as a parameter. This has 3 negative tests.
        Hide
        Mark Grover added a comment -

        I uploaded 2 v2 patches. One has 3 negative tests, the only has 1. Take your pick on which one you'd like to deploy.

        Show
        Mark Grover added a comment - I uploaded 2 v2 patches. One has 3 negative tests, the only has 1. Take your pick on which one you'd like to deploy.
        Hide
        Edward Capriolo added a comment -

        Mark all looks good. 2 exceptions you need to remove the @author tag and follow the hive style rules mainly 2 spaces for an indent not tabs. There should be en eclipse style template to help with this.

        Show
        Edward Capriolo added a comment - Mark all looks good. 2 exceptions you need to remove the @author tag and follow the hive style rules mainly 2 spaces for an indent not tabs. There should be en eclipse style template to help with this.
        Hide
        Mark Grover added a comment -

        New version of the patch with minor changes related to formatting and checkstyle.

        With only one negative test.

        Show
        Mark Grover added a comment - New version of the patch with minor changes related to formatting and checkstyle. With only one negative test.
        Hide
        Mark Grover added a comment -

        New version of the patch with minor changes related to formatting and checkstyle.

        With three negative tests

        Show
        Mark Grover added a comment - New version of the patch with minor changes related to formatting and checkstyle. With three negative tests
        Hide
        Mark Grover added a comment -

        Thanks for reviewing, Ed. I have taken out the @author tag and fixed the indentation. Made some minor checkstyle related fixes too. Please let me know if there is anything else.

        Show
        Mark Grover added a comment - Thanks for reviewing, Ed. I have taken out the @author tag and fixed the indentation. Made some minor checkstyle related fixes too. Please let me know if there is anything else.
        Hide
        Edward Capriolo added a comment -

        @Mark sorry to make you go through a couple iterations because I did only did partial reviews. One last question/comment.

        Hive tries to avoid allocating new objects each evaluate if possible.

        I notice in your process input method.

        +  private String processInput(Text input) {
        +    StringBuilder resultBuilder = new StringBuilder();
        

        Then later.

        +    String resultString = resultBuilder.toString();
        +    return resultString;
        

        It is possible to change your return type to TEXT and declare the StringBuilder and result object outside the method, then each call to evaluate can clear these objects?

        If you do not believe coding the feature this way will create less temporary objects/garbage just say 'no', and I will commit as is.

        Also you need only provide the version with the single negative test if you commit a new version.

        Show
        Edward Capriolo added a comment - @Mark sorry to make you go through a couple iterations because I did only did partial reviews. One last question/comment. Hive tries to avoid allocating new objects each evaluate if possible. I notice in your process input method. + private String processInput(Text input) { + StringBuilder resultBuilder = new StringBuilder(); Then later. + String resultString = resultBuilder.toString(); + return resultString; It is possible to change your return type to TEXT and declare the StringBuilder and result object outside the method, then each call to evaluate can clear these objects? If you do not believe coding the feature this way will create less temporary objects/garbage just say 'no', and I will commit as is. Also you need only provide the version with the single negative test if you commit a new version.
        Hide
        Mark Grover added a comment -

        No worries, Ed. I thought about that too. However, StringBuilder doesn't have a clear() so in order to clear it (in a non-hacky way), we have to re-create it. Alternatively, if we use Text as the return type of processInput() and get rid of the StringBuilder object, we will have to call Text.append() to append new code points to the result. Text.append() takes a byte[] as parameter. However, all the ways I can think of right now to convert a char[] (containing the code point) to byte[] (for appending to the Text object) involve creating an immutable String object as an intermediary which kind of defeats the purpose of moving away from the StringBuilder object.

        Given this information, I think it's best if we keep the StringBuilder creation and the code the way it is. Thanks!

        Show
        Mark Grover added a comment - No worries, Ed. I thought about that too. However, StringBuilder doesn't have a clear() so in order to clear it (in a non-hacky way), we have to re-create it. Alternatively, if we use Text as the return type of processInput() and get rid of the StringBuilder object, we will have to call Text.append() to append new code points to the result. Text.append() takes a byte[] as parameter. However, all the ways I can think of right now to convert a char[] (containing the code point) to byte[] (for appending to the Text object) involve creating an immutable String object as an intermediary which kind of defeats the purpose of moving away from the StringBuilder object. Given this information, I think it's best if we keep the StringBuilder creation and the code the way it is. Thanks!
        Hide
        Edward Capriolo added a comment -

        +1 will commit if test pass.

        Show
        Edward Capriolo added a comment - +1 will commit if test pass.
        Hide
        Edward Capriolo added a comment -

        Removed negative test cases.

        Show
        Edward Capriolo added a comment - Removed negative test cases.
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-h0.21 #1469 (See https://builds.apache.org/job/Hive-trunk-h0.21/1469/)
        HIVE-2418 Translate/Replace UDF (Mark Grover via egc) (Revision 1346933)

        Result = FAILURE
        ecapriolo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1346933
        Files :

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTranslate.java
        • /hive/trunk/ql/src/test/queries/clientpositive/udf_translate.q
        • /hive/trunk/ql/src/test/results/clientpositive/show_functions.q.out
        • /hive/trunk/ql/src/test/results/clientpositive/udf_translate.q.out
        Show
        Hudson added a comment - Integrated in Hive-trunk-h0.21 #1469 (See https://builds.apache.org/job/Hive-trunk-h0.21/1469/ ) HIVE-2418 Translate/Replace UDF (Mark Grover via egc) (Revision 1346933) Result = FAILURE ecapriolo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1346933 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTranslate.java /hive/trunk/ql/src/test/queries/clientpositive/udf_translate.q /hive/trunk/ql/src/test/results/clientpositive/show_functions.q.out /hive/trunk/ql/src/test/results/clientpositive/udf_translate.q.out
        Hide
        Mark Grover added a comment -

        Seems like this failure is not related to the check-in and it was still committed (I see it on svn). So, I guess all is well? Also, I was thinking of documenting this UDF on the Hive UDF wiki page and putting a note saying it's available starting Hive 0.10. Is that OK? Or, is the convention to wait until the release is out and then document it? (If I can still remember to do so)

        Show
        Mark Grover added a comment - Seems like this failure is not related to the check-in and it was still committed (I see it on svn). So, I guess all is well? Also, I was thinking of documenting this UDF on the Hive UDF wiki page and putting a note saying it's available starting Hive 0.10. Is that OK? Or, is the convention to wait until the release is out and then document it? (If I can still remember to do so )
        Hide
        Edward Capriolo added a comment -

        @Mark yes. If anything is failing the last thing committed "fails" You are more then welcome to document anything you like in the wiki. You can put a caveat about the version if you like.

        Show
        Edward Capriolo added a comment - @Mark yes. If anything is failing the last thing committed "fails" You are more then welcome to document anything you like in the wiki. You can put a caveat about the version if you like.
        Hide
        Mark Grover added a comment -

        Thanks, Ed. Looks like Lars Francke beat me to it (love your dedication to documentation, Lars!). So, it's now appropriately documented at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions

        Show
        Mark Grover added a comment - Thanks, Ed. Looks like Lars Francke beat me to it (love your dedication to documentation, Lars!). So, it's now appropriately documented at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
        HIVE-2418 Translate/Replace UDF (Mark Grover via egc) (Revision 1346933)

        Result = ABORTED
        ecapriolo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1346933
        Files :

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTranslate.java
        • /hive/trunk/ql/src/test/queries/clientpositive/udf_translate.q
        • /hive/trunk/ql/src/test/results/clientpositive/show_functions.q.out
        • /hive/trunk/ql/src/test/results/clientpositive/udf_translate.q.out
        Show
        Hudson added a comment - Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/ ) HIVE-2418 Translate/Replace UDF (Mark Grover via egc) (Revision 1346933) Result = ABORTED ecapriolo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1346933 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTranslate.java /hive/trunk/ql/src/test/queries/clientpositive/udf_translate.q /hive/trunk/ql/src/test/results/clientpositive/show_functions.q.out /hive/trunk/ql/src/test/results/clientpositive/udf_translate.q.out
        Hide
        Ashutosh Chauhan added a comment -

        This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

        Show
        Ashutosh Chauhan added a comment - This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

          People

          • Assignee:
            Mark Grover
            Reporter:
            kranthikiran
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 96h
              96h
              Remaining:
              Remaining Estimate - 96h
              96h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development