Hive
  1. Hive
  2. HIVE-2942

substr on string containing UTF-8 characters produces StringIndexOutOfBoundsException

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.0
    • Component/s: None
    • Labels:
      None

      Description

      After HIVE-2792, the substr function produces a StringIndexOutOfBoundsException when called on a string containing UTF-8 characters without the length argument being present.

      E.g.
      select substr(str, 1) from table1;

      now fails with that exception if str contains a UTF-8 character for any row in the table.

        Activity

        Hide
        Phabricator added a comment -

        kevinwilfong requested code review of "HIVE-2942 [jira] substr on string containing UTF-8 characters produces StringIndexOutOfBoundsException".
        Reviewers: JIRA

        https://issues.apache.org/jira/browse/HIVE-2942

        Fixed UDFSubstr so that for strings, the substr now succeeds if there is a UTF-8 character by using the string length instead of the Text length.

        Also, updated QTestUtil so that we can now write tests which include UTF-8 characters.

        After HIVE-2792, the substr function produces a StringIndexOutOfBoundsException when called on a string containing UTF-8 characters without the length argument being present.

        E.g.
        select substr(str, 1) from table1;

        now fails with that exception if str contains a UTF-8 character for any row in the table.

        TEST PLAN
        EMPTY

        REVISION DETAIL
        https://reviews.facebook.net/D2727

        AFFECTED FILES
        ql/src/test/results/clientpositive/udf_substr.q.out
        ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java
        ql/src/test/queries/clientpositive/udf_substr.q
        ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java

        MANAGE HERALD DIFFERENTIAL RULES
        https://reviews.facebook.net/herald/view/differential/

        WHY DID I GET THIS EMAIL?
        https://reviews.facebook.net/herald/transcript/6219/

        Tip: use the X-Herald-Rules header to filter Herald messages in your client.

        Show
        Phabricator added a comment - kevinwilfong requested code review of " HIVE-2942 [jira] substr on string containing UTF-8 characters produces StringIndexOutOfBoundsException". Reviewers: JIRA https://issues.apache.org/jira/browse/HIVE-2942 Fixed UDFSubstr so that for strings, the substr now succeeds if there is a UTF-8 character by using the string length instead of the Text length. Also, updated QTestUtil so that we can now write tests which include UTF-8 characters. After HIVE-2792 , the substr function produces a StringIndexOutOfBoundsException when called on a string containing UTF-8 characters without the length argument being present. E.g. select substr(str, 1) from table1; now fails with that exception if str contains a UTF-8 character for any row in the table. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D2727 AFFECTED FILES ql/src/test/results/clientpositive/udf_substr.q.out ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java ql/src/test/queries/clientpositive/udf_substr.q ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/6219/ Tip: use the X-Herald-Rules header to filter Herald messages in your client.
        Hide
        Phabricator added a comment -

        kevinwilfong has commented on the revision "HIVE-2942 [jira] substr on string containing UTF-8 characters produces StringIndexOutOfBoundsException".

        Verified all the tests still pass.

        REVISION DETAIL
        https://reviews.facebook.net/D2727

        Show
        Phabricator added a comment - kevinwilfong has commented on the revision " HIVE-2942 [jira] substr on string containing UTF-8 characters produces StringIndexOutOfBoundsException". Verified all the tests still pass. REVISION DETAIL https://reviews.facebook.net/D2727
        Hide
        Phabricator added a comment -

        pauly has accepted the revision "HIVE-2942 [jira] substr on string containing UTF-8 characters produces StringIndexOutOfBoundsException".

        +1

        REVISION DETAIL
        https://reviews.facebook.net/D2727

        BRANCH
        svn

        Show
        Phabricator added a comment - pauly has accepted the revision " HIVE-2942 [jira] substr on string containing UTF-8 characters produces StringIndexOutOfBoundsException". +1 REVISION DETAIL https://reviews.facebook.net/D2727 BRANCH svn
        Hide
        Namit Jain added a comment -

        +1

        running tests

        Show
        Namit Jain added a comment - +1 running tests
        Hide
        Paul Yang added a comment -

        Whoops, forgot to mention that I was running the tests too - sorry Namit. They passed, and I committed. Thanks Kevin!

        Show
        Paul Yang added a comment - Whoops, forgot to mention that I was running the tests too - sorry Namit. They passed, and I committed. Thanks Kevin!
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
        HIVE-2942. substr on string containing UTF-8 characters produces StringIndexOutOfBoundsException (Kevin Wilfong via pauly) (Revision 1326444)

        Result = ABORTED
        pauly : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1326444
        Files :

        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java
        • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java
        • /hive/trunk/ql/src/test/queries/clientpositive/udf_substr.q
        • /hive/trunk/ql/src/test/results/clientpositive/udf_substr.q.out
        Show
        Hudson added a comment - Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/ ) HIVE-2942 . substr on string containing UTF-8 characters produces StringIndexOutOfBoundsException (Kevin Wilfong via pauly) (Revision 1326444) Result = ABORTED pauly : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1326444 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java /hive/trunk/ql/src/test/queries/clientpositive/udf_substr.q /hive/trunk/ql/src/test/results/clientpositive/udf_substr.q.out
        Hide
        Ashutosh Chauhan added a comment -

        This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

        Show
        Ashutosh Chauhan added a comment - This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

          People

          • Assignee:
            Kevin Wilfong
            Reporter:
            Kevin Wilfong
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development