Hive
  1. Hive
  2. HIVE-5506

Hive SPLIT function does not return array correctly

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0, 0.10.0, 0.11.0
    • Fix Version/s: 0.13.0
    • Component/s: SQL, UDF
    • Labels:
      None
    • Environment:

      Hive

    • Tags:
      HIVE SPLIT UDF

      Description

      Hello all, I think I have outlined a bug in the hive split function:

      Summary: When calling split on a string of data, it will only return all array items if the the last array item has a value. For example, if I have a string of text delimited by tab with 7 columns, and the first four are filled, but the last three are blank, split will only return a 4 position array. If any number of "middle" columns are empty, but the last item still has a value, then it will return the proper number of columns. This was tested in Hive 0.9 and hive 0.11.

      Data:
      (Note \t represents a tab char, \x09 the line endings should be \n (UNIX style) not sure what email will do to them). Basically my data is 7 lines of data with the first 7 letters separated by tab. On some lines I've left out certain letters, but kept the number of tabs exactly the same.

      input.txt
      a\tb\tc\td\te\tf\tg
      a\tb\tc\td\te\t\tg
      a\tb\t\td\t\tf\tg
      \t\t\td\te\tf\tg
      a\tb\tc\td\t\t\t
      a\t\t\t\te\tf\tg
      a\t\t\td\t\t\tg

      I then created a table with one column from that data:

      DROP TABLE tmp_jo_tab_test;
      CREATE table tmp_jo_tab_test (message_line STRING)
      STORED AS TEXTFILE;

      LOAD DATA LOCAL INPATH '/tmp/input.txt'
      OVERWRITE INTO TABLE tmp_jo_tab_test;

      Ok just to validate I created a python counting script:

      #!/usr/bin/python

      import sys

      for line in sys.stdin:
      line = line[0:-1]
      out = line.split("\t")
      print len(out)

      The output there is :
      $ cat input.txt |./cnt_tabs.py
      7
      7
      7
      7
      7
      7
      7

      Based on that information, split on tab should return me 7 for each line as well:

      hive -e "select size(split(message_line, '
      t')) from tmp_jo_tab_test;"

      7
      7
      7
      7
      4
      7
      7

      However it does not. It would appear that the line where only the first four letters are filled in(and blank is passed in on the last three) only returns 4 splits, where there should technically be 7, 4 for letters included, and three blanks.

      a\tb\tc\td\t\t\t

      1. HIVE-5506.2.patch
        3 kB
        Vikram Dixit K
      2. HIVE-5506.1.patch
        3 kB
        Vikram Dixit K

        Activity

        Hide
        Vikram Dixit K added a comment -

        This should fix this issue.

        Show
        Vikram Dixit K added a comment - This should fix this issue.
        Hide
        Vikram Dixit K added a comment -

        Missed the input file.

        Show
        Vikram Dixit K added a comment - Missed the input file.
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12608605/HIVE-5506.1.patch

        ERROR: -1 due to 1 failed/errored test(s), 4412 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_split
        

        Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1136/testReport
        Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1136/console

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests failed with: TestsFailedException: 1 tests failed
        

        This message is automatically generated.

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12608605/HIVE-5506.1.patch ERROR: -1 due to 1 failed/errored test(s), 4412 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_split Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1136/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1136/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed This message is automatically generated.
        Hide
        Vikram Dixit K added a comment -

        Fix for failing test case. Golden file updated.

        Show
        Vikram Dixit K added a comment - Fix for failing test case. Golden file updated.
        Hide
        Vikram Dixit K added a comment -

        Missed the input file again.

        Show
        Vikram Dixit K added a comment - Missed the input file again.
        Hide
        Ashutosh Chauhan added a comment -

        +1

        Show
        Ashutosh Chauhan added a comment - +1
        Hide
        Hive QA added a comment -

        Overall: +1 all checks pass

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12608816/HIVE-5506.2.patch

        SUCCESS: +1 4412 tests passed

        Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1151/testReport
        Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1151/console

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        

        This message is automatically generated.

        Show
        Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12608816/HIVE-5506.2.patch SUCCESS: +1 4412 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1151/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1151/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated.
        Hide
        Ashutosh Chauhan added a comment -

        Committed to trunk. Thanks, Vikram!

        Show
        Ashutosh Chauhan added a comment - Committed to trunk. Thanks, Vikram!
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hive-trunk-h0.21 #2416 (See https://builds.apache.org/job/Hive-trunk-h0.21/2416/)
        HIVE-5506 : Hive SPLIT function does not return array correctly (Vikram Dixit via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1534775)

        • /hive/trunk/data/files/input.txt
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSplit.java
        • /hive/trunk/ql/src/test/queries/clientpositive/split.q
        • /hive/trunk/ql/src/test/results/clientpositive/split.q.out
        • /hive/trunk/ql/src/test/results/clientpositive/udf_split.q.out
        Show
        Hudson added a comment - FAILURE: Integrated in Hive-trunk-h0.21 #2416 (See https://builds.apache.org/job/Hive-trunk-h0.21/2416/ ) HIVE-5506 : Hive SPLIT function does not return array correctly (Vikram Dixit via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1534775 ) /hive/trunk/data/files/input.txt /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSplit.java /hive/trunk/ql/src/test/queries/clientpositive/split.q /hive/trunk/ql/src/test/results/clientpositive/split.q.out /hive/trunk/ql/src/test/results/clientpositive/udf_split.q.out
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hive-trunk-hadoop2-ptest #151 (See https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/151/)
        HIVE-5506 : Hive SPLIT function does not return array correctly (Vikram Dixit via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1534775)

        • /hive/trunk/data/files/input.txt
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSplit.java
        • /hive/trunk/ql/src/test/queries/clientpositive/split.q
        • /hive/trunk/ql/src/test/results/clientpositive/split.q.out
        • /hive/trunk/ql/src/test/results/clientpositive/udf_split.q.out
        Show
        Hudson added a comment - FAILURE: Integrated in Hive-trunk-hadoop2-ptest #151 (See https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/151/ ) HIVE-5506 : Hive SPLIT function does not return array correctly (Vikram Dixit via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1534775 ) /hive/trunk/data/files/input.txt /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSplit.java /hive/trunk/ql/src/test/queries/clientpositive/split.q /hive/trunk/ql/src/test/results/clientpositive/split.q.out /hive/trunk/ql/src/test/results/clientpositive/udf_split.q.out
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #214 (See https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/214/)
        HIVE-5506 : Hive SPLIT function does not return array correctly (Vikram Dixit via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1534775)

        • /hive/trunk/data/files/input.txt
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSplit.java
        • /hive/trunk/ql/src/test/queries/clientpositive/split.q
        • /hive/trunk/ql/src/test/results/clientpositive/split.q.out
        • /hive/trunk/ql/src/test/results/clientpositive/udf_split.q.out
        Show
        Hudson added a comment - SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #214 (See https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/214/ ) HIVE-5506 : Hive SPLIT function does not return array correctly (Vikram Dixit via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1534775 ) /hive/trunk/data/files/input.txt /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSplit.java /hive/trunk/ql/src/test/queries/clientpositive/split.q /hive/trunk/ql/src/test/results/clientpositive/split.q.out /hive/trunk/ql/src/test/results/clientpositive/udf_split.q.out
        Hide
        Hudson added a comment -

        ABORTED: Integrated in Hive-trunk-hadoop2 #518 (See https://builds.apache.org/job/Hive-trunk-hadoop2/518/)
        HIVE-5506 : Hive SPLIT function does not return array correctly (Vikram Dixit via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1534775)

        • /hive/trunk/data/files/input.txt
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSplit.java
        • /hive/trunk/ql/src/test/queries/clientpositive/split.q
        • /hive/trunk/ql/src/test/results/clientpositive/split.q.out
        • /hive/trunk/ql/src/test/results/clientpositive/udf_split.q.out
        Show
        Hudson added a comment - ABORTED: Integrated in Hive-trunk-hadoop2 #518 (See https://builds.apache.org/job/Hive-trunk-hadoop2/518/ ) HIVE-5506 : Hive SPLIT function does not return array correctly (Vikram Dixit via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1534775 ) /hive/trunk/data/files/input.txt /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSplit.java /hive/trunk/ql/src/test/queries/clientpositive/split.q /hive/trunk/ql/src/test/results/clientpositive/split.q.out /hive/trunk/ql/src/test/results/clientpositive/udf_split.q.out

          People

          • Assignee:
            Vikram Dixit K
            Reporter:
            John Omernik
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development