Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0, 2.0.0
    • Component/s: UDF
    • Labels:
      None

      Description

      SUBSTRING_INDEX(str,delim,count)

      Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. SUBSTRING_INDEX() performs a case-sensitive match when searching for delim.
      Examples:

      SELECT SUBSTRING_INDEX('www.mysql.com', '.', 3);
      --www.mysql.com
      SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2);
      --www.mysql
      SELECT SUBSTRING_INDEX('www.mysql.com', '.', 1);
      --www
      SELECT SUBSTRING_INDEX('www.mysql.com', '.', 0);
      --''
      SELECT SUBSTRING_INDEX('www.mysql.com', '.', -1);
      --com
      SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2);
      --mysql.com
      SELECT SUBSTRING_INDEX('www.mysql.com', '.', -3);
      --www.mysql.com
      
      
      --#delim does not exist in str
      SELECT SUBSTRING_INDEX('www.mysql.com', 'Q', 1);
      --www.mysql.com
      
      --#delim is 2 chars
      SELECT SUBSTRING_INDEX('www||mysql||com', '||', 2);
      --www||mysql
      
      --#delim is empty string
      SELECT SUBSTRING_INDEX('www.mysql.com', '', 2);
      --''
      
      --#str is empty string
      SELECT SUBSTRING_INDEX('', '.', 2);
      --''
      
      
      --#null params
      SELECT SUBSTRING_INDEX(null, '.', 1);
      --null
      SELECT SUBSTRING_INDEX('www.mysql.com', null, 1);
      --null
      SELECT SUBSTRING_INDEX('www.mysql.com', '.', null);
      --null
      
      1. HIVE-686.patch
        10 kB
        Larry Ogrodnek
      2. HIVE-686.patch
        11 kB
        Larry Ogrodnek
      3. HIVE-686.1.patch
        18 kB
        Alexander Pivovarov
      4. HIVE-686.1.patch
        18 kB
        Alexander Pivovarov

        Issue Links

          Activity

          Hide
          Larry Ogrodnek added a comment -

          I've attached code for a hive UDF that implements the substring_index function described, as well as some unit tests.

          Please review. Thanks.

          Show
          Larry Ogrodnek added a comment - I've attached code for a hive UDF that implements the substring_index function described, as well as some unit tests. Please review. Thanks.
          Hide
          Namit Jain added a comment -

          Please run all the tests - you should get a diff in show_functions.q

          Show
          Namit Jain added a comment - Please run all the tests - you should get a diff in show_functions.q
          Hide
          Larry Ogrodnek added a comment -

          attached a new patch. thanks.

          Show
          Larry Ogrodnek added a comment - attached a new patch. thanks.
          Hide
          Namit Jain added a comment -

          Can you regenerate the patch - you need to include the new file in FunctionRegistry -
          udf.* is not included anymore.

          Show
          Namit Jain added a comment - Can you regenerate the patch - you need to include the new file in FunctionRegistry - udf.* is not included anymore.
          Hide
          CHEN GEN added a comment -

          BUG: this funtion is not support
          substring_index("www.test.com","test",-1)="com"
          FIX Suggestion:
          last line:
          r.set(input.substring(k + delim.length()));

          Show
          CHEN GEN added a comment - BUG: this funtion is not support substring_index("www.test.com","test",-1)="com" FIX Suggestion: last line: r.set(input.substring(k + delim.length()));
          Hide
          Alexander Pivovarov added a comment -

          I think this function can be replaced with regexp_extract in many cases, e.g.

          --get dir from path
          select regexp_extract('/aaa/bbb/ccc.txt', '(.*)/(.*)', 1);
          /aaa/bbb
          
          --get filename from path
          select regexp_extract('/aaa/bbb/ccc.txt', '(.*)/(.*)', 2);
          ccc.txt
          
          Show
          Alexander Pivovarov added a comment - I think this function can be replaced with regexp_extract in many cases, e.g. --get dir from path select regexp_extract('/aaa/bbb/ccc.txt', '(.*)/(.*)', 1); /aaa/bbb --get filename from path select regexp_extract('/aaa/bbb/ccc.txt', '(.*)/(.*)', 2); ccc.txt
          Hide
          Alexander Pivovarov added a comment -

          patch #1

          • derive substring_index from GenericUDF
          • add Junit and qtest tests
          Show
          Alexander Pivovarov added a comment - patch #1 derive substring_index from GenericUDF add Junit and qtest tests
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12735517/HIVE-686.1.patch

          ERROR: -1 due to 57 failed/errored test(s), 8948 tests executed
          Failed tests:

          TestContribCliDriver - did not produce a TEST-*.xml file
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_case
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_null_element
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_multi_field_struct
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_optional_elements
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_required_elements
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_single_field_struct
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_structs
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_unannotated_groups
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_unannotated_primitives
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_avro_array_of_primitives
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_avro_array_of_single_field_struct
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_create
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_decimal1
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_null
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_of_arrays_of_ints
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_of_maps
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_nested_complex
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_read_backward_compatible_files
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_schema_evolution
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_thrift_array_of_primitives
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_thrift_array_of_single_field_struct
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_types
          org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join
          org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAmbiguousSingleFieldGroupInList
          org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAvroPrimitiveInList
          org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAvroSingleFieldGroupInList
          org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testHiveRequiredGroupInList
          org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testMultiFieldGroupInList
          org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testNewOptionalGroupInList
          org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testNewRequiredGroupInList
          org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testThriftPrimitiveInList
          org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testThriftSingleFieldGroupInList
          org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testUnannotatedListOfGroups
          org.apache.hadoop.hive.ql.io.parquet.TestDataWritableWriter.testSimpleType
          org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testDoubleMapWithStructValue
          org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testMapWithComplexKey
          org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testNestedMap
          org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOfOptionalArray
          org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOfOptionalIntArray
          org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOptionalPrimitive
          org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapRequiredPrimitive
          org.apache.hadoop.hive.ql.io.parquet.TestParquetSerDe.testParquetHiveSerDe
          org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testEmptyContainer
          org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testNullContainer
          org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testRegularMap
          org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testEmptyContainer
          org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testNullContainer
          org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testRegularMap
          org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetHiveArrayInspector.testEmptyContainer
          org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetHiveArrayInspector.testNullContainer
          org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetHiveArrayInspector.testRegularList
          org.apache.hadoop.hive.ql.io.parquet.serde.TestStandardParquetHiveMapInspector.testEmptyContainer
          org.apache.hadoop.hive.ql.io.parquet.serde.TestStandardParquetHiveMapInspector.testNullContainer
          org.apache.hadoop.hive.ql.io.parquet.serde.TestStandardParquetHiveMapInspector.testRegularMap
          org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4059/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4059/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4059/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 57 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12735517 - PreCommit-HIVE-TRUNK-Build

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12735517/HIVE-686.1.patch ERROR: -1 due to 57 failed/errored test(s), 8948 tests executed Failed tests: TestContribCliDriver - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_case org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_null_element org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_multi_field_struct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_optional_elements org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_required_elements org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_single_field_struct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_structs org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_unannotated_groups org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_array_of_unannotated_primitives org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_avro_array_of_primitives org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_avro_array_of_single_field_struct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_create org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_decimal1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_null org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_of_arrays_of_ints org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_map_of_maps org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_nested_complex org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_read_backward_compatible_files org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_schema_evolution org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_thrift_array_of_primitives org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_thrift_array_of_single_field_struct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_types org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAmbiguousSingleFieldGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAvroPrimitiveInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testAvroSingleFieldGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testHiveRequiredGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testMultiFieldGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testNewOptionalGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testNewRequiredGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testThriftPrimitiveInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testThriftSingleFieldGroupInList org.apache.hadoop.hive.ql.io.parquet.TestArrayCompatibility.testUnannotatedListOfGroups org.apache.hadoop.hive.ql.io.parquet.TestDataWritableWriter.testSimpleType org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testDoubleMapWithStructValue org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testMapWithComplexKey org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testNestedMap org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOfOptionalArray org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOfOptionalIntArray org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapOptionalPrimitive org.apache.hadoop.hive.ql.io.parquet.TestMapStructures.testStringMapRequiredPrimitive org.apache.hadoop.hive.ql.io.parquet.TestParquetSerDe.testParquetHiveSerDe org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testEmptyContainer org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testNullContainer org.apache.hadoop.hive.ql.io.parquet.serde.TestAbstractParquetMapInspector.testRegularMap org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testEmptyContainer org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testNullContainer org.apache.hadoop.hive.ql.io.parquet.serde.TestDeepParquetHiveMapInspector.testRegularMap org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetHiveArrayInspector.testEmptyContainer org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetHiveArrayInspector.testNullContainer org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetHiveArrayInspector.testRegularList org.apache.hadoop.hive.ql.io.parquet.serde.TestStandardParquetHiveMapInspector.testEmptyContainer org.apache.hadoop.hive.ql.io.parquet.serde.TestStandardParquetHiveMapInspector.testNullContainer org.apache.hadoop.hive.ql.io.parquet.serde.TestStandardParquetHiveMapInspector.testRegularMap org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4059/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4059/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4059/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 57 tests failed This message is automatically generated. ATTACHMENT ID: 12735517 - PreCommit-HIVE-TRUNK-Build
          Hide
          Alexander Pivovarov added a comment -

          prev builds - 4057, 4058 have 55 failed tests as well

          Show
          Alexander Pivovarov added a comment - prev builds - 4057, 4058 have 55 failed tests as well
          Hide
          Sergio Peña added a comment -

          +1

          You may upload another patch to re-rerun the tests. Those failed tests are fixed now.

          Show
          Sergio Peña added a comment - +1 You may upload another patch to re-rerun the tests. Those failed tests are fixed now.
          Hide
          Alexander Pivovarov added a comment -

          reattaching path #1 to rerun tests

          Show
          Alexander Pivovarov added a comment - reattaching path #1 to rerun tests
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12736253/HIVE-686.1.patch

          ERROR: -1 due to 5 failed/errored test(s), 8989 tests executed
          Failed tests:

          TestUtil - did not produce a TEST-*.xml file
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_case
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4110/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4110/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4110/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 5 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12736253 - PreCommit-HIVE-TRUNK-Build

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12736253/HIVE-686.1.patch ERROR: -1 due to 5 failed/errored test(s), 8989 tests executed Failed tests: TestUtil - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_fold_case org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4110/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4110/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4110/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed This message is automatically generated. ATTACHMENT ID: 12736253 - PreCommit-HIVE-TRUNK-Build
          Hide
          Alexander Pivovarov added a comment -

          Committed to master. Thank you Sergio for your review!

          Show
          Alexander Pivovarov added a comment - Committed to master. Thank you Sergio for your review!
          Hide
          Alexander Pivovarov added a comment -
          Show
          Alexander Pivovarov added a comment - Added the function description to the Language Manual https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-StringFunctions

            People

            • Assignee:
              Alexander Pivovarov
              Reporter:
              Namit Jain
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development