Hadoop Common
  1. Hadoop Common
  2. HADOOP-6623

Add StringUtils.split for non-escaped single-character separator

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: util
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This is for HDFS-1028 but useful generally. String.split("/") for example is way slower than an implementation that is specific to only single-character separators.

      1. hadoop-6623.txt
        3 kB
        Todd Lipcon
      2. hadoop-6623.txt
        3 kB
        Todd Lipcon

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          Simple implementation similar to the existing escapable StringUtils.split. I have not yet benchmarked it, but my guess is that it's faster than String.split.

          Show
          Todd Lipcon added a comment - Simple implementation similar to the existing escapable StringUtils.split. I have not yet benchmarked it, but my guess is that it's faster than String.split.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12438086/hadoop-6623.txt
          against trunk revision 918880.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/412/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/412/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/412/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/412/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12438086/hadoop-6623.txt against trunk revision 918880. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/412/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/412/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/412/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/412/console This message is automatically generated.
          Hide
          Tom White added a comment -

          Looks good. Since this is an optimization, we should really have a benchmark showing a speed improvement.

          Show
          Tom White added a comment - Looks good. Since this is an optimization, we should really have a benchmark showing a speed improvement.
          Hide
          Todd Lipcon added a comment -

          Attached is an updated patch which adds a main() function to TestStringUtils which acts as a benchmark. Benchmark results;

          Java impl #4:1339ms
          Java impl #5:1095ms
          Java impl #6:1257ms
          Java impl #7:1386ms
          Java impl #8:1470ms
          Java impl #9:1467ms
          StringUtils impl #4:274ms
          StringUtils impl #5:274ms
          StringUtils impl #6:274ms
          StringUtils impl #7:277ms
          StringUtils impl #8:289ms
          StringUtils impl #9:291ms

          If I double the number of separators in the test string to 10, results are:

          Java impl #4:1407ms
          Java impl #5:1411ms
          Java impl #6:1449ms
          Java impl #7:1443ms
          Java impl #8:1641ms
          Java impl #9:1409ms
          StringUtils impl #4:347ms
          StringUtils impl #5:347ms
          StringUtils impl #6:346ms
          StringUtils impl #7:347ms
          StringUtils impl #8:355ms
          StringUtils impl #9:346ms

          Show
          Todd Lipcon added a comment - Attached is an updated patch which adds a main() function to TestStringUtils which acts as a benchmark. Benchmark results; Java impl #4:1339ms Java impl #5:1095ms Java impl #6:1257ms Java impl #7:1386ms Java impl #8:1470ms Java impl #9:1467ms StringUtils impl #4:274ms StringUtils impl #5:274ms StringUtils impl #6:274ms StringUtils impl #7:277ms StringUtils impl #8:289ms StringUtils impl #9:291ms If I double the number of separators in the test string to 10, results are: Java impl #4:1407ms Java impl #5:1411ms Java impl #6:1449ms Java impl #7:1443ms Java impl #8:1641ms Java impl #9:1409ms StringUtils impl #4:347ms StringUtils impl #5:347ms StringUtils impl #6:346ms StringUtils impl #7:347ms StringUtils impl #8:355ms StringUtils impl #9:346ms
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12442970/hadoop-6623.txt
          against trunk revision 938590.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/483/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/483/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/483/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/483/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12442970/hadoop-6623.txt against trunk revision 938590. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/483/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/483/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/483/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/483/console This message is automatically generated.
          Hide
          Tom White added a comment -

          I've just committed this. Thanks Todd!

          Show
          Tom White added a comment - I've just committed this. Thanks Todd!

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development