Uploaded image for project: 'Commons Lang'
  1. Commons Lang
  2. LANG-162

[lang] replace() length calculation improvement

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 2.1
    • 2.2
    • None
    • None
    • Operating System: other
      Platform: Other

    • 39368

    Description

      static String replace(String text, String repl, String with, int max)

      PROBLEM:
      This method incurs the calculation of the search string ("repl") every time a
      match is found.

      int start = 0, end = 0;
      while ((end = text.indexOf(repl, start)) != -1)

      { buf.append(text.substring(start, end)).append(with); start = end + repl.length(); ... }

      PROPOSED SOLUTION:
      Move it out of the loop.

      int start = 0, end = 0, replLen = repl.length();
      while ((end = text.indexOf(repl, start)) != -1)

      { buf.append(text.substring(start, end)).append(with); start = end + replLen; ... }

      Attachments

        1. ASF.LICENSE.NOT.GRANTED--PerfTest.java
          0.5 kB
          Henri Yandell
        2. ASF.LICENSE.NOT.GRANTED--PerfTest.java
          0.6 kB
          Henri Yandell

        Activity

          ggregory@seagullsw.com ggregory@seagullsw.com added a comment -

          Since a String is immutable, I would hope the compiler if not the VM would
          already do this. IMO, unless this is a proven problem with data from a
          profiler, I do not do this type of optimization.

          ggregory@seagullsw.com ggregory@seagullsw.com added a comment - Since a String is immutable, I would hope the compiler if not the VM would already do this. IMO, unless this is a proven problem with data from a profiler, I do not do this type of optimization.
          bayard Henri Yandell added a comment -

          I'll take a look at it if Paul doesn't mention his evidence for optimisation - mostly just because profiling is
          fun and it'll be interesting to know if it is a speed issue.

          bayard Henri Yandell added a comment - I'll take a look at it if Paul doesn't mention his evidence for optimisation - mostly just because profiling is fun and it'll be interesting to know if it is a speed issue.
          pbenedict pbenedict added a comment -

          I won't hold anyone in suspense. I simply eye-balled this as an optimization
          because I was designing functionality where string replacement has heavy usage;
          so I took a look at the code. It's not critical and I am generally satisified
          with the code as-is, but putting in my 2cents in case it helps.

          pbenedict pbenedict added a comment - I won't hold anyone in suspense. I simply eye-balled this as an optimization because I was designing functionality where string replacement has heavy usage; so I took a look at the code. It's not critical and I am generally satisified with the code as-is, but putting in my 2cents in case it helps.
          bayard Henri Yandell added a comment -

          Created an attachment (id=18153)
          Simple perf test file

          bayard Henri Yandell added a comment - Created an attachment (id=18153) Simple perf test file
          bayard Henri Yandell added a comment -

          Not sure which source you're looking at, the latest HEAD is different, but does still contain the
          nested .length() call. Looking at svn log etc, looks like we have an optimisation in already for the next
          release (see COM-2395).

          I created a performance test (attached), and running it against both pre and post change jars, there's no
          obvious difference in the numbers I'm getting back.

          So marking this as a WONTFIX.

          bayard Henri Yandell added a comment - Not sure which source you're looking at, the latest HEAD is different, but does still contain the nested .length() call. Looking at svn log etc, looks like we have an optimisation in already for the next release (see COM-2395 ). I created a performance test (attached), and running it against both pre and post change jars, there's no obvious difference in the numbers I'm getting back. So marking this as a WONTFIX.
          sjr James Ring added a comment -

          I think it's silly to mark this as a WONTFIX, and the attached test case doesn't
          address the problem at hand.

          The test shows the call to length() being called only once per iteration of the
          main loop. If there are many matches, then length() will be called more times,
          and will add more overhead to the function. Of course, there's calls to
          append(), substring() and indexOf(), which probably dominate the running time of
          the function.

          All the same, why repeat a call multiple times when the result is always going
          to be the same? It seems the Sun Java compiler won't save you, because I haven't
          managed to get it to "optimize out" a repeated call to length() on an immutable
          string.

          The proposed solution is small and sensible and there's no good reason not to
          apply it.

          sjr James Ring added a comment - I think it's silly to mark this as a WONTFIX, and the attached test case doesn't address the problem at hand. The test shows the call to length() being called only once per iteration of the main loop. If there are many matches, then length() will be called more times, and will add more overhead to the function. Of course, there's calls to append(), substring() and indexOf(), which probably dominate the running time of the function. All the same, why repeat a call multiple times when the result is always going to be the same? It seems the Sun Java compiler won't save you, because I haven't managed to get it to "optimize out" a repeated call to length() on an immutable string. The proposed solution is small and sensible and there's no good reason not to apply it.
          bayard Henri Yandell added a comment -

          Created an attachment (id=18216)
          Second attempt at PerfTest

          Attaching a perf test that increases the size of the central text so that the
          number of matches increases.

          bayard Henri Yandell added a comment - Created an attachment (id=18216) Second attempt at PerfTest Attaching a perf test that increases the size of the central text so that the number of matches increases.
          bayard Henri Yandell added a comment -

          Second perf test didn't show any real increase, but given that it's a standard
          optimisation you're right James - might as well apply it. Have done the same to
          replaceChars too.

          "Applying the optimisation advised by Paul Benedict in #39368"
          src/java/org/apache/commons/lang/StringUtils.java
          Sending src/java/org/apache/commons/lang/StringUtils.java
          Transmitting file data .
          Committed revision 399127.

          bayard Henri Yandell added a comment - Second perf test didn't show any real increase, but given that it's a standard optimisation you're right James - might as well apply it. Have done the same to replaceChars too. "Applying the optimisation advised by Paul Benedict in #39368" src/java/org/apache/commons/lang/StringUtils.java Sending src/java/org/apache/commons/lang/StringUtils.java Transmitting file data . Committed revision 399127.

          People

            Unassigned Unassigned
            pbenedict Paul Benedict
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: