Pig
  1. Pig
  2. PIG-106

Optimize Pig by replacing String '+' and StringBuffer with StringBuilder

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.1.0
    • Fix Version/s: 0.1.0
    • Component/s: impl
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      While investigating PIG-99, in TestBuiltin.java line 315:

      for (int i = 0; i < LOOP_COUNT; i++) {
          for (int j = 0; j < LOOP_COUNT; j++) {
              sb.append(i + "\t" + i + "\t" + j % 2 + "\n");
          }
      }
      

      doing "i + "\t" + i + "\t" + j % 2 + "\n"" creates temporary String(s) reducing the advantages of using a StringBuffer.

      Could be replace with:

      for (int i = 0; i < LOOP_COUNT; i++) {
          for (int j = 0; j < LOOP_COUNT; j++) {
              sb.append(i);
              sb.append("\t");
              sb.append(i);
              sb.append("\t");
              sb.append(j % 2);
              sb.append("\n");
          }
      }
      
      1. PIG-106-v01.patch
        0.7 kB
        Benjamin Francisoud
      2. PIG-106-v02.patch
        57 kB
        Benjamin Francisoud
      3. PIG-106-v03.patch
        56 kB
        Benjamin Francisoud

        Issue Links

          Activity

          Hide
          Benjamin Francisoud added a comment -

          Execution time:
          before: 687 ms - after: 515 ms
          other runs give almost the same numbers...

          Show
          Benjamin Francisoud added a comment - Execution time: before: 687 ms - after: 515 ms other runs give almost the same numbers...
          Hide
          Benjamin Reed added a comment -

          It will probably run even faster with StringBuilder instead of StringBuffer.

          Show
          Benjamin Reed added a comment - It will probably run even faster with StringBuilder instead of StringBuffer.
          Hide
          Craig Macdonald added a comment -

          I count 84 uses of StringBuffer in the src/ and test/ folders currently. One mega patch, similar to the space/tab patch replace all of these. Best done by someone with commit priv - any patch would quickly become dated.

          Show
          Craig Macdonald added a comment - I count 84 uses of StringBuffer in the src/ and test/ folders currently. One mega patch, similar to the space/tab patch replace all of these. Best done by someone with commit priv - any patch would quickly become dated.
          Hide
          Benjamin Francisoud added a comment -

          I was more concern in replacing String with StringBuffer
          But if StringBuilder is even more efficient let's replace both String and StringBuffer with it
          Except in case when there is no need, no string accumulations, of course...

          Show
          Benjamin Francisoud added a comment - I was more concern in replacing String with StringBuffer But if StringBuilder is even more efficient let's replace both String and StringBuffer with it Except in case when there is no need, no string accumulations, of course...
          Hide
          Olga Natkovich added a comment -

          cleared "patch available" flags since the agreement is to use StringBuilder rather than StringBuffer

          Show
          Olga Natkovich added a comment - cleared "patch available" flags since the agreement is to use StringBuilder rather than StringBuffer
          Hide
          Pi Song added a comment -

          Pig is a total single-threaded frontend. StringBuilder should be definitely more appropriate.
          Though, is there any plan to move Pig to multithreaded client-server model?

          Show
          Pi Song added a comment - Pig is a total single-threaded frontend. StringBuilder should be definitely more appropriate. Though, is there any plan to move Pig to multithreaded client-server model?
          Hide
          Benjamin Francisoud added a comment -

          Changing the name since it's drifting from the original purpose...

          Show
          Benjamin Francisoud added a comment - Changing the name since it's drifting from the original purpose...
          Hide
          Benjamin Francisoud added a comment -

          Replaced every patterns:

          • String s = "something" + value + "else";
          • usage of StringBuffer

          with StringBuilder

          Also used log.isDebugEnable() method to avoid computing String(s) when log level is not debug.

          There must be some String accumulation left but that's already a good start.

          Show
          Benjamin Francisoud added a comment - Replaced every patterns: String s = "something" + value + "else"; usage of StringBuffer with StringBuilder Also used log.isDebugEnable() method to avoid computing String(s) when log level is not debug. There must be some String accumulation left but that's already a good start.
          Hide
          Olga Natkovich added a comment -

          I tried to apply the patch to the latest trunk and got failures. Also, as I was looking at the failures, some of the changes specifically in Tuple.java go beyond just String Builder. For instance, it adds if-else construct into the getField call. We don't want to do that - we don't want an extra branch on the critical path.

          Could you regenerate the patch with just changes to to string formatting, thanks.

          Show
          Olga Natkovich added a comment - I tried to apply the patch to the latest trunk and got failures. Also, as I was looking at the failures, some of the changes specifically in Tuple.java go beyond just String Builder. For instance, it adds if-else construct into the getField call. We don't want to do that - we don't want an extra branch on the critical path. Could you regenerate the patch with just changes to to string formatting, thanks.
          Hide
          Benjamin Francisoud added a comment -

          I was up-to-date with the latest svn rev when I did the patch, I'm sorry you got errors

          I will take a look at the if/else you are mentioning... if I did that that was be mistake.

          Few cases where I add to modify code:

          When there was this kind of code:

          if (boolean)
              something("a" + "b" + "c");
          else
              else("a" + "b" + "c");
          

          I had to put brackets when using the StringBuilder

          
          if (boolean) {
              StringBuilder sb = new StringBuilder();
              sb.append("a");
              sb.append("b");
              sb.append("c");
              something(sb.toString());
          } else {
              StringBuilder sb = new StringBuilder();
              sb.append("a");
              sb.append("b");
              sb.append("c");
              else(sb.toString());
          }
          

          Other case:

          log.debug("a" + "b" + "c");
          

          replaced with

          if (log.isDebugEnable()) {
              StringBuilder sb = new StringBuilder();
              sb.append("a");
              sb.append("b");
              sb.append("c");
              log.debug(sb.toString());
          }
          
          Show
          Benjamin Francisoud added a comment - I was up-to-date with the latest svn rev when I did the patch, I'm sorry you got errors I will take a look at the if/else you are mentioning... if I did that that was be mistake. Few cases where I add to modify code: When there was this kind of code: if ( boolean ) something( "a" + "b" + "c" ); else else ( "a" + "b" + "c" ); I had to put brackets when using the StringBuilder if ( boolean ) { StringBuilder sb = new StringBuilder(); sb.append( "a" ); sb.append( "b" ); sb.append( "c" ); something(sb.toString()); } else { StringBuilder sb = new StringBuilder(); sb.append( "a" ); sb.append( "b" ); sb.append( "c" ); else (sb.toString()); } Other case: log.debug( "a" + "b" + "c" ); replaced with if (log.isDebugEnable()) { StringBuilder sb = new StringBuilder(); sb.append( "a" ); sb.append( "b" ); sb.append( "c" ); log.debug(sb.toString()); }
          Hide
          Benjamin Francisoud added a comment -

          Regenerate the patch.
          Fixed svn conflict in Tuple.java and removed the extra else that Olga was mentioning (tanks for pointing it).

          Show
          Benjamin Francisoud added a comment - Regenerate the patch. Fixed svn conflict in Tuple.java and removed the extra else that Olga was mentioning (tanks for pointing it).
          Hide
          Alan Gates added a comment -

          Fix checked in. Thanks Benjamin.

          Show
          Alan Gates added a comment - Fix checked in. Thanks Benjamin.

            People

            • Assignee:
              Benjamin Francisoud
              Reporter:
              Benjamin Francisoud
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development