Nutch
  1. Nutch
  2. NUTCH-1079

StringBuffer converted to StringBuilder

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: 1.9
    • Component/s: None
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      All across the codebase, it contains StringBuffer, when thread-safety is probably not intended.

      This patch replaces StringBuffer to StringBuilder, as applicable.

      1. NUTCH-rel_14-1079.patch
        39 kB
        Karthik K
      2. NUTCH-1079.patch
        32 kB
        Karthik K

        Activity

        Hide
        Julien Nioche added a comment -

        No progress on this for almost 3 years + no consensus on whether this is needed. Feel free to reopen if you feel strongly about it or can contribute to the discussion

        Show
        Julien Nioche added a comment - No progress on this for almost 3 years + no consensus on whether this is needed. Feel free to reopen if you feel strongly about it or can contribute to the discussion
        Hide
        Markus Jelsma added a comment -

        20120304-push-1.6

        Show
        Markus Jelsma added a comment - 20120304-push-1.6
        Hide
        Lewis John McGibbney added a comment -

        I kinda got this feeling Julien. Thanks. We'll I think based on the discussion above, there seems to be no overwhelming reason for changing all of this. You did however begin to make a point of sorts Markus, any thoughts now that this one has had a bit of time to settle in?

        Show
        Lewis John McGibbney added a comment - I kinda got this feeling Julien. Thanks. We'll I think based on the discussion above, there seems to be no overwhelming reason for changing all of this. You did however begin to make a point of sorts Markus, any thoughts now that this one has had a bit of time to settle in?
        Hide
        Julien Nioche added a comment -

        Don't rely on me for this one. I am not in favour of changing the APIs for a dubious gain in performance but if any committer thinks that it is worth doing then go ahead

        Show
        Julien Nioche added a comment - Don't rely on me for this one. I am not in favour of changing the APIs for a dubious gain in performance but if any committer thinks that it is worth doing then go ahead
        Hide
        Lewis John McGibbney added a comment -

        How is this guys? It seems that there was a level of agreement wrt appends over concats, but the patch/issue never seemed to get updated and has now stagnated slightly. Any chance of reviving the patient?

        Show
        Lewis John McGibbney added a comment - How is this guys? It seems that there was a level of agreement wrt appends over concats, but the patch/issue never seemed to get updated and has now stagnated slightly. Any chance of reviving the patient?
        Hide
        Julien Nioche added a comment -

        Not a bug but an improvement. Moved from 1.4 to 1.5

        Show
        Julien Nioche added a comment - Not a bug but an improvement. Moved from 1.4 to 1.5
        Hide
        Karthik K added a comment -

        @Markus:
        So - I shall go ahead and create a separate jira about replacing concats with appends, while this can possibly go in now ?

        Show
        Karthik K added a comment - @Markus: So - I shall go ahead and create a separate jira about replacing concats with appends, while this can possibly go in now ?
        Hide
        Markus Jelsma added a comment -

        Kay, i'd prefer replacing concats within appends in this issue too. Can be a separate patch though.

        Show
        Markus Jelsma added a comment - Kay, i'd prefer replacing concats within appends in this issue too. Can be a separate patch though.
        Hide
        Aravind Srini added a comment -

        This seems interesting, and personally interesting about the 1.4 branch, because I was planning to do something similar.

        Can somebody give a heads-up, about where we are , with the current patches attached ?

        Show
        Aravind Srini added a comment - This seems interesting, and personally interesting about the 1.4 branch, because I was planning to do something similar. Can somebody give a heads-up, about where we are , with the current patches attached ?
        Hide
        Karthik K added a comment -

        Well - the string+string concat would still translate to a StringBuilder and not a StringBuffer, as per javase compiler at least. (agree about the unnecessary extra instantiation of the empty StringBuilder , that happens as part of it though).

        Would it be acceptable if we revisit, those independent appends on a priority basis (starting with all the non, toString() methods ), in a different jira, say ?

        Show
        Karthik K added a comment - Well - the string+string concat would still translate to a StringBuilder and not a StringBuffer, as per javase compiler at least. (agree about the unnecessary extra instantiation of the empty StringBuilder , that happens as part of it though). Would it be acceptable if we revisit, those independent appends on a priority basis (starting with all the non, toString() methods ), in a different jira, say ?
        Hide
        Julien Nioche added a comment -

        this is not going to make much of a difference anyway if we still use the String + String concatenation within the appends e.g.

             buffer.append("CrawlDatum: " + crawlDatum+"\n" );
        

        Any chance you could replace those with appends instead?

        Show
        Julien Nioche added a comment - this is not going to make much of a difference anyway if we still use the String + String concatenation within the appends e.g. buffer.append( "CrawlDatum: " + crawlDatum+ "\n" ); Any chance you could replace those with appends instead?
        Hide
        Karthik K added a comment - - edited

        @Julien: FYI, the public API-s are still backward compatible, as per the patch. The private API-s are changed though.

        Of concern, are the public API-s still as it propagates the bad behavior still further up, due to StringBuffer being there.

        Show
        Karthik K added a comment - - edited @Julien: FYI, the public API-s are still backward compatible, as per the patch. The private API-s are changed though. Of concern, are the public API-s still as it propagates the bad behavior still further up, due to StringBuffer being there.
        Hide
        Markus Jelsma added a comment -

        I don't think it's going to be significant too but it'll surely increase performance for textprofilesignature and in domcontentutils. Both do a lot of SB operations per document. AFAIK this patch already updates all API calls. Do you think there are a number of users that call one of these three API's in custom code? Also, the getText and getTitle methods are private in parse-tika where they're public in parse-html.

        What do you think?

        Show
        Markus Jelsma added a comment - I don't think it's going to be significant too but it'll surely increase performance for textprofilesignature and in domcontentutils. Both do a lot of SB operations per document. AFAIK this patch already updates all API calls. Do you think there are a number of users that call one of these three API's in custom code? Also, the getText and getTitle methods are private in parse-tika where they're public in parse-html. What do you think?
        Hide
        Julien Nioche added a comment -

        Can we expect to get any significant performance improvement? According to a few tests I have seen this is not likely to be the case. If, so is it worth changing the API for that?

        Show
        Julien Nioche added a comment - Can we expect to get any significant performance improvement? According to a few tests I have seen this is not likely to be the case. If, so is it worth changing the API for that?
        Hide
        Markus Jelsma added a comment -

        Thanks! All seems to work fine. I see no reason not to include these patches.

        Show
        Markus Jelsma added a comment - Thanks! All seems to work fine. I see no reason not to include these patches.
        Hide
        Karthik K added a comment -

        For branch 1.4

        Show
        Karthik K added a comment - For branch 1.4
        Hide
        Karthik K added a comment -

        Patch attached , created against trunk. (NUTCH-1079.patch).

        Show
        Karthik K added a comment - Patch attached , created against trunk. ( NUTCH-1079 .patch).
        Hide
        Markus Jelsma added a comment -

        Interesting. Can you provide a patch for 1.4 and trunk?

        Show
        Markus Jelsma added a comment - Interesting. Can you provide a patch for 1.4 and trunk?
        Hide
        Karthik K added a comment -

        public API-s changed, but ensuring backward compatibility still:

        o.a.n.parse.tika.DOMContentUtils:
        public void getText(Appendable sb, Node node);
        public boolean getTitle(Appendable sb, Node node);

        o.a.n.parse.html.XMLCharacterRecognizer:

        public static boolean isWhiteSpace(CharSequence buf)

        Show
        Karthik K added a comment - public API-s changed, but ensuring backward compatibility still: o.a.n.parse.tika.DOMContentUtils: public void getText(Appendable sb, Node node); public boolean getTitle(Appendable sb, Node node); o.a.n.parse.html.XMLCharacterRecognizer: public static boolean isWhiteSpace(CharSequence buf)

          People

          • Assignee:
            Markus Jelsma
            Reporter:
            Karthik K
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development