Commons IO
  1. Commons IO
  2. IO-191

Possible improvements using static analysis.

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0
    • Component/s: None
    • Labels:
      None

      Activity

      Hide
      Sebb added a comment -

      As far as I can tell, all the accepted improvements have been implemented.

      If not, please re-open with a updated patch(es).

      Show
      Sebb added a comment - As far as I can tell, all the accepted improvements have been implemented. If not, please re-open with a updated patch(es).
      Hide
      Peter Lawrey added a comment -

      Appending or indexing on a character instead of a String is about 25% faster.
      If performances isn't an issue, then it should be left as people see as clearer.

      Show
      Peter Lawrey added a comment - Appending or indexing on a character instead of a String is about 25% faster. If performances isn't an issue, then it should be left as people see as clearer.
      Hide
      Niall Pemberton added a comment -

      I have applied some of the changes

      http://svn.apache.org/viewvc?view=rev&revision=736890

      I agree with Jukka about changing String to characters - don't see much point.

      Show
      Niall Pemberton added a comment - I have applied some of the changes http://svn.apache.org/viewvc?view=rev&revision=736890 I agree with Jukka about changing String to characters - don't see much point.
      Hide
      Jukka Zitting added a comment -

      Sebb, you're right, the collection field should be volatile. My assumption (not true for AndFileFilter) was that the collection itself is immutable, so partial updates would not have been a problem.

      Anyway, my objection on grounds of thread safety is moot since the AndFileFilter is not (and probably does not need to be) thread-safe. So making the field final and using Collection.clear() is fine by me.

      Show
      Jukka Zitting added a comment - Sebb, you're right, the collection field should be volatile. My assumption (not true for AndFileFilter) was that the collection itself is immutable, so partial updates would not have been a problem. Anyway, my objection on grounds of thread safety is moot since the AndFileFilter is not (and probably does not need to be) thread-safe. So making the field final and using Collection.clear() is fine by me.
      Hide
      Sebb added a comment -

      @Jukka (was @Sebb)

      The problem is - what does A: see for the collection the next time it accesses it?

      Unless the collection field is volatile, there is no guarantee that A will see the updated value for the field.
      And unless the A and B synch. on the same lock, A can see a partially updated object.

      AIUI the example works because thread A owns the iterator across the change made by B.
      When A needs to fetch the iterator again, it won't necessarily see the new collection unless synch. is used.

      Show
      Sebb added a comment - @Jukka (was @Sebb) The problem is - what does A: see for the collection the next time it accesses it? Unless the collection field is volatile, there is no guarantee that A will see the updated value for the field. And unless the A and B synch. on the same lock, A can see a partially updated object. AIUI the example works because thread A owns the iterator across the change made by B. When A needs to fetch the iterator again, it won't necessarily see the new collection unless synch. is used.
      Hide
      Jukka Zitting added a comment -

      Oh yeah, ignore my comment about the foreach loop, IO trunk is already using Java 5 features.

      Show
      Jukka Zitting added a comment - Oh yeah, ignore my comment about the foreach loop, IO trunk is already using Java 5 features.
      Hide
      Jukka Zitting added a comment -

      Replying to Peter.

      I don't imagine multiple patches are easier to apply.

      As noted above, we probably don't want to apply all the changes in the patch. Having smaller patches would make it easier to selectively apply the changes you propose.

      > Changing single character string literals to character literals in string concatenations.
      This is a fair comment. If you feel its worth reviewing on a case by case basis I am happy to do this.

      I don't think it's worth the effort, but perhaps I'm missing some really convincing reason why we should do this.

      > why are some parts of the expression strings and other characters.
      Is this a question you would like me to answer or you are just raising this as a hypothetical question someone might ask.

      Just a hypothetical question that a future developer that looks at the code might think about.

      Show
      Jukka Zitting added a comment - Replying to Peter. I don't imagine multiple patches are easier to apply. As noted above, we probably don't want to apply all the changes in the patch. Having smaller patches would make it easier to selectively apply the changes you propose. > Changing single character string literals to character literals in string concatenations. This is a fair comment. If you feel its worth reviewing on a case by case basis I am happy to do this. I don't think it's worth the effort, but perhaps I'm missing some really convincing reason why we should do this. > why are some parts of the expression strings and other characters. Is this a question you would like me to answer or you are just raising this as a hypothetical question someone might ask. Just a hypothetical question that a future developer that looks at the code might think about.
      Hide
      Jukka Zitting added a comment -

      Responding to Sebb's comment first.

      If one thread creates a new collection whilst another is iterating it, then in the absence of synchronisation there is no guarantee what state the other thread will next see for the collection.

      Not true. For example, consider the following pseudocode where two threads, A and B, concurrently access the same collection.

      A: iterator = collection.iterator();
      A: iterator.next();
      B: collection = new Collection();
      A: iterator.next();
      

      A continues to see the contents of the original collection while iterating, which IMHO is the only reasonable and deterministic behaviour in such cases. If thread B use collection.clear(), thread A would likely fail with a ConcurrentModificationException.

      However, my objection applies only to cases where the collection is immutable after initialization (otherwise the threads would in any case need to worry about synchronization). In AndFileFilter this is not the case, so there I think using Collection.clear() is actually a valid option. This context is not visible in the patch, so I'd rather consider such changes on a case-by-case basis instead of as a part of a bigger changeset generated by an analyzer tool.

      Show
      Jukka Zitting added a comment - Responding to Sebb's comment first. If one thread creates a new collection whilst another is iterating it, then in the absence of synchronisation there is no guarantee what state the other thread will next see for the collection. Not true. For example, consider the following pseudocode where two threads, A and B, concurrently access the same collection. A: iterator = collection.iterator(); A: iterator.next(); B: collection = new Collection(); A: iterator.next(); A continues to see the contents of the original collection while iterating, which IMHO is the only reasonable and deterministic behaviour in such cases. If thread B use collection.clear(), thread A would likely fail with a ConcurrentModificationException. However, my objection applies only to cases where the collection is immutable after initialization (otherwise the threads would in any case need to worry about synchronization). In AndFileFilter this is not the case, so there I think using Collection.clear() is actually a valid option. This context is not visible in the patch, so I'd rather consider such changes on a case-by-case basis instead of as a part of a bigger changeset generated by an analyzer tool.
      Hide
      Sebb added a comment -

      "Make methods/fields static if possible."

      • making fields static is OK if they are also final, otherwise it makes the class thread-hostile.
      • making methods static prevents them from being overloaded, so needs to be considered case-by-case.
      Show
      Sebb added a comment - "Make methods/fields static if possible." making fields static is OK if they are also final, otherwise it makes the class thread-hostile. making methods static prevents them from being overloaded, so needs to be considered case-by-case.
      Hide
      Peter Lawrey added a comment -

      Here are the types of changes

        • Making private and package local final if they can be made final.
          This improves clarity as it makes it obvious which fields are changed and which are not. It improves thread safety and to a small degree performance.
        • Use character instead of String were the outcome is the same.
          For those who typical perform character calculations instead of String calculations, this may be confusing. However, my guess is that most people assume a String append as this is the most common usage.
        • Fix javadoc links and remove refer to a class which does not exist.
        • Note that the use of foreach loops needs to wait until we switch to Java 5.
          I assumed that since generics were used that Java 5 was being used. Which version of Java are you using which supports Generics but not the foreach loop?
        • Make methods/fields static if possible.
          This improves clarity that this method/field is not dependant on the instance and has a performance benefit.
        • Remove redundant initialisers
          This improve clarity as having a redundant initialiser implies its is used for something when it is not.
        • Using Character.toString(char) instead of new Character(char).toString();
          This has margin benefit but saves a pointless object.
        • Using String.indexOf('') instead of String.indexof("").
          This improves clarity by stating you are looking for just one character, not a String. There is also a performance benefit.
        • Using System.arraycopy() instead of a manual array copy.
          This can be significantly faster.
        • Call Thread.currentThread().interrupt() rather than swallowing an interrupt, or highlighting when it is ignored.
        • Reduce duplication when two constructors are almost the same.

      > It would be easier to review and apply this patch if it was broken down to pieces based on the different types of changes.
      If people feel multiple patches is easier to review I can break it down. I don't imagine multiple patches are easier to apply.

      > > Changing single character string literals to character literals in string concatenations.
      > The benefit is insignificant and the drawback is added conceptual complexity. Also, in expressions where other parts are variables, there is no syntactical hint that it's a string concatenation expression instead of an integer sum.
      This is a fair comment. If you feel its worth reviewing on a case by case basis I am happy to do this. Adding a char to a String should be consider a mysterious hack, but it may be of marginal benefit.
      > why are some parts of the expression strings and other characters.
      Is this a question you would like me to answer or you are just raising this as a hypothetical question someone might ask.

      >> Clearing an existing collection instead of replacing it with a newly allocated one.
      > Again, the benefit is typically insignificant, but as a drawback an immutable collection may become mutable. What if some other code is still concurrently iterating the collection? Perhaps the static analyzer has taken this into account, but will a future programmer that wants to modify the class?
      If the field is final, a future programmer will need to consider this or the program won't compile.

      Show
      Peter Lawrey added a comment - Here are the types of changes Making private and package local final if they can be made final. This improves clarity as it makes it obvious which fields are changed and which are not. It improves thread safety and to a small degree performance. Use character instead of String were the outcome is the same. For those who typical perform character calculations instead of String calculations, this may be confusing. However, my guess is that most people assume a String append as this is the most common usage. Fix javadoc links and remove refer to a class which does not exist. Note that the use of foreach loops needs to wait until we switch to Java 5. I assumed that since generics were used that Java 5 was being used. Which version of Java are you using which supports Generics but not the foreach loop? Make methods/fields static if possible. This improves clarity that this method/field is not dependant on the instance and has a performance benefit. Remove redundant initialisers This improve clarity as having a redundant initialiser implies its is used for something when it is not. Using Character.toString(char) instead of new Character(char).toString(); This has margin benefit but saves a pointless object. Using String.indexOf(' ') instead of String.indexof(" "). This improves clarity by stating you are looking for just one character, not a String. There is also a performance benefit. Using System.arraycopy() instead of a manual array copy. This can be significantly faster. Call Thread.currentThread().interrupt() rather than swallowing an interrupt, or highlighting when it is ignored. Reduce duplication when two constructors are almost the same. > It would be easier to review and apply this patch if it was broken down to pieces based on the different types of changes. If people feel multiple patches is easier to review I can break it down. I don't imagine multiple patches are easier to apply. > > Changing single character string literals to character literals in string concatenations. > The benefit is insignificant and the drawback is added conceptual complexity. Also, in expressions where other parts are variables, there is no syntactical hint that it's a string concatenation expression instead of an integer sum. This is a fair comment. If you feel its worth reviewing on a case by case basis I am happy to do this. Adding a char to a String should be consider a mysterious hack, but it may be of marginal benefit. > why are some parts of the expression strings and other characters. Is this a question you would like me to answer or you are just raising this as a hypothetical question someone might ask. >> Clearing an existing collection instead of replacing it with a newly allocated one. > Again, the benefit is typically insignificant, but as a drawback an immutable collection may become mutable. What if some other code is still concurrently iterating the collection? Perhaps the static analyzer has taken this into account, but will a future programmer that wants to modify the class? If the field is final, a future programmer will need to consider this or the program won't compile.
      Hide
      Sebb added a comment -

      I agree with all Jukka's comments, except for making collections immutable.

      Where the Collections are private - as in AndFileFilter for example - making the item immutable improves thread safety, because the fields are then properly published.
      If there is a chance for concurrent access, then the class can and should protect the caller.
      If one thread creates a new collection whilst another is iterating it, then in the absence of synchronisation there is no guarantee what state the other thread will next see for the collection.

      Show
      Sebb added a comment - I agree with all Jukka's comments, except for making collections immutable. Where the Collections are private - as in AndFileFilter for example - making the item immutable improves thread safety, because the fields are then properly published. If there is a chance for concurrent access, then the class can and should protect the caller. If one thread creates a new collection whilst another is iterating it, then in the absence of synchronisation there is no guarantee what state the other thread will next see for the collection.
      Hide
      Jukka Zitting added a comment -

      It would be easier to review and apply this patch if it was broken down to pieces based on the different types of changes.

      See below for a list of the changes I'd rather not apply. Other changes seem reasonable enough, though it's debatable whether changing working code for no functional reason is wise as there's always the chance of accidentally introducing an error. Note that the use of foreach loops needs to wait until we switch to Java 5.

      > Changing single character string literals to character literals in string concatenations.

      The benefit is insignificant and the drawback is added conceptual complexity (why are some parts of the expression strings and other characters). Also, in expressions where other parts are variables, there is no syntactical hint that it's a string concatenation expression instead of an integer sum.

      > Introducing an initial size constant to collection constructors where the expected size is known.

      The benefit is in most cases insignificant and the drawback is the introduction of magic numbers in the code. Note that in specific cases this might give real-world performance or memory improvements, but those cases are better covered in separate issues with more detailed analysis.

      > Clearing an existing collection instead of replacing it with a newly allocated one.

      Again, the benefit is typically insignificant, but as a drawback an immutable collection may become mutable. What if some other code is still concurrently iterating the collection? Perhaps the static analyzer has taken this into account, but will a future programmer that wants to modify the class?

      Show
      Jukka Zitting added a comment - It would be easier to review and apply this patch if it was broken down to pieces based on the different types of changes. See below for a list of the changes I'd rather not apply. Other changes seem reasonable enough, though it's debatable whether changing working code for no functional reason is wise as there's always the chance of accidentally introducing an error. Note that the use of foreach loops needs to wait until we switch to Java 5. > Changing single character string literals to character literals in string concatenations. The benefit is insignificant and the drawback is added conceptual complexity (why are some parts of the expression strings and other characters). Also, in expressions where other parts are variables, there is no syntactical hint that it's a string concatenation expression instead of an integer sum. > Introducing an initial size constant to collection constructors where the expected size is known. The benefit is in most cases insignificant and the drawback is the introduction of magic numbers in the code. Note that in specific cases this might give real-world performance or memory improvements, but those cases are better covered in separate issues with more detailed analysis. > Clearing an existing collection instead of replacing it with a newly allocated one. Again, the benefit is typically insignificant, but as a drawback an immutable collection may become mutable. What if some other code is still concurrently iterating the collection? Perhaps the static analyzer has taken this into account, but will a future programmer that wants to modify the class?

        People

        • Assignee:
          Unassigned
          Reporter:
          Peter Lawrey
        • Votes:
          1 Vote for this issue
          Watchers:
          0 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Time Tracking

            Estimated:
            Original Estimate - 3h
            3h
            Remaining:
            Remaining Estimate - 3h
            3h
            Logged:
            Time Spent - Not Specified
            Not Specified

              Development