Commons IO
  1. Commons IO
  2. IO-305

New copy() method in IOUtils that takes additional offset, length and buffersize arguments

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.2
    • Component/s: Utilities
    • Labels:
      None

      Description

      /**

      • Copy from input to output stream
      • @param is : input stream
      • @param os : output stream
      • @param offset : number of bytes to skip from input before copying
      • -ve values are ignored
      • @param len : number of bytes to copy. -1 means all
      • @param bufferSize : buffer size to use for copying
      • @throws IOException
        */
        public static void copy( InputStream is, OutputStream os, int offset, int len, int bufferSize) throws IOException
      1. IOUtilsTest.java
        5 kB
        Manoj Mokashi
      2. IOUtils.java
        1 kB
        Manoj Mokashi

        Issue Links

          Activity

          Hide
          Manoj Mokashi added a comment -

          The new method is in the IOUtils class.Note that its my class, not from the apache repository.
          The IOUtilsTest is junit test case.

          Show
          Manoj Mokashi added a comment - The new method is in the IOUtils class.Note that its my class, not from the apache repository. The IOUtilsTest is junit test case.
          Hide
          Sebb added a comment -

          Should use long for the offset and length.

          It would be useful to return the number of bytes actually copied, as per the copyLarge methods.

          Show
          Sebb added a comment - Should use long for the offset and length. It would be useful to return the number of bytes actually copied, as per the copyLarge methods.
          Hide
          Sebb added a comment -

          The code would probably be simplified by using copyLarge for the case where len == -1 (otherwise it has to keep checking for this).

          Also, unless the copyLarge methods are enhanced to allow the buffer or its size to be specified, it would be more consistent to use the same buffer size as the copyLarge methods, and omit the size parameter.

          Show
          Sebb added a comment - The code would probably be simplified by using copyLarge for the case where len == -1 (otherwise it has to keep checking for this). Also, unless the copyLarge methods are enhanced to allow the buffer or its size to be specified, it would be more consistent to use the same buffer size as the copyLarge methods, and omit the size parameter.
          Hide
          Sebb added a comment -

          Added methods for bytes and chars based on copyLarge.

          Dropped buffer size parameter as not essential.

          Show
          Sebb added a comment - Added methods for bytes and chars based on copyLarge. Dropped buffer size parameter as not essential.
          Hide
          Manoj Mokashi added a comment -

          In my humble opinion, the buffer size parameter is important to control performance,
          the default buffer size is 4096, and for large files we would need more. If copyLarge should not accept a buffer size parameter,
          maybe we should use the copy method. Is the check for len == -1 really a performance issue, especially since we don't read byte-by-byte ?

          Show
          Manoj Mokashi added a comment - In my humble opinion, the buffer size parameter is important to control performance, the default buffer size is 4096, and for large files we would need more. If copyLarge should not accept a buffer size parameter, maybe we should use the copy method. Is the check for len == -1 really a performance issue, especially since we don't read byte-by-byte ?
          Hide
          Sebb added a comment -

          The default buffer size of 4096 was chosen because it gives good performance.

          Have you any performance tests that show otherwise?

          If so, we can consider implementing this for all the copyLarge methods, see: IO-308

          Is the check for len == -1 really a performance issue

          Code no longer checks the length twice; I reimplemented the loop in order to support returning the copied length.

          Show
          Sebb added a comment - The default buffer size of 4096 was chosen because it gives good performance. Have you any performance tests that show otherwise? If so, we can consider implementing this for all the copyLarge methods, see: IO-308 Is the check for len == -1 really a performance issue Code no longer checks the length twice; I reimplemented the loop in order to support returning the copied length.
          Hide
          Manoj Mokashi added a comment -

          I tested copying a 500MB tar archive with diffent buffersizes, and it does make a difference.
          e.g. buffersize => time in millis:
          4096=>129954,*16=>71734,*64=91328
          4096=>120406,*16=>80219,*64=69687

          btw, accessing buffer.length inside the loop seems to affect performance for bigger lengths.
          As seen in the 1st statistics, the *64 method actually takes longer than *16.
          In the 2nd set i have used a buffersize var outside the loop as its constant.
          I guess the results will vary as per avail memory, OS, disk types etc.
          But it does make a difference to specify buffer size.

          wrt IO-308, i agree that passing a buffer would be even better.

          Show
          Manoj Mokashi added a comment - I tested copying a 500MB tar archive with diffent buffersizes, and it does make a difference. e.g. buffersize => time in millis: 4096=>129954,*16=>71734,*64=91328 4096=>120406,*16=>80219,*64=69687 btw, accessing buffer.length inside the loop seems to affect performance for bigger lengths. As seen in the 1st statistics, the *64 method actually takes longer than *16. In the 2nd set i have used a buffersize var outside the loop as its constant. I guess the results will vary as per avail memory, OS, disk types etc. But it does make a difference to specify buffer size. wrt IO-308 , i agree that passing a buffer would be even better.
          Hide
          Gary Gregory added a comment -

          Version 2.2 has been released and addresses this issue.

          Show
          Gary Gregory added a comment - Version 2.2 has been released and addresses this issue.

            People

            • Assignee:
              Unassigned
              Reporter:
              Manoj Mokashi
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development