Commons Compress
  1. Commons Compress
  2. COMPRESS-34

Compress should allow for writing to Zip Files

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0
    • Component/s: None
    • Labels:
      None

      Description

      Compress should be able to modify existing ZipFiles.

      1. patch-testcases.txt
        26 kB
        Christian Grobmeier
      2. patch-changeset-review1.txt
        17 kB
        Christian Grobmeier
      3. patch-changeset-improvement.txt
        36 kB
        Christian Grobmeier
      4. myzip2.zip
        232 kB
        Will Pugh
      5. myzip.zip
        219 kB
        Will Pugh

        Issue Links

          Activity

          Torsten Curdt made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Christian Grobmeier made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Christian Grobmeier added a comment -

          The ChangeSet has finally made its way into trunk and works
          Code is marked experimental since several ideas for improvment came up.

          Show
          Christian Grobmeier added a comment - The ChangeSet has finally made its way into trunk and works Code is marked experimental since several ideas for improvment came up.
          Christian Grobmeier made changes -
          Fix Version/s 1.0 [ 12313768 ]
          Hide
          Christian Grobmeier added a comment -

          planned as experimental for version 1.0

          Show
          Christian Grobmeier added a comment - planned as experimental for version 1.0
          Dennis Lundberg made changes -
          Project Commons Sandbox [ 12310491 ] Commons Compress [ 12310904 ]
          Key SANDBOX-183 COMPRESS-34
          Affects Version/s Nightly Builds [ 12311693 ]
          Component/s Compress [ 12311183 ]
          Christian Grobmeier made changes -
          Attachment patch-changeset-improvement.txt [ 12401163 ]
          Hide
          Christian Grobmeier added a comment -

          This patch improves the ChangeSet implementation. It should support most common usecases now. Additionally the Testcases have been improved.
          AbstractTestCase adds some files to the test-zip-file.

          Show
          Christian Grobmeier added a comment - This patch improves the ChangeSet implementation. It should support most common usecases now. Additionally the Testcases have been improved. AbstractTestCase adds some files to the test-zip-file.
          Christian Grobmeier made changes -
          Attachment patch-testcases.txt [ 12400241 ]
          Hide
          Christian Grobmeier added a comment -

          Please apply the attached patch. It improves the TestCases a bit. Its not perfect, but before the patch beomes to big, I would like to see it applied.
          Thanks!

          Changes:

          ChangeSet: removed System.out.println

          AbstracTestCase: added a method which adds the src/test/resources folder to the classpath. This is cool when starting the JUnit from eclipse. Additionally two new helper which create zipfiles and help with checking the content.

          DetectArchiverTestCase: extends from AbstractTestCase to get the "add to eclipse classpath" feature

          ZipTestCase: built in one assertion
          ChangeSetTestCase: several business logic testcases and assertion stuff. Assertions fail at the moment, but this is due to ChangeSet-Implementation bugs.

          src/test/resources: New files for better testcases

          Show
          Christian Grobmeier added a comment - Please apply the attached patch. It improves the TestCases a bit. Its not perfect, but before the patch beomes to big, I would like to see it applied. Thanks! Changes: ChangeSet: removed System.out.println AbstracTestCase: added a method which adds the src/test/resources folder to the classpath. This is cool when starting the JUnit from eclipse. Additionally two new helper which create zipfiles and help with checking the content. DetectArchiverTestCase: extends from AbstractTestCase to get the "add to eclipse classpath" feature ZipTestCase: built in one assertion ChangeSetTestCase: several business logic testcases and assertion stuff. Assertions fail at the moment, but this is due to ChangeSet-Implementation bugs. src/test/resources: New files for better testcases
          Hide
          Torsten Curdt added a comment -

          Applied a modified version (and fix the Ar-Classes). I didn't like the use of a worker class for example.

          Now while the general idea is implemented correctly it can be much more complicated. We should start creating more complex testcases for this. The problem is that the order in which the changes get applied will matter.

          Also how do we deal with directories?

          delete dir1
          add dir1/test.txt

          mv dir1/test.text dir2/test.txt
          delete dir1

          Also the question whether we should simplify changes before we apply them

          add test.txt
          mv test.txt test2.txt

          What will that it be? It's obvious it's the same as just

          add test2.txt

          but we would have to merge this in the changeset.

          Show
          Torsten Curdt added a comment - Applied a modified version (and fix the Ar-Classes). I didn't like the use of a worker class for example. Now while the general idea is implemented correctly it can be much more complicated. We should start creating more complex testcases for this. The problem is that the order in which the changes get applied will matter. Also how do we deal with directories? delete dir1 add dir1/test.txt mv dir1/test.text dir2/test.txt delete dir1 Also the question whether we should simplify changes before we apply them add test.txt mv test.txt test2.txt What will that it be? It's obvious it's the same as just add test2.txt but we would have to merge this in the changeset.
          Torsten Curdt made changes -
          Assignee Torsten Curdt [ tcurdt ]
          Christian Grobmeier made changes -
          Attachment patch-changeset-review1.txt [ 12397484 ]
          Hide
          Christian Grobmeier added a comment -

          First version of a possible changeset design. Please review and come back with comments. Supported are: Delete, Add.

          The ArOutputStream-Test fails. This isn't a bug of the ChangeSet but seems to be located in the Ar-Classes directly. Torsten, can you have a look at this? I didn't see the problem quickly.

          If you apply this patch and expierence trouble with the testcases, please try to copy the bla.ar and the test.txt file to java/main/resources.

          Show
          Christian Grobmeier added a comment - First version of a possible changeset design. Please review and come back with comments. Supported are: Delete, Add. The ArOutputStream-Test fails. This isn't a bug of the ChangeSet but seems to be located in the Ar-Classes directly. Torsten, can you have a look at this? I didn't see the problem quickly. If you apply this patch and expierence trouble with the testcases, please try to copy the bla.ar and the test.txt file to java/main/resources.
          Hide
          Christian Grobmeier added a comment -

          this cannot be fixed without a correct TAR implementation (patch provided in 273)

          Show
          Christian Grobmeier added a comment - this cannot be fixed without a correct TAR implementation (patch provided in 273)
          Christian Grobmeier made changes -
          Link This issue is blocked by SANDBOX-273 [ SANDBOX-273 ]
          Hide
          Christian Grobmeier added a comment -

          There are actually classes for this issue in the sourcebase. Some features are allready implemented.
          See also:
          http://svn.apache.org/viewvc/commons/sandbox/compress/trunk/src/main/java/org/apache/commons/compress/changes/

          Show
          Christian Grobmeier added a comment - There are actually classes for this issue in the sourcebase. Some features are allready implemented. See also: http://svn.apache.org/viewvc/commons/sandbox/compress/trunk/src/main/java/org/apache/commons/compress/changes/
          Hide
          Torsten Curdt added a comment -

          This should be implemented with a changeset design. You basically have an archive, then build up a set of changes and then apply them on the fly to the stream.

          src -> changes -> dst

          At least this means only what has changed needs to be modified. Everything else will get just copied.

          TBD in trunk.

          Show
          Torsten Curdt added a comment - This should be implemented with a changeset design. You basically have an archive, then build up a set of changes and then apply them on the fly to the stream. src -> changes -> dst At least this means only what has changed needs to be modified. Everything else will get just copied. TBD in trunk.
          Hide
          Will Pugh added a comment -

          I fixed the style concerns, but did not fix the allocation size issue yet. Should not be hard to do, and it should be done.

          I'm not sure how you plan to modify a zip file, whether you do it the "temp file" way or the "change in place" way, where a process crash would not create a corrupted version of the file.

          Traditionally, you would do this by creating a temporary zip file, and doing a "mv" on it, so the name changed to the new name, but no data needed to be moved. I'm not sure you can do this in Java, unless you had planned on shelling out the OS (and detecting different OSes to do this).

          There were two main reasons I wanted to do this in memory:

          1) So you would not need any type of temp directory
          2) The performance differences can be VAST in some cases. Imagine dealing with large XML or text files, or anything where the compression is significant. If you do the temp file approach, you need to uncompress everything, and write it to a file. Then, re-open all those files and re-compress them. This is not only a lot more CPU time in compressing and uncompressing, but also a lot more data-movement using the disk drive.

          Show
          Will Pugh added a comment - I fixed the style concerns, but did not fix the allocation size issue yet. Should not be hard to do, and it should be done. I'm not sure how you plan to modify a zip file, whether you do it the "temp file" way or the "change in place" way, where a process crash would not create a corrupted version of the file. Traditionally, you would do this by creating a temporary zip file, and doing a "mv" on it, so the name changed to the new name, but no data needed to be moved. I'm not sure you can do this in Java, unless you had planned on shelling out the OS (and detecting different OSes to do this). There were two main reasons I wanted to do this in memory: 1) So you would not need any type of temp directory 2) The performance differences can be VAST in some cases. Imagine dealing with large XML or text files, or anything where the compression is significant. If you do the temp file approach, you need to uncompress everything, and write it to a file. Then, re-open all those files and re-compress them. This is not only a lot more CPU time in compressing and uncompressing, but also a lot more data-movement using the disk drive.
          Will Pugh made changes -
          Attachment myzip2.zip [ 12361090 ]
          Hide
          Will Pugh added a comment -

          Fixed most of Mario's style concerns.

          Did not address the memory issue yet.

          Show
          Will Pugh added a comment - Fixed most of Mario's style concerns. Did not address the memory issue yet.
          Henri Yandell made changes -
          Fix Version/s Nightly Builds [ 12311693 ]
          Hide
          Christian Grobmeier added a comment -

          another thing: this newimplementation does depend on commons IO. I am not sure, but i think there was a discussion in the past that this is not a good idea. This "copy" method used is also somewhere in the compress code when i remember correct. I think its CompressUtils or something like that.

          Btw, you can add this implemetnation at runtime via the static addArchiver() method in the factory. May be helpful...

          And. for delete/modification support the Archiver-interfaces must be extended with such methods too- no big deal. Idea was:

          • get an list of ArchiveEntrys with: list() or entries()
          • do: delete(ArchiveEntry entry) for each entry you dont want any longer
          • do. update() to rewrite your zip file
          Show
          Christian Grobmeier added a comment - another thing: this newimplementation does depend on commons IO. I am not sure, but i think there was a discussion in the past that this is not a good idea. This "copy" method used is also somewhere in the compress code when i remember correct. I think its CompressUtils or something like that. Btw, you can add this implemetnation at runtime via the static addArchiver() method in the factory. May be helpful... And. for delete/modification support the Archiver-interfaces must be extended with such methods too- no big deal. Idea was: get an list of ArchiveEntrys with: list() or entries() do: delete(ArchiveEntry entry) for each entry you dont want any longer do. update() to rewrite your zip file
          Hide
          Christian Grobmeier added a comment -

          Adding it as altnerate would be possible. Archivers are registered at the beginning of "ArchiverFactory". There we could add this implementation. Lets name it "ZipQuick" cause it may be more quick but a bit unsure? The new stuff must implement the archiver-interfaces when integrating it (all in org.apache.commons.compress)

          Show
          Christian Grobmeier added a comment - Adding it as altnerate would be possible. Archivers are registered at the beginning of "ArchiverFactory". There we could add this implementation. Lets name it "ZipQuick" cause it may be more quick but a bit unsure? The new stuff must implement the archiver-interfaces when integrating it (all in org.apache.commons.compress)
          Hide
          Mario Ivankovits added a comment -

          > wouldn't it be easier just to expand the zipfile in an temp directory

          Yea, I am a little bit "conservative" about this topic too.
          How can we know that we do not damage a zip in any case, though, the algorithmic used currently seems not that complicated - I don't know.

          We should give it a try, maybe.

          Instead of incorporating it into the existing codebase, can't we add it as alternative implementation? Something like ZipEx method.
          So that we can have both methods (direct alteration and unzip/zip through temp dir).

          Show
          Mario Ivankovits added a comment - > wouldn't it be easier just to expand the zipfile in an temp directory Yea, I am a little bit "conservative" about this topic too. How can we know that we do not damage a zip in any case, though, the algorithmic used currently seems not that complicated - I don't know. We should give it a try, maybe. Instead of incorporating it into the existing codebase, can't we add it as alternative implementation? Something like ZipEx method. So that we can have both methods (direct alteration and unzip/zip through temp dir).
          Hide
          Christian Grobmeier added a comment -

          Hi,
          it seems difficult to integrate this in the existing codebase. It looks like we would have to throw away the old zip implementation and use this one, if we want this feature. Please help if i am wrong.

          For the memory problem: wouldn't it be easier just to expand the zipfile in an temp directory, take the intrnal catalog to see what files we need again and rebuild the whole stuff? I think this is the way winzip handles this. This would be quite safe and the old file would't get corrupt in case of any trouble. This strategy would also allow us to delete from the zip package. No?

          Cheers,
          Chris.

          Show
          Christian Grobmeier added a comment - Hi, it seems difficult to integrate this in the existing codebase. It looks like we would have to throw away the old zip implementation and use this one, if we want this feature. Please help if i am wrong. For the memory problem: wouldn't it be easier just to expand the zipfile in an temp directory, take the intrnal catalog to see what files we need again and rebuild the whole stuff? I think this is the way winzip handles this. This would be quite safe and the old file would't get corrupt in case of any trouble. This strategy would also allow us to delete from the zip package. No? Cheers, Chris.
          Hide
          Mario Ivankovits added a comment -

          I haven't had the time to give it a try, but looks promising.

          Just a few things in advance:

          • use the org.apache.commons.compress package prefix (instead of the com.* one)
          • add the ASF license header
          • use method names with lower-case first letter
          • did you already sent a CLA? [1]
          • do not use e.printStrackTrace()

          Then something technically

          • ModifiableZipFile.WriteExistingEntries: when you "shift" the content you allocate a buffer with the full sizeOfBlock size, isn't it dangerous (out of memeory) when you have to deal with large zip files, or at least with a large compressed entry?

          Thanks!
          Mario

          [1] http://www.apache.org/licenses/#clas

          Show
          Mario Ivankovits added a comment - I haven't had the time to give it a try, but looks promising. Just a few things in advance: use the org.apache.commons.compress package prefix (instead of the com.* one) add the ASF license header use method names with lower-case first letter did you already sent a CLA? [1] do not use e.printStrackTrace() Then something technically ModifiableZipFile.WriteExistingEntries: when you "shift" the content you allocate a buffer with the full sizeOfBlock size, isn't it dangerous (out of memeory) when you have to deal with large zip files, or at least with a large compressed entry? Thanks! Mario [1] http://www.apache.org/licenses/#clas
          Will Pugh made changes -
          Field Original Value New Value
          Attachment myzip.zip [ 12348211 ]
          Will Pugh created issue -

            People

            • Assignee:
              Torsten Curdt
              Reporter:
              Will Pugh
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development