Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3
    • Component/s: Archivers
    • Labels:
      None

      Description

      I'm submitting a series of patches to the ext2/3/4 dump utility and noticed that the commons-compress library doesn't have an archiver for it. It's as old as tar and fills a similar niche but the later has become much more widely used. Dump includes support for sparse files, extended attributes, mac os finder, SELinux labels (I think), and more. Incremental dumps can capture that files have been deleted.

      I should have initial support for a decoder this weekend. I can read the directory entries and inode information (file permissions, etc.) but need a bit more work on extracting the content as an InputStream.

      1. dump.zip
        32 kB
        Bear Giles
      2. dump-20110722.zip
        37 kB
        Bear Giles
      3. test.dump
        4.03 MB
        Bear Giles
      4. test-z.dump
        155 kB
        Bear Giles

        Activity

        Hide
        Stefan Bodewig added a comment -

        I've settled for an offset of 24 bytes for format detection with svn revision 1170505.

        Show
        Stefan Bodewig added a comment - I've settled for an offset of 24 bytes for format detection with svn revision 1170505.
        Hide
        Stefan Bodewig added a comment -

        I've added dump support on the base of the current trunk code to the Compress Antlib[1]'s trunk and its different type of test archive works as well.

        So I see autodetection and docs as the only things missing to close this, yay.

        The current javadocs say we'd support BZIP2 (in package.html), but this is not true, is it?

        [1] http://ant.apache.org/antlibs/compress/index.html

        Show
        Stefan Bodewig added a comment - I've added dump support on the base of the current trunk code to the Compress Antlib [1] 's trunk and its different type of test archive works as well. So I see autodetection and docs as the only things missing to close this, yay. The current javadocs say we'd support BZIP2 (in package.html), but this is not true, is it? [1] http://ant.apache.org/antlibs/compress/index.html
        Hide
        Stefan Bodewig added a comment - - edited

        extract test passes with svn revision 1158723 - ArchiveEntry#getSize() is supposed to return -1 for directories and the test relied on it, DumpArchiveEntry now does.

        Show
        Stefan Bodewig added a comment - - edited extract test passes with svn revision 1158723 - ArchiveEntry#getSize() is supposed to return -1 for directories and the test relied on it, DumpArchiveEntry now does.
        Hide
        Stefan Bodewig added a comment -

        I've managed to create archives and committed an initial testcase which fails to extract the files (they are empty), I'll try to investigate whether this is due to changes I have made or already happens with your code base later.

        The other thing that fails is format recognition. The archives I have created contain the magic number 60012 at offset 24 (where my magic file expects it to be) but the code in the matches method is looking at offset 7. Is the code wrong?

        Also, do you think we'd need to also check for little endian order or the old-fs dump (60011) or does the current code not support those anyway?

        Show
        Stefan Bodewig added a comment - I've managed to create archives and committed an initial testcase which fails to extract the files (they are empty), I'll try to investigate whether this is due to changes I have made or already happens with your code base later. The other thing that fails is format recognition. The archives I have created contain the magic number 60012 at offset 24 (where my magic file expects it to be) but the code in the matches method is looking at offset 7. Is the code wrong? Also, do you think we'd need to also check for little endian order or the old-fs dump (60011) or does the current code not support those anyway?
        Hide
        Stefan Bodewig added a comment -

        Ah, yes, a loopback fs should do the trick, will try that later. Thanks!

        Show
        Stefan Bodewig added a comment - Ah, yes, a loopback fs should do the trick, will try that later. Thanks!
        Hide
        Bear Giles added a comment -

        I used a loopback filesystem

        1. truncate -s 4g test1fs (creates large sparse file)
        2. losetup /dev/loop7 test1fs
        3. mkfs.ext2 /dev/loop7
        4. losetup -d /dev/loop7
        5. mount -oloopback test1fs /mnt
        6. (populate partition with test data)
        7. dump ... (note: mount will probably mount on a different loopback
          device)

        (or something like that - I may have various arguments reversed)

        Show
        Bear Giles added a comment - I used a loopback filesystem truncate -s 4g test1fs (creates large sparse file) losetup /dev/loop7 test1fs mkfs.ext2 /dev/loop7 losetup -d /dev/loop7 mount -oloopback test1fs /mnt (populate partition with test data) dump ... (note: mount will probably mount on a different loopback device) (or something like that - I may have various arguments reversed)
        Hide
        Stefan Bodewig added a comment -

        dump on Linux - Ubuntu 10.4 in my case - claims you can dump individual files.

        Show
        Stefan Bodewig added a comment - dump on Linux - Ubuntu 10.4 in my case - claims you can dump individual files.
        Hide
        Sebb added a comment -

        According to "man dump" on people.a.o, you can only dump filesystems, so test1.xml and test2.xml should be replaced by /dev/sda6.

        Perhaps try dumping a USB device instead, as you can populate that as needed?

        Show
        Sebb added a comment - According to "man dump" on people.a.o, you can only dump filesystems, so test1.xml and test2.xml should be replaced by /dev/sda6. Perhaps try dumping a USB device instead, as you can populate that as needed?
        Hide
        Stefan Bodewig added a comment -

        Later revisions have fixed some issues detected by findbugs, made some methods less public or fixed javadocs, so it has changed quite a bit.

        My initial attempt to run your testcase resulted in Java spinning in an infinite loop, I'll investigate this further.

        I tried to create a dump file on my Linux box - preferably one that has the same contents as src/test/resources/bla.* in Compress' trunk source tree - but failed so far. Cursory reading of the manual page is obviously not enough to make it work. Right now I don't know what to make of

        stefan@birdy:~/cc$ sudo dump -v -f bla.dump test1.xml test2.xml 
          DUMP: Date of this level 0 dump: Tue Aug 16 06:34:18 2011
          DUMP: Dumping /dev/sda6 (/home (dir /stefan/cc/test1.xml)) to bla.dump
          DUMP: Excluding inode 8 (journal inode) from dump
          DUMP: Excluding inode 7 (resize inode) from dump
          DUMP: Label: none
          DUMP: Writing 10 Kilobyte records
          DUMP: mapping (Pass I) [regular files]
        /dev/sda6: File not found by ext2_lookup while translating .xml
        
        Show
        Stefan Bodewig added a comment - Later revisions have fixed some issues detected by findbugs, made some methods less public or fixed javadocs, so it has changed quite a bit. My initial attempt to run your testcase resulted in Java spinning in an infinite loop, I'll investigate this further. I tried to create a dump file on my Linux box - preferably one that has the same contents as src/test/resources/bla.* in Compress' trunk source tree - but failed so far. Cursory reading of the manual page is obviously not enough to make it work. Right now I don't know what to make of stefan@birdy:~/cc$ sudo dump -v -f bla.dump test1.xml test2.xml DUMP: Date of this level 0 dump: Tue Aug 16 06:34:18 2011 DUMP: Dumping /dev/sda6 (/home (dir /stefan/cc/test1.xml)) to bla.dump DUMP: Excluding inode 8 (journal inode) from dump DUMP: Excluding inode 7 (resize inode) from dump DUMP: Label: none DUMP: Writing 10 Kilobyte records DUMP: mapping (Pass I) [regular files] /dev/sda6: File not found by ext2_lookup while translating .xml
        Hide
        Stefan Bodewig added a comment -

        svn revision 1157769 contains a repackaged version of the main tree of your code.

        Things I've changed:

        • repackaged to live in org.apache.commons land
        • removed all @author tags and instead added you to the POM as contributor, hope this is OK with you (we don't do @author tags). Should this is a problem for you then I'll simply remove the code again.
        • merged POSIXArchiveEntry into DumpArchiveEntry for now
        • renamed getModTime to getLastModifiedDate as your class didn't implement that method (it was added in Compress 1.1)

        Missing for me in order to close this are tests - will add some once I have access to a machine that has dump installed - and initial documentation for the site. I'll take care of that as well.

        Show
        Stefan Bodewig added a comment - svn revision 1157769 contains a repackaged version of the main tree of your code. Things I've changed: repackaged to live in org.apache.commons land removed all @author tags and instead added you to the POM as contributor, hope this is OK with you (we don't do @author tags). Should this is a problem for you then I'll simply remove the code again. merged POSIXArchiveEntry into DumpArchiveEntry for now renamed getModTime to getLastModifiedDate as your class didn't implement that method (it was added in Compress 1.1) Missing for me in order to close this are tests - will add some once I have access to a machine that has dump installed - and initial documentation for the site. I'll take care of that as well.
        Hide
        Stefan Bodewig added a comment -

        First of all, thanks!

        Don't waste your time on porting the code to Java 1.4 since the CC 1.2 release is expected to be days rather than weeks away.

        The OutputStream problem is not unique to the dump format - which I've not made myself familiar with, yet - but may affect it to a stronger degree. A simple similar case is AR using the GNU method for storing long files where all filenames of the archives appear in a separate entry at the front of the archive (at least that's where GNU ar puts it). ZIP has the problem on the reading side and we introduced ZipFile (well, the java.util.zip designers did) to deal with it. Maybe we'd need to introduce a DumpFile class for writing dump archives that can use RandomAccessFile underneath.

        As for JNI/JNA, ouch! 8-)

        Show
        Stefan Bodewig added a comment - First of all, thanks! Don't waste your time on porting the code to Java 1.4 since the CC 1.2 release is expected to be days rather than weeks away. The OutputStream problem is not unique to the dump format - which I've not made myself familiar with, yet - but may affect it to a stronger degree. A simple similar case is AR using the GNU method for storing long files where all filenames of the archives appear in a separate entry at the front of the archive (at least that's where GNU ar puts it). ZIP has the problem on the reading side and we introduced ZipFile (well, the java.util.zip designers did) to deal with it. Maybe we'd need to introduce a DumpFile class for writing dump archives that can use RandomAccessFile underneath. As for JNI/JNA, ouch! 8-)
        Hide
        Bear Giles added a comment -

        Attached is latest snapshot. It contains a unit test and two sample files. The sample files (which are identical except for compression) contain various special files, SELinux labels and one user-defined attribute. XA is recognized but ignored.

        There's a POSIXArchiveEntry class but it hasn't been synced with recent changes in the commons library.

        Since there isn't a lot of code it could be made consistent with 1.4 if desired.

        On output streams - I have been giving a lot of thought since the archives aren't simple streams. I can skim the extra information in the input stream since it only affects internal state. I can't do that on the output stream because some of the header information is exactly sized to the amount of data that will be archived - you can't keep adding to it like a tar or zip file. (Unless you're willing to cache everything, of course.) I haven't given up on this approach yet but it's a non-trivial problem.

        The other issue is getting some of the gory details that aren't available in the standard java.io.* classes. The cleanest way is using JNA to bind to libc... but that introduces dependencies on JNA.

        Show
        Bear Giles added a comment - Attached is latest snapshot. It contains a unit test and two sample files. The sample files (which are identical except for compression) contain various special files, SELinux labels and one user-defined attribute. XA is recognized but ignored. There's a POSIXArchiveEntry class but it hasn't been synced with recent changes in the commons library. Since there isn't a lot of code it could be made consistent with 1.4 if desired. On output streams - I have been giving a lot of thought since the archives aren't simple streams. I can skim the extra information in the input stream since it only affects internal state. I can't do that on the output stream because some of the header information is exactly sized to the amount of data that will be archived - you can't keep adding to it like a tar or zip file. (Unless you're willing to cache everything, of course.) I haven't given up on this approach yet but it's a non-trivial problem. The other issue is getting some of the gory details that aren't available in the standard java.io.* classes. The cleanest way is using JNA to bind to libc... but that introduces dependencies on JNA.
        Hide
        Stefan Bodewig added a comment -

        Since the implementation uses Java5 features internally it will have to be pushed to 1.3, at least.

        We'll have to decide whether we want to include an implementation without an OutputStream once 1.3 is close enough to actually talk about a release.

        Show
        Stefan Bodewig added a comment - Since the implementation uses Java5 features internally it will have to be pushed to 1.3, at least. We'll have to decide whether we want to include an implementation without an OutputStream once 1.3 is close enough to actually talk about a release.
        Hide
        Bear Giles added a comment -

        This is a proof-of-concept implementation for dump archive files. It handles ZLIB compression (and can easily add BZLIB compression when available) and sparse files. I haven't added support for extended attributes yet or for deleted files (in incremental backups) yet.

        I have verified the test extraction of a 1GB compressed backup.

        Show
        Bear Giles added a comment - This is a proof-of-concept implementation for dump archive files. It handles ZLIB compression (and can easily add BZLIB compression when available) and sparse files. I haven't added support for extended attributes yet or for deleted files (in incremental backups) yet. I have verified the test extraction of a 1GB compressed backup.

          People

          • Assignee:
            Unassigned
            Reporter:
            Bear Giles
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development