Hadoop Common
  1. Hadoop Common
  2. HADOOP-7139

Allow appending to existing SequenceFiles

    Details

    • Type: Improvement Improvement
    • Status: Patch Available
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.0.0
    • Fix Version/s: None
    • Component/s: io
    • Labels:
      None
    • Release Note:
      Existing SequenceFiles can now be appended to
    • Target Version/s:
    1. HADOOP-7139.patch
      7 kB
      Stephen Rose
    2. HADOOP-7139.patch
      7 kB
      Stephen Rose
    3. HADOOP-7139.patch
      7 kB
      Stephen Rose
    4. HADOOP-7139.patch
      7 kB
      Stephen Rose
    5. HADOOP-7139-kt.patch
      41 kB
      Kristofer Tomasette

      Issue Links

        Activity

        Hide
        Stephen Rose added a comment -

        Didn't mean to close it

        Show
        Stephen Rose added a comment - Didn't mean to close it
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12470968/HADOOP-7139.patch
        against trunk revision 1070021.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.io.compress.TestCodec
        org.apache.hadoop.io.TestArrayFile
        org.apache.hadoop.io.TestBloomMapFile
        org.apache.hadoop.io.TestMapFile
        org.apache.hadoop.io.TestSequenceFileSerialization
        org.apache.hadoop.io.TestSequenceFile
        org.apache.hadoop.io.TestSetFile

        +1 contrib tests. The patch passed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/232//testReport/
        Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/232//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/232//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12470968/HADOOP-7139.patch against trunk revision 1070021. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.io.compress.TestCodec org.apache.hadoop.io.TestArrayFile org.apache.hadoop.io.TestBloomMapFile org.apache.hadoop.io.TestMapFile org.apache.hadoop.io.TestSequenceFileSerialization org.apache.hadoop.io.TestSequenceFile org.apache.hadoop.io.TestSetFile +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/232//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/232//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/232//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12470970/HADOOP-7139.patch
        against trunk revision 1070021.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.io.compress.TestCodec
        org.apache.hadoop.io.TestArrayFile
        org.apache.hadoop.io.TestBloomMapFile
        org.apache.hadoop.io.TestMapFile
        org.apache.hadoop.io.TestSequenceFileSerialization
        org.apache.hadoop.io.TestSequenceFile
        org.apache.hadoop.io.TestSetFile

        +1 contrib tests. The patch passed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/233//testReport/
        Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/233//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/233//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12470970/HADOOP-7139.patch against trunk revision 1070021. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.io.compress.TestCodec org.apache.hadoop.io.TestArrayFile org.apache.hadoop.io.TestBloomMapFile org.apache.hadoop.io.TestMapFile org.apache.hadoop.io.TestSequenceFileSerialization org.apache.hadoop.io.TestSequenceFile org.apache.hadoop.io.TestSetFile +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/233//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/233//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/233//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12471759/HADOOP-7139.patch
        against trunk revision 1071364.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/295//testReport/
        Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/295//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/295//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12471759/HADOOP-7139.patch against trunk revision 1071364. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/295//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/295//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/295//console This message is automatically generated.
        Hide
        Stephen Rose added a comment -

        No additional unit tests created as checksum fs doesn't support append. Have tested on HDFS.

        Show
        Stephen Rose added a comment - No additional unit tests created as checksum fs doesn't support append. Have tested on HDFS.
        Hide
        Todd Lipcon added a comment -
        • looks like the patch introduces some incorrect whitespace - our coding style is to use two spaces for indentation, and no "hard tabs"
        • appending to a seqfile should probably check that the version of the seqfile to be appended is the same as the current file format version - I see you added a getVersion() function but it's not used.
        • the Release Note and JIRA description indicates support for appending to MapFile, but that isn't available in the API.
        • It seems the change to be able to specify the compression block size is a separate logical change from the ability to append. We should probably break this into two JIRAs since it's two different features.
        Show
        Todd Lipcon added a comment - looks like the patch introduces some incorrect whitespace - our coding style is to use two spaces for indentation, and no "hard tabs" appending to a seqfile should probably check that the version of the seqfile to be appended is the same as the current file format version - I see you added a getVersion() function but it's not used. the Release Note and JIRA description indicates support for appending to MapFile, but that isn't available in the API. It seems the change to be able to specify the compression block size is a separate logical change from the ability to append. We should probably break this into two JIRAs since it's two different features.
        Hide
        Todd Lipcon added a comment -

        Also, could you write a unit test for this against RawLocalFileSystem? eg create a seqfile, close it, reopen it, append, close it, and then verify you can read the whole thing?

        Show
        Todd Lipcon added a comment - Also, could you write a unit test for this against RawLocalFileSystem? eg create a seqfile, close it, reopen it, append, close it, and then verify you can read the whole thing?
        Hide
        Tom White added a comment -

        Sorry, I made a mistake assigning this a moment ago when marking it as open (while Todd's feedback is addressed).

        Show
        Tom White added a comment - Sorry, I made a mistake assigning this a moment ago when marking it as open (while Todd's feedback is addressed).
        Hide
        Kristofer Tomasette added a comment -

        This patch depends on HADOOP-7817

        Show
        Kristofer Tomasette added a comment - This patch depends on HADOOP-7817
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12510845/HADOOP-7139-kt.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 26 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/515//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12510845/HADOOP-7139-kt.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 26 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/515//console This message is automatically generated.
        Hide
        Keith Wyss added a comment -

        Looking at this patch, it looks like a bunch of bookkeeping about compression metadata and support for not initializing the file with the typical SequenceFile header. Am I reading it correctly? Will this apply cleanly to #"CDH3U[45]"? Anyone tested it on those systems? Thank you.

        Show
        Keith Wyss added a comment - Looking at this patch, it looks like a bunch of bookkeeping about compression metadata and support for not initializing the file with the typical SequenceFile header. Am I reading it correctly? Will this apply cleanly to #"CDH3U [45] "? Anyone tested it on those systems? Thank you.

          People

          • Assignee:
            Stephen Rose
            Reporter:
            Stephen Rose
          • Votes:
            5 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:

              Time Tracking

              Estimated:
              Original Estimate - 2h
              2h
              Remaining:
              Remaining Estimate - 2h
              2h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development