Hadoop Common
  1. Hadoop Common
  2. HADOOP-3977

SequenceFile.Writer reopen (hdfs append)

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: io
    • Labels:
      None

      Description

      Allows for reopening and appending to a SequenceFile

      1. HADOOP-3977.txt
        43 kB
        Karl Wettin
      2. HADOOP-3977.txt
        37 kB
        Karl Wettin

        Issue Links

          Activity

          Karl Wettin created issue -
          Hide
          Karl Wettin added a comment -

          No java docs nor test case available, yet. 'll be back with a new patch soon enough, but would very much like to hear that I didn't miss out on something important. It seems to work just fine over here (famous last words)

          Show
          Karl Wettin added a comment - No java docs nor test case available, yet. 'll be back with a new patch soon enough, but would very much like to hear that I didn't miss out on something important. It seems to work just fine over here (famous last words)
          Karl Wettin made changes -
          Field Original Value New Value
          Attachment HADOOP-3977.txt [ 12388590 ]
          Hide
          Karl Wettin added a comment -
          • @param javadocs
          • TestSequenceFileReopen

          Caveat: block compressed sequence files does not support reopening. Not sure how possible this is. An IOException will be thrown if attempted.

          Also, over here TestMapRed fails over here. I don't think it is related to my patch.

          Testcase: testCompression took 1,39 sec
          	Caused an ERROR
          Job failed!
          java.io.IOException: Job failed!
          	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1123)
          	at org.apache.hadoop.mapred.TestMapRed.checkCompression(TestMapRed.java:441)
          	at org.apache.hadoop.mapred.TestMapRed.testCompression(TestMapRed.java:463)
          
          Show
          Karl Wettin added a comment - @param javadocs TestSequenceFileReopen Caveat: block compressed sequence files does not support reopening. Not sure how possible this is. An IOException will be thrown if attempted. Also, over here TestMapRed fails over here. I don't think it is related to my patch. Testcase: testCompression took 1,39 sec Caused an ERROR Job failed! java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1123) at org.apache.hadoop.mapred.TestMapRed.checkCompression(TestMapRed.java:441) at org.apache.hadoop.mapred.TestMapRed.testCompression(TestMapRed.java:463)
          Karl Wettin made changes -
          Attachment HADOOP-3977.txt [ 12388698 ]
          Karl Wettin made changes -
          Link This issue blocks MAHOUT-19 [ MAHOUT-19 ]
          Hide
          Doug Cutting added a comment -

          At a glance, this looks reasonable. The SequenceFile constructors are a mess, and this only makes them worse, but that's a separate issue, I guess. Let's see what Hudson says...

          Show
          Doug Cutting added a comment - At a glance, this looks reasonable. The SequenceFile constructors are a mess, and this only makes them worse, but that's a separate issue, I guess. Let's see what Hudson says...
          Doug Cutting made changes -
          Assignee Karl Wettin [ karl.wettin ]
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12388698/HADOOP-3977.txt
          against trunk revision 689666.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 4 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          -1 release audit. The applied patch generated 272 release audit warnings (more than the trunk's current 271 warnings).

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3131/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3131/artifact/trunk/current/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3131/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3131/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3131/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12388698/HADOOP-3977.txt against trunk revision 689666. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 272 release audit warnings (more than the trunk's current 271 warnings). -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3131/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3131/artifact/trunk/current/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3131/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3131/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3131/console This message is automatically generated.
          Hide
          Owen O'Malley added a comment -

          The back up was put in because of a corner case. It is fixable, but I believe you are still vulnerable to it.

          (start) a b c \r (start) \n d e f \r \n g h i \r \n

          Clearly, the lines should be:

          split1: a b c / d e f
          split2: g h i

          But I believe, your patch will lose the d e f line.

          Show
          Owen O'Malley added a comment - The back up was put in because of a corner case. It is fixable, but I believe you are still vulnerable to it. (start) a b c \r (start) \n d e f \r \n g h i \r \n Clearly, the lines should be: split1: a b c / d e f split2: g h i But I believe, your patch will lose the d e f line.
          Owen O'Malley made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Karl Wettin added a comment -

          I think I'm giving up on this patch. It might be cleaner to implement a new thin layer on top of DataOutputStream that allows for appending and is binary compatible with a limited subset of the SequenceFile features, i.e. no compression support.

          I'll report back on this here as soon as I've got something (any year now).

          Show
          Karl Wettin added a comment - I think I'm giving up on this patch. It might be cleaner to implement a new thin layer on top of DataOutputStream that allows for appending and is binary compatible with a limited subset of the SequenceFile features, i.e. no compression support. I'll report back on this here as soon as I've got something (any year now).
          Harsh J made changes -
          Link This issue is superceded by HADOOP-7139 [ HADOOP-7139 ]
          Hide
          Harsh J added a comment -

          A fresher effort is ongoing at HADOOP-7139

          (Resolving as duplicate)

          Show
          Harsh J added a comment - A fresher effort is ongoing at HADOOP-7139 (Resolving as duplicate)
          Harsh J made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Duplicate [ 3 ]

            People

            • Assignee:
              Karl Wettin
              Reporter:
              Karl Wettin
            • Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development