Pig
  1. Pig
  2. PIG-151

Zero length BZip files are not generated correctly

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.1.0
    • Component/s: impl
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      If a zero length BZIp file is created as a result of a store, the resulting BZIp file will be invalid: it will have part of the BZip header, but it will be missing everything else. The BZip library does not behave correctly when a BZip file is created and nothing is written before the close.

      1. bz.patch
        6 kB
        Benjamin Reed
      2. bz.patch
        2 kB
        Benjamin Reed

        Activity

        Benjamin Reed created issue -
        Benjamin Reed made changes -
        Field Original Value New Value
        Attachment bz.patch [ 12377947 ]
        Hide
        Pi Song added a comment -

        The solution looks good.

        • would be better if the unit test only does test CBZip2OutputStream/CBZip2InputStream.
        Show
        Pi Song added a comment - The solution looks good. would be better if the unit test only does test CBZip2OutputStream/CBZip2InputStream.
        Hide
        Benjamin Reed added a comment -

        Agreed. I made it end to end in part because we don't have an end to end test of the bzip code. I will add another test that just tests CBZip2OutputStream/CBZip2InputStream.

        Show
        Benjamin Reed added a comment - Agreed. I made it end to end in part because we don't have an end to end test of the bzip code. I will add another test that just tests CBZip2OutputStream/CBZip2InputStream.
        Olga Natkovich made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Benjamin Reed added a comment -

        This patch adds a unit test specifically for reading zero length bzip files and well as another end-to-end test.

        The patch also fixes an error related to figuring out the name of the output directory when determining whether this is a bzip file. Note, the source of this error was Hadoop-16, so the fix is specific to Hadoop-16. This will not work with Hadoop-15.

        Show
        Benjamin Reed added a comment - This patch adds a unit test specifically for reading zero length bzip files and well as another end-to-end test. The patch also fixes an error related to figuring out the name of the output directory when determining whether this is a bzip file. Note, the source of this error was Hadoop-16, so the fix is specific to Hadoop-16. This will not work with Hadoop-15.
        Benjamin Reed made changes -
        Attachment bz.patch [ 12380900 ]
        Hide
        Olga Natkovich added a comment -

        Ben, thanks for figuring this out.

        I have reviewed and tested the patch and will commit it once the SVN is back to normal.

        I agree that this is pretty hacky. I spoke with Owen and in Hadoop 0.17 things change again so we need to make more more changes. Fortunately, Hadoop is providing us a way to find the original name provided by the user via a static method in FileOutputFormat.

        Ben, how do we handle gzip files. Do we need to make any changes there?

        Show
        Olga Natkovich added a comment - Ben, thanks for figuring this out. I have reviewed and tested the patch and will commit it once the SVN is back to normal. I agree that this is pretty hacky. I spoke with Owen and in Hadoop 0.17 things change again so we need to make more more changes. Fortunately, Hadoop is providing us a way to find the original name provided by the user via a static method in FileOutputFormat. Ben, how do we handle gzip files. Do we need to make any changes there?
        Hide
        Olga Natkovich added a comment -

        patch committed. Thanks, Ben

        Show
        Olga Natkovich added a comment - patch committed. Thanks, Ben
        Olga Natkovich made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Amir Youssefi added a comment -

        Ben,

        Using Pig tutorial's excite.log.bz2 doesn't work but excite.log (uncompressed version runs). Here is URL and stack trace:

        http://wiki.apache.org/pig/PigTutorial

        java -cp pig_latest.jar org.apache.pig.Main -x local script1-local.pig
        2008-06-26 20:27:09,708 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.io.IOException: Unable to store alias null
        at org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:16)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:296)
        at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:457)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:233)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:63)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:60)
        at org.apache.pig.Main.main(Main.java:294)
        ?Pt???4??fd?@Q(/??C?!Ap7;?+?w]?<=v}k?m??w[3?

        {=?Z????????u??????????6r??v????l??8???????Y??vlwR??P??;??P 8p\?b?????;??}+|?[t??}?v>?????y?z?^h=?];j>w?<?Z??}???{?c?{?n>?wh>?@(W??
        ?????m?n?;ol|p'{?}?t{?[?>>?^
        ?oxf?)
        at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:136)
        at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:27)
        at org.apache.pig.PigServer.optimizeAndRunQuery(PigServer.java:413)
        at org.apache.pig.PigServer.registerQuery(PigServer.java:293)
        ... 5 more
        ?Pt???4??fd?@Q(/??C?!Ap7;?+?w]?<=v}k?m??w[3?{=?Z????????u??????????6r??v????l??8???????Y??vlwR??P??;??P 8p?b?????;??}

        +|?[t??}?v>?????y?z?^h=?];j>w?<?Z??}???{?c?{?n>?wh>?@(W??
        ?????m?n?;ol|p'{?}?t{?[?>>?^
        ?oxf?)
        at org.apache.pig.data.Tuple.getField(Tuple.java:176)
        at org.apache.pig.impl.eval.ProjectSpec.eval(ProjectSpec.java:84)
        at org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:38)
        at org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.exec(GenerateSpec.java:264)
        at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:88)
        at org.apache.pig.impl.eval.EvalSpec.simpleEval(EvalSpec.java:223)
        at org.apache.pig.impl.eval.cond.FuncCond.eval(FuncCond.java:72)
        at org.apache.pig.impl.eval.FilterSpec$1.add(FilterSpec.java:60)
        at org.apache.pig.backend.local.executionengine.POEval.getNext(POEval.java:113)
        at org.apache.pig.backend.local.executionengine.POEval.getNext(POEval.java:107)
        at org.apache.pig.backend.local.executionengine.POEval.getNext(POEval.java:107)
        at org.apache.pig.backend.local.executionengine.POEval.getNext(POEval.java:107)
        at org.apache.pig.backend.local.executionengine.POCogroup.open(POCogroup.java:88)
        at org.apache.pig.impl.physicalLayer.PhysicalOperator.open(PhysicalOperator.java:68)
        at org.apache.pig.backend.local.executionengine.POEval.open(POEval.java:74)
        at org.apache.pig.impl.physicalLayer.PhysicalOperator.open(PhysicalOperator.java:68)
        at org.apache.pig.backend.local.executionengine.POCogroup.open(POCogroup.java:66)
        at org.apache.pig.impl.physicalLayer.PhysicalOperator.open(PhysicalOperator.java:68)
        at org.apache.pig.backend.local.executionengine.POEval.open(POEval.java:74)
        at org.apache.pig.impl.physicalLayer.PhysicalOperator.open(PhysicalOperator.java:68)
        at org.apache.pig.backend.local.executionengine.POCogroup.open(POCogroup.java:66)
        at org.apache.pig.impl.physicalLayer.PhysicalOperator.open(PhysicalOperator.java:68)
        at org.apache.pig.backend.local.executionengine.POEval.open(POEval.java:74)
        at org.apache.pig.impl.physicalLayer.PhysicalOperator.open(PhysicalOperator.java:68)
        at org.apache.pig.backend.local.executionengine.POEval.open(POEval.java:74)
        at org.apache.pig.impl.physicalLayer.PhysicalOperator.open(PhysicalOperator.java:68)
        at org.apache.pig.backend.local.executionengine.POEval.open(POEval.java:74)
        at org.apache.pig.impl.physicalLayer.PhysicalOperator.open(PhysicalOperator.java:68)
        at org.apache.pig.backend.local.executionengine.POSort.open(POSort.java:54)
        at org.apache.pig.impl.physicalLayer.PhysicalOperator.open(PhysicalOperator.java:68)
        at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:126)
        ... 8 more

        Show
        Amir Youssefi added a comment - Ben, Using Pig tutorial's excite.log.bz2 doesn't work but excite.log (uncompressed version runs). Here is URL and stack trace: http://wiki.apache.org/pig/PigTutorial java -cp pig_latest.jar org.apache.pig.Main -x local script1-local.pig 2008-06-26 20:27:09,708 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.io.IOException: Unable to store alias null at org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:16) at org.apache.pig.PigServer.registerQuery(PigServer.java:296) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:457) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:233) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:63) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:60) at org.apache.pig.Main.main(Main.java:294) ?Pt???4??fd? @Q(/??C?!Ap7 ;?+? w]?<=v}k?m??w [3? {=?Z????????u??????????6r??v????l??8???????Y??vlwR??P??;??P 8p\?b?????;??} + |? [t??}?v>?????y?z?^h=?] ;j>w ?<?Z??}??? {?c?{?n> ? wh>?@( W ?? ????? m?n ? ;ol |p'{?}?t{ ?[? > >? ^ ?oxf?) at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:136) at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:27) at org.apache.pig.PigServer.optimizeAndRunQuery(PigServer.java:413) at org.apache.pig.PigServer.registerQuery(PigServer.java:293) ... 5 more ?Pt???4??fd? @Q(/??C?!Ap7 ;?+? w]?<=v}k?m??w [3?{=?Z????????u??????????6r??v????l??8???????Y??vlwR??P??;??P 8p?b?????;??} + |? [t??}?v>?????y?z?^h=?] ;j>w ?<?Z??}??? {?c?{?n> ? wh>?@( W ?? ????? m?n ? ;ol |p'{?}?t{ ?[? > >? ^ ?oxf?) at org.apache.pig.data.Tuple.getField(Tuple.java:176) at org.apache.pig.impl.eval.ProjectSpec.eval(ProjectSpec.java:84) at org.apache.pig.impl.eval.SimpleEvalSpec$1.add(SimpleEvalSpec.java:38) at org.apache.pig.impl.eval.GenerateSpec$CrossProductItem.exec(GenerateSpec.java:264) at org.apache.pig.impl.eval.GenerateSpec$1.add(GenerateSpec.java:88) at org.apache.pig.impl.eval.EvalSpec.simpleEval(EvalSpec.java:223) at org.apache.pig.impl.eval.cond.FuncCond.eval(FuncCond.java:72) at org.apache.pig.impl.eval.FilterSpec$1.add(FilterSpec.java:60) at org.apache.pig.backend.local.executionengine.POEval.getNext(POEval.java:113) at org.apache.pig.backend.local.executionengine.POEval.getNext(POEval.java:107) at org.apache.pig.backend.local.executionengine.POEval.getNext(POEval.java:107) at org.apache.pig.backend.local.executionengine.POEval.getNext(POEval.java:107) at org.apache.pig.backend.local.executionengine.POCogroup.open(POCogroup.java:88) at org.apache.pig.impl.physicalLayer.PhysicalOperator.open(PhysicalOperator.java:68) at org.apache.pig.backend.local.executionengine.POEval.open(POEval.java:74) at org.apache.pig.impl.physicalLayer.PhysicalOperator.open(PhysicalOperator.java:68) at org.apache.pig.backend.local.executionengine.POCogroup.open(POCogroup.java:66) at org.apache.pig.impl.physicalLayer.PhysicalOperator.open(PhysicalOperator.java:68) at org.apache.pig.backend.local.executionengine.POEval.open(POEval.java:74) at org.apache.pig.impl.physicalLayer.PhysicalOperator.open(PhysicalOperator.java:68) at org.apache.pig.backend.local.executionengine.POCogroup.open(POCogroup.java:66) at org.apache.pig.impl.physicalLayer.PhysicalOperator.open(PhysicalOperator.java:68) at org.apache.pig.backend.local.executionengine.POEval.open(POEval.java:74) at org.apache.pig.impl.physicalLayer.PhysicalOperator.open(PhysicalOperator.java:68) at org.apache.pig.backend.local.executionengine.POEval.open(POEval.java:74) at org.apache.pig.impl.physicalLayer.PhysicalOperator.open(PhysicalOperator.java:68) at org.apache.pig.backend.local.executionengine.POEval.open(POEval.java:74) at org.apache.pig.impl.physicalLayer.PhysicalOperator.open(PhysicalOperator.java:68) at org.apache.pig.backend.local.executionengine.POSort.open(POSort.java:54) at org.apache.pig.impl.physicalLayer.PhysicalOperator.open(PhysicalOperator.java:68) at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:126) ... 8 more
        Olga Natkovich made changes -
        Fix Version/s 0.1.0 [ 12312848 ]
        Alan Gates made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Benjamin Reed
            Reporter:
            Benjamin Reed
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development