Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4599

tar.gz compression doesn't produce correct output

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.12.1
    • Fix Version/s: None
    • Component/s: None

      Description

      I'm not completely sure whether this is the right place to put this issue since Pig is involved, however, Pig leave decompression of tar.gz to hadoop-common.

      How to reproduce the issue:

      1. simple file (file1) with arbitrary text lines put into in1 in HDFS
      2. same file (file1) compressed by tar -cvzf file1.tar.gz file put into in2 in HDFS
      3. issue simple pig commands in pig:

        raw = load 'in1/' USING TextLoader AS (line: bytearray);
        dump raw;

        run for both (compressed and uncompressed file)

      4. in case of compressed version you will get strange 1st line

        a0000644000570000001440000000002512534073736011260 0ustar loadhadoopusersa
        ...

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              xhudik Tomas Hudik
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: