Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4599

tar.gz compression doesn't produce correct output

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 0.12.1
    • None
    • None

    Description

      I'm not completely sure whether this is the right place to put this issue since Pig is involved, however, Pig leave decompression of tar.gz to hadoop-common.

      How to reproduce the issue:

      1. simple file (file1) with arbitrary text lines put into in1 in HDFS
      2. same file (file1) compressed by tar -cvzf file1.tar.gz file put into in2 in HDFS
      3. issue simple pig commands in pig:

        raw = load 'in1/' USING TextLoader AS (line: bytearray);
        dump raw;

        run for both (compressed and uncompressed file)

      4. in case of compressed version you will get strange 1st line

        a0000644000570000001440000000002512534073736011260 0ustar loadhadoopusersa
        ...

      Attachments

        Activity

          People

            Unassigned Unassigned
            xhudik Tomas Hudik
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: