Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2746

Pig doesn't detect all forms of compression extensions properly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Reopened
    • Major
    • Resolution: Unresolved
    • 0.8.1
    • None
    • None
    • None
    • Reviewed

    Description

      The PigStorage has the following snippet.

      private void setCompression(Path path, Job job) {
           	String location=path.getName();
              if (location.endsWith(".bz2") || location.endsWith(".bz")) {
                  FileOutputFormat.setCompressOutput(job, true);
                  FileOutputFormat.setOutputCompressorClass(job,  BZip2Codec.class);
              }  else if (location.endsWith(".gz")) {
                  FileOutputFormat.setCompressOutput(job, true);
                  FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
              } else {
                  FileOutputFormat.setCompressOutput( job, false);
              }
          }
      

      This limits it to only work with STORE filenames provided as 'output.gz' or 'output.bz2' and for the rest (like LZO) one has to specify codecs and manually enable compression.

      Ideally Pig can rely on Hadoop's extension-to-codec detector instead of having this ladder.

      Attachments

        1. PIG-2746.patch
          2 kB
          Harsh J
        2. PIG-2746.patch
          2 kB
          Harsh J
        3. PIG-2746.patch
          3 kB
          Harsh J
        4. PIG-2746.patch
          4 kB
          Harsh J

        Issue Links

          Activity

            People

              Unassigned Unassigned
              qwertymaniac Harsh J
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: