Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-752

local mode doesn't read bzip2 and gzip compressed data files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 0.4.0
    • None
    • None
    • None

    Description

      Problem 1) use of .bz2 file extension does not store results bzip2 compressed in Local mode (-exectype local)

      If I use the .bz2 filename extension in a STORE statement on HDFS, the results are stored with bzip2 compression.
      If I use the .bz2 filename extension in a STORE statement on local file system, the results are NOT stored with bzip2 compression.

      compact.bz2.pig:

      A = load 'events.test' using PigStorage();
      store A into 'events.test.bz2' using PigStorage();
      
      C = load 'events.test.bz2' using PigStorage();
      C = limit C 10;
      
      dump C;
      
      -bash-3.00$ pig -exectype local compact.bz2.pig
      
      -bash-3.00$ file events.test
      events.test: ASCII English text, with very long lines
      -bash-3.00$ file events.test.bz2
      events.test.bz2: ASCII English text, with very long lines
      
      -bash-3.00$ cat events.test | bzip2 > events.test.bz2
      -bash-3.00$ file events.test.bz2
      events.test.bz2: bzip2 compressed data, block size = 900k
      

      The output format in local mode is definitely not bzip2, but it should be.

      
      Problem 2) pig in local mode does not decompress bzip2 compressed files, but should, to be consistent with HDFS
      
      read.bz2.pig:
      

      A = load 'events.test.bz2' using PigStorage();
      A = limit A 10;
      dump A;

      The output should be human readable but is instead garbage, indicating no decompression took place during the load:
      
      

      -bash-3.00$ pig -exectype local read.bz2.pig
      USING: /grid/0/gs/pig/current
      2009-04-03 18:26:30,455 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
      2009-04-03 18:26:30,456 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
      (BZh91AY&SYoz?u??@{????????????????x_?d?|u??-mK?;????????????4?C)
      ((R? 6?*m?&?g, ?6?Zj?k,?0???QT?d?hY?#m??J?>????[j?z?m?t?u?K)K5+)?m?E7j?X?8??????a
      U?p@@MT?$?B?P??N=?(??z<}GK?E

      {@????c$\??I????]?G:?J) a(R?,?U?V??????@?I@??J??!D?)???A?PP?IY??m? (m????P(i?4,#F[?I)@????>??@??|7^?}

      U??w??wg,?u?$?T???((Q!D?=`?}h????P_|=?(??2?m=???xG?(?rC?B?(33:4?N?????t??|T???k??NT?x?=?fyv?w>f????4z?4t?)
      (?oou?t?Kwl?3?nCM?WS?;l?P?s?x
      a?e??)B??9? ?44
      ((??@4?)
      (f????)
      (?@+?d?0@>?U)
      (Q?SR)
      -bash-3.00$

      
      

      Attachments

        1. Pig_752.Patch
          10 kB
          Jeff Zhang

        Activity

          People

            zjffdu Jeff Zhang
            ciemo David Ciemiewicz
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: