Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1694

lzo compressed input files not properly recognized

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 0.14.0
    • None
    • None
    • None

    Description

      When running the wordcount example with text, gzip and lzo compressed input files, the lzo compressed input files are not properly recognized and are treated as text files.

      With an input dir of

      /user/hadoopqa/input/part-001.txt
      /user/hadoopqa/input/part-002.txt.gz
      /user/hadoopqa/input/part-003.txt.lzo

      and running this command

      bin/hadoopqa jar hadoop-examples.jar wordcount /user/hadoopqa/input /user/hadoopqa/output

      I get output that looks like

      row 4
      royal 4
      rt$3-ex?ÔøΩ?÷µIStÔøΩ"4D%ÔøΩ9$UÔøΩÔøΩ"ÔøΩ, 1
      ru$ÔøΩÔøΩ#~t"@ÔøΩm*d#\/$ÔøΩÔøΩl.t"XÔøΩÔøΩDi" 1
      rubbÔøΩdÔøΩ&@bT 1
      rubbed 2

      To lzo compress the file I used lzop:
      http://www.lzop.org/download/lzop-1.01-linux_i386.tar.gz

      Attachments

        1. part-201.txt.lzo
          80 kB
          Nigel Daley

        Issue Links

          Activity

            People

              acmurthy Arun Murthy
              nidaley Nigel Daley
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: