Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6829

how to get compressed hdfs file using impala or hive

    XMLWordPrintableJSON

Details

    • Question
    • Status: Resolved
    • Major
    • Resolution: Not A Bug
    • None
    • None
    • None
    • None
    • ghx-label-4

    Description

      hi,

       

      i am doing the self learning now the impala and trying to enable the compression for the table but could not see the hdfs file getting the extension?

      referring to 

      https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_txtfile.html

      but not sure how the final compressed file are creating. 

      When I try sqoop, i can get the compress file.  please guide.
      create table csv_compressed (a string, b string, c string)
      row format delimited fields terminated by ",";

      insert into csv_compressed values
      ('one - uncompressed', 'two - uncompressed', 'three - uncompressed'),
      ('abc - uncompressed', 'xyz - uncompressed', '123 - uncompressed');
      ...make equivalent .gz, .bz2, and .snappy files and load them into same table directory...

      select * from csv_compressed;
      ----------------------------------------------------------

      a b c

      ----------------------------------------------------------

      one - snappy two - snappy three - snappy
      one - uncompressed two - uncompressed three - uncompressed
      abc - uncompressed xyz - uncompressed 123 - uncompressed
      one - bz2 two - bz2 three - bz2
      abc - bz2 xyz - bz2 123 - bz2
      one - gzip two - gzip three - gzip
      abc - gzip xyz - gzip 123 - gzip

      ----------------------------------------------------------

      $ hdfs dfs -ls 'hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/';
      ...truncated for readability...
      75 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/csv_compressed.snappy
      79 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/csv_compressed_bz2.csv.bz2
      80 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/csv_compressed_gzip.csv.gz
      116 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/dd414df64d67d49b_data.0.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sat123_123 sathishkumar paramasivam
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: