Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6829

how to get compressed hdfs file using impala or hive

    Details

    • Type: Question
    • Status: Resolved
    • Priority: Major
    • Resolution: Not A Bug
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Epic Color:
      ghx-label-4

      Description

      hi,

       

      i am doing the self learning now the impala and trying to enable the compression for the table but could not see the hdfs file getting the extension?

      referring to 

      https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_txtfile.html

      but not sure how the final compressed file are creating. 

      When I try sqoop, i can get the compress file.  please guide.
      create table csv_compressed (a string, b string, c string)
      row format delimited fields terminated by ",";

      insert into csv_compressed values
      ('one - uncompressed', 'two - uncompressed', 'three - uncompressed'),
      ('abc - uncompressed', 'xyz - uncompressed', '123 - uncompressed');
      ...make equivalent .gz, .bz2, and .snappy files and load them into same table directory...

      select * from csv_compressed;
      ----------------------------------------------------------

      a b c

      ----------------------------------------------------------

      one - snappy two - snappy three - snappy
      one - uncompressed two - uncompressed three - uncompressed
      abc - uncompressed xyz - uncompressed 123 - uncompressed
      one - bz2 two - bz2 three - bz2
      abc - bz2 xyz - bz2 123 - bz2
      one - gzip two - gzip three - gzip
      abc - gzip xyz - gzip 123 - gzip

      ----------------------------------------------------------

      $ hdfs dfs -ls 'hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/';
      ...truncated for readability...
      75 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/csv_compressed.snappy
      79 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/csv_compressed_bz2.csv.bz2
      80 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/csv_compressed_gzip.csv.gz
      116 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/dd414df64d67d49b_data.0.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              sat123_123 sathishkumar paramasivam
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: