Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11120

load-data.py does not load ORC files with specified codec

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 4.1.0
    • Infrastructure
    • None
    • ghx-label-9

    Description

      I ran the following command to generate TPC-H tables in ORC format using SNAPPY compression:

      bin/load-data.py -w tpch -e core --table_formats=orc/snap/block
      

      After it succeeded, I realized the compression is still ZLIB:

      $ hive --service orcfiledump hdfs://localhost:20500/test-warehouse/tpch.lineitem_orc_snap/000000_0
      Processing data file hdfs://localhost:20500/test-warehouse/tpch.lineitem_orc_snap/000000_0 [length: 149783256]
      Structure for hdfs://localhost:20500/test-warehouse/tpch.lineitem_orc_snap/000000_0
      File Version: 0.12 with ORC_135
      Rows: 6001215
      Compression: ZLIB         <-------- not SNAPPY
      Compression size: 262144
      Calendar: Julian/Gregorian
      

      The Hive statements we use to generate data are

      SET hive.exec.compress.output=true;
      SET mapred.output.compression.type=BLOCK;
      SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
      SET hive.exec.dynamic.partition.mode=nonstrict;
      SET hive.exec.dynamic.partition=true;
      SET hive.exec.max.dynamic.partitions=10000;
      SET hive.exec.max.dynamic.partitions.pernode=10000;
      set hive.auto.convert.join=true;
      SET mapred.max.split.size=256000000;
      SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
      INSERT OVERWRITE TABLE tpch_orc_snap.lineitem SELECT * FROM tpch.lineitem;
      

      Setting mapred.output.compression.codec does not work in ORC format. Instead, we need to set tblproperty "orc.compress" to "SNAPPY".

      ref: https://orc.apache.org/docs/hive-config.html

      Attachments

        Activity

          People

            stigahuang Quanlong Huang
            stigahuang Quanlong Huang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: