Hive
  1. Hive
  2. HIVE-2250

"DESCRIBE EXTENDED table_name" shows inconsistent compression information.

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 0.7.0
    • Fix Version/s: None
    • Component/s: CLI, Diagnosability
    • Labels:
      None
    • Environment:

      RHEL, Full Cloudera stack

      Description

      Commands executed in this order:

      user@node # hive
      hive> SET hive.exec.compress.output=true;
      hive> SET io.seqfile.compression.type=BLOCK;
      hive> CREATE TABLE table_name ( [...] ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS SEQUENCEFILE;
      hive> CREATE TABLE staging_table ( [...] ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
      hive> LOAD DATA LOCAL INPATH 'file:///root/input/' OVERWRITE INTO TABLE staging_table;
      hive> INSERT OVERWRITE TABLE table_name SELECT * FROM staging_table;
      (Map reduce job to change to sequence file...)
      hive> DESCRIBE EXTENDED table_name;

      Detailed Table Information Table(tableName:table_name, dbName:benchmarking, owner:root, createTime:1309480053, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:session_key, type:string, comment:null), FieldSchema(name:remote_address, type:string, comment:null), FieldSchema(name:canister_lssn, type:string, comment:null), FieldSchema(name:canister_session_id, type:bigint, comment:null), FieldSchema(name:tltsid, type:string, comment:null), FieldSchema(name:tltuid, type:string, comment:null), FieldSchema(name:tltvid, type:string, comment:null), FieldSchema(name:canister_server, type:string, comment:null), FieldSchema(name:session_timestamp, type:string, comment:null), FieldSchema(name:session_duration, type:string, comment:null), FieldSchema(name:hit_count, type:bigint, comment:null), FieldSchema(name:http_user_agent, type:string, comment:null), FieldSchema(name:extractid, type:bigint, comment:null), FieldSchema(name:site_link, type:string, comment:null), FieldSchema(name:dt, type:string, comment:null), FieldSchema(name:hour, type:int, comment:null)], location:hdfs://hadoop2/user/hive/warehouse/benchmarking.db/table_name, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format= , field.delim=

          • SEE ABOVE: Compression is set to FALSE, even though contents of table is compressed.
      1. HIVE-2250.patch
        12 kB
        subramanian raghunathan

        Issue Links

          Activity

          Hide
          subramanian raghunathan added a comment -

          Handled the following scenarios

          Create table
          Create table like
          Alter table fileformat

          Based on the Inputformat if its type is secquenceFileFormat , then compression flag is set to true and vice versa.

          Show
          subramanian raghunathan added a comment - Handled the following scenarios Create table Create table like Alter table fileformat Based on the Inputformat if its type is secquenceFileFormat , then compression flag is set to true and vice versa.
          Hide
          He Yongqiang added a comment -

          If a table's fileformat is seqfile, it does not mean the data is compressed.
          I guess a better way to populate this information is from the MoveTask. eg a 'insert overwrite ...' normally has 2 taks, one is the mapreduce task, which use the compression configuration, and the other is the load task which is a data load task. And i think they share the same conf.

          But not sure about how compression info for queries like "load data local inpath ''".

          We can focus on correcting the "insert overwrite " in this task.

          Show
          He Yongqiang added a comment - If a table's fileformat is seqfile, it does not mean the data is compressed. I guess a better way to populate this information is from the MoveTask. eg a 'insert overwrite ...' normally has 2 taks, one is the mapreduce task, which use the compression configuration, and the other is the load task which is a data load task. And i think they share the same conf. But not sure about how compression info for queries like "load data local inpath ''". We can focus on correcting the "insert overwrite " in this task.
          Hide
          Harsh J added a comment -

          If we don't really make use of the IS_COMPRESSED attribute of a table, should we just get rid of it (or at least not print it in the describe extended/formatted output, which causes great confusion as it is always certainly No)?

          Show
          Harsh J added a comment - If we don't really make use of the IS_COMPRESSED attribute of a table, should we just get rid of it (or at least not print it in the describe extended/formatted output, which causes great confusion as it is always certainly No )?

            People

            • Assignee:
              subramanian raghunathan
              Reporter:
              Travis Powell
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development