Details

    Description

      Compressed file with Hive on Tez returns header and footers - for both select * and select count ( * ):

      printf "offset,id,other\n9,\"20200315 X00 1356\",123\n17,\"20200315 X00 1357\",123\nrst,rst,rst" > data.csv
      hdfs dfs -put -f data.csv /apps/hive/warehouse/bz2test/bz2tbl1/
      bzip2 -f data.csv 
      hdfs dfs -put -f data.csv.bz2 /apps/hive/warehouse/bz2test/bz2tbl2/
      
      beeline -e "CREATE EXTERNAL TABLE default.bz2tst2 (
        sequence   int,
        id         string,
        other      string) 
      ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
      LOCATION '/apps/hive/warehouse/bz2test/bz2tbl2' 
      TBLPROPERTIES (
        'skip.header.line.count'='1',
        'skip.footer.line.count'='1');"
      
      beeline -e "
        SET hive.fetch.task.conversion = none;
        SELECT * FROM default.bz2tst2;"
      +-------------------+--------------------+----------------+
      | bz2tst2.sequence  |     bz2tst2.id     | bz2tst2.other  |
      +-------------------+--------------------+----------------+
      | offset            | id                 | other          |
      | 9                 | 20200315 X00 1356  | 123            |
      | 17                | 20200315 X00 1357  | 123            |
      | rst               | rst                | rst            |
      +-------------------+--------------------+----------------+
      

      PS: HIVE-22769 addressed the issue for Hive on LLAP.

      Attachments

        Issue Links

          Activity

            People

              pgaref Panagiotis Garefalakis
              pgaref Panagiotis Garefalakis
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m