Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25505

Incorrect results with header. skip.header.line.count if first line is blank

    XMLWordPrintableJSON

Details

    Description

      aAtable with header. skip.header.line.count=1 does not skip the first line if it is blank, except in a fetch task.

      To reproduce, create a csv table, ans set header. skip.header.line.count=1 in table properties.

      In the table location, create a single file, with a blank (empty) first line, and say 2 further lines.

      If you do a select * on it, you see 2 rows (correct)
      If you do select count on it, you get 3 (incorrect)

      CREATE EXTERNAL TABLE `testcase1`(id int, name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
        LOCATION '${system:test.tmp.dir}/testcase1'
        TBLPROPERTIES ("skip.header.line.count"="1");
      
      SET hive.fetch.task.conversion = more;
      select * from testcase1;
      select count(*) from testcase1;
      
      
      set hive.fetch.task.conversion=none;
      select * from testcase1;
      select count(*) from testcase1;
      
      Test file:
      
      1,2019-12-31
      2,2019-12-31
      3,2019-12-31
      
      
      
      Should both yield (with the above test file):
      #### A masked pattern was here ####
      1	2019-12-31
      2	2019-12-31
      3	2019-12-31
      
      3
      
      

      Attachments

        Issue Links

          Activity

            People

              pgaref Panagiotis Garefalakis
              scarlin Steve Carlin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m