Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25505

Incorrect results with header. skip.header.line.count if first line is blank

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      aAtable with header. skip.header.line.count=1 does not skip the first line if it is blank, except in a fetch task.

      To reproduce, create a csv table, ans set header. skip.header.line.count=1 in table properties.

      In the table location, create a single file, with a blank (empty) first line, and say 2 further lines.

      If you do a select * on it, you see 2 rows (correct)
      If you do select count on it, you get 3 (incorrect)

      CREATE EXTERNAL TABLE `testcase1`(id int, name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
        LOCATION '${system:test.tmp.dir}/testcase1'
        TBLPROPERTIES ("skip.header.line.count"="1");
      
      SET hive.fetch.task.conversion = more;
      select * from testcase1;
      select count(*) from testcase1;
      
      
      set hive.fetch.task.conversion=none;
      select * from testcase1;
      select count(*) from testcase1;
      
      Test file:
      
      1,2019-12-31
      2,2019-12-31
      3,2019-12-31
      
      
      
      Should both yield (with the above test file):
      #### A masked pattern was here ####
      1	2019-12-31
      2	2019-12-31
      3	2019-12-31
      
      3
      
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            pgaref Panagiotis Garefalakis Assign to me
            scarlin Steve Carlin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 1h 50m
              1h 50m

              Slack

                Issue deployment