Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-5795

Hive should be able to skip header and footer rows when reading data file for a table

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.13.0
    • None
    • Hide
      hive.file.max.footer
        Default Value: 100
        Max number of lines of footer user can set for a table file.
      skip.header.line.count
        Default Value: 0
        Number of header lines for the table file.
      skip.footer.line.count
        Default Value: 0
        Number of footer lines for the table file.

      "skip.footer.line.count" and "skip.header.line.count" should be specified in the table property during creating the table. Following example shows the usage of these two properties:

      Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="2");
      Show
      hive.file.max.footer   Default Value: 100   Max number of lines of footer user can set for a table file. skip.header.line.count   Default Value: 0   Number of header lines for the table file. skip.footer.line.count   Default Value: 0   Number of footer lines for the table file. "skip.footer.line.count" and "skip.header.line.count" should be specified in the table property during creating the table. Following example shows the usage of these two properties: Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="2");

    Description

      Hive should be able to skip header and footer lines when reading data file from table. In this way, user don't need to processing data which generated by other application with a header or footer and directly use the file for table operations.
      To implement this, the idea is adding new properties in table descriptions to define the number of lines in header and footer and skip them when reading the record from record reader. An DDL example for creating a table with header and footer should be like this:

      Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="2");
      

      Attachments

        1. HIVE-5795.1.patch
          33 kB
          Shuaishuai Nie
        2. HIVE-5795.2.patch
          39 kB
          Shuaishuai Nie
        3. HIVE-5795.3.patch
          42 kB
          Shuaishuai Nie
        4. HIVE-5795.4.patch
          44 kB
          Shuaishuai Nie
        5. HIVE-5795.5.patch
          42 kB
          Shuaishuai Nie

        Issue Links

          Activity

            People

              shuainie Shuaishuai Nie
              shuainie Shuaishuai Nie
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: