Hive
  1. Hive
  2. HIVE-5795

Hive should be able to skip header and footer rows when reading data file for a table

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
    • Release Note:
      Hide
      hive.file.max.footer
        Default Value: 100
        Max number of lines of footer user can set for a table file.
      skip.header.line.count
        Default Value: 0
        Number of header lines for the table file.
      skip.footer.line.count
        Default Value: 0
        Number of footer lines for the table file.

      "skip.footer.line.count" and "skip.header.line.count" should be specified in the table property during creating the table. Following example shows the usage of these two properties:

      Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="2");
      Show
      hive.file.max.footer   Default Value: 100   Max number of lines of footer user can set for a table file. skip.header.line.count   Default Value: 0   Number of header lines for the table file. skip.footer.line.count   Default Value: 0   Number of footer lines for the table file. "skip.footer.line.count" and "skip.header.line.count" should be specified in the table property during creating the table. Following example shows the usage of these two properties: Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="2");

      Description

      Hive should be able to skip header and footer lines when reading data file from table. In this way, user don't need to processing data which generated by other application with a header or footer and directly use the file for table operations.
      To implement this, the idea is adding new properties in table descriptions to define the number of lines in header and footer and skip them when reading the record from record reader. An DDL example for creating a table with header and footer should be like this:

      Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="2");
      
      1. HIVE-5795.5.patch
        42 kB
        Shuaishuai Nie
      2. HIVE-5795.4.patch
        44 kB
        Shuaishuai Nie
      3. HIVE-5795.3.patch
        42 kB
        Shuaishuai Nie
      4. HIVE-5795.2.patch
        39 kB
        Shuaishuai Nie
      5. HIVE-5795.1.patch
        33 kB
        Shuaishuai Nie

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Shuaishuai Nie
              Reporter:
              Shuaishuai Nie
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development