Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.4.0, 3.2.0, 4.0.0
Description
int headerCount = 0; int footerCount = 0; if (table != null) { headerCount = Utilities.getHeaderCount(table); footerCount = Utilities.getFooterCount(table, conf); if (headerCount != 0 || footerCount != 0) { // Input file has header or footer, cannot be splitted. HiveConf.setLongVar(conf, ConfVars.MAPREDMINSPLITSIZE, Long.MAX_VALUE); } }
this piece of code makes the CSV (or any text files with header/footer) files not splittable if header or footer is present.
If only header is present, we can find the offset after first line break and use that to split. Similarly for footer, may be read few KB's of data at the end and find the last line break offset. Use that to determine the data range which can be used for splitting. Few reads during split generation are cheaper than not splitting the file at all.
Attachments
Attachments
Issue Links
- causes
-
HIVE-26284 ClassCastException: java.io.PushbackInputStream cannot be cast to org.apache.hadoop.fs.Seekable when table properties contains 'skip.header.line.count' = '1' and datafiles are in UTF-16 encoding
- Open
- contains
-
HIVE-21951 Llap query on external table with header or footer returns incorrect row count.
- Resolved
- Is contained by
-
HIVE-26751 Bug Fixes and Improvements for 3.2.0 release
- Open
- is fixed by
-
HIVE-24224 Fix skipping header/footer for Hive on Tez on compressed files
- Closed
-
HIVE-24381 Compressed text input returns 0 rows if skip header/footer is mentioned
- Closed
-
HIVE-22769 Incorrect query results and query failure during split generation for compressed text files
- Closed
- links to