Description
Certain data types require conversion and/or validation when read from a Parquet file. For example, timestamps can require conversion to account for different storage offsets. Char/varchar fields can require conversion to handle lengths and space padding. Timestamps require validation, because not all bit combinations are valid timestamps.
Right now, this is done per element as it is read. For dictionary encoded columns, it would save processing to do the conversion/validation once at dictionary construction.
Attachments
Issue Links
- relates to
-
IMPALA-5050 Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS to the parquet scanner
- Resolved