Details
Description
When selecting the date_tbl, I got the following warnings:
[localhost:21050] functional_orc_def> select * from date_tbl order by id_col; Query: select * from date_tbl order by id_col Query submitted at: 2022-02-20 11:19:36 (Coordinator: http://quanlong-OptiPlex-BJ:25000) Query progress can be monitored at: http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=a14cc5049351c48a:703197c000000000 +--------+------------+------------+ | id_col | date_col | date_part | +--------+------------+------------+ | 0 | NULL | 0001-01-01 | | 1 | 0001-12-29 | 0001-01-01 | | 2 | 0001-12-30 | 0001-01-01 | | 3 | 1400-01-08 | 0001-01-01 | | 4 | 2017-11-28 | 0001-01-01 | | 5 | 9999-12-31 | 0001-01-01 | | 6 | NULL | 0001-01-01 | | 10 | 2017-11-28 | 1399-06-27 | | 11 | NULL | 1399-06-27 | | 12 | 2018-12-31 | 1399-06-27 | | 20 | 0001-06-19 | 2017-11-27 | | 21 | 0001-06-20 | 2017-11-27 | | 22 | 0001-06-21 | 2017-11-27 | | 23 | 0001-06-22 | 2017-11-27 | | 24 | 0001-06-23 | 2017-11-27 | | 25 | 0001-06-24 | 2017-11-27 | | 26 | 0001-06-25 | 2017-11-27 | | 27 | 0001-06-26 | 2017-11-27 | | 28 | 0001-06-27 | 2017-11-27 | | 29 | 2017-11-28 | 2017-11-27 | | 30 | 9999-12-01 | 9999-12-31 | | 31 | 9999-12-31 | 9999-12-31 | +--------+------------+------------+ WARNINGS: ORC file 'hdfs://localhost:20500/test-warehouse/managed/date_tbl_orc_def/date_part=0001-01-01/base_0000005/bucket_00000_0' column '8' contains an out of range date. The valid date range is 0001-01-01..9999-12-31.
This table has only 3 columns. It's unclear to users what column '8' is. Actually, 8 is the orc column type id which is not the column index in the table schema. (This table is ACID-enabled).
The warnings for out of range timestamps from the orc scanner has the same issue.
Parquet scanner produces a better warning with the column name. We should improve this in the orc scanner.