Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
impala 2.3
-
None
-
CDH 5.5.2/ Impala 2.3
Parquet table with a timestamp column
Secure cluster
convert_legacy_hive_parquet_utc_timestamps=true
Timestamp column is not being filtered on
Description
Enabling convert_legacy_hive_parquet_utc_timestamps=true
makes simple queries that don't even filter on a timestamp attribute perform really poorly.
Parquet table.
Impala 2.3 / CDH 5.5.2.
convert_legacy_hive_parquet_utc_timestamps=true makes following simple query 30x slower (1.1minutes -> over 30 minutes).
select * from parquet_table_with_a_timestamp_attribute where bigint_attribute=1000771658169
Notice I did not even filter on a timestamp attribute.
Made multiple tests with and without convert_legacy_hive_parquet_utc_timestamps=true impalad present.
Also, from https://issues.cloudera.org/browse/IMPALA-1658
Casey Ching added a comment - 15/Jun/15 5:12 PM
Btw, a perf test showed enabling this flag was 10x slower.
Attachments
Attachments
Issue Links
- relates to
-
IMPALA-3307 add support for IANA time zone database
- Resolved
-
IMPALA-1773 Implement TIMESTAMP WITH TIME ZONE data type
- Open
-
IMPALA-2125 Improve perf when reading timestamps from parquet files written by hive
- Closed
-
IMPALA-1658 Add a compatibility option for reading parquet timestamps written by Hive
- Resolved
- links to