-
Type:
Bug
-
Status: Resolved
-
Priority:
Minor
-
Resolution: Fixed
-
Affects Version/s: impala 2.3
-
Fix Version/s: Impala 3.1.0
-
Component/s: Backend
-
Labels:None
-
Environment:CDH 5.5.2/ Impala 2.3
Parquet table with a timestamp column
Secure cluster
convert_legacy_hive_parquet_utc_timestamps=true
Timestamp column is not being filtered on
-
Target Version:
Enabling convert_legacy_hive_parquet_utc_timestamps=true
makes simple queries that don't even filter on a timestamp attribute perform really poorly.
Parquet table.
Impala 2.3 / CDH 5.5.2.
convert_legacy_hive_parquet_utc_timestamps=true makes following simple query 30x slower (1.1minutes -> over 30 minutes).
select * from parquet_table_with_a_timestamp_attribute where bigint_attribute=1000771658169
Notice I did not even filter on a timestamp attribute.
Made multiple tests with and without convert_legacy_hive_parquet_utc_timestamps=true impalad present.
Also, from https://issues.cloudera.org/browse/IMPALA-1658
Casey Ching added a comment - 15/Jun/15 5:12 PM
Btw, a perf test showed enabling this flag was 10x slower.
- relates to
-
IMPALA-3307 add support for IANA time zone database
-
- Resolved
-
-
IMPALA-1773 Implement TIMESTAMP WITH TIME ZONE data type
-
- Open
-
-
IMPALA-2125 Improve perf when reading timestamps from parquet files written by hive
-
- Closed
-
-
IMPALA-1658 Add a compatibility option for reading parquet timestamps written by Hive
-
- Resolved
-
- links to