Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
Impala 1.4, Impala 2.0, Impala 2.1, Impala 2.2
-
None
Description
When I run a query over a 4 billion row table that returns a single row, it takes ~30 seconds if i do 'select * ...'. It takes only 3 seconds if I do a 'select field1, field2 ...'. This is repeatable.
Given these times, it would seem that the 'select *' query is materializing all the fields for rows whether they match or not.
Lazy materialization of columns when they are needed could improve performance.
These four queries were run back to back. The actual returned data is elided (sorry). The table has 35 fields.
0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure> select * from events where event_id=1416403791; <elided> 1 row selected (33.777 seconds) 0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure> select event_id, client_id from events where event_id=1416403791; +-------------+------------+--+ | event_id | client_id | +-------------+------------+--+ | 1416403791 | <elided> | +-------------+------------+--+ 1 row selected (3.363 seconds) 0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure> select * from events where event_id=1416403791; <elided> 1 row selected (33.138 seconds) 0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure> select event_id, client_id from events where event_id=1416403791; +-------------+------------+--+ | event_id | client_id | +-------------+------------+--+ | 1416403791 | <elided> | +-------------+------------+--+ 1 row selected (3.074 seconds) 0: jdbc:hive2://atl1c1r2data09.vldb-bo.secure>
Attachments
Issue Links
- is blocked by
-
IMPALA-2736 Column-wise value materialisation in Parquet scanner
- Resolved
- is duplicated by
-
IMPALA-3052 Reorder Parquet Column readers such that slots with probe filters are read first
- Resolved
- is related to
-
IMPALA-3841 Avoid materializing nested collections if top-level predicates already disqualify the row.
- Open
-
IMPALA-8077 Avoid converting timestamps in dropped rows during Parquet scanning
- Resolved
- relates to
-
IMPALA-9810 Support Kudu's columnar scan format (Apache Arrow)
- Open