Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.9.0
Description
There are scenarios when you (1) want to reduce the number of partitions by hash&mod or part of a date, etc and (2) do not want to expose this complexity to end users who do not know how the table might be partitioned.
As an example: Say I have a column that has 1000 values and I want to partition on customer_id, but the data is too small for 1000 partitions so I want to mod the customer_id to put it into 100 'buckets', to give me a 100x faster scan when selecting a single customer_id.
When using 3rd party tools or exposing this table to end users who may not be educated on the partitioning scheme, those tools will not properly filter based on simply selecting a specific customer_id.
If this is my partitioned table:
CREATE TABLE default.lesspartitions ( customer_id INT, customer_name STRING, some_data STRING ) PARTITIONED BY ( partition_id INT )
...
select customer_id, customer_name, some_data, customer_id % 100 as partition_id from source
It would be nice to create a view where the partition is hidden from users and filtered on automatically:
CREATE VIEW default.hiddenpartitions AS SELECT customer_id, customer_name, some_data FROM default.lesspartitions WHERE partition_id = customer_id % 100
So then a user/3rd party tool can write the following:
select * from hiddenpartitions where customer_id = 4
This jira would adding the 'constant propagation' of the filter into the view so that only 1 partition is read.
Attachments
Issue Links
- is a child of
-
IMPALA-5180 Non-deterministic exprs without slot refs cause HDFS query failure
- Resolved