Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.38.0
Description
In some cases, the number of unique keys grows exponentially for certain relational expressions.
Consider a table with a composite key {k1, k2, k3} and the following query:
SELECT k1, k1, k2, k2, k3, k3 FROM composite
The number of minimal unique keys for this query is 2^3:
{0, 2, 4}, {0, 3, 4}, {1, 2, 4}, {1, 3, 4}, {0, 2, 5}, {0, 3, 5}, {1, 2, 5}, {1, 3, 5}
In this case, the size of minimal keys is exponential to the number of columns in the composite key. The number of keys is f(x,y) = x^y where x is the number of repetitions of each column and y is the number of columns in the composite key. The example is taken out from CALCITE-6640.
A large number of unique keys may occur in other use-cases as well. Computing the entire result set may become prohibitively expensive and cause OOM and other failures.
To prevent this from happening we propose to add a new parameter to the UniqueKeys interface allowing the users to specify a limit on the size of the result set.
Attachments
Issue Links
- is related to
-
HIVE-28582 OOM when compiling query with many GROUP BY columns aliased multiple times
- Open
-
CALCITE-6640 RelMdUniqueKeys generates non-minimal keys when columns are repeated in projections
- Resolved
- links to