[CALCITE-6640] RelMdUniqueKeys generates non-minimal keys when columns are repeated in projections - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.39.0
Component/s: core
Labels:
- pull-request-available

Description

Consider the following table where empno is a unique key column.

CREATE TABLE emp (
 empno INT, 
 ename VARCHAR, 
 job VARCHAR
 PRIMARY KEY (empno));

The results of RelMetadataQuery#getUniqueKeys for the following queries are as follows:

SELECT empno FROM emp;
{0}
SELECT ename, empno FROM emp;
{1} 
SELECT empno, ename, empno FROM emp;
{0}, {2}, {0, 2}
SELECT empno, ename, empno, empno FROM emp;
{0}, {2}, {3}, {0, 2}, {0 3}, {2, 3}, {0, 2, 3}

When key columns are repeated in the project the result grows exponentially. This makes the unique key computation very expensive when there are many keys or when keys are repeated multiple times. The problem can lead to OOM errors and queries/rules hanging forever while trying to extract the keys.

Observe, that the results above are not minimal so currently we are creating and returning a lot of redundant information.

{0}, {2}, {3}, {0, 2}, {0 3}, {2, 3}, {0, 2, 3}

If we know that {0}, {2}, and {3} are unique keys individually then any superset of those is also a unique key so it is sufficient to return just those.

Attachments

Issue Links

is caused by

CALCITE-3666 Refine RelMdUniqueKeys and RelMdColumnUniqueness for Calc

Closed

relates to

HIVE-28582 OOM when compiling query with many GROUP BY columns aliased multiple times

Open

CALCITE-6704 Limit result size of RelMdUniqueKeys handler

Resolved

links to

GitHub Pull Request #4013

Activity

People

Assignee:: Stamatis Zampetakis

Reporter:: Stamatis Zampetakis

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 23/Oct/24 09:24

Updated:: 25/Nov/24 13:36

Resolved:: 25/Nov/24 13:13