[IMPALA-8423] Add rule to remove useless SELECT node - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: In Progress
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Frontend
Labels:
- performance

Epic Color:
ghx-label-6

Description

We can add some rules to optimize the plan after we chose a cheapest plan based on cost. For example, one useful rule can be "removing useless SELECT nodes".

Impala will generated a useless SELECT for the following query:

SELECT t.id, t.int_col
FROM functional.alltypestiny t
LEFT JOIN
  (SELECT id, int_col
  FROM functional.alltypestiny) t2
ON (t.id = t2.id)
WHERE t.int_col = t.id
UNION ALL
VALUES (NULL, NULL)

Its single node plan is

PLAN-ROOT SINK
|
00:UNION
|  constant-operands=1
|  row-size=8B cardinality=1
|
04:SELECT
|  predicates: t.id = t.int_col
|  row-size=12B cardinality=0
|
03:HASH JOIN [RIGHT OUTER JOIN]
|  hash predicates: id = t.id
|  runtime filters: RF000 <- t.id
|  row-size=12B cardinality=1
|
|--01:SCAN HDFS [functional.alltypestiny t]
|     HDFS partitions=4/4 files=4 size=460B
|     predicates: t.int_col = t.id
|     row-size=8B cardinality=1
|
02:SCAN HDFS [functional.alltypestiny]
   HDFS partitions=4/4 files=4 size=460B
   runtime filters: RF000 -> id
   row-size=4B cardinality=8

The SELECT node (id=04) is useless since its only predicate "t.id = t.int_col" has been enforced in the SCAN node (id=01) which is the right hand side of the RIGHT OUTER JOIN. The SELECT node won't filter out any more rows.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Quanlong Huang

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 16/Apr/19 13:06

Updated:: 23/Feb/23 07:54