[DRILL-6880] Hash-Join: Many null keys on the build side form a long linked chain in the Hash Table - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 1.14.0
Fix Version/s: 1.16.0
Component/s: Execution - Relational Operators
Labels:
None

Description

When building the Hash Table for the Hash-Join, each new key is matched with an existing key (same bucket) by calling the generated method `isKeyMatchInternalBuild`, which compares the two. However when both keys are null, the method returns false (meaning not-equal; i.e. it is a new key), thus the new key is added into the list following the old key. When a third null key is found, it would be matched with the prior two, and added as well. Etc etc ...

This way many null values would perform checks at order N^2 / 2.

Suggested improvement: The generated code should return a third result, meaning "two null keys". Then in case of Inner or Left joins all the duplicate nulls can be discarded.

Below is a simple example, note the time difference between non-null and the all-nulls tables (also instrumentation showed that for nulls, the method above was called 1249975000 times!!)

0: jdbc:drill:zk=local> use dfs.tmp;
0: jdbc:drill:zk=local> create table testNull as (select cast(null as int) mycol from 
 dfs.`/data/test128M.tbl` limit 50000);
0: jdbc:drill:zk=local> create table test1 as (select cast(1 as int) mycol1 from 
 dfs.`/data/test128M.tbl` limit 60000);
0: jdbc:drill:zk=local> create table test2 as (select cast(2 as int) mycol2 from dfs.`/data/test128M.tbl` limit 50000);
0: jdbc:drill:zk=local> select count(*) from test1 join test2 on test1.mycol1 = test2.mycol2;
+---------+
| EXPR$0  |
+---------+
| 0       |
+---------+
1 row selected (0.443 seconds)
0: jdbc:drill:zk=local> select count(*) from test1 join testNull on test1.mycol1 = testNull.mycol;
+---------+
| EXPR$0  |
+---------+
| 0       |
+---------+
1 row selected (140.098 seconds)

Attachments

Issue Links

links to

GitHub Pull Request #1598

Activity

People

Assignee:: Boaz Ben-Zvi

Reporter:: Boaz Ben-Zvi

Reviewer:: Aman Sinha

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 05/Dec/18 04:34

Updated:: 15/Jan/19 02:22

Resolved:: 14/Jan/19 23:44