Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6880

Hash-Join: Many null keys on the build side form a long linked chain in the Hash Table

    XMLWordPrintableJSON

    Details

      Description

      When building the Hash Table for the Hash-Join, each new key is matched with an existing key (same bucket) by calling the generated method `isKeyMatchInternalBuild`, which compares the two. However when both keys are null, the method returns false (meaning not-equal; i.e. it is a new key), thus the new key is added into the list following the old key. When a third null key is found, it would be matched with the prior two, and added as well. Etc etc ...

      This way many null values would perform checks at order N^2 / 2.

      Suggested improvement: The generated code should return a third result, meaning "two null keys". Then in case of Inner or Left joins all the duplicate nulls can be discarded.

      Below is a simple example, note the time difference between non-null and the all-nulls tables (also instrumentation showed that for nulls, the method above was called 1249975000 times!!)

      0: jdbc:drill:zk=local> use dfs.tmp;
      0: jdbc:drill:zk=local> create table testNull as (select cast(null as int) mycol from 
       dfs.`/data/test128M.tbl` limit 50000);
      0: jdbc:drill:zk=local> create table test1 as (select cast(1 as int) mycol1 from 
       dfs.`/data/test128M.tbl` limit 60000);
      0: jdbc:drill:zk=local> create table test2 as (select cast(2 as int) mycol2 from dfs.`/data/test128M.tbl` limit 50000);
      0: jdbc:drill:zk=local> select count(*) from test1 join test2 on test1.mycol1 = test2.mycol2;
      +---------+
      | EXPR$0  |
      +---------+
      | 0       |
      +---------+
      1 row selected (0.443 seconds)
      0: jdbc:drill:zk=local> select count(*) from test1 join testNull on test1.mycol1 = testNull.mycol;
      +---------+
      | EXPR$0  |
      +---------+
      | 0       |
      +---------+
      1 row selected (140.098 seconds)
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ben-zvi Boaz Ben-Zvi
                Reporter:
                ben-zvi Boaz Ben-Zvi
                Reviewer:
                Aman Sinha
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: