Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5725

coalesce() not being fully applied with outer joins on kudu tables

    XMLWordPrintableJSON

Details

    Description

      SELECT COALESCE() on multiple arguments spanning different tables with an OUTER JOIN on Kudu tables is not properly being applied. This behavior is

      1. different relative to Kudu tables in 2.9
      2. different relative to the 2.10 behavior with HDFS, seemingly making this Kudu-specific
      3. different from Postgres, which matches the HDFS behavior, further making this seem Kudu-specific

      Consider this query:

      USE tpch_kudu;
      SELECT
      COALESCE(a2.n_nationkey, a1.p_size),
      a2.n_nationkey,
      a1.p_size
      FROM part a1
      LEFT JOIN nation a2 ON (a1.p_size) = (a2.n_nationkey);
      

      Some of the rows returned include:

      +-------------------------------------+-------------+--------+
      | coalesce(a2.n_nationkey, a1.p_size) | n_nationkey | p_size |
      +-------------------------------------+-------------+--------+
      [snip]
      | 21                                  | 21          | 21     |
      | 22                                  | 22          | 22     |
      | 23                                  | 23          | 23     |
      | 24                                  | 24          | 24     |
      | NULL                                | NULL        | 25     |
      | NULL                                | NULL        | 26     |
      | NULL                                | NULL        | 27     |
      [snip]
      

      The COALESCE() column is not returning the value of p_size when its first argument, n_nationkey is NULL. tpch_kudu.nation n_nationkey has values between 0 and 24, hence the NULL values in that column when part.p_size is greater.

      This goes away if you keep the query above but switch the ordering of the COALESCE() arguments.

      I can see the same sort of problems if I write similar RIGHT or FULL OUTER JOIN queries:

      USE tpch_kudu;
      SELECT
      DISTINCT
      COALESCE(a2.n_nationkey, a1.p_size),
      a2.n_nationkey,
      a1.p_size
      FROM part a1
      FULL OUTER JOIN nation a2 ON (a1.p_size) = (a2.n_nationkey)
      ORDER BY 1,2,3;
      
      USE tpch_kudu;
      SELECT
      DISTINCT
      COALESCE(a2.n_nationkey, a1.p_size),
      a2.n_nationkey,
      a1.p_size
      FROM nation a2
      RIGHT JOIN part a1 ON (a1.p_size) = (a2.n_nationkey)
      ORDER BY 1,2,3;
      

      Explain-level 2 plans and profiles will be attached.

      Attachments

        1. explain-l2-2.10.txt
          3 kB
          Michael Brown
        2. explain-l2-2.9.txt
          2 kB
          Michael Brown
        3. profile-2.10.txt
          41 kB
          Michael Brown
        4. profile-2.9.txt
          41 kB
          Michael Brown

        Issue Links

          Activity

            People

              twmarshall Thomas Tauber-Marshall
              mikeb Michael Brown
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: