Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5725

coalesce() not being fully applied with outer joins on kudu tables

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      SELECT COALESCE() on multiple arguments spanning different tables with an OUTER JOIN on Kudu tables is not properly being applied. This behavior is

      1. different relative to Kudu tables in 2.9
      2. different relative to the 2.10 behavior with HDFS, seemingly making this Kudu-specific
      3. different from Postgres, which matches the HDFS behavior, further making this seem Kudu-specific

      Consider this query:

      USE tpch_kudu;
      SELECT
      COALESCE(a2.n_nationkey, a1.p_size),
      a2.n_nationkey,
      a1.p_size
      FROM part a1
      LEFT JOIN nation a2 ON (a1.p_size) = (a2.n_nationkey);
      

      Some of the rows returned include:

      +-------------------------------------+-------------+--------+
      | coalesce(a2.n_nationkey, a1.p_size) | n_nationkey | p_size |
      +-------------------------------------+-------------+--------+
      [snip]
      | 21                                  | 21          | 21     |
      | 22                                  | 22          | 22     |
      | 23                                  | 23          | 23     |
      | 24                                  | 24          | 24     |
      | NULL                                | NULL        | 25     |
      | NULL                                | NULL        | 26     |
      | NULL                                | NULL        | 27     |
      [snip]
      

      The COALESCE() column is not returning the value of p_size when its first argument, n_nationkey is NULL. tpch_kudu.nation n_nationkey has values between 0 and 24, hence the NULL values in that column when part.p_size is greater.

      This goes away if you keep the query above but switch the ordering of the COALESCE() arguments.

      I can see the same sort of problems if I write similar RIGHT or FULL OUTER JOIN queries:

      USE tpch_kudu;
      SELECT
      DISTINCT
      COALESCE(a2.n_nationkey, a1.p_size),
      a2.n_nationkey,
      a1.p_size
      FROM part a1
      FULL OUTER JOIN nation a2 ON (a1.p_size) = (a2.n_nationkey)
      ORDER BY 1,2,3;
      
      USE tpch_kudu;
      SELECT
      DISTINCT
      COALESCE(a2.n_nationkey, a1.p_size),
      a2.n_nationkey,
      a1.p_size
      FROM nation a2
      RIGHT JOIN part a1 ON (a1.p_size) = (a2.n_nationkey)
      ORDER BY 1,2,3;
      

      Explain-level 2 plans and profiles will be attached.

      Attachments

        1. profile-2.9.txt
          41 kB
          Michael Brown
        2. profile-2.10.txt
          41 kB
          Michael Brown
        3. explain-l2-2.9.txt
          2 kB
          Michael Brown
        4. explain-l2-2.10.txt
          3 kB
          Michael Brown

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            twmarshall Thomas Tauber-Marshall
            mikeb Michael Brown
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment