Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-4586

UPSERT SELECT doesn't take in account comparison operators for subqueries.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 4.14.0
    • Fix Version/s: 4.14.0
    • Labels:
      None

      Description

      If upsert select has a where condition that is using any comparison operator (including ANY/SOME/etc), the whole WHERE clause just ignored. Table:

      create table T (id integer primary key, i1 integer);
      upsert into T values (1,1);
      upsert into T values (2,2);
      

      Query that should not upsert anything because we have a condition in where that I1 should be greater than any value we already have as well as not existing ID:

      0: jdbc:phoenix:> upsert into T select id, 4 from T where id = 3 AND i1 > (select i1 from T);
      2 rows affected (0.02 seconds)
      0: jdbc:phoenix:> select * from T;
      +-----+-----+
      | ID  | I1  |
      +-----+-----+
      | 1   | 4   |
      | 2   | 4   |
      +-----+-----+
      2 rows selected (0.014 seconds)
      

      Now with ANY. Should not upsert anything as well because ID is [1,2], while I1 are all '4':

      0: jdbc:phoenix:> upsert into T select id, 5 from T where id = 2 AND i1 = ANY (select ID from T);
      2 rows affected (0.016 seconds)
      0: jdbc:phoenix:> select * from T;
      +-----+-----+
      | ID  | I1  |
      +-----+-----+
      | 1   | 5   |
      | 2   | 5   |
      +-----+-----+
      2 rows selected (0.013 seconds)
      

      A similar query with IN works just fine:

      0: jdbc:phoenix:> upsert into T select id, 6 from T where id = 2 AND i1 IN (select ID from T);
      No rows affected (0.094 seconds)
      0: jdbc:phoenix:> select * from T;
      +-----+-----+
      | ID  | I1  |
      +-----+-----+
      | 1   | 5   |
      | 2   | 5   |
      +-----+-----+
      2 rows selected (0.014 seconds)
      

      The reason for this behavior is that for IN we convert subselect to semi-join and execute upsert on the client side. For comparisons, we don't perform any transformations and query is considered flat and finally executed on the server side. Not sure why, but we also completely ignore the second condition in WHERE clause as well and that may lead to a serious data loss.
      James Taylor, Maryann Xue any thoughts or suggestions how to fix that are really appreciated.

        Attachments

          Activity

            People

            • Assignee:
              maryannxue Maryann Xue
              Reporter:
              sergey.soldatov Sergey Soldatov
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: