Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-4586

UPSERT SELECT doesn't take in account comparison operators for subqueries.

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 4.14.0
    • 4.14.0
    • None
    • None

    Description

      If upsert select has a where condition that is using any comparison operator (including ANY/SOME/etc), the whole WHERE clause just ignored. Table:

      create table T (id integer primary key, i1 integer);
      upsert into T values (1,1);
      upsert into T values (2,2);
      

      Query that should not upsert anything because we have a condition in where that I1 should be greater than any value we already have as well as not existing ID:

      0: jdbc:phoenix:> upsert into T select id, 4 from T where id = 3 AND i1 > (select i1 from T);
      2 rows affected (0.02 seconds)
      0: jdbc:phoenix:> select * from T;
      +-----+-----+
      | ID  | I1  |
      +-----+-----+
      | 1   | 4   |
      | 2   | 4   |
      +-----+-----+
      2 rows selected (0.014 seconds)
      

      Now with ANY. Should not upsert anything as well because ID is [1,2], while I1 are all '4':

      0: jdbc:phoenix:> upsert into T select id, 5 from T where id = 2 AND i1 = ANY (select ID from T);
      2 rows affected (0.016 seconds)
      0: jdbc:phoenix:> select * from T;
      +-----+-----+
      | ID  | I1  |
      +-----+-----+
      | 1   | 5   |
      | 2   | 5   |
      +-----+-----+
      2 rows selected (0.013 seconds)
      

      A similar query with IN works just fine:

      0: jdbc:phoenix:> upsert into T select id, 6 from T where id = 2 AND i1 IN (select ID from T);
      No rows affected (0.094 seconds)
      0: jdbc:phoenix:> select * from T;
      +-----+-----+
      | ID  | I1  |
      +-----+-----+
      | 1   | 5   |
      | 2   | 5   |
      +-----+-----+
      2 rows selected (0.014 seconds)
      

      The reason for this behavior is that for IN we convert subselect to semi-join and execute upsert on the client side. For comparisons, we don't perform any transformations and query is considered flat and finally executed on the server side. Not sure why, but we also completely ignore the second condition in WHERE clause as well and that may lead to a serious data loss.
      James R. Taylor, Wei Xue any thoughts or suggestions how to fix that are really appreciated.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            maryannxue Wei Xue
            sergey.soldatov Sergey Soldatov
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment