Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46446

Correctness bug in correlated subquery with OFFSET

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete CommentsDelete
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.0
    • 4.0.0
    • SQL

    Description

      Subqueries with correlation under LIMIT with OFFSET have a correctness bug, introduced recently when support for correlation under OFFSET was enabled but were not handled correctly. (So we went from unsupported, query throws error -> wrong results.)

      It’s a bug in all types of correlated subqueries: scalar, lateral, IN, EXISTS

       

      It's easy to repro with a query like

      create table x(x1 int, x2 int);
      insert into x values (1, 1), (2, 2);
      create table y(y1 int, y2 int);
      insert into y values (1, 1), (1, 2), (2, 4);
      
      
      select * from x where exists (select * from y where x1 = y1 limit 1 offset 2)

      Correct result: empty set, see postgres: https://www.db-fiddle.com/f/dtXNn7hwDnemiCTUhvwgYM/0 

      Spark result: Array([2,2])

       

      The PR where it was introduced added a test for it, but the golden file results for the test actually were incorrect and we didn't notice. (The bug was initially found by https://github.com/apache/spark/pull/44084)

      I'll work on both:

      • Adding support for offset in DecorrelateInnerQuery (the transformation is into a filter on row_number window function, similar to limit).
      • Adding a feature flag to enable/disable offset in subquery support

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jchen5 Jack Chen Assign to me
            jchen5 Jack Chen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment