Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-2398

IS NOT DISTINCT FROM predicate returns incorrect result when used as a join filter

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      count should return 0 and not NULL

      0: jdbc:drill:schema=dfs> select
      . . . . . . . . . . . . >         count(*)
      . . . . . . . . . . . . > from
      . . . . . . . . . . . . >         j1 INNER JOIN j2 ON
      . . . . . . . . . . . . >         ( j1.c_double = j2.c_double)
      . . . . . . . . . . . . > where
      . . . . . . . . . . . . >         j1.c_bigint IS NOT DISTINCT FROM j2.c_bigint
      . . . . . . . . . . . . > ;
      +------------+
      |   EXPR$0   |
      +------------+
      +------------+
      

      These are the values in the table

      0: jdbc:drill:schema=dfs> select j1.c_bigint, j2.c_bigint, count(*) from j1 INNER JOIN j2 ON (j1.c_double = j2.c_double) group by j1.c_bigint, j2.c_bigint;
      +------------+------------+------------+
      |  c_bigint  | c_bigint1  |   EXPR$1   |
      +------------+------------+------------+
      | 460194667  | -498749284 | 1          |
      | 464547172  | -498828740 | 1          |
      | 467451850  | -498966611 | 2          |
      | 471050029  | -499154096 | 3          |
      | 472873799  | -499233550 | 3          |
      | 475698977  | -499395929 | 2          |
      | 478986584  | -499564607 | 1          |
      | 488139464  | -499763274 | 3          |
      | 498214699  | -499871720 | 2          |
      +------------+------------+------------+
      9 rows selected (0.339 seconds)
      

      IS DISTINCT FROM predicate returns correct result

      select
              count(*)
      from
              j1 INNER JOIN j2 ON
              ( j1.c_double = j2.c_double)
      where
              j1.c_bigint IS DISTINCT FROM j2.c_bigint
      

      Explain plan for query that returns incorrect result:

      00-01      StreamAgg(group=[{}], EXPR$0=[COUNT()])
      00-02        Project($f0=[0])
      00-03          SelectionVectorRemover
      00-04            Filter(condition=[CAST(CASE(IS NULL($1), IS NULL($3), IS NULL($3), IS NULL($1), =($1, $3))):BOOLEAN NOT NULL])
      00-05              HashJoin(condition=[=($0, $2)], joinType=[inner])
      00-07                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/joins/j1]], selectionRoot=/joins/j1, numFiles=1, columns=[`c_double`, `c_bigint`]]])
      00-06                Project(c_double0=[$0], c_bigint0=[$1])
      00-08                  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/joins/j2]], selectionRoot=/joins/j2, numFiles=1, columns=[`c_double`, `c_bigint`]]])
      

      Attachments

        1. j1.parquet
          742 kB
          Victoria Markman
        2. j2.parquet
          2 kB
          Victoria Markman

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            amansinha100 Aman Sinha
            vicky Victoria Markman
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment