Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-2967

Incompatible types error reported in a "not in" query with compatible data types

    XMLWordPrintableJSON

Details

    Description

      Two tables, parquet files (attached in the bug):

      0: jdbc:drill:schema=dfs> select * from t1;
      +------------+------------+------------+
      |     a1     |     b1     |     c1     |
      +------------+------------+------------+
      | 1          | aaaaa      | 2015-01-01 |
      | 2          | bbbbb      | 2015-01-02 |
      | 3          | ccccc      | 2015-01-03 |
      | 4          | null       | 2015-01-04 |
      | 5          | eeeee      | 2015-01-05 |
      | 6          | fffff      | 2015-01-06 |
      | 7          | ggggg      | 2015-01-07 |
      | null       | hhhhh      | 2015-01-08 |
      | 9          | iiiii      | null       |
      | 10         | jjjjj      | 2015-01-10 |
      +------------+------------+------------+
      10 rows selected (0.119 seconds)
      
      0: jdbc:drill:schema=dfs> select * from t2;
      +------------+------------+------------+
      |     a2     |     b2     |     c2     |
      +------------+------------+------------+
      | 0          | zzz        | 2014-12-31 |
      | 1          | aaaaa      | 2015-01-01 |
      | 2          | bbbbb      | 2015-01-02 |
      | 2          | bbbbb      | 2015-01-02 |
      | 2          | bbbbb      | 2015-01-02 |
      | 3          | ccccc      | 2015-01-03 |
      | 4          | ddddd      | 2015-01-04 |
      | 5          | eeeee      | 2015-01-05 |
      | 6          | fffff      | 2015-01-06 |
      | 7          | ggggg      | 2015-01-07 |
      | 7          | ggggg      | 2015-01-07 |
      | 8          | hhhhh      | 2015-01-08 |
      | 9          | iiiii      | 2015-01-09 |
      +------------+------------+------------+
      13 rows selected (0.116 seconds)
      

      Disable hash join and set slice_target = 1:

      alter session set `planner.enable_hashjoin` = false;
      alter session set `planner.slice_target` = 1;

      Correct result:

      0: jdbc:drill:schema=dfs> select * from t1 where b1 not in (select b2 from t2);
      +------------+------------+------------+
      |     a1     |     b1     |     c1     |
      +------------+------------+------------+
      | 10         | jjjjj      | 2015-01-10 |
      +------------+------------+------------+
      1 row selected (0.625 seconds)
      

      Swap tables and you get an error:

      0: jdbc:drill:schema=dfs> select * from t2 where b2 not in (select b1 from t1);
      +------------+------------+------------+
      |     a1     |     b1     |     c1     |
      +------------+------------+------------+
      Query failed: SYSTEM ERROR: Join only supports implicit casts between 1. Numeric data
       2. Varchar, Varbinary data Left type: INT, Right type: VARCHAR. Add explicit casts to avoid this error
      
      Fragment 1:0
      
      [1a83aa50-39aa-452c-91dd-970bf4a8f03d on atsqa4-133.qa.lab:31010]
      java.lang.RuntimeException: java.sql.SQLException: Failure while executing query.
              at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
              at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
              at sqlline.SqlLine.print(SqlLine.java:1809)
              at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
              at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
              at sqlline.SqlLine.dispatch(SqlLine.java:889)
              at sqlline.SqlLine.begin(SqlLine.java:763)
              at sqlline.SqlLine.start(SqlLine.java:498)
              at sqlline.SqlLine.main(SqlLine.java:460)
      

      Explain plan for the query with an error:

       
      0: jdbc:drill:schema=dfs> explain plan for select * from t2 where b2 not in (select b1 from t1); 
      +------------+------------+ 
      | text | json | 
      +------------+------------+ 
      | 00-00 Screen 
      00-01 Project(*=[$0]) 
      00-02 UnionExchange 
      01-01 Project(T27¦¦*=[$0]) 
      01-02 SelectionVectorRemover 
      01-03 Filter(condition=[NOT(CASE(=($2, 0), false, IS NOT NULL($6), true, IS NULL($4), null, <($3, $2), null, false))]) 
      01-04 MergeJoin(condition=[=($4, $5)], joinType=[left]) 
      01-06 SelectionVectorRemover 
      01-08 Sort(sort0=[$4], dir0=[ASC]) 
      01-10 Project(T27¦¦*=[$0], b2=[$1], $f0=[$2], $f1=[$3], b20=[$4]) 
      01-12 HashToRandomExchange(dist0=[[$4]]) 
      02-01 UnorderedMuxExchange 
      04-01 Project(T27¦¦*=[$0], b2=[$1], $f0=[$2], $f1=[$3], b20=[$4], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($4))]) 
      04-02 Project(T27¦¦*=[$0], b2=[$1], $f0=[$2], $f1=[$3], b20=[$1]) 
      04-03 NestedLoopJoin(condition=[true], joinType=[inner]) 
      04-05 Project(T27¦¦*=[$0], b2=[$1]) 
      04-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t2]], selectionRoot=/drill/testdata/aggregation/t2, numFiles=1, columns=[`*`]]]) 
      04-04 BroadcastExchange 
      06-01 StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[$SUM0($1)]) 
      06-02 UnionExchange 
      07-01 StreamAgg(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)]) 
      07-02 Project(b1=[$0], $f1=[true]) 
      07-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`]]]) 
      01-05 Project(b1=[$0], $f10=[$1]) 
      01-07 SelectionVectorRemover 
      01-09 Sort(sort0=[$0], dir0=[ASC]) 
      01-11 HashAgg(group=[{0}], agg#0=[MIN($1)]) 
      01-13 Project(b1=[$0], $f1=[$1]) 
      01-14 HashToRandomExchange(dist0=[[$0]]) 
      03-01 UnorderedMuxExchange 
      05-01 Project(b1=[$0], $f1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))]) 
      05-02 HashAgg(group=[{0}], agg#0=[MIN($1)]) 
      05-03 Project(b1=[$0], $f1=[true]) 
      05-04 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`]]]) 
      

      Correct result and correct plan with hash join distributed plan ( planner.slice_target = 1)

      alter session set `planner.enable_hashjoin` = true;
      alter session set `planner.slice_target` = 1;

      0: jdbc:drill:schema=dfs> select * from t2 where b2 not in (select b1 from t1);
      +------------+------------+------------+
      |     a2     |     b2     |     c2     |
      +------------+------------+------------+
      +------------+------------+------------+
      No rows selected (0.458 seconds)
      
      0: jdbc:drill:schema=dfs> explain plan for select * from t2 where b2 not in (select b1 from t1);
      +------------+------------+
      |    text    |    json    |
      +------------+------------+
      | 00-00    Screen
      00-01      Project(*=[$0])
      00-02        Project(T25¦¦*=[$0])
      00-03          SelectionVectorRemover
      00-04            Filter(condition=[NOT(CASE(=($2, 0), false, IS NOT NULL($6), true, IS NULL($4), null, <($3, $2), null, false))])
      00-05              HashJoin(condition=[=($4, $5)], joinType=[left])
      00-07                Project(T25¦¦*=[$0], b2=[$1], $f0=[$2], $f1=[$3], b20=[$1])
      00-09                  NestedLoopJoin(condition=[true], joinType=[inner])
      00-11                    Project(T25¦¦*=[$0], b2=[$1])
      00-12                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t2]], selectionRoot=/drill/testdata/aggregation/t2, numFiles=1, columns=[`*`]]])
      00-10                    BroadcastExchange
      01-01                      StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[$SUM0($1)])
      01-02                        UnionExchange
      03-01                          StreamAgg(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
      03-02                            Project(b1=[$0], $f1=[true])
      03-03                              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`]]])
      00-06                Project(b1=[$0], $f10=[$1])
      00-08                  BroadcastExchange
      02-01                    HashAgg(group=[{0}], agg#0=[MIN($1)])
      02-02                      Project(b1=[$0], $f1=[$1])
      02-03                        HashToRandomExchange(dist0=[[$0]])
      04-01                          UnorderedMuxExchange
      05-01                            Project(b1=[$0], $f1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
      05-02                              HashAgg(group=[{0}], agg#0=[MIN($1)])
      05-03                                Project(b1=[$0], $f1=[true])
      05-04                                  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`]]])
      

      Same error with the columns of date, time and timestamp types.

      Attachments

        1. j2_j6_tables.tar
          344 kB
          Victoria Markman
        2. t2.parquet
          0.5 kB
          Victoria Markman
        3. t1.parquet
          0.5 kB
          Victoria Markman

        Activity

          People

            adeneche Abdel Hakim Deneche
            vicky Victoria Markman
            Victoria Markman Victoria Markman
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: