Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-2967

Incompatible types error reported in a "not in" query with compatible data types

    XMLWordPrintableJSON

Details

    Description

      Two tables, parquet files (attached in the bug):

      0: jdbc:drill:schema=dfs> select * from t1;
      +------------+------------+------------+
      |     a1     |     b1     |     c1     |
      +------------+------------+------------+
      | 1          | aaaaa      | 2015-01-01 |
      | 2          | bbbbb      | 2015-01-02 |
      | 3          | ccccc      | 2015-01-03 |
      | 4          | null       | 2015-01-04 |
      | 5          | eeeee      | 2015-01-05 |
      | 6          | fffff      | 2015-01-06 |
      | 7          | ggggg      | 2015-01-07 |
      | null       | hhhhh      | 2015-01-08 |
      | 9          | iiiii      | null       |
      | 10         | jjjjj      | 2015-01-10 |
      +------------+------------+------------+
      10 rows selected (0.119 seconds)
      
      0: jdbc:drill:schema=dfs> select * from t2;
      +------------+------------+------------+
      |     a2     |     b2     |     c2     |
      +------------+------------+------------+
      | 0          | zzz        | 2014-12-31 |
      | 1          | aaaaa      | 2015-01-01 |
      | 2          | bbbbb      | 2015-01-02 |
      | 2          | bbbbb      | 2015-01-02 |
      | 2          | bbbbb      | 2015-01-02 |
      | 3          | ccccc      | 2015-01-03 |
      | 4          | ddddd      | 2015-01-04 |
      | 5          | eeeee      | 2015-01-05 |
      | 6          | fffff      | 2015-01-06 |
      | 7          | ggggg      | 2015-01-07 |
      | 7          | ggggg      | 2015-01-07 |
      | 8          | hhhhh      | 2015-01-08 |
      | 9          | iiiii      | 2015-01-09 |
      +------------+------------+------------+
      13 rows selected (0.116 seconds)
      

      Disable hash join and set slice_target = 1:

      alter session set `planner.enable_hashjoin` = false;
      alter session set `planner.slice_target` = 1;

      Correct result:

      0: jdbc:drill:schema=dfs> select * from t1 where b1 not in (select b2 from t2);
      +------------+------------+------------+
      |     a1     |     b1     |     c1     |
      +------------+------------+------------+
      | 10         | jjjjj      | 2015-01-10 |
      +------------+------------+------------+
      1 row selected (0.625 seconds)
      

      Swap tables and you get an error:

      0: jdbc:drill:schema=dfs> select * from t2 where b2 not in (select b1 from t1);
      +------------+------------+------------+
      |     a1     |     b1     |     c1     |
      +------------+------------+------------+
      Query failed: SYSTEM ERROR: Join only supports implicit casts between 1. Numeric data
       2. Varchar, Varbinary data Left type: INT, Right type: VARCHAR. Add explicit casts to avoid this error
      
      Fragment 1:0
      
      [1a83aa50-39aa-452c-91dd-970bf4a8f03d on atsqa4-133.qa.lab:31010]
      java.lang.RuntimeException: java.sql.SQLException: Failure while executing query.
              at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
              at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
              at sqlline.SqlLine.print(SqlLine.java:1809)
              at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
              at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
              at sqlline.SqlLine.dispatch(SqlLine.java:889)
              at sqlline.SqlLine.begin(SqlLine.java:763)
              at sqlline.SqlLine.start(SqlLine.java:498)
              at sqlline.SqlLine.main(SqlLine.java:460)
      

      Explain plan for the query with an error:

       
      0: jdbc:drill:schema=dfs> explain plan for select * from t2 where b2 not in (select b1 from t1); 
      +------------+------------+ 
      | text | json | 
      +------------+------------+ 
      | 00-00 Screen 
      00-01 Project(*=[$0]) 
      00-02 UnionExchange 
      01-01 Project(T27¦¦*=[$0]) 
      01-02 SelectionVectorRemover 
      01-03 Filter(condition=[NOT(CASE(=($2, 0), false, IS NOT NULL($6), true, IS NULL($4), null, <($3, $2), null, false))]) 
      01-04 MergeJoin(condition=[=($4, $5)], joinType=[left]) 
      01-06 SelectionVectorRemover 
      01-08 Sort(sort0=[$4], dir0=[ASC]) 
      01-10 Project(T27¦¦*=[$0], b2=[$1], $f0=[$2], $f1=[$3], b20=[$4]) 
      01-12 HashToRandomExchange(dist0=[[$4]]) 
      02-01 UnorderedMuxExchange 
      04-01 Project(T27¦¦*=[$0], b2=[$1], $f0=[$2], $f1=[$3], b20=[$4], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($4))]) 
      04-02 Project(T27¦¦*=[$0], b2=[$1], $f0=[$2], $f1=[$3], b20=[$1]) 
      04-03 NestedLoopJoin(condition=[true], joinType=[inner]) 
      04-05 Project(T27¦¦*=[$0], b2=[$1]) 
      04-06 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t2]], selectionRoot=/drill/testdata/aggregation/t2, numFiles=1, columns=[`*`]]]) 
      04-04 BroadcastExchange 
      06-01 StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[$SUM0($1)]) 
      06-02 UnionExchange 
      07-01 StreamAgg(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)]) 
      07-02 Project(b1=[$0], $f1=[true]) 
      07-03 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`]]]) 
      01-05 Project(b1=[$0], $f10=[$1]) 
      01-07 SelectionVectorRemover 
      01-09 Sort(sort0=[$0], dir0=[ASC]) 
      01-11 HashAgg(group=[{0}], agg#0=[MIN($1)]) 
      01-13 Project(b1=[$0], $f1=[$1]) 
      01-14 HashToRandomExchange(dist0=[[$0]]) 
      03-01 UnorderedMuxExchange 
      05-01 Project(b1=[$0], $f1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))]) 
      05-02 HashAgg(group=[{0}], agg#0=[MIN($1)]) 
      05-03 Project(b1=[$0], $f1=[true]) 
      05-04 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`]]]) 
      

      Correct result and correct plan with hash join distributed plan ( planner.slice_target = 1)

      alter session set `planner.enable_hashjoin` = true;
      alter session set `planner.slice_target` = 1;

      0: jdbc:drill:schema=dfs> select * from t2 where b2 not in (select b1 from t1);
      +------------+------------+------------+
      |     a2     |     b2     |     c2     |
      +------------+------------+------------+
      +------------+------------+------------+
      No rows selected (0.458 seconds)
      
      0: jdbc:drill:schema=dfs> explain plan for select * from t2 where b2 not in (select b1 from t1);
      +------------+------------+
      |    text    |    json    |
      +------------+------------+
      | 00-00    Screen
      00-01      Project(*=[$0])
      00-02        Project(T25¦¦*=[$0])
      00-03          SelectionVectorRemover
      00-04            Filter(condition=[NOT(CASE(=($2, 0), false, IS NOT NULL($6), true, IS NULL($4), null, <($3, $2), null, false))])
      00-05              HashJoin(condition=[=($4, $5)], joinType=[left])
      00-07                Project(T25¦¦*=[$0], b2=[$1], $f0=[$2], $f1=[$3], b20=[$1])
      00-09                  NestedLoopJoin(condition=[true], joinType=[inner])
      00-11                    Project(T25¦¦*=[$0], b2=[$1])
      00-12                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t2]], selectionRoot=/drill/testdata/aggregation/t2, numFiles=1, columns=[`*`]]])
      00-10                    BroadcastExchange
      01-01                      StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[$SUM0($1)])
      01-02                        UnionExchange
      03-01                          StreamAgg(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
      03-02                            Project(b1=[$0], $f1=[true])
      03-03                              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`]]])
      00-06                Project(b1=[$0], $f10=[$1])
      00-08                  BroadcastExchange
      02-01                    HashAgg(group=[{0}], agg#0=[MIN($1)])
      02-02                      Project(b1=[$0], $f1=[$1])
      02-03                        HashToRandomExchange(dist0=[[$0]])
      04-01                          UnorderedMuxExchange
      05-01                            Project(b1=[$0], $f1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
      05-02                              HashAgg(group=[{0}], agg#0=[MIN($1)])
      05-03                                Project(b1=[$0], $f1=[true])
      05-04                                  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`]]])
      

      Same error with the columns of date, time and timestamp types.

      Attachments

        1. t1.parquet
          0.5 kB
          Victoria Markman
        2. t2.parquet
          0.5 kB
          Victoria Markman
        3. j2_j6_tables.tar
          344 kB
          Victoria Markman

        Activity

          People

            adeneche Abdel Hakim Deneche
            vicky Victoria Markman
            Victoria Markman Victoria Markman
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: